Since the beginning of this project, I have learned many lessons.
First of all the syntax of the language was to closely related to SQL, as this was the language that I wanted to mimic and fit into this framework. Main reason for that was that I wanted something familiar to the end user, I was hoping for easier adoption rate. Secondly, the lack of Web 2.0+ futures. They were not thought out in the beginning and shoehorning them in just made no sense and even more complicated the language.
Example : Grab data from woot (v1)
foreach @srcUrl : [http://www.woot.com/, http://home.woot.com/, http://shirt.woot.com/, http://kids.woot.com/] begin set @doc = select xpaths("//*[@class='amount']/text()"), xpaths("//h2[@class='fn']/text()"), xpaths("//*[@class='lightBox' and @rel='sale']/img/@src"), xpaths("//*[@class='wootOffProgressBarValue']/@style") from @srcUrl set @images = @document.xpaths("//*[@class='lightBox' and @rel='sale']/@href") set @first = @doc set @price = @first set @title = @first set @image = @first set @wootoff = @first.split(":") set @wootoff = @wootoff.replace("%", "") insert products (@srcUrl, @title, @price, @wootoff, @image, @images) end
Couple problems that this introduced, first of all, we were constrained by the original language structure. Let’s take simple snippet
SET @price = @FIRST
It might not look like much but this type of declaration is very declarative and it could easily be removed from the syntax.
Another drawback was the lack of extensibility. Many of the basic functions have been hardcoded into the parser, thus adding any new language extension was dependent on recreating the AST. Only later and ‘dynamic’ methods have been added but again they were an afterthought and seem more like a hack.
There was no way to reuse existing libraries, basically, this means I had to reimplement all the features that someone already did. This only became apparent to me when I needed to run a md5 checksum on one of the downloaded files.
Another issue for me was that there was no good support for collections (List, Set, Dictionary). There was the basic support for ‘list’ and then later ‘dictionary ‘ but again it was not thought out and missed many features.
As we can see there were quite a few design issues that were not thought out in the initial version. Many of this thing have been learned though running the system on real word problems. As long as we are able to take this and adapt I think we will be in good shape.
- Reworking language syntax
- Based on modern languages / frameworks (EcmaScript / Python / Rubby / SQL/ Node.js)
- Web 2.0 (Support for extracting data from dynamic websites)
- Distributed Processing(Will run on a distributed clustering framework)
- Package Manager (PhantomSQL Package Manager(PPM))