PhantomSQL 2.0

Since the beginning of this project, I have learned many lessons.

First of all the syntax of the language was to closely related to SQL, as this was the language that I wanted to mimic and fit into this framework. Main reason for that was that I wanted something familiar to the end user, I was hoping for easier adoption rate. Secondly, the lack of Web 2.0+ futures. They were not thought out in the beginning and shoehorning them in just made no sense and even more complicated the language.

Example : Grab data from woot (v1)

foreach @srcUrl : [,,,]
	set @doc = select 
			xpaths("//*[@class='lightBox' and @rel='sale']/img/@src"),			
		 from @srcUrl
	set @images = @document.xpaths("//*[@class='lightBox' and @rel='sale']/@href")
	set @first = @doc[0]
	set @price = @first[0]	
	set @title = @first[1]
	set @image = @first[2]	
	set @wootoff = @first[3].split(":")
	set @wootoff = @wootoff[1].replace("%", "")

	insert  products (@srcUrl, @title, @price, @wootoff, @image, @images)

Couple problems that this introduced, first of all, we were constrained by the original language structure. Let’s take simple snippet

SET @price = @FIRST[0]

It might not look like much but this type of declaration is very declarative and it could easily be removed from the syntax. 

Another drawback was the lack of extensibility.  Many of the basic functions have been hardcoded into the parser, thus adding any new language extension was dependent on recreating the AST. Only later and ‘dynamic’ methods have been added but again they were an afterthought and seem more like a hack.

There was no way to reuse existing libraries, basically, this means I had to reimplement all the features that someone already did. This only became apparent to me when I needed to run a md5 checksum on one of the downloaded files. 

Another issue for me was that there was no good support for collections (List, Set, Dictionary). There was the basic support for ‘list’ and then later ‘dictionary ‘ but again it was not thought out and missed many features.

As we can see there were quite a few design issues that were not thought out in the initial version.  Many of this thing have been learned though running the system on real word problems. As long as we are able to take this and adapt I think we will be in good shape.


  • Reworking language syntax
    • Based on modern languages / frameworks (EcmaScript / Python / Rubby / SQL/ Node.js)
    • Web 2.0 (Support for extracting data from dynamic websites)
  • Distributed Processing(Will run on a distributed clustering framework)
  • Package Manager (PhantomSQL Package Manager(PPM))


Run MD5 check sum against all files in a directory

Couple snippets that allow us to run checksum and get unique md5 checksums.

This is two step process. First, we obtain our md5 checksum for all files

find -type f -exec md5sum "{}" + > /opt/checklist.chk

This produces file with following contents

71cc452a8ac5a27c32a83e6a0909e7ae  ./PID_190_7344_0_47710322.tif6712032974632727465.tiff
71cc452a8ac5a27c32a83e6a0909e7ae  ./PID_190_7344_0_47710322.tif174464329785828524.tiff
71cc452a8ac5a27c32a83e6a0909e7ae  ./PID_190_7344_0_47710322.tif6775939766281585264.tiff
71cc452a8ac5a27c32a83e6a0909e7ae  ./PID_190_7344_0_47710322.tif7205305688614612348.tiff
71cc452a8ac5a27c32a83e6a0909e7ae  ./PID_190_7344_0_47710322.tif3909999865608008175.tiff

Next we parse and get only unique checksums.

cat  /opt/checklist.chk | awk '{split($0, a, " "); if(!seen[a[1]]++) print a[1]}'

This produces our distinct checksums


Random number between two values

This is a small utility class that allows us to obtain a random number between two values that are uniformly distributed in the range of ‘low’ to ‘high’. This works for floats, doubles and integers.

The inner working of this class are straight forward, our uniform(int, int) method uses the nextInt(int) method of Random class which already allows us to pass the upper bound. Float and Double work by obtaining a value in range [0.0, 1.0] and then scaling it accordingly between our ‘low’ and ‘high’

As this is meant for use in multithreaded environment I am using java.util.concurrent.ThreadLocalRandom rather than java.util.Random for performance reasons.

import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.ThreadLocalRandom;
public class RandomUtil
    public static int uniform(final int low, final int high)
        final ThreadLocalRandom rand = ThreadLocalRandom.current();
        return rand.nextInt(high - low) + low;
    public static float uniform(final float low, final float high)
        final ThreadLocalRandom rand = ThreadLocalRandom.current();
        return rand.nextFloat() * (high - low) + low;
    public static double uniform(final double low, final double high)
        final ThreadLocalRandom rand = ThreadLocalRandom.current();
        return rand.nextDouble() * (high - low) + low;
    public static double nextDouble()
        final ThreadLocalRandom rand = ThreadLocalRandom.current();
        return rand.nextDouble();
    public static boolean nextBoolean()
        final ThreadLocalRandom rand = ThreadLocalRandom.current();
        return rand.nextBoolean();

Procedural lightning effect Unity


This is a basic tutorial on how to create procedural lightning effect in Unity, this is my first attempt at using Unity so if you think there are bugs,issues or better ways of doing things let me know.

I am using a two step process

1) Generator
2) Renderer

I like to have them separated for couple different reasons but mainly to allow me to render them differently and create different lighting like effects.

Generator is responsible for generating segments and renderer is responsible for rendering segments to the screen.

My original version used a LineRenderer but I decided to go with a Mesh/MeshFilter for rendering as that gives me more control.  Each segment creates a new quad that is added to our mesh.

After 1 generation


Results after first pass.

light-001    light-002    light-003

Current status 

light-mesh-002    light-mesh-003    light-mesh-004


Segment class

public class Segment 
    public Vector2 start;

    public Vector2 end;

    public int generation;
    public Segment(Vector2 start, Vector2 end) : this(start, end, 0)

    public Segment(Vector3 start, Vector2 end, int generation)
        this.start = start;
        this.end = end;
        this.generation = generation;

Generate ID from UUID

This is a method to generate a long id in the positive space.

There are few issues to consider with this method
– UUID is 16 bytes / 128 bits
– Long is 8 bytes / 64 bits

This means that we will loose some information, if we don’t want to lose that we could use a BigInteger but In this case we are dealing with longs.

     * Gnereate unique ID from UUID in positive space
     * @return long value representing UUID
    private Long generateUniqueId()
        long val = -1;
            final UUID uid = UUID.randomUUID();
            final ByteBuffer buffer = ByteBuffer.wrap(new byte[16]);
            final BigInteger bi = new BigInteger(buffer.array());
            val = bi.longValue();
        } while (val < 0);
        return val;

This works simply by creating new BigInteger from parts of UUID object, and then getting the longValue. We also make sure that the ID is in positive space, if its not we simply repeat the process. During testing most cases completed in one iteration but it did encounter few runs that reached four iterations.

Configuring Java JDK on Ubuntu

This is an easy way to configure java on a linux box, all this information is available online.

First we need to obtain the build.

sudo wget

Extract from tar

 sudo tar xzvf  jdk-7u75-linux-x64.gz

Create symbolic link so we can later update the version

sudo ln -s /opt/jdk1.7.0_75/ /opt/java

we edit the /etc/profile and add following two lines

export JAVA_HOME=/opt/java
export PATH=$JAVA_HOME/bin:$PATH

finally we ‘source’ the file

source /etc/profile

At this point you should be ready to go, we can verify this by executing

java -version

Grepping for multiple strings in a file

We will use egrep which accepts a regular expression to grep for multiple strings.

tail -f localhost_access_log.txt | egrep "\" 404|\" 500" 

Here our example looks at logs to see if we got 404 or 500 request.

 "GET /favicon.ico HTTP/1.1" 404 973
 "GET /login.html  HTTP/1.1" 500 1230
 "GET /favicon.ico HTTP/1.1" 404 973

Starting jetty via command line an nohup

Somehow I am getting problems starting Jetty via

service jetty start

We will be using unix command called nohup
“Nohup is a unix command, used to start another program, in such a way that it does not terminate when the parent process is terminated.”

I have opted out for using this

nohup java -jar start.jar -Djetty.port=8085

while this works it shown an message

nohup: ignoring input and appending output to `nohup.out'

to fix that up we need to redirect in put and output to /dev/null

 nohup java -jar start.jar -Djetty.port=8085  /dev/null &

Taking heap dump of java process on linux and windows

Taking a heap dump from console when Java VisualVM and JMX is not available to us.
We will use following tools

    • jmap
    • jps
    • ps

Dumping heap requires two steps
1) Obtaining target process id
2) Dumping heap for given pid

First we need to obtain the target process id we would like to dump, here I will show couple ways I like to use.

ps aux | grep 'java'
userx     29901  6.7 47.0 25418812 3848276 ?    Sl   Mar23  85:42 /opt/java/bin/java -Djava.util.logging.config.

Here second column indicates our process id (pid)

Second method that is quite useful to obtain pid for java processes

uxserx@WS4:/opt/java/bin# ./jps -l
29901 org.apache.catalina.startup.Bootstrap

As we see both methods returned us pid of 29901
Npw to perform the dump we issue our second command

userx@WS4:/opt/java/bin# ./jmap -dump:format=b,file=/tmp/heapdump-001.hprof 29901
Dumping heap to /tmp/heapdump-001.hprof ...

At this point we have our heap dump that is ready to be analyzed, for my analysis I use two tools. Eclipse Memory Analyzer (MAT) and Java Visual VM