17 February 2012

Visualising the Twitter firehose with DataSift

In February of 2011 I saw a short video named Immaterials: Light painting WiFi by Timo Arnall, Jørn Knutsen and Einar Sneve Martinussen and something about bringing part of the technical world of bits and bytes, RF and EMR into some sort of physical manifestation resonated with me.

A little while later in June I seized the opportunity to be a part of the Operations team at DataSift and in the first few months we did lots of techy augmentations of a traditional nature such as installing our 42" operations dashboards or installing a strobe light linked to Zenoss that warns us when the platform is unhappy (SMS's just aren't cool enough). As I was working with the dashboards and looking at the sheer volume and diversity of the data that comes through DataSift I started thinking of how this data could be presented in a way more engaging way and then I remembered that video.

One of DataSift's available augmentations is the Klout score of an individual and as that can be a score of 0 - 100 it made a great metric for visualisation. A plan was born; Visualise the Klout scores of individuals tweeting about a particular topic in a 'physical' and engaging way.

To achieve my goal I first needed a way to consume the DataSift data and drive the physical medium. To do this I settled on an ARM SBC; the TS-7553 from Technologic Systems retailing at around $250 (with import tax etc it came to about £320!). This is a headless 250Mhz ARM computer with only 64Mb of RAM but has a 5000 LUT Lattice FPGA, 10/100 Ethernet and best of all can run a native 2.6 Linux kernel all whilst consuming only 642mA at 4.98 volts!

To provide the visualisations I opted for a 10 x 2 grid of 8000 mcd blue LED's which are driven by the 12ma FPGA DIO and transistors. This is all housed in a 2mm acylic sheet:

With the basic 'core' elements completed I turned to writing a hacky little embedded C++ program I have nicknamed DataSift Embedded (or just DSE). The code is freely available on GitHub under the GPLv3 license; https://github.com/NetworksAreMadeOfString/DSE-C.

Once compiled the DSE binary allows you to either manually turn on an LED pair by designating it's pin (specified as constants at the top of dse.h) and it's mode of either 1 or 0. It's primary mode of operation however is to start up in 'daemon' mode where it utilises the awesome CURL library to connect to stream.datasift.com and subscribe to the DataSift hash with your credentials as specified in /etc/dse.conf.

Once connected to the stream it then does a very very dirty hack to try and find the Klout score and if successful passes it to the ProcessScore() function.

strncpy(temp,kloutPointer,10);
	strScore[0] = temp[7];

if(temp[8] != ',' && temp[8] != '}')
	strScore[1] = temp[8];

if(temp[9] != ',' && temp[9] != '}')
	strScore[2] = temp[9];

if(temp[10] != ',' && temp[10] != '}')
	strScore[3] = temp[10];

intScore = atoi(strScore);

if(debug == 1)
	printf("Found a Klout Score of %i\n",intScore);

ProcessScore() is again quite simple using a lot of if / else if statements to evaluate the score and then configure the FPGA:

else if(Score >= 20 && Score < 30)
{
setdiopin(lvl2,1);
                setdiopin(lvl3,1);
                setdiopin(lvl4,0);
                setdiopin(lvl5,0);
                setdiopin(lvl6,0);
                setdiopin(lvl7,0);
                setdiopin(lvl8,0);
                setdiopin(lvl9,0);
                setdiopin(lvl10,0);
}

This is where things get a little bit more interesting, for example the pin-outs have logical numbering as seen below but these may not map to the same registers in the FPGA so you have to calculate some offsets;

    ______________________________________ 
   | 2  4  6  8 10 12 14 16 18 20 22 24 26|
 * | 1  3  5  7  9 11 13 15 17 19 21 23 25|
   \--------------------------------------/
dirPinOffSet = pin - 33; // -37 + 4 = Direction; -37 + 8 = Output
      outPinOffSet = pin - 29;

      // set bit [pinOffset] to [val] of register [0x66]
      if(val)
         sbus_poke16(0x66, (sbus_peek16(0x66) | (1 << outPinOffSet)));
      else
         sbus_poke16(0x66, (sbus_peek16(0x66) & ~(1 << outPinOffSet)));

      // Make the specified pin into an output in direction bits
      sbus_poke16(0x66, sbus_peek16(0x66) | (1 << dirPinOffSet));

sbus_peek16() is part of the ts7500ctl API however the guys at Technologic were kind enough to allow me to utilise the source and distribute it as part of DSE. Please note that sbus.c and sbus.h are Copyright © Technologic Systems and can *only* be used with one of their SBC's.

One cool bit I'd like to point out is the use of asm volatile;

asm volatile (
"mov %0, %1, lsl #18\n"
"orr %0, %0, #0x800000\n"
"orr %0, %0, %2, lsl #3\n"
"3: ldr r1, [%3, #0x64]\n"
"cmp r1, #0x0\n"
"bne 3b\n"
"2: str %0, [%3, #0x50]\n"
"1: ldr r1, [%3, #0x64]\n"
"cmp r1, #0x0\n"
"beq 1b\n"
"ldr %0, [%3, #0x58]\n"
"ands r1, %0, #0x1\n"
"moveq %0, #0x0\n"
"beq 3b\n"
: "+r"(dummy) : "r"(adr), "r"(d), "r"(cvspiregs) : "r1","cc"
);

Now that the code was written (and compiled) I started putting it all together, after ordering all sorts of weird and wonderful components from Ebay such as 2N3904 NPN TO-92 transistors, 20 way IDC sockets and various OHMs of resistors the first demo was uploaded to YouTube in September of 2011:

Things were put on to the backburner for a while as the entire team at DataSift were gearing up for the full launch of the platform on the 16th November and there wasn't any time for playing with finicky micro-electronics and annoying ARM vs x86 issues because we were building a platform that compared to this little 250Mhz ARM CPU has over 936 2.8Ghz cores; http://highscalability.com/blog/2011/11/29/datasift-architecture-realtime-datamining-at-120000-tweets-p.html

Once things had calmed down I returned to the project and started with the external housing. I chose an aluminium cylinder 120mm wide and 1000mm tall with the intention of drilling a series of holes in the front through which the LEDs would shine;

In total I drilled through 315 mm's of aluminium! A quick light test using 3x 10000 mcd red LEDs pointing upwards looks quite nice;

Since the aluminium is simple industrial grade stuff it already had some serious scratches so I wanted to try and soften them but opted not to polish it as it would appear too 'cold' especially with the blue LEDs. I instead chose to try and brush the aluminium. A quick Google search will show that this is not exactly an easy task so I set about creating a tool of awesome.

Using the oven (don't tell @AnimalFuzzyHead !) and some thermosetting acrylic I made a sandpaper tool that was moulded to the shape of the cylinder therefore (hopefully) keeping the brush marks straight;

I think it worked out OK:

Some final touches were to add some foam to the back of the clear acrlic and hot glue the power / signal leads for the LEDs to anchor them to prevent any strain during transport.

Final testing before sealing the unit up with some frosted acrylic caps on each end;

Final Device:

I'm neither an artist, a programmer or a DIY materials man so if someone can make something that turns a DataSift stream into a tangible visualisation (or anything really) I'd love to hear about it / put in our DataSift Devices corner and tell the world about it!