In late 2012 I wrote about the migration of our [DataSift.com's] Hadoop cluster to Arista switches but what I didn’t mention was that we also moved our real time systems over to Arista too.
Within the LAN
During our fact finding trek through the Cisco portfolio we acquired a bunch of 4948 and 3750 switches which were re-purposed into the live platform. Unfortunately, the live platform (as opposed to Hadoop sourced historical data) would occasionally experience performance issues due to the fan-out design of our distributed architecture amplifying the impact of micro-bursts during high traffic events.
For every interaction we receive it is augmented with additional meta data such as language designation, sentiment analysis, trend analysis and more. To acquire these values an interaction is tokenised into the relevant parts (e.g. a Twitter user name for Klout score, sentences for sentiment analysis, trigrams for language analysis etc). Each of those tokens is then dispatched to the service endpoints for processing. A stream of 15,000 interactions a second can instantly becomes 100,000+ additional pieces of data traversing the network which puts load on NICs, switch backplanes and core uplinks.
If a particular request were to fail then precious time would be wasted on waiting for the reply, processing the failure and then re-processing the request. To combat this you might duplicate calls to service endpoints (e.g. speculative execution in Hadoop parlance) and double your chances of success but and those ~100,000 streams would become ~200,000 putting further stress on all your infrastructure.
At DataSift we discuss internal platform latency in terms of microseconds and throughput in tens of gigabits so adding an unnecessary callout here or a millisecond extra there isn’t acceptable. We want to be as efficient, fast and reliable as possible.
When we started looking at ways of improving the performance of the real time platform it was obvious that many of the arguments that made Arista an obvious choice for Hadoop also meant it was ideal for our real time system too. The Arista 7050′s we’d already deployed have some impressive statistics in regards to latency so we needed little more convincing that we were on the right path (although the 1.28 Tbps and 960,000,000 packets per second statistics don’t hurt either). For truly low latency switching at the edge one would normally look at the 7150 series but from our testing the 7048′s were well within the performance threshold we wanted and enabled us to standardise our edge.
We made use of our failure tolerant platform design (detailed further below) to move entire cabs at a time over to the Arista 7048′s with no interruption of service to customers.
Once all cabinets were migrated and with no other optimizations at that point we saw an immediate difference in key metrics;
Simply by deploying Arista switches for our ‘real time’ network we decreased augmentation latency from ~15,000µs down to 2200µs. Further optimisations to our stack and how we leverage the Linux kernels myriad options have improved things even more.
Epic Switches are only half the story
One of the great features of the Arista 7048 switches is their deep buffer architecture but in certain circumstances another buffer in the path is that last thing you want. Each buffer potentially adds latency to the system before the upstream can detect the congestion and react to it.
The stack needs to be free of bottlenecks to prevent the buffers from filling up and the 7048 switches can provide up to 40Gb/s of throughput to the core which fits nicely with 40 1u servers in a 44u cabinet. With that said we’re not ones to waste time and bandwidth by leaving the TOR switch if we don’t have to.
With intelligent health checks and resource routing coupled with the Aristas non-blocking full wire speed forwarding in the event of a resource pool suffering failures the processing servers can call out cross-rack with very little penalty;
That’s Great but I’m on the other side of the Internet
We are confident enough in our ability to provide low latency, real time, filtered and augmented content that we publish live latency statistics of a stream being consumed by an EC2 node from the other side of the planet on our status site; http://status.DataSift.com.
We can be this confident because we control and manage every aspect of our platform from influencing how data traverses the Internet to reach us, our routers, our switches all the way down to the SSD chipset or SAS drive spindle speed in the servers. (You can’t say that if you’re on someone’s public cloud!)
When you consider the factors outside of our control it speaks volumes about the trust we have in what we’ve built.
(They could be next door to a social platform DC or over in Antarctica)
|10ms – 150ms|
|Source Platform Processing time
(e.g. Time taken for Facebook or Twitter to process & send it on)
(e.g. San Jose to our furthest European processing node)
(e.g. From a European processing node to a customer in Japan)
When dealing with social data on a global scale there can be a lot of performance uncertainty with under-sea fibre cuts, carrier issues and entire IX outages but we can rest assured that once that data hits our edge we know we can process it with low latencies and high throughput.
In conclusion I’ve once again been impressed by Arista and would whole heartedly recommend their switches to anyone else working with high volume, latency sensitive data.
Arista switches were already a joy to work with (access to bash on a switch, what’s not to love?) but Gary’s insights and advice makes it all the better.
Arista Warrior – Gary A. Donahue
Even with all the epicness of this hardware if you’re lazy with how you treat the steps your data goes through before it becomes a frame on the switch you’re gonna have a bad time so for heavy duty reading The Linux TCP/IP stack book may help.
The Linux TCP/IP Stack: Networking for Embedded Systems – Thomas F Herbert