13 September 2014

Why I Built an Anti-Censorship Proxy

The Net interprets censorship as damage and routes around it.
- John Gilmore</p>

In the next few days https://RoutingPacketsIsNotACrime.uk is going to go live. This blog post is about why I created it.

The Why

The Internet has always experienced pushes for censorship, early Usenet administrators attempted to censor certain discussions but people fought back and their efforts led to the quote from John Gilmore above.

Eleven years later BT deployed CleanFeed across its network and at the time it was difficult to argue against a filter whose sole purpose was the prevention of accidental (and especially purposeful) access to child abuse material, however the ramifications were obvious.

Fast forward another decade and we're in the envisioned nightmare where High Court Judges order that CleanFeed be used for censoring whatever a lawyer demands, Claire Perry has forced ISPs to introduce even more filtering in peoples homes, in coffee shops and in restaurants. Most recently the City of London Police's PIPCU started criminalising those that help others bypass some of these filters.

I've tried writing to my MP about Internet censorship, they didn't believe there was a risk of over blocking or feature creep. When I moved to a new constituency I wrote to that MP too but they were not interested.

I tried (albeit over Twitter) debating with MP Claire Perry (the main proponent of ISP filtering) but to no avail. I wrote to her then boss but he believed the claims regarding the infallibility of filters and the danger the Internet presented to children so brushed me off.

Having given up on trying to explain in writing to MPs that these filters over-block I helped create blocked.org.uk to highlight the issues with over blocking, some people cared and complained to the ISPs but it wasn't enough to cause change (although amusingly I'm led to believe that the ISPs were pretty annoyed that we were making them look bad).

Other people started noticing that their blogs, business and websites were blocked so they too complained to the ISPs but as they weren't customers they were ignored or fobbed off.

I donated money to create the Department of Dirty video and got a t-shirt that proved somewhat popular on Twitter;

I've also donated money to help the OpenRightsGroup bring transparency to the process of using Court Orders to censor the Internet;

I've been running Tor nodes for as long as I can remember (but only recently started naming them after my domain namesake ) and have written a minor piece of software for the Tor ecosystem (a very very minor piece of software but FOSS all the same, and with ~2500 active users)

I created and maintain https://SurviveTheClairePerryInter.net which documents Internet censorship issues as well as providing education on Internet censorship circumnavigation methods.

But when PIPCU arrested the operator of Immunicity I realised that the slippery slope of Internet censorship was at risk of becoming a dangerously sharp precipice.

People have the choice to choose ISPs that don't filter their Internet connection (such as Andrews and Arnold) but that choice is taken away when they are locked into a contract and the ISP forces filters upon them so I decided to build something that empowered people to choose how to route their packets regardless of the ISP they were currently using.

Unlike Immunicity et al I'm not specifically building a "Pirate Proxy", granted people might use this proxy to navigate to Torrent websites but were I to sell a laptop on eBay that same person may use it for the same reasons so I see no difference. In fact Section 44, subsection 2 of the Serious Crimes Act 2007 even states he is not to be taken to have intended to encourage or assist the commission of an offence merely because such encouragement or assistance was a foreseeable consequence of his act.

We know that Internet filters are fallible and we know that people are having difficulty reaching legitimate websites to find information about sexual health, for trauma support or to find peers; if I can help these people whilst at the same time showing MPs, ISPs and the Police that John Perry Barlow's Declaration of the Independence of Cyberspace still applies then am I not morally obligated to do so?

I address you with no greater authority than that with which liberty itself always speaks. I declare the global social space we are building to be naturally independent of the tyrannies you seek to impose on us. You have no moral right to rule us nor do you possess any methods of enforcement we have true reason to fear.
- A Declaration of the Independence of Cyberspace 1996

The How

The platform is quite simple and is effectively just two parts; the PAC serving front-end and the proxy back-ends.

Before we delve into the operation of RoutingPacketsIsNotACrime.uk I should probably explain what a PAC file is.

The Proxy auto-config file format was originally designed by Netscape in 1996 for the Netscape Navigator 2.0 and is a text file that defines at least one JavaScript function, FindProxyForURL(url, host), with two arguments: url is the URL of the object and host is the host-name derived from that URL.

A very simple example of a PAC file is:

function FindProxyForURL(url, host)
{
  return "PROXY proxy.example.com:8080; DIRECT";
}

This function instructs the browser to retrieve all pages through the proxy on port 8080 of the server proxy.example.com. Should this proxy fail to respond, the browser contacts the Web-site directly, without using a proxy.

Using RoutingPacketsIsNotACrime.uk people can easily create a personalised PAC file that only routes the URLs they specify via the back-end proxies, all other URLs are routed normally.

Creating a personalised PAC file is as simple as pasting a comma separated list of domains into the /create/ endpoint of RoutingPacketsIsNotACrime.uk.

Users can optionally choose a friendly name, a description and a password (so only they can add or remove URLs) for their PAC file. When they click Create the PAC file is generated and the back-end synchronisation process starts.

Whilst the back-end processes are running the user is redirected to the /view/ endpoint of their PAC file. This endpoint shows basic meta data about their PAC file including the PAC URL that will need to be added to their browser and more

All the URLs in the PAC file are listed and are checked against the OpenRightsGroup Blocked.org.uk database to display whether these URLs are currently being blocked by UK ISPs.

Anyone with the password can add or remove URLs from this PAC file with a single click. Other users can choose to clone the PAC file which creates a new copy without affecting the original.

All /view/ endpoints have their own Disqus comments section at the bottom for feedback (imagine a community built PAC file with a few moderators to maintain the URLs).

The proxies maintain two sets of ACLs, the first is a URL white list derived from normalised URLs present in the PAC configs. This is synchronised from the front-end servers over 4096bit Perfect Forward Secrecy secured SSL every 60 seconds.

The second ACL uses BGP to map the subnets routed by the ASNs of the UK ISPs that are known to be filtering their customers Internet connections, this is done to reduce load on the proxies as people without filtered Internet connections don't need this service (and it cuts down on abuse by malicious bots etc).

The proxies are physical dedicated servers (pfft Cloud) and have 24Gb of RAM each (useful for caching static resources such as CSS etc) with 1Gbit/sec connections (200Mbit/sec guarantees).

Bandwidth statistics are generated locally using ifstat then relayed to the front-end every 60 seconds to update the bandwidth utilisation charts. Squid is also monitored using Cacti to keep an eye on CPU, RAM, cache hit ratio, DNS performance and general in/out bandwidth.

Using Chef and some crazy bash scripting additional servers are automatically provisioned and bootstrapped if resource utilisation breeches thresholds. These servers are located in Sweden and Germany, this is not ideal for latency but it minimizes the chances of extra-legal interference.

PAC file serving is done from the UK (this same server in fact) but the disks in this server are LUKS encrypted using a Yubikey password that, once the server had been installed, fully updated to a new kernel and rebooted, was erased and changed. I state here (and on Twitter where I can't edit in retrospect) that in the spirit of the Regulation of Investigatory Powers Act 2000 section 53 (3) I am no longer in possession of the key that can decrypt these disks. (Chef and backups can have it up and running again in ~60mins so why risk 2 years in prison / the state having access to my data?).

The Squid config is fairly simple so should suffice for what it is being used for;

cache_replacement_policy heap LFUDA
cache_swap_low 90
cache_swap_high 95
maximum_object_size_in_memory 50 KB
cache_dir aufs /var/spool/squid 40000 16 256
cache_mem 10000 MB
logfile_rotate 10
memory_pools off
maximum_object_size 50 MB
quick_abort_min 0 KB
quick_abort_max 0 KB
log_icp_queries off
client_db off
buffered_logs on
half_closed_clients off

Each Squid server also has custom error documents such as the one below warning of attempting to reach reddit.com and that either the domain hasn't made it into the PAC ACL yet or the user is visiting from an ISP that is neither in the UK nor does filtering.

The proxies are not designed to cloak or hide would-be criminals, the X-Forwarded-For HTTP header is set and there is no additional encryption between the browser and the proxy servers.

The End Result?

I don't know what effect this project will have, I hope it helps people and I hope that PIPCU don't attempt to kick down my door for it but at the end of the day these proxies are just a small part of my continuing fight against Internet Censorship.