#VATMESS - or, how a taxation change took 4 developers a week to handle

A lot of people are talking about the problems that are being caused by a recent change to taxation in the EU; this TechCrunch article gives a nice overview of the issues. But we thought it would be fun just to tell about our experience - for general interest, and as an example of what one UK startup had to do to implement these changes. Short version: it hasn't been fun.

If you know all about the EU VAT changes and just want to know what we did at PythonAnywhere, click here to skip the intro. Otherwise, read on...

The background

"We can fight over what the taxation levels should be, but the tax system should be very, very simple and not distortionary." - Adam Davidson

The tax change is, in its most basic form, pretty simple. But some background will probably help. The following is simplified, but hopefully reasonably clear.

VAT is Value Added Tax, a tax that is charged on pretty much all purchases of anything inside the EU (basic needs like food are normally exempt). It's not dissimilar to sales or consumption tax, but the rate is quite high: in most EU countries it's something like 20%. When you buy (say) a computer, VAT is added on to the price, so a PC that the manufacturer wants EUR1,000 for might cost EUR1,200 including VAT. Prices for consumers are normally quoted with VAT included, when the seller is targetting local customers. (Companies with large numbers of international customers, like PythonAnywhere, tend to quote prices without VAT and show a note to the effect that EU customers have to pay VAT too.)

When you pay for the item, the seller takes the VAT, and they have to pay all the VAT they have collected to their local tax authority periodically. There are various controls in place to make sure this happens, that people don't pay more or less VAT than they've collected, and in general it all works out pretty simply. The net effect is that stuff is more expensive in Europe because of tax, but that's a political choice on the part of European voters (or at least their representatives).

When companies buy stuff from each other (rather than sales from companies to consumers) they pay VAT on those purchases if they're buying from a company in their own country, but they can claim that VAT back from their local tax authorities (or offset it against VAT they've collected), so it's essentially VAT-free. And when they buy from companies in other EU countries or internationally, it's VAT-free. (The actual accounting is a little more complicated, but let's not get into that.)

What changed

"Taxation without representation is tyranny." - James Otis

Historically, for "digital services" -- a category that includes hosting services like PythonAnywhere, but also downloaded music, ebooks, and that kind of thing -- the rule was that the rate of VAT that was charged was the rate that prevailed in the country where the company doing the selling was based. This made a lot of sense. Companies would just need to register as VAT-collecting businesses with their local authorities, charge a single VAT rate for EU customers (apart from sales to other VAT-registered businesses in other EU countries), and pay the VAT they collected to their local tax authority. It wasn't trivially simple, but it was doable.

But, at least from the tax authorities' side, there was a problem. Different EU countries have different VAT rates. Luxembourg, for example, charges 15%, while Hungary is 27%. This, of course, meant that Hungarian companies were at a competitive disadvantage to Luxembourgeoise companies.

There's a reasonable point to be made that governments who are unhappy that their local companies are being disadvantaged by their high tax rates might want to consider whether those high tax rates are such a good idea, but (a) we're talking about governments here, so that was obviously a non-starter, and (b) a number of large companies had a strong incentive to officially base themselves in Luxembourg, even if the bulk of their business -- both their customers and their operations -- was in higher-VAT jurisdictions.

So, a decision was made that instead of basing the VAT rate for intra-EU transactions for digital services on the VAT rate for the seller, it should be based on the VAT rate for the buyer. The VAT would then be sent to the customer's country's tax authority -- though to keep things simple, each country's tax authority would set up a process where the company's local tax authority would collect all of the money from the company along with a file saying how much was meant to go to each other country, and they'd handle the distribution. (This latter thing is called, in a wonderful piece of bureaucratese, a "Mini One-Stop Shop" or MOSS, which is why "VATMOSS" has been turned into a term for the whole change.)

As these kind of things go, it wasn't too crazy a decision. But the knock-on effects, both those inherent in the idea of companies having to charge different tax rates for different people, but also those caused by the particular way the details of laws have been worked out, have been huge. What follows is what we had to change for PythonAnywhere. Let's start with the basics.

Different country, different VAT rate

"The wisdom of man never yet contrived a system of taxation that would operate with perfect equality." - Andrew Jackson

Previously we had some logic that worked out if a customer was "vattable". A user in the UK, where we're based, is vattable, as is a non-business anywhere else in the EU. EU-based businesses and anyone from outside the EU were non-vattable. If a user was vattable, we charged them 20%, a number that we'd quite sensibly put into a constant called VAT_RATE in our accounting system.

What needed to change? Obviously, we have a country for each paying customer, from their credit card/PayPal details, so a first approximation of the system was simple. We created a new database table, keyed on the country, with VAT rates for each. Now, all the code that previously used the VAT_RATE constant could do a lookup into that table instead.

So now we're billing people the right amount. We also need to store the VAT rate on every invoice we generate so that we can produce the report to send to the MOSS, but there's nothing too tricky about that.

Simple, right? Not quite. Let's put aside that there's no solid source for VAT rates across the EU (the UK tax authorities recommend that people look at a PDF on an EU website that's updated irregularly and has a table of VAT rates using non-ISO country identifiers, with the countries' names in English but sorted by their name in their own language, so Austria is written "Austria" but sorted under "O" for "Österreich").

No, the first problem is in dealing with evidence.

Where are you from?

"Extraordinary claims require extraordinary evidence." - Carl Sagan

How do you know which country someone is from? You'd think that for a paying customer, it would be pretty simple. Like we said a moment ago, they've provided a credit card number and an address, or a PayPal billing address, so you have a country for them. But for dealing with the tax authorities, that's just not enough. Perhaps they feared that half the population of the EU would be flocking to Luxembourgeoise banks for credit cards based there to save a few euros on their downloads.

What the official UK tax authority guidelines say regarding determining someone's location is this (and it's worth quoting in all its bureaucratic glory):

1.5 Record keeping

If the presumptions referred above don’t apply, you’ll be expected to obtain and to keep in your records 2 pieces of non-contradictory evidence from the following list to support your taxing decisions. Examples include:

  • the billing address of the customer
  • the Internet Protocol (IP) address of the device used by the customer
  • location of the bank
  • the country code of SIM card used by the customer
  • the location of the customer’s fixed land line through which the service is supplied to him
  • other commercially relevant information (for example, product coding information which electronically links the sale to a particular jurisdiction)

Once you have 2 pieces of non-contradictory evidence that is all you need and you don’t need to collect any further supporting evidence. This is the case even if, for example, you obtain a third piece of evidence which happens to contradict the other 2 pieces of information. You must keep VAT MOSS records for a period of 10 years from 31 December of the year during which the transaction was carried out.

For an online service paid with credit cards, like PythonAnywhere, the only "pieces of evidence" we can reasonably collect are your billing address and your IP address.

So, our nice simple checkout process had to grow another wart. When someone signs up, we have to look at their IP address. We then compare this with a GeoIP database to get a country. When we have both that and their billing address, we have to check them:

  • If neither the billing address nor the IP address is in the EU, we're fine -- continue as normal.
  • If either the billing address or the IP address is in the EU, then:
    • If they match, we're OK.
    • If they don't match, we cannot set up the subscription. There is quite literally nothing we can do, because we don't know enough about the customer to work out how much VAT to charge them.

We've set things up so that when someone is blocked due to a location mismatch like that, we show the user an apologetic page, and the PythonAnywhere staff get an email so that we can talk to the customer and try to work something out. But this is a crazy situation:

  • If you're a British developer with a UK credit card, currently on a business trip to Germany, then you can't sign up for PythonAnywhere. This does little good for the free movement of goods and services across the EU.
  • If you're an American visiting the UK and want to sign up for our service, you're equally out of luck. This doesn't help the EU's trade balance much.
  • If you're an Italian developer who's moved to Portugal, you can't sign up for PythonAnywhere until you have a new Portuguese credit card. This makes movement of people within the EU harder.

As far as we can tell there's no other way to implement the rules in a manner consistent with the guidelines. This sucks.

And it's not all. Now we need to deal with change...

Ch-ch-changes

"Time changes everything except something within us which is always surprised by change." - Thomas Hardy

VAT rates change (normally upwards). When you're only dealing with one country's VAT rate, this is pretty rare; maybe once every few years, so as long as you've put it in a constant somewhere and you're willing to update your system when it changes, you're fine. But if you're dealing with 28 countries, it becomes something you need to plan for happening pretty frequently, and it has to be stored in data somewhere.

So, we added an extra valid_from field to our table of VAT rates. Now, when we look up the VAT rate, we plug today's date into the query to make sure that we've got the right VAT rate for this country today. (If you're wondering why we didn't just set things up with a simple country-rate mapping that would be updated at the precise point when the VAT rate changed, read on.)

No big deal, right? Well, perhaps not, but it's all extra work. And of course we now need to check a PDF on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying "Beware of the Leopard" to see when it changes. We're hoping a solid API for this will crop up somewhere -- though there are obviously regulatory issues there, because we're responsible for making sure we use the right rate, and we can't push that responsibility off to someone else. The API would need to be provided by someone we were 100% sure would keep it updated at least as well as we would ourselves -- so, for example, a volunteer-led project would be unlikely to be enough.

So what went wrong next? Let's talk about subscription payments.

Reconciling ourselves to PayPal

"Nothing changes like changes, because nothing changes but the changes." - Gary Busey

Like many subscription services, at PythonAnywhere we use external companies to manage our billing. The awesome Stripe handle our credit card payments, and we also support PayPal. This means that we don't need to store credit card details in our own databases (a regulatory nightmare) or do any of the horrible integration work with legacy credit card processing systems.

So, when you sign up for a paid PythonAnywhere account, we set up a subscription on either Stripe or PayPal. The subscription is for a specific amount to be billed each month; on Stripe it's a "gross amount" (that is, including tax) while PayPal split it out into separate "net amount" and a "tax amount". But the common thing between them is that they don't do any tax calculations themselves. As international companies having to deal with billing all over the world, this makes sense, especially given that historically, say, a UK company might have been billing through their Luxembourg subsidiary to get the lower VAT rate, so there's no safe assumptions that the billing companies could make on their customers' behalf.

Now, the per-country date-based VAT rate lookup into the database table that we'd done earlier meant that when a customer signed up, we'd set up a subscription on PayPal/Stripe with the right VAT amount for the time when the subscription was created. But if and when the VAT rate changed in the future, it would be wrong. Our code would make sure that we were sending out billing reminders for the right amount, but it wouldn't fix the subscriptions on PayPal or Stripe.

What all that means is that when a VAT rate is going to change, we need to go through all of our customers in the affected country, identify if they were using PayPal or Stripe, then tell the relevant payment processor that the amount we're charging them needs to change.

This is tricky enough as it is. What makes it even worse is an oddity with PayPal billing. You cannot update the billing amount for a customer in the 72 hours before a payment is due to be made. So, for example, let's imagine that the VAT rate for Malta is going to change from 18% to 20% on 1 May. At least three days before, you need to go through and update the billing amounts on subscriptions for all Maltese customers whose next billing date is after 1 May. And then sometime after you have to go through all of the other Maltese customers and update them too.

Keeping all of this straight is a nightmare, especially when you factor in things like the fact that (again due to PayPal) we actually charge people one day after their billing date, and users can be "delinquent" -- that is, their billing date was several days ago but due to (for example) a credit card problem we've not been able to charge them yet (but we expect to be able to do so soon). And so on.

The solution we came up with was to write a reconciliation script. Instead of having to remember "OK, so Malta's going from 18% to 20% on 1 May so we need to run script A four days before, then update the VAT rate table at midnight on 1 May, then run script B four days after", and then repeat that across all 28 countries forever, we wanted something that would run regularly and would just make everything work, by checking when people are next going to be billed and what the VAT rate is on that date, then checking what PayPal and Stripe plan to bill them, and adjusting them if appropriate.

A code sample is worth a thousand words, so here's the algorithm we use for that. It's run once for every EU user (whose subscription details are in the subscription parameter). The get_next_charge_date_and_gross_amount and update_vat_amount are functions passed in as a dependency injection so that we can use the same algorithm for both PayPal and Stripe.

def reconcile_subscription(
    subscription,
    get_next_charge_date_and_gross_amount,
    update_vat_amount
):
    next_invoice_date = subscription.user.get_profile().next_invoice_date()
    next_charge_date, next_charge_gross_amount = get_next_charge_date_and_gross_amount(subscription)

    if next_invoice_date < datetime.now():
        # Cancelled subscription
        return

    if next_charge_date < next_invoice_date:
        # We're between invoicing and billing -- don't try to do anything
        return

    expected_next_invoice_vat_rate = subscription.user.get_profile().billing_vat_rate_as_of(next_invoice_date)
    expected_next_charge_vat_amount = subscription.user.get_profile().billing_net_amount * expected_next_invoice_vat_rate

    if next_charge_gross_amount == expected_next_charge_vat_amount + subscription.user.get_profile().billing_net_amount:
        # User has correct billing set up on payment processor
        return

    # Needs an update
    update_vat_amount(subscription, expected_next_charge_vat_amount)

We're pretty sure this works. It's passed every test we've thrown at it. And for the next few days we'll be running it in our live environment in "nerfed" mode, where it will just print out what it's going to do rather than actually updating anything on PayPal or Stripe. Then we'll run it manually for the first few weeks, before finally scheduling it as a cron job to run once a day. And then we'll hopefully be in a situation where when we hear about a VAT rate change we just need to update our database with a new row with the appropriate country, valid from date, and rate, and it will All Just Work.

(An aside: this probably all comes off as a bit of a whine against PayPal. And it is. But they do have positive aspects too. Lots of customers prefer to have their PythonAnywhere accounts billed via PayPal for the extra level of control it gives them -- about 50% of new subscriptions we get use it. And the chargeback model for fraudulent use under PayPal is much better -- even Stripe can't isolate you from the crazy-high chargeback fees that banks impose on companies when a cardholder claims that a charge was fraudulent.)

In conclusion

"You can't have a rigid view that all new taxes are evil." - Bill Gates

The changes in the EU VAT legislation came from the not-completely-unreasonable attempt by the various governments to stop companies from setting up businesses in low-VAT countries for the sole purpose of offering lower prices to their customers, despite having their operations on higher-VAT countries.

But the administrative load placed on small companies (including but not limited to tech startups) is large, it makes billing systems complex and fragile, and it imposes restrictions on sales that are likely to reduce trade. We've seen various government-sourced estimates of the cost of these regulations on businesses floating around, and they all seem incredibly low.

At PythonAnywhere we have a group of talented coders, and it still took over a week out of our development which we could have spent working on stuff our customers wanted. Other startups will be in the same position; it's an irritation but not the end of the world.

For small businesses without deep tech talent, we dread to think what will happen.


New PythonAnywhere update: Mobile, UI, packages, reliability, and the dreaded EU VAT change

We released a bunch of updates to PythonAnywhere today :-) Short version: we've made some improvements to the iPad and Android experience, applied fixes to our in-browser console, added a bunch of new pre-installed packages, done a big database upgrade that should make unplanned outages rarer and shorter, and made changes required by EU VAT legislation (EU customers will soon be charged their local VAT rate instead of UK VAT).

Here are the details:

iPad and Android

  • The syntax-highlighting in-browser editor that we use, Ace, now supports the iPad and Android devices, so we've upgraded it and changed the mobile version of our site to use it instead of the rather ugly textarea we used to use.
  • We've also re-introduced the "Save and Run" button on the iPad.
  • Combined with the console upgrade we did earlier on this month, our mobile support should now be pretty solid on iPads, iPhones, and new Android (Lollipop) devices. Let us know if you encounter any problems!

User interface

  • Some fixes to our in-browser consoles: fixed problems with zooming in (the bottom line could be cut off if your browser zoom wasn't set to 100%) and with the control key being stuck down if you switched tabs while it was pressed.
  • A tiny change, but one that (we hope) might nudge people in a good direction: we now list Python 3 before Python 2 in our list of links for starting consoles and for starting web apps :-)

New packages

We've added loads of new packages to our "batteries included" list:

  • A quantitative finance library, Quandl (Python 2.7, 3.3 and 3.4)
  • A backtester for financial algorithms, zipline (Python 2.7 only)
  • A Power Spectral Densities estimation package, spectrum (2.7 only)
  • The very cool remix module from Echo Nest: make amazing things from music! (2.7 only)
  • More musical stuff: pyspotify. (2.7 only)
  • Support for the netCDF data format (2.7, 3.3 and 3.4)
  • Image tools: imagemagick and aggdraw (2.7, 3.3 and 3.4)
  • Charting: pychart (2.7 only)
  • For Django devs: django-bootstrap-form (2.7, 3.3 and 3.4)
  • For Flask devs: flask-admin (2.7, 3.3 and 3.4)
  • For web devs who prefer offbeat frameworks: falcon and wheezy.web (2.7, 3.3 and 3.4)
  • A little ORM: peewee (2.7, 3.3 and 3.4)
  • For biologists: simplehmmer (2.7 only)
  • For statistics: Augustus. (2.7 only)
  • For thermodynamicists (?): CoolProp (2.7, 3.3 and 3.4)
  • Read barcodes from images or video: zbar (2.7 only)
  • Sending texts: we've upgraded the Python 2.7 twilio package so that it works from free accounts, and also added it for Python 3.3 and 3.4.
  • Locating people by IP address: pygeoip (2.7, 3.3 and 3.4)
  • We previously had the rpy2 package installed but there was a bug that stopped you from importing rpy2.robjects. That's fixed.

Additionally, for people who like alternative shells to the ubiquitous bash, we've added fish.

Reliability improvement

We've upgraded one of our underlying infrastructural databases to SSD storage. We've had a couple of outages recently caused by problems with this database, which were made much worse by the fact that it took a long time to start up after a failover. Moving it to SSD moved it to new hardware (which we think will make it less likely to fail) and will also mean that if it does fail, it should recover much faster.

EU VAT changes

For customers outside the EU, this won't change anything. But for non-business customers inside the EU, starting 1 January 2015, we'll be charging you VAT at the rate for your country, instead of using the UK VAT rate of 20%. This is the result of some (we think rather badly-thought-through) new EU legislation. We'll write an extended post about this sometime soon. [Update: here it is.]


PythonAnywhere now supports Postgres

Finally!

tl;dr: upgrade to a Custom account and you can now add Postgres

Say no to multi-tenancy (ak The Whale vs the Elephant vs the Dolphin)

Postgres has been the top requested feature for as long as we've supported webapps (3 years? who's counting!). We did have a brief beta based on the idea of a single postgres server, and multi-tenanted low-privileged accounts for each user, but it turned out Postgres really doesn't work too well that way (unlike MySQL).

Our new solution uses Linux containers to provide an isolated server for each user, so everyone can have full superuser access. And, yes, it uses the ubiquitous Docker under the hood.

How to get it

  • You'll need to upgrade to a Custom account, and enable Postgres, as well as choosing how much storage you need for your database.

  • Then, head on over to the Databases tab, and click the big button that says "Start a Postgres server".

  • Once it's ready, take a note of its hostname and port

  • Set your superuser passsword. We'll save it to ~/.pgpass for convenience.

  • And now hit "Start a Postgres console" and take a look around!

How it works under the hood

We run several different machines to host postgres containers for our users. When you hit "start my server", we scan through to find a machine with spare capacity, and ask it to build up a container for you.

It's a docker container based on our postgres image, but each individual user's is customised slightly. For example, we create a superuser account called "pythonanywhere_helper" with a unique, random password to enable us to perform some admin functions. (once you have your own superuser account you could theoretically delete this guy, but we'd rather you didn't...)

We also set up a special permanent storage area for your database, which gives us the ability to migrate you to a different server if we need to.

You can read a bit more about our TDD process for developing this part of the codebase here

What it costs

Aha, the dreaded question. We've set the price for the basic service at $15/month which includes 1GB of storage, and subsequent gigs are 20cents/gig, which we think is pretty fair... Feel free to have a moan if you think it isn't!

What to do next

That's up to you! We've tried to supply you with everything you could need, including the bleeding-edge 9.4 version of Postgres, PostGIS, PL-Python and PL-Python3.... Let us know how you get on!


New release -- new console with 256 colours, some fixes to task logging, and the P-thing.

Exciting new deploy today!

A new console

Obviously, the most important thing we did was to switch out our javascript console for a new one that supports 256 colours! And slightly more sane copy + paste. And it works on Android, or at least it does on Lollipop. Giles recommends the Hackers keyboard. Still doesn't work on my blackberry though.

For the curious, it's based on hterm which is a part of Chromium...

Some new packages

Of secondary importance, we added a few new packages, including TA-lib, pytesseract, and a thing called ruffus.

Improved logging of scheduled tasks

Scheduled tasks now log directly to files in /var/log, rather than storing their output in our database. That means they'll get log-rotated like everything else in there, and if you call flush on your sys.stdout, you may even be able to see live updates while tasks are still running. I think.

New database type supported.

Oh, and we also released a new database type, it's called Postgres, I'm told it's quite popular. Skip on over to the accounts page and get yourself a Custom account if you want to check it out.

Happy coding everyone!


Outage report: 1st November 2014

We had an outage this morning that lasted about an hour. We've established the cause, fixed the problem, and all sites are now back up. Apologies to all those affected. More detail follows.

We were alerted to an initial problem via our monitoring systems around 9:50AM. The symptom was that one of the web servers was seeing intermittent outages, due to a memory leak in one of our users' web applications causing a lot of swapping.

Our failover procedure involves taking the affected server out of rotation on the load balancer, redistributing its workload across to other servers, and rebooting it. We saw the same user using a lot of memory on a second server, so we able to confirm that that was a repeatable issue. We disabled his web app and rebooted this second server.

At this point the larger issue kicked in, which was that the rebooted servers seemed to be non-functional when they came back, which left the remaining servers struggling to keep up with the load, and causing outages to more customers. By this point two of us were working on the issue, and it took us a while to identify the root cause. It turned out to be due to a change in our logging configuration which was causing nginx to hang on startup. Specifically, it only affected users with custom SSL configuration. The reason that this was particularly baffling to us is that our deploy procedure involves a manual check on a sample of custom SSL users, and we confirmed they were functional when we did that deploy two days ago. Our working theory is that nginx will reload happily with broken logging config, but not restart happily:

On deploy:

  1. Start nginx
  2. Add custom SSL webapp configs
  3. Reload nginx

--> not a problem, despite broken SSL webapp logging

On reboot:

  1. Custom SSL configs with broken logging are already present on disk
  2. Nginx refuses to start

We'll be confirming this theory in development environments over the next few days.

In the meantime, we've fixed the offending configuration file template, and confirmed that both regular users and custom-SSL users sites are back up. We're also adding some safeguards to prevent any other users' web apps from using up too much memory.

Once again, we apologise to all those affected.


Maintenance release: trusty + process listings

Hi All,

A maintenance release today, so nothing too exciting. Still, a couple of things you may care about:

  • We've updated to Ubuntu Trusty. Although we weren't vulnerable to shellshock, it's nice to have the updated Bash, and to be on an LTS release

  • We've added an oft-requested feature to be able to view all your running console processes. You'll find it at the bottom of the consoles page. The UI probably needs a bit of work, you need to hit refresh to update the list, but it's a solution for when you think you have some detached processes chewing up your CPU quota! Let us know what you think.

Other than that, we've updated our client-side for our Postgres beta to 9.4, and added some of the PostGIS utilities. (Email us if you want to check out the beta). We also fixed an issue where a redirect loop would break the "reload" button on web apps, and we've added weasyprint and python-svn to the batteries included.


Try PythonAnywhere for free for a month!

Angkor Wat steps
Step Right Up Folks, Step Right Up...

OK, so we already have a free account, but we'd like to give out a free trial of our most popular paid plan, the "Web Developer" plan. Here's what you have to do to claim your free schtuff:

  1. Sign up for PythonAnywhere, if you haven't already
  2. Upgrade to a Web Developer account using Paypal or a credit card (we use this as a basic way of verifying your identity, to mitigate the likelihood of being used as a DDoS platform)
  3. Build a Thing using PythonAnywhere (even a really silly trivial Thing. Say a double-ROT13 encrypter)
  4. Get in touch with us, quoting the code UNGUESSABLEUNIQUECODE1, and show us your Thing
  5. And we'll refund your first month's payment!

Then, play around with the shiny paid features for the rest of the month! Build up to 3 web apps on whatever domains you like, splurge your 3,000 second CPU allowance or your 5GB of storage, use SSH and SSL to your heart's content, whatever it'll be. Then when your month is up, you can cancel if you don't like it (no need to quote any more codes or anything, just hit cancel, no questions asked).

PS -- this isn't some scheme to catch you out and tax you for a month if you forget to cancel. We'll send you a reminder a couple of days before the renewal date, reminding you that your trial is almost over and that you'll be auto-renewed if you don't cancel. And if you happen to miss that, and you get in touch all sore about how we charged you for the second month, we'll totally refund you that too. IT IS MATHEMATICALLY IMPOSSIBLE TO SAY FAIRER THAN THAT.

PPS -- offer expires at the end of this week! You have until 23:59 UTC on the 2nd of Nov.


Site updates on 1 October 2014

We've just updated PythonAnywhere, and there's some great news: Postgres is now in beta! We've switched it on for a select list of beta testers; if you'd like to join, drop us a line at support@pythonanywhere.com.

There have also been some minor tweaks and updates:

  • New installed packages: OpenCV, the libproj Ubuntu package (useful for some Python GIS packages), WeasyPrint,
  • General website speedup (improved minifying of CSS and JavaScript).
  • The "Web" and "Databases" tabs remember which sub-tab you're on between visits.
  • Better validation on the web tab.
  • The page displayed for web apps that have their DNS set up to route to PythonAnywhere but haven't been set up has a better explanation of what's going on.

Test-Driving a docker-based Postgres service using py.test

[cross-posted at Obey The Testing Goat!]

We've been working on incorporating a Postgres database service into PythonAnywhere, and we decided to make it into a bit of a standalone project. The shiny is that we're using Docker to containerise Postgres servers for our users, and while we were at it we thought we'd try a bit of a different approach to testing. I'd be interested in feedback -- what do you like, what might you do differently?

Context: A Docker-based Postgres service

The objective is to build a service that, on demand, will spin up a Docker container with Postgres running on it, and listening on a particular port. The service is going to be controlled by a web API. We've got Flask to run the web service, docker-py to control containers, and Ansible to provision servers.

A single loop of integrated tests

Normally we use a "double-loop" TDD process, with an outside loop of functional tests that use selenium to interact with our web app, and an inner loop of more isolated unit tests. For our development of the Postgres service, we still have the outer loop of functional tests -- selenium tests that log into the site via a browser, and test the service from the perspective of the user -- clicking through the right buttons on our UI and seeing if they can get a console that connects to a new Postgres service.

But for the inner loop we were in a green field -- this wasn't going to be another app in our monolithic Django project, we wanted it to be a standalone service, one that you could package up and use in another context. It would provide all its services via an API, and need no knowledge of the rest of PythonAnywhere. So how should we write the self-contained tests for this app? Should it, in turn, have a double loop? Relying on isolated unit tests only felt like a waste of time -- after all, the whole app was basically a thin wrapper that hooks up a web service to a series of Docker commands. All boundaries. Isolated unit tests would end up being all mocks. And from a TDD-process point of view, because we'd never actually used docker-py before, we didn't know its API, so we wouldn't know what mocks to write before we'd actually decided what the code was going to look like, and tried it out. And trying it out would involve either running one of the PythonAnywhere FTs (super-slow, so a tediously and onerous feedback loop), or with manual tests, with all the uncertainty that implies.

So instead, it felt like starting with an intermediate-level layer of integrated tests might be best: we've already got our top-level UI layer full-stack tests in the form of functional tests. The next level down was the API level -- does calling this particular URL on the API actually give us a working container?

An example test

def test_create_starts_container_with_postgres_connectable(docker_cleanup):
    response = post_to_api_create()

    port = response.json()["port"]
    assert port > 1024

    connection = psycopg2.connect(
        database="postgres",
        user="pythonanywhere_helper", password="papwd",
        host="localhost", port=port,
    )
    connection.close()

Where

def post_to_api_create():
    response = requests.post(
        "http://localhost:5000/api/create",
        {"admin_password": "papwd"}
    )
    assert response.status_code == 200
    assert response.json()["status"] == "OK"
    return response

So you can see that's a very integration-ey, end-to-end test -- it does a real POST request, to a place where it expects to see an actual webapp running, and it expects to see a real, connectable database spun up and ready for it.

Now this test runs in about 10 seconds - not super-fast, like the milliseconds you might want a unit test to run in, but much faster than our FT, which takes 5 or 6 minutes. And, meanwhile, we can actually write this test first. To write an isolated, mocky test, we'd need to know the docker-py API already, and be sure that it was going to work, which we weren't.

To illustrate this point, take a look at the difference between an early implementation and a later one:

A first implementation

USER_IMAGE_DOCKERFILE = '''
FROM postgres
USER postgres
RUN /etc/init.d/postgresql start && \\
    psql -c "CREATE USER pythonanywhere_helper WITH SUPERUSER PASSWORD '{hashed}';"
CMD ["/usr/lib/postgresql/9.3/bin/postgres", "-D", "/var/lib/postgresql/9.3/main", "-c", "config_file=/etc/postgresql/9.3/main/postgresql.conf"]
'''

def get_user_dockerfile(admin_password):
    hashed = 'md5' + md5(admin_password + 'pythonanywhere_helper').hexdigest()
    return USER_IMAGE_DOCKERFILE.format(
        hashed=hashed,
    )

def create_container_with_password(password):
    tempdir = tempfile.mkdtemp()
    with open(os.path.join(tempdir, 'Dockerfile'), 'w') as f:
        f.write(get_user_dockerfile(password))

    response = docker.build(path=tempdir)
    response_lines = list(response)

    image_finder = r'Successfully built ([0-9a-f]+)'
    match = re.search(image_finder, response_lines[-1])
    if match:
        image_id = match.group(1)
    else:
        raise Exception('Image failed to build:\n{}'.format(
            '\n'.join(response_lines)
        ))

    container = docker.create_container(
        image=image_id,
    )
    return container

(These are some library functions we wrote, I won't show you the trivial flask app that calls them).

This was one of our first attempts -- we needed to be able to customise the Postgres superuser password for each user, and our initial solution involved building a new image for each user, by generating and running a custom Dockerfile for them.

We were never quite sure whether the Dockerfile voodoo was going to work, and we weren't really Postgres experts either, so having the high-level integration test, which actually tried to spin up a container and connect to the Postgres database that should be running inside it, was a really good way of getting to a solution that worked.

Imagine what a more isolated test for this code might look like:

@patch('containers.docker')
def test_uses_dockerfile_to_build_new_image(mock_docker):
    expected_dockerfile = USER_IMAGE_DOCKERFILE.format(
        'md5sekritpythonanywhere_helper'
    ).hexdigest()
    def check_dockerfile_contents(path):
        with open(os.path.join(path, 'Dockerfile')) as f:
            assert f.read() == expected_dockerfile

    mock_docker.build.side_effect = check_dockerfile_contents

    create_container_with_password('sekrit')

    assert mock_docker.build.called is True

 @patch('containers.docker')
def test_creates_container_from_docker_image(mock_docker):
    create_container_with_password('sekrit')
    mock_docker.create_container.assert_called_once_with(
        mock_docker.build.return_value
    )

There's no way we could have written that test until we actually had a working solution. And, on top of that, the test would have been totally useless when it came to evolving our requirements and our solution

A later implementation -- but minimal change to the main test

To give you an idea, here's what our current implementation looks like:

def start_new_container(storage_dirname, password, requested_port):
    prep_storage_dir(storage_dirname)
    run_command_on_temporary_container_with_mounts(
        command=['chown', '-R', 'postgres:postgres', POSTGRES_DIR],
        storage_dirname=storage_dirname,
        user='root',
    )
    run_command_on_temporary_container_with_mounts(
        command=[
            'bash', '-c', 
            INITIALISE_POSTGRES_AND_SET_PASSWORD.format(password)
        ],
        storage_dirname=storage_dirname
    )
    user_container = create_postgres_container(name=storage_dirname)
    start_container_with_storage(
        user_container, storage_dirname, 
        ports={POSTGRES_PORT: requested_port},
    )
    with open(port_file_path(storage_dirname), 'w') as f:
        f.write(str(requested_port))
    return requested_port

I won't bore you with the details of run_command_on_temporary_container_with_mounts, but one way or another we realised that building separate images for each user wasn't going to work, and that instead we were going to want to have some permanent storage mounted in from outside of Docker, which would contain the Postgres data directory, and which would effectively "save" customisations like the user's password.

So a radically different implementation, but look how little the main test changed:

def post_to_api_create(storage_dir=None, port=None):
    if storage_dir is None:
        storage_dir = uuid.uuid4()
    if port is None:
        port = random.randint(6000, 9999)
    response = requests.post(
        "https://localhost/api/containers/",
        {
            "storage_dir": storage_dir,
            "admin_password": OUR_PASSWORD,
            "port": port,
        },
        verify=False,
    )
    return response

def test_create_starts_container_with_postgres_connectable(docker_cleanup):
    response = post_to_api_create(port=6123)
    # rest of test as before!

And now imagine all the time we'd have had to spend rewriting mocks, if we'd decided to have isolated tests as well.

Aside: py.test observations

One py.test selling point is "less boilerplate". Notice that none of these tests are methods in a class, and there's no self variable. On top of that, we just use assert keywords, no complicated remembering of self.assertIn, self.assertIsNotNone, and so on. Absolutely loving that.

py.test fixtures

Another thing you may be interested in is the docker_cleanup argument to the test. py.test will magically look for a special fixture function named the same as that argument, and use it in the test. Here's how it looks:

from docker import Client
docker = Client(base_url='unix://var/run/docker.sock')

@pytest.fixture()
def docker_cleanup(request):
    containers_before = docker.containers()

    def kill_new_containers():
        current_containers = docker.containers()
        for container in current_containers:
            if container not in containers_before:
                print('killing {}'.format(container['Names'][0]))
                docker.kill(container)

    request.addfinalizer(kill_new_containers)
    return kill_new_containers

The fixture function has a couple of jobs:

  • It adds a "finalizer" (the equivalent of unittest addCleanup or tearDown) which will run at the end of the tests, to kill any containers that have been started by the test

  • It provides that same finalizer, and a helper method to identify new containers, to the tests that use the fixture, as a helper tool (I haven't showed any examples of that here though)

As it's illustrated here, there are no obvious advantages over the unittest setUp/tearDown ideas, although you can see it would make it a little easier to share setup and cleanup code between tests in different files and tests. There's a lot more to them, and if you really want to get #mindblown, go checkout out pytest yield fixtures

Incidentally, until I started using py.test I'd always associated "fixtures" with Django "fixtures", which basically meant serialized versions of model data, but really py.test is using the word in a more correct usage of the term, to mean "state that the world has to be in for the test to run properly".

The pros & cons of the "integrated-tests-only" workflow

Pros:

  • Allowed us to experiment freely with an API that was new to us, and get feedback on whether it was really working
  • Allowed us to refactor code freely, extracting helper functions etc, without needing to rewrite mocky unit tests

Cons:

  • Being end-to-end tests, they ran much slower than unit tests would - on the order of seconds, and later, a minute or two, once we grew from three or four tests to a dozen or two. And, on top of that...

  • Being integrated tests, they're not designed to run on a development machine. Instead, each code change means pushing updated source up to the server using Ansible, restarting the control webapp, and then re-running the tests in an SSH session.

  • Because the tests call across a web API, the code being tested runs in a different process to he test code, meaning tracebacks aren't integrated into your test results. Instead, you have to tail a logfile, and make sure you have logging set up appropriately.

Conclusions and next steps

I can potentially imagine a time when we might start to see value in a layer of "real" unit tests... So far though, there's really no "business logic" that we could extract and write fast unit tests for. Or at least, there's no business logic that I identify as such, and I'd be very pleased for someone to come along and school me about it?

On the other hand, I can definitely see a time where we might want to split out our tests for the web API from the tests for the Postgres and Docker stuff, and I can see value in a setup where a developer can run these tests locally rather than having to push code up to a dev box. Vagrant and VirtualBox might be one solution, but, honestly, installing Docker and Postgres on a dev box doesn't feel that onerous either, as long as we know we'll be testing on a "real" box in CI. Or at least, it doesn't feel onerous until we start talking about my poor laptop with its paltry 120GB SSD. No room here!

And the bonus of being able to see honest-to-God tracebacks in your test run output feels like it might be worth it.

But, overall, at this stage in development, given the almost total lack of "business logic" in our app, and given the fact that we were working with a new API and a new set of technologies -- I've found that doing without "real" unit tests has actually worked very well.


New release - a few new packages, some steps towards postgres, and forum post previews

Today's upgrade included a few new packages in the standard server image:

  • OpenSCAD
  • FreeCAD
  • inkscape
  • Pillow for 3.3 and 3.4
  • flask-bootstrap
  • gensim
  • textblob

We also improved the default "Unhandled Exception" page, which is shown when a users' web app allows an exception to bubble up to our part of the stack. We now include a slightly friendlier message, explaining to any of the users' users that there's an error, and explaining to the user where they can find their log files and look for debug info.

And in the background, we've deployed a bunch of infrastructure changes related to postgres support. We're getting there, slowly slowly!

Oh yes, and we've enabled dynamic previews in the forums, so you get an idea of how the markdown syntax will translate. It actually uses the same library as stackoverflow, it's called pagedown. Hope you find 'em useful!


Page 1 of 10.

Older posts »

PythonAnywhere is a Python development and hosting environment that displays in your web browser and runs on our servers. They're already set up with everything you need. It's easy to use, fast, and powerful. There's even a useful free plan.

You can sign up here.