Scaling a popular internet radio station: an interview with Mark, creator of Stereodose

Or: What happened when I hit the front page of reddit…

Mark is the creator of Stereodose, a wildly popular internet radio station with a unique approach to generating playlists: you pick your drug of choice, you pick your mood, and a customised selection of tracks start to play.

Stereodose is hosted by PythonAnywhere, a Python-focused PaaS and browser-based programming environment.

Read more…

Outage Report for 19 November 2013

[UPDATE: as of 22 November, backups are working again.]

Backups have always been a source of trouble for us here at PythonAnywhere. We have tried a number of ways to back up your files and all of them have characteristics that make them less than suitable:

  • EBS snapshots - these generate a nice, consistent point-in-time snapshot of everyone’s files, but they slow disk access down too much and for too long (in our experiments, a snapshot could entirely take down every user website on the disk that’s backing up for half an hour and could cause slow disk accesses for up to 6 hours)

  • Rsync - is nice and easy, but it also competes with users for disk access and, because it takes a long time to run, can’t be used to provide continually updated backups.

With that in mind, we set about finding a new backup solution that would provide continual backups that we could then take point-in-time snapshots of. As an extra bonus we’d like it to also provide on-line hot fail-over (and a pony!)

We found our solution in DRBD. Essentially, it keeps 2 disks on different machines synchronised across the network. Our users could use a set of primary disks, and they’d be constantly synchronised with a set of secondary disks. We could then use the secondary disks to take snapshots with no effect on the performance of the primary disks that our users relied on and we could (if one of the the primary disks failed) immediately switch to using its secondary disk without anyone even noticing the switch. As an added bonus, DRBD would enable zero-downtime upgrades to PythonAnywhere and that is a goal that we’re very keen to achieve.

That was the theory. In practice, we needed a multi-step process to implement DRBD in our infrastructure without jeopardising our users’ data. The upgrade on 19 November was the second step of the process and, on the surface, it should have been a simple step that was easy to do. Here’s how it went wrong (all times in UTC).

Read more…

Latest updates -- embedabble Python 3, logrotate and fixed HTTPS/proxy/requests issue

Hi everyone,

New, fresh PythonAnywhere today! Here’s what you get:

  • Embeddable Python 3 consoles! Just use an iframe like this:

    <iframe style="width: 640; height: 480; border: none;" name="embedded_python_anywhere" id="id_florence_iframe" src="https://www.pythonanywhere.com/embedded3/"></iframe>

Thanks to Gerald for pushing us to get that in.

  • Logrotate is now switched on for your web app log files by default, so no scrolling through 10-meg log files any more!

And finally, and this was a long time coming –

  • requests now works properly over HTTPS via our proxy! (If you’re using a virtualenv you’ll need to update it yourself – pip install -U requests should do the job. If you’re not using a virtualenv it should Just Work)

Thanks to all the hard work from the guys that maintain requests and urllib3:

Speeding up the filesystem

Speed illustration

Today we updated PythonAnywhere with a simple, but effective improvement. Filesystem access from your web apps and consoles should now be much faster. Here’s what we did.

Read more…

XFS project IDs - why we switched from Debian to Ubuntu for our File storage

Back in March, we discovered a problem on PythonAnywhere. Some of the people who were signing up reported that the site was telling them that they’d used all of their 500Mb disk quota, even though they had almost no files. When we logged in to our file servers and checked manually using the system tools – like df – we saw the same thing. Our system wasn’t misreporting what the operating system said, the operating system itself was at fault. But curiously, when we used du to see how much space their files were taking up, it gave the correct (much smaller) numbers. This blog post explains what we discovered, and how we fixed it.

Read more…

Better, faster, stronger webapps

Today we deployed the first in a set of large-scale infrastructure improvements, and the big win for everyone is that web apps are much faster, and much-better insulated from each other. It should also help reliability – we’re expecting that the daily problems where some apps had 502 errors at around mid-morning UTC will no longer occur.

There are a bunch more infrastructural improvements in the pipeline – stay tuned!

Outage report for today - an AWS interface mishap

We deployed a new version of PythonAnywhere today with some cool new stuff – more on that later. But there was a nasty outage, and it might be worth explaining just in case anyone else is at risk of getting bitten by the same problem.

Read more…

Filesystem performance problems

A number of people (including ourselves while we’re dogfooding) are seeing very slow disk access on PythonAnywhere at the moment.

The problem is twofold:

  • A smaller cause: we moved to Ubuntu as our underlying base operating system recently, and this appears to be more disk-intensive than our older Debian-based system.
  • The big problem: our system currently uses NFS to share each user’s files between their web applications, consoles, and scheduled tasks. This gives a bottleneck which hasn’t been a huge problem in the past, but is becoming quite serious as more people join PythonAnywhere.

The OS upgrade problem may be fixable in the short term – we’re working to identify what’s causing the greater disk access – but the bigger problem will need a bigger fix. We’re benchmarking a number of alternative filesystems (the current leader seems to be GlusterFS) to see which is most likely to handle scaling better, and will make the move as soon as we can.

Please bear with us!

How many Python programmers are there in the world?

Cross-posted from our founder’s blog, gilesthomas.com

We’ve been talking to some people recently who really wanted to know what the potential market size was for PythonAnywhere, our Python Platform-as-a-Service and cloud-based IDE.

There are a bunch of different ways to look at that, but the most obvious starting point is, “how many people are coding Python?” This blog post is an attempt to get some kind of order-of-magnitude number for that.

Read more…

Ubuntu upgrade -- some minor user-visible features

Switching from Debian to Ubuntu was obviously quite a big infrastructure leap so we’ve tried to keep this release as feature-free as possible, but there are still a few user-visible changes that you’ll see.

  • Vim goes from version 7.2 to 7.3, woo!
  • Emacs goes from version 23.2 to 23.4. meh. we’ll upgrade to 24 when we get a chance (as if anyone uses emacs, anyway).
  • Git goes from version 1.7.10 to version 1.8.1 (release notes)
  • Python goes from version 2.7.3 to 2.7.4 and 3.3.0 to 3.3.1

And you get a few other Ubuntu goodies, like ll as an alias for ls -l, more colourful bash output, and so on… Let us know if you have any particular favourites.

But for us, the big win was the move from kernel 2.6.32 to 3.8.0. W00t.

PS – as part of this release we’re also allowing community submissions for our mini tutorials. More info in the forums