PythonAnywhere problems on Friday and Saturday

Like a lot of other companies who use Amazon Web Services for their underlying infrastructure, we were affected by the problems they had at their US East Coast data centre on Friday and Saturday. Almost everything was recovered by 09.30 UTC on Saturday, and we don’t believe there was any data loss – though certain scheduled tasks and Dropbox share requests were delayed until late Sunday.

Here are the details for anyone who’s interested…

On Friday, Amazon had a problem with some of their routing hardware. This hit PythonAnywere at approximately 14:40 UTC and led to a situation where people could log in, but got internal server errors on many pages. After investigating and discovering that the problem was an Amazon outage, we took the site down to prevent it from getting into an inconsistent state. Amazon rectified the problem by 17:00 UTC so we brought PythonAnywhere back up. Our check-out confirmed that everything was OK, so we left it running. However, we had inadvertently left the some of code that runs scheduled tasks and Dropbox deactivated at this point. (Obviously we need to improve our check-out procedure.)

On Saturday at 03:30 UTC, Amazon had another outage, this one due to an electrical storm in Northern Virginia which took out their power supply. We became aware of this at 08:10 UTC, by which time the data center was up and running again, so we were able to log in and fix it. Some of our servers had become disconnected from their file systems, so any data users had created during the outage (after the servers had come back but before we had checked them out) was stored in the wrong place; we backed up that data and recovered the servers by 09:30 UTC. All of the data that had been put in the wrong place has been copied back to the right place; all affected users have been notified. (The timing of this outage meant that very few people were affected anyway).

On Sunday we were alerted on the forums that there was a problem for some users with scheduled tasks and Dropbox share handling; we immediately realised what had happened and re-activated the code that had been disabled since Friday evening.

Everything is now running normally. If you have any problems at all with PythonAnywhere, please don’t hesitate to get in touch via the forums or the “feedback” link.

comments powered by Disqus