As of today, PythonAnywhere supports Python 3 web apps for all frameworks that work with Python 3. We'd love to hear any feedback people have.
To create a Python 3 web app, just click the "Add a new web app" button. If you have an existing web app that you'd like to switch over from Python 2.7 to Python 3, drop us a line with the "Send feedback" option.
In an attempt to get more clicks, this post (which, emphatically, is not a newsletter) has been rewritten in an annoying, nu-internet upworthy/buzzfeed stylee. Welcome aboard!
At 1:37 he hooks you in with a meteoric rise in users numbers. At
2:56 he explains some pretty cool hacks with SQL
ORDER BY RAND. At 3:17 your
mind is totally blown by the web-scale.
OK, so it's not even a video. But it is a pretty cool case study. Over the last year or so we've gone from hosting small hobby projects and small apps to hosting some serious, high-volume sites spiking up to hundreds of thousands of hits per hour. Read more about one of our customers who's built a popular web-radio station out of nothing, using web2py.
Also: illegal narcotics are involved. If that's not clickbait, I don't know what is!.
You didn't know Miley was one of our long-standing users? She has a Ph. D. in Computer Science, you know. And she shares a name with a celebrity. Here's what she had to say.
We use Ace for our in-browser editor -- it's an awesome open source component: the devs have been busy, and it now has loads of new features, including autocomplete, multi-line commment/uncomment, block selection, and much more.
You can consult a full list under the link "Keyboard Shortcuts" if you twerk your way over to editor via the files tab.
If you had just one chance to come up with a hosting plan, would you come up with anything as crazy as these guys did?
So we just launched a $99 "Startup plan", as part of our "all-growed-up-now" offering to more serious hosting users. Some people have told us they worry it's too big a leap from the $12 plan up to the $99 version. What do you think might work well as a stepping stone between the two? Email us -- firstname.lastname@example.org
Well, by "asked", we meant "imagined asking". And I don't think we really stretched to imagining asking 75 times. Anyway, here's refactoring cat!
Remember folks, only good test coverage can save you from becoming the refactoring cat!
It's all fun and games until everything explodes. Read up on one of the more stressful hours we've spent recently.
More generally, we're working towards a zero-downtime architecture, but while we're on our way there, we're going to have to keep having planned outages for new releases. As we get more and more serious customers, we can't get away with announcing them half an hour ahead of time on Twitter. We were thinking email notification, 24 hours ahead of time. But maybe you'd prefer something else? Answers on a postcard to email@example.com please!
[All right, that's enough - Ed.]
Keep in touch everyone! We always love to hear what y'all have to say.
Mark is the creator of Stereodose, a wildly popular internet radio station with a unique approach to generating playlists: you pick your drug of choice, you pick your mood, and a customised selection of tracks start to play.
Stereodose is hosted by PythonAnywhere, a Python-focused PaaS and browser-based programming environment.
|Pageviews||500k / month|
|Unique visitors||70k / month|
Disclaimer: PythonAnywhere LLP naturally does not condone the use of harmful drugs. This is why we do not support PHP. Although we do believe that people should be free to use PHP if they wish to, in the privacy of their own home.
Here's our exclusive interview with Mark:
I graduated from college in 2011, with a degree in Economics. I wasn't interested in finance or consulting, which seemed to be what all my classmates were doing, so I started working at a small music marketing company out of college. I interacted with many blogs at the marketing job, and I wanted to create something with more functionality than what I was seeing, especially one that catered to my music tastes.
I came up with the idea some time in 2011. While coming up with a theme for the a music blog I wanted to create, I realized there weren't any music blogs focused on "drug music." I created the music blog, but it didn't really go anywhere. I decided the "drug music" message needed to be as blunt as possible, so I came up with a "pick your drug, we'll make your playlist" format.
At first, I used Wordpress to build the site. It was doable, but I was forcing Wordpress to do something that it wasn't optimized to do. I started looking into other options, realizing a web app framework would be my best bet. After doing lots of research, I came up with Rails, Django, and web2py. Looking at languages, I found PHP to be really ugly, with complex syntax, compared to Ruby and Python.
I researched each framework as thoroughly as I could, although I didn't understand many of the specifics at that point, just the general differences. I remember reading a lot of discussions, and finding Massimo and other core web2py developers actively supporting users. The smaller community and passion for web2py (having the creator write Stackoverflow responses), was very convincing. I saw web2py as a reaction to Rails and Django, taking inspiration from both, but improving on the flaws of each framework.
I don't quite remember where I found PythonAnywhere first, either in the web2py forums or this web development tutorial by Marco Laspe. However, it was definitely the tutorial that convinced me that PythonAnywhere was the way to go, as deploying web2py easily was very important to me (being so new to web development).
I searched through a couple other hosts, but PythonAnywhere was the only host with a special interest in web2py and Python.
There are many things I like about web2py and PythonAnywhere, so I'll just cover the most important points. For PythonAnywhere, the in-browser Bash Console, Scheduler, and amazing customer support are the big winners. There is an undeniably human aspect to PythonAnywhere, with very real interactions and help when you need it. I really feel that I'm working with people who care about innovation and programming, not just an anonymous corporation.
For web2py, the dedication to creating a quality product from the web2py community is my favorite thing about using web2py. I have many specific needs for my site, and so far, web2py has provided the tools to build a wide array of functionality. It always surprises me how vast web2py's tool set is, and the fact that many of the features have come from web2py community suggestions.
As for things I hate, I'm going to be very careful with what I say hahaha. Most of my frustrations come from the fact that I'm learning as I'm maintaining/creating for Stereodose; I'm doing many things for the first time. I can't say that I really hate anything about web2py or PythonAnywhere. I'm not at a level of competence to where I can accurately criticize either (sorry for the safe answer).
[Ed: if you want to check out web2py on PythonAnywhere, check out our try-web2py demo, no installation or signup required]
My first breakthrough was after I posted the site on /r/drugs (the drug subforum of Reddit). After that, people started periodically posting the site on other forums as well, with a fair number from Reddit too.
The biggest break was earlier this year, when the site got put on rotation with Stumbleupon, and at the same time it front-paged Reddit (through r/Music). I never spent much time marketing the site myself, and was planning on leaving it be, before this traffic surge. I liked the idea of growing Stereodose and wanted to see its full potential, so I abandoned a resume builder project I was working on to focus on Stereodose.
In the most recent month:
One interesting problem was getting random records from the database, efficiently -- the playlist needs to be fresh every time you load it. I originally used web2py's DAL , using
orderby="<random>" to get random records from the database. Unfortunately, this would do a full table scan of the entire song data table. This is expected behavior from
orderby="<random>", since it represents mysql's
ORDER BY RAND(). As the song table grew larger, this presented more and more of a problem, contributing to many 502 errors and slow page loads.
Thinking about a solution, when obtaining random records I wanted to make sure that the number of records analyzed by mysql was equal to the number of records I actually needed. If I wanted 30 songs, I didn't want to analyze 12000 records with a full table scan, but only analyze those 30 records.
The main solution I found was to generate 30 numbers randomly, and then find the record ids that matched those numbers. Frustratingly, this only works if your record ids are sequential, without any holes. Using a WHERE clause in the database query naturally creates holes in ids of your dataset, since the results that match a WHERE clause are not sequential. I needed the where clause, because the songs are chosen at random ONLY if they match the genre that the user selected.
Thus, my solution was to add a column which contained a sequential number for each song, per genre (row_id_per_genre). If I wanted to select 30 songs at random from Genre A, I would find the number of records in Genre A (stored in a summary table so I don't have to count each time), create 30 unique random numbers from 1 to (# of records in Genre A), and directly find those records in the database. The row_id_per_genre column eliminates holes I would usually get from using the WHERE clause.
The result is able to use index of the
row_id_per_genre to find random records directly, and avoids a full table scan, resulting in much faster database selects. A drop from about 300-500ms to 15ms. Other playlist related pages also had this problem, and I used the basic principle from this solution to speed up other random selects.
I didn't realize how unscalable
ORDER BY RAND() was, because I never had a big enough table with decent traffic to cause any problems.
Most people who know me, also know that I run Stereodose. My family is somewhat iffy about the site; they come from a culture where drugs are very taboo, so it's understandable that they'd have some reservations about Stereodose.
It's not impossible to learn on your own, as long as you have an internet connection and can use Google search! Might seem daunting at first, but give it time and you'll be surprised at how much you can understand.
I have a couple big features coming up for the site, the main one being user-created playlists (and with that, more drugs and moods). There are a couple project ideas rattling around in my head, but I'm not ready to materialize any of them yet.
I'm also looking to see if there are other companies or projects I can contribute to once Stereodose can run on it's own. Some professional experience working with other programmers would be a good change of scene (as well as some pocket change).
Oooo that's a tough one...I love to dance and get down with a funky vibe, so my top choice would be Weed -> Groovin. But I'm always changing it up!
Editor's note, aka the shameless marketing bit: Stereodose started out on our $12/month Web Developer plan, which we market as "able to handle the hitting the front page of Hacker News now and again". As Mark's requirements grew, we started scoping out a higher-end $99 Startup Plan, although even he doesn't need quite that much! We've put together a custom package for him; if you're interested in hosting with us at a custom price point, do get in touch and tell us about your requirements. Don't be scared! We have no salespeople, just friendly devs :)
Drop us a line at firstname.lastname@example.org and we'll have a chat!
[UPDATE: as of 22 November, backups are working again.]
Backups have always been a source of trouble for us here at PythonAnywhere. We have tried a number of ways to back up your files and all of them have characteristics that make them less than suitable:
EBS snapshots - these generate a nice, consistent point-in-time snapshot of everyone's files, but they slow disk access down too much and for too long (in our experiments, a snapshot could entirely take down every user website on the disk that's backing up for half an hour and could cause slow disk accesses for up to 6 hours)
Rsync - is nice and easy, but it also competes with users for disk access and, because it takes a long time to run, can't be used to provide continually updated backups.
With that in mind, we set about finding a new backup solution that would provide continual backups that we could then take point-in-time snapshots of. As an extra bonus we'd like it to also provide on-line hot fail-over (and a pony!)
We found our solution in DRBD. Essentially, it keeps 2 disks on different machines synchronised across the network. Our users could use a set of primary disks, and they'd be constantly synchronised with a set of secondary disks. We could then use the secondary disks to take snapshots with no effect on the performance of the primary disks that our users relied on and we could (if one of the the primary disks failed) immediately switch to using its secondary disk without anyone even noticing the switch. As an added bonus, DRBD would enable zero-downtime upgrades to PythonAnywhere and that is a goal that we're very keen to achieve.
That was the theory. In practice, we needed a multi-step process to implement DRBD in our infrastructure without jeopardising our users' data. The upgrade on 19 November was the second step of the process and, on the surface, it should have been a simple step that was easy to do. Here's how it went wrong (all times in UTC):
We are currently running without automated backups, but we're going to be putting together an emergency backup process that can tide us over until we have DRBD working correctly.
We deeply regret the inconveniece to all of our users and we'd like to prevent this sort of thing in future. It will be some time before we're finished implementing zero-downtime upgrades to PythonAnywhere, so we'd like to hear from you how you think we should manage our upgrades:
Send any suggestions or thoughts to email@example.com. We'll take everyone's ideas into consideration and come up with a plan.
A number of people have asked us how they can persuade their (non-technical) bosses that they should use PythonAnywhere in a project. We're put together a first draft of an explanation. We'd love to know what everyone thinks!
New, fresh PythonAnywhere today! Here's what you get:
Embeddable Python 3 consoles! Just use an iframe like this:
<iframe style="width: 640; height: 480; border: none;" name="embedded_python_anywhere" id="id_florence_iframe" src="http://www.pythonanywhere.com/embedded3/"></iframe>
Thanks to Gerald for pushing us to get that in.
And finally, and this was a long time coming --
requestsnow works properly over HTTPS via our proxy! (If you're using a virtualenv you'll need to update it yourself --
pip install -U requestsshould do the job. If you're not using a virtualenv it should Just Work)
Thanks to all the hard work from the guys that maintain requests and urllib3:
Today we updated PythonAnywhere with a simple, but effective improvement. Filesystem access from your web apps and consoles should now be much faster. Here's what we did.
All of your consoles and your web apps on PythonAnywhere have access to an identical filesystem. You can write to a file from a web app, and read the file from the console, and see it updating in real time. This is despite the fact that your web apps and consoles can all be running on different physical machines.
Gluster? Ceph? AFS even? No my friends: NFS.
Obviously, to share the filesystem between multiple machines, we need to use a network filesystem under the hood. Now, with all of the recent work on distributed filesystems and all of the new cool stuff coming out, you might think that we were using something clever like GlusterFS, Ceph, or even that old stalwart, AFS. But while they're all great for keeping large amounts of data, if all you want to do is share a few tens of gigabytes per user between machines, the best bet is actually still NFS. So that's what we use.
Unfortunately, we recently discovered that filesystem performance on PythonAnywhere was getting unacceptably slow. Investigations uncovered the fact that the fault actually lay with the way we handle Dropbox shares.
Dropbox is a great product, and we love it. But we use it at an unusually large scale (tens of thousands of shares, etc), and we run the clients that keep it up-to-date on our side on Linux machines -- Linux is a pretty small market for them, and presumably they don't put as much time into making their Linux client as fast and efficient as possible as they do into optimising the Windows and Mac versions.
Added to this is the fact that a Dropbox client cannot run on an NFS client. If you have an NFS server that shares a directory, then mount it on an NFS client, and then run Dropbox on the NFS client, changes to files made elsewhere won't be detected by Dropbox and so they won't be synchronised. (For people who eat this kind of stuff for breakfast -- turns out Dropbox uses
inotify, which doesn't work over NFS)
So, historically, each NFS file server that was serving up your home directory was also running Dropbox so that it could keep everything in sync. But Dropbox was sucking up more and more machine resources as more and more people joined us on PythonAnywhere. Something had to change, and the change was pretty simple.
Dropbox files are now on a separate NFS server
The important thing for Dropbox is that it runs on the NFS server that is serving up the Dropbox directory. But there's no reason that the NFS server that's serving up the Dropbox directory should also be the one that's serving up the rest of your files. So now we have two classes of file server; Dropbox file servers, which handle just the Dropbox subdirectory for each user's filesystem, and regular file servers, which handle all of the other space that you can write to. (Files you can't write to are stored locally on each machine in our cluster.)
The net result is that accessing files inside your dropbox subdirectory will be slightly faster than before (because the server handling it isn't having to handle your other files), but more importantly, access to your other files will be much faster because it's not having to handle a heavyweight Dropbox connection.
To give some numbers: before this change, the CPU load on a combined file server would frequently spike to 100% and things would often get backed up. Now we've made this change, a typical Dropbox server load is around 50% of CPU, while on a regular file server things are down at less than 10%.
We've got some other ideas planned for making filesystem access faster, but we're hoping that this one alone will fix most of the problems people have seen.
Back in March, we discovered a problem on PythonAnywhere. Some of the people who were signing up reported that the site was telling them that they'd used all of their 500Mb disk quota, even though they had almost no files. When we logged in to our file servers and checked manually using the system tools -- like
df -- we saw the same thing. Our system wasn't misreporting what the operating system said, the operating system itself was at fault. But curiously, when we used
du to see how much space their files were taking up, it gave the correct (much smaller) numbers. This blog post explains what we discovered, and how we fixed it.
On PythonAnywhere, your files are stored on XFS-based filesystems, on Amazon EBS volumes. XFS allows us to have a large number of "projects" on a system, each of which can have an independently-managed quota. We use one project for each user; a project can contain a number of different directories -- for example, your project has your home directory and your
/tmp -- and the quota applies across all of a project's directories.
So, let's say a user called
fred signs up. We generate a line in a file called
projid specifying the name of the project (for which we use the username) and a numerical ID for it, for which we just use the ID of the user in our own database. It looks like this:
Next, we put one line for each of fred's top-level quota'ed directories in a file called
projects, which says which directories belong in that project. They look something like this:
We then run a couple of
xfs_quota commands to tell XFS to read this stuff from the files and apply it to the disk. One of the commands specifies the size of the
fred project's quota. Once that's done, the quota is set up. And we can always re-run the command later to adjust the quota size (for example, if he upgrades and gets more disk space).
df and XFS's own tools were telling us that certain users had filled up their disk quota. But tools like
du were telling us that they'd only used a few kb of disk space. What was going on? It took a little while for us to notice this, but when we did it was obvious. The first person to have the problem had project ID 65,542. When we looked at the project IDs of the others who were seeing the problem, they all had projects IDs above 65,536, apart from a small scattering with IDs of less than 100.
The problem was that the version of Debian that we were using had an older XFS implementation, which only supported 16-bit project IDs. Once a project was created with an ID of greater than 65,535, then it was in effect merged with the same project with the same ID modulo 65,536. The merging triggered all kinds of weird behaviour -- the higher project would always wind up looking like it was full, and the lower one (the one modulo 65,536) would sometimes look like it was full.
So, our first thought was that we could just upgrade Debian. But at that time, the latest version was still squeeze (6.0). The changes we wanted were in wheezy, but that was still a few months away. We were able to find a package with 32-bit XFS management tools for squeeze, but not the kernel modules.
We'd actually been planning to move our entire infrastructure over to Ubuntu in the medium term anyway (and we've since done that). And we discovered that the then-current Ubuntu had the right kernel modules for the version of XFS with 32-bit IDs. So we figured it would make sense to upgrade just the file servers (which have much less stuff running on them than our other server types) to Ubuntu, firstly to fix this bug and secondly to see how hard the full migration was likely to be.
So that's what we did; it turned out to be pretty easy, and after a slightly scary deploy at 3am (where we had to change the existing storage volumes from 16-bit to 32-bit), everything was working perfectly again.
The moral of the story? Whenever you encounter an OS-level thing behaving strangely, check any associated integers and see if they've just passed a particular power of 2.
Today we deployed the first in a set of large-scale infrastructure improvements, and the big win for everyone is that web apps are much faster, and much-better insulated from each other. It should also help reliability -- we're expecting that the daily problems where some apps had 502 errors at around mid-morning UTC will no longer occur.
There are a bunch more infrastructural improvements in the pipeline -- stay tuned!
We deployed a new version of PythonAnywhere today with some cool new stuff -- more on that later. But there was a nasty outage, and it might be worth explaining just in case anyone else is at risk of getting bitten by the same problem.
Amazon AWS provide a feature called "Elastic IPs". An EIP is an IP address that you can associate to any machine you want that you're running on AWS. We use these for all of the public-facing IP addresses on our live cluster.
When we deploy a new PythonAnywhere cluster, all of its public-facing IPs are initially random. Once it's fully up and running and we've checked it's all OK, we run a script that disassociates the EIPs from the old cluster, and associate them with the new cluster. Then, when we're sure all's well, we shut down the old cluster. (This is a slight simplification, but should suffice.)
Now, when you shut down a machine in the AWS web-based console, one option you are given is to "release" any EIPs that were associated with it. This is because they charge you a small amount of money for unused EIPs, to stop people hoarding them. This time we deployed the new version of PythonAnywhere, we decided to use this.
Unfortunately, we hadn't refreshed the web browser we were doing this in after we had run the go-live script had switched the EIPs from the old cluster to the new one. So the interface in that particular browser thought that the live service's EIPs were still associated with the old cluster... and it released them all. So our live cluster suddenly had no public-facing IPs.
This meant that shortly after the new PythonAnywhere cluster went live, we shut down the old one and the site went down. Even worse, once you've released an EIP, you can't get it back. This meant that all of our DNS settings for www.pythonanywhere.com -- and, more importantly, all of our customers' sites -- were pointing to an IP address that we no longer controlled.
This could have been a disaster -- sure, DNS settings can be changed, and most of our customers use CNAMEs so we could get everything pointing at the new IPs by changing our own DNS settings. But DNS settings can also take a long time to propagate, in particular if your TTL (time-to-live) settings are high, because a high TTL means that DNS servers all over the Internet might be caching the old values.
We got in touch with Amazon. Thankfully, their engineers were very responsive and were able to reclaim our old EIPs and reassociate them with our account within 45 minutes, and we were able to get the site back up shortly thereafter. We were lucky -- when you release an EIP, it can get reallocated to any other Amazon user who asks for one.
Lesson learned -- always refresh the AWS console in your browser before doing anything.
[UPDATE] The AWS team have told us that they've fixed this problem -- the "release EIPs" option when you shut down a server will only release the EIP if it currently belongs to the server that you're shutting down. The bug was a pain, but we've been really pleased with the AWS team's responsiveness in helping us work around it and in fixing it.
Page 1 of 8.