Securing PythonAnywhere from the Heartbleed bug

The short version

The Heartbleed bug impacted PythonAnywhere (along with pretty much every Linux-based web service out there). We don't believe there's any risk that customer data has been leaked as a result of this problem, with the single exception of private keys for HTTPS certificates for custom domains -- that is, for websites hosted with us that don't end with .pythonanywhere.com. We don't have any reason to believe that those private keys were leaked either -- they're just the only data that we think could possibly have been leaked by it.

[UPDATE: Robert Graham at Errata Security points out that Heartbleed could also potentially have been used to harvest session cookies, usernames and passwords from users of affected sites. He's right, though it would be hard to do, and unlikely that someone would have targeted us for that. But just to be sure, we recommend you change your PythonAnywhere password, log out, then log back in again, and get users of your website to do likewise. Just to be clear on this: we don't think this has been used against us, and have no indication that it has. But it's better to be safe than sorry.]

The details

As you may have read, a bug in OpenSSL was announced last night that could potentially have been used to extract data from webservers, for example the private keys used to encrypt websites' SSL certificates. It exploits the SSL heartbeat extension, and has been nicknamed "Heartbleed". There's more information in this TechCrunch article.

All servers running recent versions of Linux were affected -- a very large percentage of the Internet -- and PythonAnywhere's were among them. All of our servers have been patched since early this morning, so the attack is now not possible against us. The only risk is that data might have been leaked before then.

We do not believe at this time that there's any risk that any data apart from SSL certificates' private keys could have been leaked. So for most PythonAnywhere users, everything should be fine. (Our own key for our own certificate for www.pythonanywhere.com might have been leaked, but we've changed it and are working on revoking the old certificate.)

For those customers who host websites on custom domains with PythonAnywhere (that is, domains that don't end with .pythonanywhere.com), there is a possibility that hackers who knew about this bug before this morning could have used it to extract their private keys. We have notified all such customers by email with details on what to do next; if you do have a custom domain with your own certificate and haven't received an email from us, drop us a line and we'll let you know what to do next.

If you have any questions, just let us know.

What we did

Due to some heroic work on the part of the Ubuntu team, patched versions of the affected libraries were ready by the time we started working on this. So patching all of our servers was just a few commands on each server that did HTTPS:

apt-get update
apt-get install openssl libssl-dev libssl1.0.0

And then a service tornado restart or service nginx restart, depending on what the HTTPS service was on the server.

We used @titanous's Heartbleeder command-line tool and Filippo Valsorda's Heartbleed test page both before and after the fix to make sure we really had fixed the problem.

We're confident that the patches we've applied are enough to fix the bug, at least as it's currently understood.


New release: Custom plans

[updated 09:23 GMT to add bit about reload web app button]

We just released a new version of PythonAnywhere, featuring the usual host of impossible-for-you-to-verify-but-we'll-still-claim-them stability improvements and bugfixes, but there's also some highly visible features, which we hope you'll like.

Pick 'n mix pricing plans

People have been telling us there was too big a leap between our $12 Web Developer plan and the $99 Startup plan, so we're pleased to present you with a new option for Custom plans.

You can find the custom plans on the Accounts & Pricing page

Obviously we're extremely interested in what you think about this. How do you feel about the prices? How do they compare against other options you're evaluating, and, more interestingly, how do you feel they're aligned with the value you derive from them?

Other bits 'n pieces

We've also squeezed in a couple of small but (hopefully) helpful features:

File changed on disk warning

Ever do that thing where you've got a file open in the editor, and then you do a git checkout in a console that changes the file, only to forget about it, and overwrite your changes when you hit save in the editor? Or even accidentally open the same file in two different tabs and lose an hour's work?

Our editor will now warn you if it thinks the file has any changes that have happened on disk while you were editing it:

Can you guess how it works?

Start Console here option

The file browser now has the option to start a Bash console in the current folder

Reload Web App from any file in app directory

People were also telling us it's tedious to have to skip back to the Web tab to reload your web app when you're working on the code for your app. So, we've added a button to the editor toolbar to reload the web app.

It's only visible for files that are inside the folder that's associated with your web app, and we only know which folder that is if you went through one of our "start new app" wizards... If you used the "custom" button, unfortunately, we don't know where your app is. But, zap us a note with "Send Feedback" and we can manually register a folder for you...

Your suggestion here

Hope you find those useful. If you have suggestions for other little UI tweaks, we'd love to hear them. Or indeed, any other features you'd like to see. Keep in touch!


Interactive shells on Python.org

We're really proud to announce that we're providing a "Launch interactive shell" feature for the newly-redesigned Python.org website. We hope that the ease of just clicking on something on the site to try it out will help bring even more people over to The World's Best Programming Language!

Light technical details after the pretty picture...

Drawing

PythonAnywhere is a platform as a service for Python developers. One of our neat features is that you can work on your code from anywhere, from a computer or a tablet, by starting up a Python or Bash command line inside your web browser. So it was easy for us to roll out a simplified version of the system that just provided a simple Python 3 console (using IPython) for anyone who wanted one, and make that available to Python.org.

So, how does it work?

Getting started

When your browser loads the main Python.org front page, a small bit of JavaScript runs. This pings a URL on the PythonAnywhere servers to check that our service is up and running -- this allows us to rate-limit things if the number of visitors to the page gets beyond our abilities to handle new consoles. So far we've not had to do that, but it's good for our peace of mind to have a way of pulling the plug in an emergency.

Assuming that the response says that all systems are go, the JavaScript displays a button above the code sample on the front page that says "Launch interactive shell". That's all for now. Starting a new console takes a certain amount of machine resources, so we don't start one for every hit on the Python.org front page.

onclick="..."

When you click the interactive shell, a bit more JavaScript is run. This injects an iframe into the document, sized to cover the code sample. The src of the iframe is a URL on PythonAnywhere's servers. Without going into too much detail, hitting this URL creates a new "unregistered" user on our server cluster (one with very limited capabilities) and returns a load of HTML and JavaScript that displays a vt100 terminal emulator, and connects back using SockJS (which normally uses WebSockets) to one of our cluster of console servers.

The console server

Our console servers run Tornado. Specifically, they run a Tornado application that listens on port 443 for incoming TLS SockJS connections. When one comes in, it and the JavaScript on the front end do a short authentication exchange to make sure it's from a real user (not super-important for the publicly-accessible consoles on Python.org, but much more important for our normal site). After the auth, the console server constructs a sandbox for the user. This involves setting up a filesystem, and enabling the various limitations for the user. The Tornado process then forks off a new Python process, runng as that user, chrooted into the filesystem. (For a more in-depth look at the Tornado server, check out this EuroPython talk by Giles Thomas).

And off and running

Once the chrooted Python process is up and running, the Tornado server just works to forward keystrokes from the browser down to the process, and results back from the process to the browser. The JavaScript vt100 in the browser handles all of the formatting.

That's pretty much it for the overview! If there's anything you'd like more details on, leave a comment below and we'll respond there -- or we'll update the blog post if enough people are interested in the same bits.


PythonAnywhere now accepts credit cards.

We're pleased to announce that we now support credit card payments!

Screenshot of Credit card upgrade screen

Note screenshot subliminally encouraging people to upgrade to the most expensive possible plan...

Effective immediately...

You can use it effective immediately, and you can switch existing accounts away from PayPal too, if you like.

Lightning-fast turnaround

Screenshot of old trac ticket

Agile development in action

In all seriousness, thanks to everyone that kept pestering us for this, and thanks for bearing with us. In a future post we'll spend a bit of time moaning about Paypal's flaky sandbox environment and how much better Stripe's is...

Kitchen sink...

In other news, we release several minor bugfixes, as well as a whole heap of extra packages:

pyamf, beautifulsoup for python 3.3, marisa-trie, dataset, geoip, texlive-latex-extra, flask-httpauth, pygal, python-ldap, south for Python 3, xvfb-run, the latest scipy for Python 3, mysql-connector for Python 3, patsy, statsmodels, snappy and a full Haskell environment, just in case....

Let us know how you get on with all that, folks!


Scraping and number crunching for a sentiment analysis website

Harrison Kinsley uses PythonAnywhere to generate the data he hosts at his sentiment analysis website, sentdex.com. We asked him a bit about how the site works, and how he uses PythonAnywhere.

Screenshot of the Sentdex front page

Editor's note, aka the shameless marketing bit: the Sentdex code runs on our $99/month Startup Plan.

What does sentdex.com do?

We do sentiment analysis for stocks/forex/bitcoin, politics, and global sentiment, plotted on a three.js [ed: WebGL-based in-browser 3D graphics] globe.

What first gave you the idea to build a site like this?

I have always been interested in financial markets. I got the idea to attempt to get a program to sift through all the news for public companies to generate a sentiment rating on them, which could help long term investors find price inconsistencies in the short term for buying undervalued companies, or even short term traders trade on the sentiment.

I had a developer at the time who I used to develop all my silly ideas, but he wasn't able to do this one, so I had to go at it alone. I somewhat felt like I was a pioneer in the idea that you could use computers to gauge sentiment. I had seen people do Twitter mentions.. but not much else. Now that I've been here for a while, I can see I am hardly alone. Nothing new under the sun, as the saying goes.

Who uses the site?

It started out purely for stocks, and since then I have moved to politics and global sentiment. About 40% of my viewers come for the stocks, another ~30% comes for the political or general sentiment, and then the other 30% comes for the tutorials I give on Python.

How busy is the site (hits/day etc)?

I'd say it is still pretty small. We're currently getting 500-700 visits a day during the week, with 1000-2000 pageviews, and more like 300 or so visits over the weekend (less stock pages getting viewed with markets closed). It varies a lot and has been growing pretty much every day. The website is fairly new, with doors opening about 6 months ago. We've started getting a lot more organic results, so that is nice! We also recently got listed on the Chrome Experiments website for our WebGL globe of general sentiment. That's been a huge help.

How do you use PythonAnywhere?

Currently, I run the crawl bot, stock prices, and some of the number crunching on PythonAnywhere. This powers the stocks, politics, and bitcoin aspects of the website almost entirely. I'd like to eventually run the entire back-end on PythonAnywhere, but the other aspects involve processing through 5-15+ million tweets a day, and then some daily larger number crunching. Those two appear to really kill the cpu time, but I am also trying to shift the loads around a bit to make things more efficient and streamline them more towards the scheduled tasks that can be set up.

What frameworks/Python modules are you using?

I ended up creating my own natural language processing module, and usually wind up writing a lot of things myself. Matplotlib and Numpy are the two major modules that I can't thank enough for visualization and number crunching.

Other than that, I use the heck out of urllib2, re, cookilib (for the exceedingly tricky websites that don't have love for bots :(... ).

Were there any interesting problems/challenges with getting it working?

I started out with very little serious programming skills. It all started when I decided that I wanted to make a computer read and understand text. That was probably my biggest challenge.

What's your background? How long have you been programming?

I am 23 years old, a university graduate, with no CS schooling, and getting married in a few months! I double majored in Philosophy and Criminal Justice. If I could change it, I would probably would have done Philosophy and Economics instead.

I am self taught at programming. I started programming by algorithmic trading virtual currencies. I made a living from ~12 to 21 in the virtual currency market, but I wanted to move on to other things. I was also heavily flipping domains at that time for side income. That is when I decided on the natural language processing idea, but I didn't know anything about that field.

Even though I was algo trading with a program I wrote myself, I still didn't even understand what a function was. It was basically 1 giant loop with a lot of if statements.

A friend pointed me towards the NLTK text book, which covered Natural Language Processing in Python. At the time, I was trying to figure out what language to even learn. The algo trading was done in Pascal, so... yeah... it was time to do something else! I tried Java and C... but I just couldn't learn it, mostly because I felt like I wasn't making any progress towards the goal I wanted to achieve.

Even though NLP wasn't the best or easiest thing to start on, I went through the book in a couple days, simply because that was exactly what I wanted to learn. It was really nice to have found a huge body of work aimed at NLP, while also being written in a way that a true noob could understand.


Got a PythonAnywhere story you'd like to share next month?

Drop us a line at support@pythonanywhere.com and we'll have a chat!


PythonAnywhere and CloudFlare

CloudFlare is a security and acceleration service that sits between your application and the big, bad internet. Here's how to get all that goodness for your PythonAnywhere web app.

Since CloudFlare works by taking over the DNS configuration for a site, this will only work for custom domains. I have used minimumviableserver.com, which is one of the domains we have lying around.

The setup of a website on CloudFlare is simplicity itself. CloudFlare interrogates the DNS system for the domain's current settings and provides excellent instructions about what needs to be changed. After following this process, there is only one thing more to be done.

In order to configure CloudFlare so that it serves a web app from PythonAnywhere, the one change that is necessary is to set the "www" CNAME to point to a PythonAnywhere account. With these settings, the naked domain minimumviableserver.com gets redirected to www.minimumviableserver.com.

Screenshot of CloudFlare DNS settings

The CNAME for direct.minimumviableserver.com that you can see in the screenshot is provided by CloudFlare so you can get direct access to your servers without going through CloudFlare (for ssh or ftp, for example). It is not very useful for PythonAnywhere because <username>.pythonanywhere.com is probably a different web app and ssh.pythonanywhere.com is there for direct ssh access.

Once everything's set up on CloudFlare, it's time to create the web app on PythonAnywhere.

Screenshot of web app creation

So now when we go to www.minimumviableserver.com, we see our web app, but how do we know it's going through CloudFlare? looking at the network logs in our browser's developer tools, we can see that there are plenty of indicators in the response headers.

Here are the response headers for a dynamically generated html page: Screenshot of dynamic response headers

and these are the response headers for a static resource on that page: Screenshot of static response headers


HSK东西 Scripts: a site for learning Chinese characters - or, "Handling Chinese characters with Python Unicode strings is less hassle than I thought it would be."

Alan Davies is learning Chinese and couldn't find a site that would work out what level of difficulty a text or a vocabulary list would be. So he built a site to do that on PythonAnywhere, our Python-focused PaaS and browser-based programming environment. Bravely enough, he did it in Python 2, which is not renowned for its Unicode support. While he says that "there are a few little things you have to be aware of" Unicode-wise, it turned out to be entirely doable and is now used by people learning Chinese all over the world.

Screenshot of the HSK Scripts front page

What does your site do?

There is a standardised test of Mandarin Chinese for foreigners called the HSK. The HSK has six levels, with HSK 1 requiring the ability to work with a vocabulary of 150 Chinese words, and HSK 6 requiring about 5,000 words. It's useful to be able to ask questions about all these word lists:

  • How much of the HSK level 3 vocabulary do I know?
  • Which words do I need to know to get to HSK 5?
  • Given the vocabulary list for a textbook, what level will it get me to?
  • Which words were added to or removed from a level when the vocabulary was revised?
  • I have passed HSK 4, will I be able to read this short story?

I wanted something that could answer these questions, so I wrote a quick script for myself to use, taking some Chinese text or a vocabulary list, and breaking it down according to the 6 levels of the standardised Chinese HSK exams.

What first gave you the idea to build a site like this?

I started learning Mandarin Chinese a couple of years ago. Once I got started it was addictive, I think it really appeals to the brain of a programmer because of the highly logical and consistent grammatical structure, with every character pronounced as exactly one syllable. There are many other interesting features such as how the characters are constructed from smaller components which often give hints as to their meaning or pronunciation, a bit like a cryptic crossword clue in each character. One quick example: 马 (pronounced 'ma', ignoring the tones) means horse, and 女 means woman. 妈, which means 'mother', is also pronounced 'ma'. The left side of 妈 gives a hint to its meaning, and the right side to its pronunciation.

How does it work?

My script has a few text files as inputs: the HSK words at each level, word and character frequencies, a free Chinese-English dictionary, and a file that gives character composition information (see above). The words and characters at each HSK level are stored in Python set() objects, and when a user vocabulary list has been input the various questions to do with HSK characters in the can be answered with various combinations of the very powerful set operations. The resulting lists are sorted by character or word frequency as appropriate, to keep the more interesting and frequent characters near the top. The in-memory data structures (which are all sets, dictionaries, and arrays) are just pickled to disk, to save re-parsing the source files.

I also added the ability to take a block of text and try to turn it into a vocabulary list. Parsing Chinese sentences into words is quite a difficult problem, and not just because it is quite difficult to distinguish between 'words' and 'phrases' sometimes in Chinese (editor's note: Section 5 of this blog post gives you some idea of how hard this is). Secondly, Chinese speakers run 'words' together with pauses between them as we do in English, but when they write these words down there are no spaces between them. There generally isn't too much ambiguity when being read by a fluent Chinese speaker, but for a computer program this is a nightmare. Tokenising Chinese sentences has been the subject of many research projects, and all sorts of statistical, grammatical, and AI techniques have been tried. Another example: the characters 中 (middle), 国 (country), and 人 (person) are each words in their own right. Put them together and you can create the words 中国 (China, the 'Middle Kingdom'), 国人 (compatriot), and 中国人 (Chinese person). All this can of course be in the middle of a sentence with other potential word combinations formed from the first and last character. I chose a simple and crude way to resolve this ambiguity by looking at the frequency of the possible word combinations, it could probably use some more work though, maybe next time I am delayed at an airport!

Who uses the site?

Judging by the feedback I've had, people learning Chinese all over the world. The top countries are the US, China, Thailand, and Germany, and Taiwan.

How busy is the site (hits/day etc)?

At the moment about 100 unique users per day, as it's really still just in early development. I've shared it with a few people to test it, but if it becomes popular I'll polish it up and host it under my own domain name.

What frameworks/Python modules are you using?

The only non standard library module this project uses is Flask, I just wanted the simplest possible framework that would let objects persist on the server between requests, so I could keep some of the data around without re-parsing the data files for every request. The HTML and CSS is as simple as possible at the moment, making it look pretty is not the part that interests me much.

Were there any interesting problems/challenges with getting it working? We're guessing Unicode...

Handling Chinese characters with Python 2 Unicode strings is less hassle than I though it would be, but there are a few little things you have to be aware of. You end up being paranoid and prefixing all of your string literals with 'u', a problem that I believe is fixed in Python 3 where all strings are unicode by default. Missing the 'u' prefix will cause some nasty behaviour from the examples below; '1' will output a bit of line noise, 2 will output "\u4e2d" rather than a properly encoded character, and worst of all will 3 and 4 will throw a run-time exception. 5 will usually work correctly depending if your editor has saved the file with the correct encoding, although for safety I stick with 6 and keep my source files in ASCII:

print "1: {}".format("中");
print "2: {}".format("\u4e2d");
print "3: {}".format(u"中");
print "4: {}".format(u"\u4e2d");
print u"5: {}".format(u"中");
print u"6: {}".format(u"\u4e2d");

It is also important to specify the encoding of files when reading and writing non-ASCII encodings, otherwise you will have garbled characters. For UTF-8 it is also usually better to include a byte order mark as well. The following code opens files for reading and writing:

import codecs
infile = codecs.open(infilename, 'r', "utf-8")

outfile = codecs.open(outfilename, 'w', "utf-8")
outfile.write(codecs.BOM_UTF8.decode("utf8"))

What's your background? How long have you been programming?

I messed about a bit with BASIC on the the BBC Micro and ZX Spectrum when I was young, and then Assembler on the Atari ST. I did a degree in Computer Engineering and started work as a software developer, working in C, then C++, and more recently various .NET languages. I'm now doing a PhD in Finance so any programming I do tends to be econometrics in R, SAS, and Stata. For about 12 years though I have tended to used Python for my own small projects, as I find it very quick and intuitive to write, and very readable.

How did PythonAnywhere help?

With PythonAnywhere I can log in from anywhere when I have a few minutes and fix a bug or add or improve a feature. And the processing power available is blinding.


Got a PythonAnywhere story you'd like to share next month?

Drop us a line at support@pythonanywhere.com and we'll have a chat!


Python 3 web apps!

As of today, PythonAnywhere supports Python 3 web apps for all frameworks that work with Python 3. We'd love to hear any feedback people have.

To create a Python 3 web app, just click the "Add a new web app" button. If you have an existing web app that you'd like to switch over from Python 2.7 to Python 3, drop us a line with the "Send feedback" option.


17 Xs About PythonAnywhere That Will Make You Y What You Thought About Z

In an attempt to get more clicks, this post (which, emphatically, is not a newsletter) has been rewritten in an annoying, nu-internet upworthy/buzzfeed stylee. Welcome aboard!

1. That Amazing Moment When Your Hit Rate Goes Through The Roof

At 1:37 he hooks you in with a meteoric rise in users numbers. At 2:56 he explains some pretty cool hacks with SQL ORDER BY RAND. At 3:17 your mind is totally blown by the web-scale.

Ricky Martin's Greatest Python Hits

OK, so it's not even a video. But it is a pretty cool case study. Over the last year or so we've gone from hosting small hobby projects and small apps to hosting some serious, high-volume sites spiking up to hundreds of thousands of hits per hour. Read more about one of our customers who's built a popular web-radio station out of nothing, using web2py.

Also: illegal narcotics are involved. If that's not clickbait, I don't know what is!.

Scaling a popular internet radio station, an interview with Mark, creator of Stereodose

2. What Happened When Miley Cyrus Saw The New Autocomplete Feature In Our Editor

You didn't know Miley was one of our long-standing users? She has a Ph. D. in Computer Science, you know. And she shares a name with a celebrity. Here's what she had to say.

Miley Cyrus says Autocomplete Rocks!

We use Ace for our in-browser editor -- it's an awesome open source component: the devs have been busy, and it now has loads of new features, including autocomplete, multi-line commment/uncomment, block selection, and much more.

  • Ctrl + / comment/uncomment
  • Ctrl + <Space> Autocomplete!
  • Ctrl + P Jump to matching bracket
  • Ctrl + Alt + Up multiple-cursors add one line down
  • Ctrl + Alt + Right multiple-cursors add to word matching selection

You can consult a full list under the link "Keyboard Shortcuts" if you twerk your way over to editor via the files tab.

3. The one plan you have to check out before you die

If you had just one chance to come up with a hosting plan, would you come up with anything as crazy as these guys did?

New Pricing table!

So we just launched a $99 "Startup plan", as part of our "all-growed-up-now" offering to more serious hosting users. Some people have told us they worry it's too big a leap from the $12 plan up to the $99 version. What do you think might work well as a stepping stone between the two? Email us -- support@pythonanywhere.com

4. We Asked 75 Python Core Devs What Their Favourite Animated GIF Was. You Won't Believe What They Came Up With.

Well, by "asked", we meant "imagined asking". And I don't think we really stretched to imagining asking 75 times. Anyway, here's refactoring cat!

PythonAnywhere static files

Remember folks, only good test coverage can save you from becoming the refactoring cat!

5. Shocking Outage Reports And Downtime Notifications You'll Tell Your Children About Some Day

It's all fun and games until everything explodes. Read up on one of the more stressful hours we've spent recently.

Catastrophic failure error message

More generally, we're working towards a zero-downtime architecture, but while we're on our way there, we're going to have to keep having planned outages for new releases. As we get more and more serious customers, we can't get away with announcing them half an hour ahead of time on Twitter. We were thinking email notification, 24 hours ahead of time. But maybe you'd prefer something else? Answers on a postcard to support@pythonanywhere.com please!

5. If You Can Read This Next Article Without Tearing Up, You're Made Of Stone

[All right, that's enough - Ed.]

Keep in touch everyone! We always love to hear what y'all have to say.


Scaling a popular internet radio station, an interview with Mark, creator of Stereodose

Or: What happened when I hit the front page of reddit...

Mark is the creator of Stereodose, a wildly popular internet radio station with a unique approach to generating playlists: you pick your drug of choice, you pick your mood, and a customised selection of tracks start to play.

Stereodose is hosted by PythonAnywhere, a Python-focused PaaS and browser-based programming environment.

Stereodose.com stats
Users80k
Pageviews500k / month
Unique visitors70k / month

Screenshot of the stereodose front page

Disclaimer: PythonAnywhere LLP naturally does not condone the use of harmful drugs. This is why we do not support PHP. Although we do believe that people should be free to use PHP if they wish to, in the privacy of their own home.

Here's our exclusive interview with Mark:

What's your background? How long have you been programming?

I graduated from college in 2011, with a degree in Economics. I wasn't interested in finance or consulting, which seemed to be what all my classmates were doing, so I started working at a small music marketing company out of college. I interacted with many blogs at the marketing job, and I wanted to create something with more functionality than what I was seeing, especially one that catered to my music tastes.

Around late 2011, I started learning how to program, with my main goal being to build my own Wordpress music blog (learning HTML, JavaScript, and PHP). I enjoyed programming and creating much more than marketing, so I eventually quit my job in 2012 and started learning more/resume building. In total, I've been programming a little over 2 years now.

What first gave you the idea to build a site like this?

I came up with the idea some time in 2011. While coming up with a theme for the a music blog I wanted to create, I realized there weren't any music blogs focused on "drug music." I created the music blog, but it didn't really go anywhere. I decided the "drug music" message needed to be as blunt as possible, so I came up with a "pick your drug, we'll make your playlist" format.

Why did you choose web2py and PythonAnywhere? What do you like about them? What do you hate about them?

At first, I used Wordpress to build the site. It was doable, but I was forcing Wordpress to do something that it wasn't optimized to do. I started looking into other options, realizing a web app framework would be my best bet. After doing lots of research, I came up with Rails, Django, and web2py. Looking at languages, I found PHP to be really ugly, with complex syntax, compared to Ruby and Python.

I researched each framework as thoroughly as I could, although I didn't understand many of the specifics at that point, just the general differences. I remember reading a lot of discussions, and finding Massimo and other core web2py developers actively supporting users. The smaller community and passion for web2py (having the creator write Stackoverflow responses), was very convincing. I saw web2py as a reaction to Rails and Django, taking inspiration from both, but improving on the flaws of each framework.

I don't quite remember where I found PythonAnywhere first, either in the web2py forums or this web development tutorial by Marco Laspe. However, it was definitely the tutorial that convinced me that PythonAnywhere was the way to go, as deploying web2py easily was very important to me (being so new to web development).

I searched through a couple other hosts, but PythonAnywhere was the only host with a special interest in web2py and Python.

There are many things I like about web2py and PythonAnywhere, so I'll just cover the most important points. For PythonAnywhere, the in-browser Bash Console, Scheduler, and amazing customer support are the big winners. There is an undeniably human aspect to PythonAnywhere, with very real interactions and help when you need it. I really feel that I'm working with people who care about innovation and programming, not just an anonymous corporation.

For web2py, the dedication to creating a quality product from the web2py community is my favorite thing about using web2py. I have many specific needs for my site, and so far, web2py has provided the tools to build a wide array of functionality. It always surprises me how vast web2py's tool set is, and the fact that many of the features have come from web2py community suggestions.

As for things I hate, I'm going to be very careful with what I say hahaha. Most of my frustrations come from the fact that I'm learning as I'm maintaining/creating for Stereodose; I'm doing many things for the first time. I can't say that I really hate anything about web2py or PythonAnywhere. I'm not at a level of competence to where I can accurately criticize either (sorry for the safe answer).

[Ed: if you want to check out web2py on PythonAnywhere, check out our try-web2py demo, no installation or signup required]

When did you first realise the site was getting a serious amount of hits? Where do you think they came from?

My first breakthrough was after I posted the site on /r/drugs (the drug subforum of Reddit). After that, people started periodically posting the site on other forums as well, with a fair number from Reddit too.

The biggest break was earlier this year, when the site got put on rotation with Stumbleupon, and at the same time it front-paged Reddit (through r/Music). I never spent much time marketing the site myself, and was planning on leaving it be, before this traffic surge. I liked the idea of growing Stereodose and wanted to see its full potential, so I abandoned a resume builder project I was working on to focus on Stereodose.

What kind of traffic are you seeing now?

In the most recent month:

  • Total number of users: 80k
  • Pageviews/month: 500k (website only)
  • Unique visitors/month: 70k (website only)
  • 600k songs favorited/liked by users.
  • 14k playlists played daily.
  • Anywhere from 100k - 200k songs played daily (very rough estimate, multiplying playlists by number of songs per playlist, which depends on how long the listener listens for).

What did you learn from having to scale the site so quickly?

One interesting problem was getting random records from the database, efficiently -- the playlist needs to be fresh every time you load it. I originally used web2py's DAL , using orderby="<random>" to get random records from the database. Unfortunately, this would do a full table scan of the entire song data table. This is expected behavior from orderby="<random>", since it represents mysql's ORDER BY RAND(). As the song table grew larger, this presented more and more of a problem, contributing to many 502 errors and slow page loads.

Thinking about a solution, when obtaining random records I wanted to make sure that the number of records analyzed by mysql was equal to the number of records I actually needed. If I wanted 30 songs, I didn't want to analyze 12000 records with a full table scan, but only analyze those 30 records.

The main solution I found was to generate 30 numbers randomly, and then find the record ids that matched those numbers. Frustratingly, this only works if your record ids are sequential, without any holes. Using a WHERE clause in the database query naturally creates holes in ids of your dataset, since the results that match a WHERE clause are not sequential. I needed the where clause, because the songs are chosen at random ONLY if they match the genre that the user selected.

Thus, my solution was to add a column which contained a sequential number for each song, per genre (row_id_per_genre). If I wanted to select 30 songs at random from Genre A, I would find the number of records in Genre A (stored in a summary table so I don't have to count each time), create 30 unique random numbers from 1 to (# of records in Genre A), and directly find those records in the database. The row_id_per_genre column eliminates holes I would usually get from using the WHERE clause.

The result is able to use index of the row_id_per_genre to find random records directly, and avoids a full table scan, resulting in much faster database selects. A drop from about 300-500ms to 15ms. Other playlist related pages also had this problem, and I used the basic principle from this solution to speed up other random selects.

I didn't realize how unscalable ORDER BY RAND() was, because I never had a big enough table with decent traffic to cause any problems.

Do your boss / college tutor / parents / children/ significant other know about the site?

Most people who know me, also know that I run Stereodose. My family is somewhat iffy about the site; they come from a culture where drugs are very taboo, so it's understandable that they'd have some reservations about Stereodose.

Do you have any advice for other aspiring web developers?

It's not impossible to learn on your own, as long as you have an internet connection and can use Google search! Might seem daunting at first, but give it time and you'll be surprised at how much you can understand.

Do you have any ideas for what to do next with the site? What about other projects?

I have a couple big features coming up for the site, the main one being user-created playlists (and with that, more drugs and moods). There are a couple project ideas rattling around in my head, but I'm not ready to materialize any of them yet.

I'm also looking to see if there are other companies or projects I can contribute to once Stereodose can run on it's own. Some professional experience working with other programmers would be a good change of scene (as well as some pocket change).

Final question: what do you usually pick as your playlist?

Oooo that's a tough one...I love to dance and get down with a funky vibe, so my top choice would be Weed -> Groovin. But I'm always changing it up!


Editor's note, aka the shameless marketing bit: Stereodose started out on our $12/month Web Developer plan, which we market as "able to handle the hitting the front page of Hacker News now and again". As Mark's requirements grew, we started scoping out a higher-end $99 Startup Plan, although even he doesn't need quite that much! We've put together a custom package for him; if you're interested in hosting with us at a custom price point, do get in touch and tell us about your requirements. Don't be scared! We have no salespeople, just friendly devs :)

Got a PythonAnywhere story you'd like to share next month?

Drop us a line at support@pythonanywhere.com and we'll have a chat!


Page 1 of 8.

Older posts »

PythonAnywhere is a Python development and hosting environment that displays in your web browser and runs on our servers. They're already set up with everything you need. It's easy to use, fast, and powerful. There's even a useful free plan.

You can sign up here.