System update this morning

This morning's system update went smoothly :-)

It was primarily a maintenance update, bringing our US-based system up to the same version our EU-based system. There were a number of minor bugfixes, along with a bunch of improvements to our system administration tools, which won't be visible to you, but do mean that we'll be able to spend less time on admin stuff -- which gives us more time to work on adding cool new features!


Announcing eu.pythonanywhere.com

We're proud to announce today that we now have an EU-hosted PythonAnywhere system :-) You can access it at eu.pythonanywhere.com. It's completely separated from our normal system, but has all of the same features -- plus billing in euros.

Why we built it

Since the GDPR came into effect last year, a lot of people have asked us about whether our servers are compliant. The answer is a resounding "yes!" -- historically all of our servers have been in the US, but they're in a datacenter run by Amazon Web Services, which is covered by the EU-U.S. Privacy Shield Framework -- an agreement that allows certain US-based hosting solutions to be treated as "as good as" EU-based systems for the purposes of the GDPR.

But we appreciate that some people are still concerned. Agreements like Privacy Shield can potentially come to an end, and while things are looking good right now, there were some issues last year.

So now, if you want to be sure that your own account's data, and any data you're using PythonAnywhere to manage, is stored in the EU, we have a solution. The servers for eu.pythonanywhere.com are based in AWS's eu-central-1 datacenter in Frankfurt, and there's no connection between the two systems apart from a common list of usernames (which is also in Frankfurt).

Why you might want to use it

If you're based in the EU, there's almost no downside to using the new system. You will have the comfort of knowing your data is in the EU, it's closer to you in network terms (in our tests, the network latency is about 8ms from Amsterdam, versus 90ms to our US servers), and for paid accounts, billing is in euros so you don't need to worry about foreign exchange fees on your card payments. If we need to perform system maintenance on the servers, we'll do it late at night European time, rather than during the early morning timeslot that we use for the existing system. The only reason you might want to stick with the US system is that it's a little cheaper; our underlying hosting costs are a bit higher for the EU service, which is reflected in the prices -- a Hacker account that costs $5/month on the US servers is €5/month on our new system (plus VAT if applicable).

If you're not in the EU, then we'd recommend you stick with www.pythonanywhere.com unless you're handling a lot of data relating to EU citizens and are concerned about the GDPR. But that said, you're entirely welcome to use the new system if you prefer :-)

How to get started

You can sign up on the EU servers right now. If you already have an account on our existing system, you'll have to use a new username; we're blocking duplicate usernames at the moment because soon we'll be offering the ability to migrate existing accounts over from the US system to the EU. If you're interested in having an existing account moved over, email us at eusupport@pythonanywhere.com and let us know your account username.

Any questions?

If you have any questions about the new EU systems, please do let us know -- you can post a comment below, or send us an email.


How DNS works: a beginner's guide

We sometimes get emails from people who are trying to point their custom domain at PythonAnywhere so that they can host their website, but are struggling to set up their DNS settings. Normally DNS setup is pretty simple, but sometimes people can get bogged down due to confusing interfaces on their registrar's site, or complexities in the terminology people use.

The parts of DNS that you need to know about in order to host a website are actually not all that complicated, but some domain registrars have complicated, hard-to-understand interfaces. Either they assume that you understand all of the technical details about how the whole thing works -- which makes it hard for first-timers -- or they try to put a simple user-friendly interface on top of it, but simplify it so much that it's actually harder to use because they're hiding important stuff from you.

Given that basic DNS stuff really isn't all that hard, we felt that it would be a good idea to post an explanation, going from the basics up to some slightly deeper stuff. This post is written so that if you only want the basics, you can just read the first part, while if you want a deeper understanding -- either out of interest, or because your domain registrar has got such a low-level interface that you need to -- then you can keep reading.

It's worth noting that for most people, you don't need to know any of this stuff to set up a website on PythonAnywhere, even with a custom domain; it's meant more as an explanation so that people who do run into problems with their registrar have the background knowledge they need to solve the problem -- or, indeed, to explain to the registrar's tech support team what the problem is. And, of course, it's a bit of light reading for people who are just interested in this stuff :-)

The basics: domain names and IP addresses

When a browser wants to connect to www.yourdomain.com, it needs to know which computer on the Internet is hosting that site. The string www.yourdomain.com is a hostname (technically it's a "fully qualified domain name"), but at the underlying network layer, all computers are identified by IP addresses, which are numerical. So somehow the browser needs to find out which numerical IP address it should use when it wants to talk to www.yourdomain.com.

It does that by asking a DNS server -- in full, a Domain Name System server. Normally this will be a server that's provided by your ISP or your local network administator; in general, when your computer joins a network, the details of the DNS server are part of the information it gets from the router. So to convert a hostname to an IP address, the browser makes a system call to the operating system saying "please get me the IP address for www.yourdomain.com", the operating system sends a message to the DNS server with the same request, and the DNS server responds with the IP address: something like "23.145.21.243" (in IP version 4, the version that's currently most widely used, IP addresses are normally written as four numbers between 0 and 255, separated by dots -- I'll use similar addresses as examples later on). Now the browser can connect to the server at that IP address, send it an HTTP message saying "please send me the contents of the front page for www.yourdomain.com", and the server will respond with the appropriate stuff.

What all this means is that when you want your custom domain to be hosted on PythonAnywhere, you're essentially setting things up so that all of the DNS servers across the Internet know what IP address to provide when someone wants to access your site. (This, by the way, is why we say that the changes you make "may take some time to propagate across the Internet" -- it's not just one central database that needs to be updated; instead, different ISPs will pick up the change at different times. A little more about that later.)

DNS records

The explanation above is, of course, a bit simplified. The DNS database isn't just a mapping from hostnames like www.yourdomain.com to IP addresses. Instead, for each "domain name", it can keep a bunch of different types of records. I put "domain name" in quotes back there because the technical meaning of the words here is slightly different to what people normally use it to mean. When we normally use the phrase "domain name", we mean something like "yourdomain.com", "google.com", or "bbc.co.uk". That is, we mean the portion of a network address that identifies a particular organisation. We'd expect (for example) Google to own the domain name google.com, to have a website at www.google.com, and for people who work there to have email addresses like alice.jones@google.com.

But in the technical parlance of DNS, a "domain name" is something a little more subtle. The Wikipedia article is (of course) a good explanation, but a summary is:

  • There is a "root" domain called .
  • There is a subdomain of the root domain called com.
  • If you buy yourdomain.com then you own yourdomain.com., which is is a subdomain of com.
  • If you create a website at www.yourdomain.com then technically you've created a subdomain of yourdomain.com. with the name www.yourdomain.com..

(You'll notice that all of the domain names above end with the . to represent the root domain. Technically domain names always should, but no-one ever bothers, so I won't use the extra . in the rest of this blog post.)

Basically, any level of the hierarchy is a domain name. A number of different kinds of "records" can be associated with a domain name. The domain yourdomain.com might have various records, and so might www.yourdomain.com. Any domain can have a number of different records, including multiple records of the same type.

Some examples of record types are:

  • A ("Address") records, which are how you specify an IPv4 address like 23.145.21.243. These tell DNS that if someone wants to treat the domain name as the name of a computer, then this is the IP address they should use.
  • MX ("Mail eXchange") records, which specify what email server should be used for a domain. If someone wants to send email to bob@yourdomain.com, the email systems involved will use the MX record for yourdomain.com to work out which computer to deliver it to.
  • TXT ("TeXT") records, which are pretty much free-form text and are used for various purposes including verifying senders of email messages.

There are a bunch of others -- NS, SOA, and so on. But apart from A records (which you can see are obviously important in the example above where we got the IP address for a hostname), there's one other interesting kind of record: CNAME.

CNAMEs

When a browser (via its operating system) asks its DNS server for the IP address corresponding to a hostname, the server can respond in a number of different ways:

  • If it doesn't know what the IP address is, it will just return a response saying so, so that the browser can display an appropriate error page. (For example Chrome will say something like "www.nonexistent.com’s server IP address could not be found.")
  • If it has an A record for the hostname, then it will just return the record and the browser will use the IP address contained in it.
  • If it doesn't have an A record, but does have a CNAME (abbreviated from "canonical name") record then it will return that. The CNAME record doesn't contain an IP address; instead, it contains another hostname. The browser can then send the DNS server a second message, saying "OK, then -- so what's the IP address of that hostname" -- and hopefully the response this time will have an IP address. (Of course, it could just respond with another CNAME record, which would require a third lookup, and so on.)

CNAME records, on the face of it, look like a somewhat roundabout and inefficient way to do things. Instead of making one request to get from a hostname to an IP address, a browser will need to make two or more. So why is it that when you create a website with a custom domain on PythonAnywhere, we ask you to set up a CNAME record to point www.yourdomain.com to a hostname like webapp-123456.pythonanywhere.com, rather than just providing you with an IP address so that you can create an A record?

The answer is that CNAMEs are actually really useful -- and also not all that inefficient in practice.

The usefulness first: let's imagine you had a website hosted at www.yourdomain.com, and you set up an A record with your registrar to point it at an IP address that belonged to PythonAnywhere. Your site would be up and running, and everything would work fine so long as that IP address was always the right one for your site. But if it changed, you'd need to log in to your registrar again and update it.

But IP addresses sometimes have to change. Sometimes they get blocked in certain countries. Sometimes a specific IP address might be subjected to a denial-of-service attack, and be unusable. Or sometimes we at PythonAnywhere might want to move your site from one IP address to another simply to balance out load across our cluster of servers.

If you're using a CNAME record, then IP address changes like that are something you don't need to worry about. We can update the A records for our own hostnames, like webapp-123456.pythonanywhere.com, because we control the DNS settings for pythonanywhere.com and all of its subdomains. Because your website is pointed at us using a CNAME, browsers that want to connect to your site will start using the new IP address without you needing to do anything. They'll ask for the IP address of www.yourdomain.com, they'll get a CNAME response saying that it's the same as the one for webapp-123456.pythonanywhere.com, so they'll look up the IP address for that and will get the correct new address from the A record that comes back.

But we can't update DNS records for your domain -- only you can do that -- so if you use a A record, when the IP address changes you'll have to update it yourself every time. If you happened to be away, or if we don't have up-to-date contact details for you, it might be some time before you knew about the problem and were able to fix it.

So what about the inefficiency? We always make sure that the webapp-XXXXXX.pythonanywhere.com addresses point to an A record rather than another CNAME, but that's still, in theory, two lookups to go from www.yourdomain.com to an IP address -- one to get the CNAME, and one to get the IP address from the hostname stored there. According to Cloudflare, the average ISP has a 70 millisecond round-trip time for queries, so that might mean that your site would load up 70 milliseconds slower if it needs to do two queries rather than one. Not a huge amount of time, but given that research shows that people are less engaged with slower websites, every millisecond counts.

The answer here is twofold -- firstly, your computer will generally cache the results of DNS lookups for some time. So this extra time will only impact some hits to your page. Secondly, many DNS servers are pretty smart -- if someone asks for a hostname, and the result is going to be a CNAME record, they know that the next request is likely to be for the CNAME's value -- so they'll attach the results for that to their response as well. So, for example, a browser might say "what's the address for www.yourdomain.com?" and the DNS server would reply "www.yourdomain.com has a CNAME pointing to webapp-123456.pythonanywhere.com. Oh, and by the way, webapp-123456.pythonanywhere.com has an A record pointing to the IP address 1.2.3.4". The browser can then just use the IP address directly; there's only one lookup, but the CNAME is fully resolved.

So using CNAME records to point your website at PythonAnywhere means less maintenance for you, and in general no real performance overhead.

There is one case where CNAMEs can be problematic, though:

Naked domains

One problem with CNAMEs is that in general, you can't use them for "naked" domains. A naked domain is the domain name that you buy from your registrar, something like yourdomain.com -- without anything like www. in front of it. For relatively arcane technical reasons a naked domain can't use a CNAME, only subdomains like www.yourdomain.com -- and so if you want yourdomain.com without the www. to point to PythonAnywhere, you have to use an A record.

That's why we suggest that if you want your site to be accessible without the www., you should host it at www.yourdomain.com, and then use a redirection service so that people who visit the address without the www. will be transparently redirected to the address with it. Most registrars have this kind of thing built in. You tell them to do the redirection, and they'll set up an A record pointing to one of their servers, and that server will do the redirect when it receives a request -- and because, unlike us, they can update A records on your behalf, they can fix things if it needs to change. If your registrar doesn't support it, there are third-party services you can point to with an A record you set up yourself.

See our help pages for details of how to set up a redirect -- we have links to the appropriate documentation for a bunch of popular domain reistrars, and also some links to third-party services if you need to use them.

How to use all of that information

Hopefully by now it should be clear what you need to do when pointing a custom domain to PythonAnywhere; you log into your domain registrar's website, and find the place where you set up your DNS configuration. Once you're there, you set up a CNAME record to point www.yourdomain.com to the webapp-XXXXXX.pythonanywhere.com value shown when you look at your website's configuration on the "Web" page inside PythonAnywhere. If there are any other A or CNAME records for www.yourdomain.com then you should delete them (because having two records for the same hostname will potentially confuse the DNS). But don't delete any other records -- there may be things like MX records, which you'll remember are used for email, or "NS" or "SOA" records, which are low-level DNS stuff that you shouldn't touch without knowing pretty clearly what you're doing.

Different registrars have different interfaces, but with most of them make it's reasonably easy to find the bit you need once you know what you're looking for. We also have some links to the appropriate documentation on the sites of popular registrars on our help pages.

One thing that does sometimes confuse people is that many registrars only require you to type in the bit that goes before your domain name when specifying a record -- that is, for www.yourdomain.com you would set up a CNAME with a "name" of www and a "value" of webapp-XXXXX.pythonanywhere.com. If you created one with a "name" of www.yourdomain.com, then the CNAME would actually point the hostname www.yourdomain.com.yourdomain.com at PythonAnywhere, which would be unhelpful :-)

Sometimes people are told by their registrar that instead of setting up a CNAME, they need to provide a name server (or even two name servers). Oddly, this sometimes happens even when the registrar in question really does support CNAMEs. Without wanting to call any registrar out in particular, if you are told that, but your registrar appears on the list on our help pages then the customer service person you're talking to is a little confused, and you should check out the documentation that we link to.

But some registrars really do not allow you to set up CNAMEs; they require you to specify name servers. This can happen, for example, with some country-level domains. Explaining what that means is what we'll move on to next.

Domain registrars versus DNS and name servers

Providing the ability to register domains, and providing the DNS stuff for those domains are, technically speaking, two different services. Given that in general it's pretty pointless to own a domain without being able to associate it with an IP address so that you can host a website there, most companies that do domain registration provide both registration and DNS services as a bundle, so you don't need to know about the separation -- but it is there, and if you find yourself having to use a registration-only service, you need to know a bit more about how things work.

The first thing to explain is what a name server is. Remember that earlier we said that when a browser wants to connect to www.yourdomain.com, it asks its local DNS server -- the one provided by the ISP -- what the IP address is. Obviously, the DNS server needs to find the answer somehow. It might have cached the DNS records for the hostname you're looking for due to some previous request, but if this is the first time it's heard about this hostname, it will have to pass the query on to a different server. This is the "authoritative nameserver" for the domain; different domains will have different nameservers set up for them. In a normal setup, when you buy yourdomain.com, there will have to be one or more computers somewhere on the Internet who, when asked for the details of www.yourdomain.com, can give the official answer -- a CNAME, an A record, or whatever. So if the DNS server belonging to your ISP doesn't know the IP address for www.yourdomain.com, it needs to ask the nameserver that is authoritative for yourdomain.com for the answer.

You might wonder how your ISP's DNS server knows how to find out what those nameservers are -- how do they know which nameserver is authoritative for yourdomain.com? The answer is that they ask the nameservers that are authoritative for "one level up" -- that is, if they want to know which nameservers are authoritative for yourdomain.com, they'll ask the nameservers that are responsible for .com to tell them. Likewise, if asked "which nameservers are authoritative for yourdomain.co.uk?", they would ask the nameservers that are authoritative for .co.uk, which might first require them to ask who is authoritative for .uk in order to make that query. (You might in turn wonder how the whole thing gets bootstrapped -- how the DNS server works out the nameservers that are authoritative for these top-level domains like .com and .uk. The answer is that there are some special servers that handle that -- that is, servers that are right at the top of the tree, and are authoritative for the . "root" domain we mentioned a while back.)

So -- when you register a domain name with a normal "bundled" registrar like GoDaddy, they will do two things:

  • Register you as the owner of the domain with the organisation who is responsible for keeping track of such things (which will be a different one for each top-level domain -- Verisign for .com, Nominet UK for .uk, and so on)
  • Set up one or more of their own nameservers so that they can give authoritative answers to questions about your domain.
  • Tell the authoritative nameservers one level up from your domain (that is, the .com ones for yourdomain.com) that their -- that is, the registrar's -- nameservers are authoritative for your domain.

Once that's done, you're all set -- you can use their web interface to update your DNS settings, and when you do that they'll pass the changes on to the appropriate nameservers.

However, if you are using a registrar which doesn't do DNS for you, they won't do the second two steps. They'll just register the domain. They will, however, have a way to tell the one-level-up nameservers which nameservers are authoritative for your domain -- but you'll need to provide them with the names of some servers in order for them to do that.

Setting up your own nameserver is really tricky, but luckily you don't need to do that. There are free services out there that can handle all of it for you, so you just need to sign up with one of them. They'll provide you with the addresses of some nameservers that you can then pass on to your registrar, and a web-based interface so that you can set up all of your DNS records like the CNAME and so on. One that consistently get good reports from our users is the FreeDNS service from Namecheap.

Conclusion

Hopefully that was all reasonably clear. You now know the basics of how DNS works, and how it interacts with domain registration. If you've got any questions, please leave them in the comments below -- and additionally, if you think there are other things that you'd like us to add to this post (or a future one), please do let us know!


Slow scheduled tasks after yesterday's system update

After our system update yesterday, there was a period when some people's scheduled tasks were running slowly. This is an update on what caused the issue and what we did to fix it.

The slowdown

Different code that you run on PythonAnywhere runs on different specific servers; this is because different workloads have different CPU/memory requirements, and the most effective way to set things up from our side is to group like with like -- so, for example, websites run on web servers, consoles on console servers, and scheduled tasks on task servers.

At around 10am UTC, three hours after the system update, we noticed that the load average on one of the task servers was extremely high -- around 100. This meant that 100 processes were in the run queue -- that is, they wanted some processor time to do calculations, and were waiting for it. This was clearly an indication of a problem; the run queue should normally be pretty much empty -- or in the worst case, have a number of processes approximately equal to the number of cores on the machine.

We logged in to the affected machine, and indeed there were a lot of processes running. But the CPU itself didn't seem particularly busy. Investigating, we discovered that the connection from that task server to one of the file servers was slow. It looked like perhaps all of the processes were queueing up blocked on IO.

A file server is exactly what it sounds like -- just a server where people's files are stored. Each account has all of its files on one file server; the file server makes the files available to the servers where code is actually run (as well as managing things like disk mirroring, backups, and so on). So a slow connection from a task server to a specific file server meant slow processes for the users on that task server whose files were on that file server.

While we were investigating this, early indications of similar problems started cropping up on other task servers. In each case it was a different file server -- for example, task server 4 was having problems talking to file server 3, and task 6 to file 2.

The investigation

Initially this looked like it might be due to connectivity problems between the servers in question. That's something that is managed for us by Amazon AWS, so we got in touch with them. After much discussion, network issues were eliminated as a cause, and we had to go back to first principles.

The main question was, what had changed on these servers as a result of the system update? The changes we had made didn't really have much to do with the scheduled task system. However, one change had been made that was common to all servers where our users' code is executed: an improvement to our tarpit system.

Each user on PythonAnywhere gets a quota of CPU-seconds that they can use over each 24-hour period. This is the number of seconds for which they can use 100% of a CPU (rather than actual 'wall clock' time), and different users have different amounts, free users (of course) having the least. When a user runs out of their quota for the day, they are "put into the tarpit". This means that instead of getting guaranteed CPU time, they get a slice of whatever is left over after other (non-tarpitted) people have had their share.

We had recently found certain cases where people who were in the tarpit could use up quite a lot of CPU. This wasn't generally a case of people trying to "cheat" -- more that in certain cases, errors that they made in their code could lead to them wasting CPU to no benefit to themselves. We had made a change that fixed that, using the CFS ("completely fair scheduler") quota cgroup feature that is part of the Linux kernel.

While there was no obvious way we could see that this could impact file system access, especially at a system-wide level (as opposed to just for the people who were in the tarpit), it was the only change we could see from the previous system. And on the impacted task servers, there were users who (a) were in the tarpit, (b) had their files on the file servers with which those specific task servers were having slow connectivity, and (c) had lots of processes running.

So, experimentally, we disabled the new tarpit behaviour on one of the misbehaving servers, and watched. Load average on that server immediately started dropping -- and a test of the connection to the fileserver had it going back to normal levels of latency. We had our culprit!

The next step, of course, was to disable the new tarpit behaviour across all task servers; both the other ones that were showing the problem, and the unaffected ones. That was done by about 5pm, and all has been well since then. Load averages across the system are normal.

Conclusion

We're still trying to work out exactly what interaction between the CFS-based tarpit and the file server system caused the problem. One -- somewhat vague -- hypothesis is that when someone went into the tarpit, but was trying to do lots of file IO, some kind of internal buffers between the file servers and the user's code filled up. The user's code was running slowly, and pulling stuff off the buffers slowly, which meant that the buffers were full -- but perhaps the buffers were shared between everyone on the machine rather than being per-user. This doesn't sound quite right, though, and more digging is definitely needed.

We'll keep investigating, though, and for now, the new tarpit system will remain disabled on task servers.


Today's upgrade: Let's Encrypt auto-renew and much much more!

This morning's system update went pretty smoothly, and we have some cool new stuff to announce:

Let's Encrypt certificates with automatic renewal

You can now get an HTTPS certificate for your custom domain using Let's Encrypt without all that tedious mucking around with dehydrated -- and you don't need to remember to renew the certificate either, or even set up a scheduled task to renew your certificate for you.

Just go to the new "HTTPS certificate" line in the "Security" section of the "Web" tab. You'll see a pencil icon next to the kind of certificate you have (which will probably be "None" or "Custom"). Click the pencil, and you'll see that there's an option called "Auto-renewed Let's Encrypt certificate". If you select that and click "Save", we'll get a fresh certificate for your site from Let's Encrypt -- and well before it expires, our system will automatically renew it for you.

If you have a certificate that you've bought from some other organisation like Comodo or GoDaddy, you can also configure it from here -- select the "Custom certificate" option, and you'll get input fields where you can copy and paste the private key and the combined certificate.

We have detailed instructions with screenshots on the help site.

The old ways of setting up a certificate still work -- you can use dehydrated, or get a certificate from a third party like GoDaddy, and upload everything using the command-line scripts.

MySQL 5.7

New accounts created from today will use MySQL 5.7. If you're still on 5.6 and would like your databases moved over to a 5.7-compatible server, get in touch over support@pythonanywhere.com -- the move won't happen until early next year, though.

Fixes for Firefox and Selenium from website code

Several people reported a problem where you could not run Selenium from website code from a Hacker account -- you needed a Web Dev account or better. This was a bug, not a feature, so we fixed it :-)

CPU sharing enhancements

There was a problem where people who had used all of their CPU allowance could continue to use lots of server resources; this isn't something that would have been useful for anyone or that we think anyone was doing deliberately -- it would only happen if they started processes which did nothing and then restarted. So it just meant that if one person had a certain kind of bug in their code and then went into the tarpit, they'd use up CPU that could have been put to better use by people who were actually trying to run working code :-) We've put a fix in place to stop that from happening.

And that's it!

Of course, there were the normal minor tweaks and bugfixes, but those are the highlights. A very happy holiday season to everyone, and we look forward to being able to show you some cool new stuff in the new year!


Always-on tasks

Always-on tasks are a new feature we rolled out in our last system update. Essentially, they're a way you can specify a program and tell us that you want us to keep it running all the time. If it exits for any reason, we'll automatically restart it -- and even in extreme circumstances, for instance if the server that it's running on has a hardware failure, it will fail over to a working machine quickly.

We already have that kind of availability for websites, of course -- always-on tasks are a way of providing the same kind of uptime for non-website scripts, so they're the right solution if you want a non-website program that runs 24/7 -- for example, a chat bot on Twitter or Discord, or something that streams data from an external source. All paid accounts get one always-on task by default, and you can customize your account to add more if you need them.

If you have a paid account and would like to try them out, we have detailed documentation here.

We added them because a lot of people want to run something all the time, and would try doing that in a console -- this works, and we keep consoles running for as long as we can, but they do need to be rebooted from time to time for system maintenance, and when that happens, your programs stop running.
Historically we'd advised people to set up a scheduled task to run their script, with some locking code to make sure that only one copy was running at a time -- but this was not ideal, as if the program crashed, it could be some time before it was restarted.

The one thing you can't do with always-on tasks right now is use them to run a server; we have plans to address that in the future, but we don't have any timelines yet. Do let us know if that's something you'd be interested in -- say, running Celery or even an async website in a task. The more people that ask for it, the higher up our priority list it goes :-)


Today's Upgrades: Always-On Tasks

Always-On Tasks

We are officially live with our always-on tasks! All paying customers will get one always-on task, and you can add more by customizing your plan on our accounts page. Our infrastructure will try to keep your script always running (ie. we will restart it if your script errors and stops etc). We'd love to know what you think- Just drop us a line using the "Feedback" link, or email us at support@pythonanywhere.com!

Logging Improvements

We have also improved user experience working with log files. You can now access our API to delete log files (or wipe the file if your particular log file is currently in use), and we have better formatting in place when logging certain web app errors.

Other Stuff

We also optimized the editor that you can access from the 'Files' tab to make consoles within it start faster and to avoid scripts rerunning unintentionally.


Auto-renewing your Let's Encrypt certificate with scheduled tasks

This blog post is out-of-date -- we can now manage all of your Let's Encrypt certificates automatically. See this help page for details.

Let's Encrypt certificates are really useful for custom domains -- you can get HTTPS working on your site for free. Their one downside is that the certificate only lasts for 90 days, so you need to remember to renew it.

The good news is that you can set up a scheduled task to do that all for you -- no need to put anything in your calendar. Once you've done the initial Let's Encrypt setup to get the original certificate installed, and you've confirmed that it's all working, go to the "Tasks" tab, and set up a daily scheduled task (not an always-on task) with this command:

cd ~/letsencrypt && ~/dehydrated/dehydrated --cron --domain www.yourdomain.com --out . --challenge http-01 && pa_install_webapp_letsencrypt_ssl.py www.yourdomain.com

Don't forget to replace both instances of www.yourdomain.com with your actual website's hostname.

Most days, this will fail with a message like this from the dehydrated script:

Valid till Nov 12 15:23:59 2018 GMT (Longer than 30 days). Skipping renew!

Followed by a message from the pa_install_webapp_letsencrypt_ssl.py saying something like this:

POST to set SSL details via API failed, got <Response [400]>:{"cert":["Certificate has not changed."]}

...but this is harmless. When your certificate really does have just 30 days to go, it will succeed and your certificate will be renewed, and the new one installed.


Turning a Python script into a website

One question we often hear from people starting out with PythonAnywhere is "how do I turn this script I've written into a website so that other people can run it?"

That's actually a bigger topic than you might imagine, and a complete answer would wind up having to explain almost everything about web development. So we won't do all of that in this blog post :-) The good news is that simple scripts can often be turned into simple websites pretty easily, and in this blog post we'll work through a couple of examples.

Let's get started!

The simplest case: a script that takes some inputs and returns an output

Let's say you have this Python 3.x script:

number1 = float(input("Enter the first number: "))
number2 = float(input("Enter the second number: "))
solution = number1 + number2
print("The sum of your numbers is {}".format(solution))

Obviously that's a super-simple example, but a lot of more complicated scripts follow the same kind of form. For example, a script for a financial analyst might have these equivalent steps:

  • Get data about a particular stock from the user.
  • Run some kind of complicated analysis on the data.
  • Print out a result saying how good the algorithm thinks the stock is as an investment.

The point is, we have three phases, input, processing and output.

(Some scripts have more phases -- they gather some data, do some processing, gather some more data, do more processing, and so on, and eventually print out a result. We'll come on to those later on.)

Let's work through how we would change our three-phase input-process-output script into a website.

Step 1: extract the processing into a function

In a website's code, we don't have access to the Python input or print functions, so the input and output phases will be different -- but the processing phase will be the same as it was in the original script. So the first step is to extract our processing code into a function so that it can be re-used. For our example, that leaves us with something like this:

def do_calculation(number1, number2):
    return number1 + number2

number1 = float(input("Enter the first number: "))
number2 = float(input("Enter the second number: "))
solution = do_calculation(number1, number2)
print("The sum of your numbers is {}".format(solution))

Simple enough. In real-world cases like the stock-analysis then of course there would be more inputs, and the do_calculation function would be considerably more complicated, but the theory is the same.

Step 2: create a website

Firstly, create a PythonAnywhere account if you haven't already. A free "Beginner" account is enough for this tutorial.

Once you've signed up, you'll be taken to the dashboard, with a tour window. It's worth going through the tour so that you can learn how the site works -- it'll only take a minute or so.

At the end of the tour you'll be presented with some options to "learn more". You can just click "End tour" here, because this tutorial will tell you all you need to know.

Now you're presented with the PythonAnywhere dashboard. I recommend you check your email and confirm your email address -- otherwise if you forget your password later, you won't be able to reset it.

Now you need to create a website, which requires a web framework. The easiest web framework to get started with when creating this kind of thing is Flask; it's very simple and doesn't have a lot of the built-in stuff that other web frameworks have, but for our purposes that's a good thing.

To create your site, go to the "Web" page using the tab near the top right:

Click on the "Add a new web app" button to the left. This will pop up a "Wizard" which allows you to configure your site. If you have a free account, it will look like this:

If you decided to go for a paid account (thanks :-), then it will be a bit different:

What we're doing on this page is specifying the host name in the URL that people will enter to see your website. Free accounts can have one website, and it must be at yourusername.pythonanywhere.com. Paid accounts have the option of using their own custom host names in their URLs.

For now, we'll stick to the free option. If you have a free account, just click the "Next" button, and if you have a paid one, click the checkbox next to the yourusername.pythonanywhere.com, then click "Next". This will take you on to the next page in the wizard.

This page is where we select the web framework we want to use. We're using Flask, so click that one to go on to the next page.

PythonAnywhere has various versions of Python installed, and each version has its associated version of Flask. You can use different Flask versions to the ones we supply by default, but it's a little more tricky (you need to use a thing called a virtualenv), so for this tutorial we'll create a site using Python 3.6, with the default Flask version. Click the option, and you'll be taken to the next page:

This page is asking you where you want to put your code. Code on PythonAnywhere is stored in your home directory, /home/yourusername, and in its subdirectories. Flask is a particularly lightweight framework, and you can write a simple Flask app in a single file. PythonAnywhere is asking you where it should create a directory and put a single file with a really really simple website. The default should be fine; it will create a subdirectory of your home directory called mysite and then will put the Flask code into a file called flask_app.py inside that directory.

(It will overwrite any other file with the same name, so if you're not using a new PythonAnywhere account, make sure that the file that it's got in the "Path" input box isn't one of your existing files.)

Once you're sure you're OK with the filename, click "Next". There will be a brief pause while PythonAnywhere sets up the website, and then you'll be taken to the configuration page for the site:

You can see that the host name for the site is on the left-hand side, along with the "Add a new web app" button. If you had multiple websites in your PythonAnywhere account, they would appear there too. But the one that's currently selected is the one you just created, and if you scroll down a bit you can see all of its settings. We'll ignore most of these for the moment, but one that is worth noting is the "Best before date" section.

If you have a paid account, you won't see that -- it only applies to free accounts. But if you have a free account, you'll see something saying that your site will be disabled on a date in three months' time. Don't worry! You can keep a free site up and running on PythonAnywhere for as long as you want, without having to pay us a penny. But we do ask you to log in every now and then and click the "Run until 3 months from today" button, just so that we know you're still interested in keeping it running.

Before we do any coding, let's check out the site that PythonAnywhere has generated for us by default. Right-click the host name, just after the words "Configuration for", and select the "Open in new tab" option; this will (of course) open your site in a new tab, which is useful when you're developing -- you can keep the site open in one tab and the code and other stuff in another, so it's easier to check out the effects of the changes you make.

Here's what it should look like.

OK, it's pretty simple, but it's a start. Let's take a look at the code! Go back to the tab showing the website configuration (keeping the one showing your site open), and click on the "Go to directory" link next to the "Source code" bit in the "Code" section:

You'll be taken to a different page, showing the contents of the subdirectory of your home directory where your website's code lives:

Click on the flask_app.py file, and you'll see the (really really simple) code that defines your Flask app. It looks like this:

It's worth working through this line-by-line:

from flask import Flask

As you'd expect, this loads the Flask framework so that you can use it.

app = Flask(__name__)

This creates a Flask application to run your code.

@app.route('/')

This decorator specifies that the following function defines what happens when someone goes to the location "/" on your site -- eg. if they go to http://yourusername.pythonanywhere.com/. If you wanted to define what happens when they go to http://yourusername.pythonanywhere.com/foo then you'd use @app.route('/foo') instead.

def hello_world():
    return 'Hello from Flask!'

This simple function just says that when someone goes to the location, they get back the (unformatted) text "Hello from Flask".

Try changing it -- for example, to "This is my new shiny Flask app". Once you've made the change, click the "Save" button at the top to save the file to PythonAnywhere:

...then the reload button (to the far right, looking like two curved arrows making a circle), which stops your website and then starts it again with the fresh code.

A "spinner" will appear next to the button to tell you that PythonAnywhere is working. Once it has disappeared, go to the tab showing the website again, hit the page refresh button, and you'll see that it has changed as you'd expect.

Step 3: make the processing code available to the web app

Now, we want our Flask app to be able to run our code. We've already extracted it into a function of its own. It's generally a good idea to keep the web app code -- the basic stuff to display pages -- separate from the more complicated processing code (after all, if we were doing the stock analysis example rather than this simple add-two-numbers script, the processing could be thousands of lines long).

So, we'll create a new file for our processing code. Go back to the browser tab that's showing your editor page; up at the top, you'll see "breadcrumb" links showing you where the file is stored. They'll be a series of directory names separated by "/" characters, each one apart from the last being a link. The last one, just before the name of the file containing your Flask code, will probably be mysite. Right-click on that, and open it in a new browser tab -- the new tab will show the directory listing you had before:

In the input near the top right, where it says "Enter new file name, eg. hello.py", enter the name of the file that will contain the processing code. Let's (uninventively) call it processing.py. Click the "New file" button, and you'll have another editor window open, showing an empty file. Copy/paste your processing function into there; that means that the file should simply contain this code:

def do_calculation(number1, number2):
    return number1 + number2

Save that file, then go back to the tab you kept open that contains the Flask code. At the top, add a new line just after the line that imports Flask, to import your processing code:

from processing import do_calculation

While we're at it, let's also add a line to make debugging easier if you have a typo or other error in the code; just after the line that says

app = Flask(__name__)

...add this:

app.config["DEBUG"] = True

Save the file; you'll see that you get a warning icon next to the new import line. If you move your mouse pointer over the icon, you'll see the details:

It says that the function was imported but is not being used, which is completely true! That moves us on to the next step.

Step 4: Accepting input

What we want our site to do is display a page that allows the user to enter two numbers. To do that, we'll change the existing function that is run to display the page. Right now we have this:

@app.route('/')
def hello_world():
    return 'This is my new shiny Flask app'

We want to display more than text, we want to display some HTML. Now, the best way to do HTML in Flask apps is to use templates (which allow you to keep the Python code that Flask needs in separate files from the HTML), but we have other tutorials that go into the details of that. In this case we'll just put the HTML right there inside our Flask code -- and while we're at it, we'll rename the function:

@app.route('/')
def adder_page():
    return '''
        <html>
            <body>
                <p>Enter your numbers:
                <form>
                    <p><input name="number1" /></p>
                    <p><input name="number2" /></p>
                    <p><input type="submit" value="Do calculation" /></p>
                </form>
            </body>
        </html>
    '''

We won't go into the details of how HTML works here, there are lots of excellent tutorials online and one that suits the way you learn is just a Google search away. For now, all we need to know is that where we were previously returning a single-line string, we're now returning a multi-line one (that's what the three quotes in a line mean, in case you're not familiar with them -- one string split over multiple lines). The multi-line string contains HTML code, which just displays a page that asks the user to enter two numbers, and a button that says "Do calculation". Click on the editor's "reload website" button:

...and then check out your website again in the tab that you (hopefully) kept open, and you'll see something like this:

However, as we haven't done anything to wire up the input to the processing, clicking the "Do calculation" button won't do anything but reload the page.

Step 5: validating input

We could at this stage go straight to adding on the code to do the calculations, and I was originally planning to do that here. But after thinking about it, I realised that doing that would basically be teaching you to shoot yourself in the foot... When you put a website up on the Internet, you have to allow for the fact that the people using it will make mistakes. If you created a site that allowed people to enter numbers and add them, sooner or later someone will type in "wombat" for one of the numbers, or something like that, and it would be embarrassing if your site responded with an internal server error.

So let's add on some basic validation -- that is, some code that makes sure that people aren't providing us with random marsupials instead of numbers.

A good website will, when you enter an invalid input, display the page again with an error message in it. A bad website will display a page saying "Invalid input, please click the back button and try again". Let's write a good website.

The first step is to change our HTML so that the person viewing the page can click the "Do calculation" button and get a response. Just change the line that says

                <form>

So that it says this:

                <form method="post" action=".">

What that means is that previously we had a form, but now we have a form that has an "action" telling it that when the button that has the type "submit" is clicked, it should request the same page as it is already on, but this time it should use the "post" method.

(HTTP methods are extra bits of information that are tacked on to requests that are made by a browser to a server. The "get" method, as you might expect, means "I just want to get a page". The "post" method means "I want to provide the server with some information to store or process". There are vast reams of details that I'm skipping over here, but that's the most important stuff for now.)

So now we have a way for data to be sent back to the server. Reload the site using the button in the editor, and refresh the page in the tab where you're viewing your site. Try entering some numbers, and click the "Do calculation" button, and you'll get... an incomprehensible error message:

Well, perhaps not entirely incomprehensible. It says "method not allowed". Previously we were using the "get" method to get our page, but we just told the form that it should use the "post" method when the data was submitted. So Flask is telling us that it's not going to allow that page to be requested with the "post" method.

By default, Flask view functions only accept requests using the "get" method. It's easy to change that. Back in the code file, where we have this line:

@app.route('/')

...replace it with this:

@app.route("/", methods=["GET", "POST"])

Save the file, hit the reload button in the editor, then go to the tab showing your page; click back to get away from the error page if it's still showing, then enter some numbers and click the "Do calculation" button again.

You'll be taken back to the page with no error. Success! Kind of.

Now let's add the validation code. The numbers that were entered will be made available to us in our Flask code via the form attribute of a global variable called request. So we can add validation logic by using that. The first step is to make the request variable available by importing it; change the line that says

from flask import Flask

to say

from flask import Flask, request

Now, add this code to the view function, before the return statement:

    errors = ""
    if request.method == "POST":
        number1 = None
        number2 = None
        try:
            number1 = float(request.form["number1"])
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number1"])
        try:
            number2 = float(request.form["number2"])
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number2"])

Basically, we're saying that if the method is "post", we do the validation.

Finally, add some code to put those errors into the page's HTML; replace the bit that returns the multi-line string with this:

    return '''
        <html>
            <body>
                {errors}
                <p>Enter your numbers:
                <form method="post" action=".">
                    <p><input name="number1" /></p>
                    <p><input name="number2" /></p>
                    <p><input type="submit" value="Do calculation" /></p>
                </form>
            </body>
        </html>
    '''.format(errors=errors)

This is exactly the same page as before, we're just interpolating the string that contains any errors into it just above the "Enter your numbers" header.

Save the file; you'll see more warnings for the lines where we define variables called number1 and number2, because we're not using those variables. We know we're going to fix that, so they can be ignored for now.

Reload the site, and head over to the page where we're viewing it, and try to add a koala to a wallaby -- you'll get an appropriate error:

Try adding 23 to 19, however, and you won't get 42 -- you'll just get the same input form again. So now, the final step that brings it all together.

Step 6: doing the calculation!

We're all set up to do the calculation. What we want to do is:

  • If the request used a "get" method, just display the input form
  • If the request used a "post" method, but one or both of the numbers are not valid, then display the input form with error messages.
  • If the request used a "post" method, and both numbers are valid, then display the result.

We can do that by adding something inside the if request.method == "POST": block, just after we've checked that number2 is valid:

        if number1 is not None and number2 is not None:
            result = do_calculation(number1, number2)
            return '''
                <html>
                    <body>
                        <p>The result is {result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

Adding that code should clear out all of the warnings in the editor page, and if you reload your site and then try using it again, it should all work fine!

Pause for breath...

So if all has gone well, you've now converted a simple script that could add two numbers into a simple website that lets other people add numbers. If you're getting error messages, it's well worth trying to debug them yourself to find out where any typos came in. An excellent resource is the website's error log; there's a link on the "Web" page:

...and the most recent error will be at the bottom:

That error message is telling me that I mistyped "flask" as "falsk", and the traceback tells me exactly which line the typo is on.

However, if you get completely stuck, here's the code you should currently have:

from flask import Flask, request

from processing import do_calculation

app = Flask(__name__)
app.config["DEBUG"] = True

@app.route("/", methods=["GET", "POST"])
def adder_page():
    errors = ""
    if request.method == "POST":
        number1 = None
        number2 = None
        try:
            number1 = float(request.form["number1"])
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number1"])
        try:
            number2 = float(request.form["number2"])
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number2"])
        if number1 is not None and number2 is not None:
            result = do_calculation(number1, number2)
            return '''
                <html>
                    <body>
                        <p>The result is {result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

    return '''
        <html>
            <body>
                {errors}
                <p>Enter your numbers:
                <form method="post" action=".">
                    <p><input name="number1" /></p>
                    <p><input name="number2" /></p>
                    <p><input type="submit" value="Do calculation" /></p>
                </form>
            </body>
        </html>
    '''.format(errors=errors)

The next step -- multi-phase scripts

So now that we've managed to turn a script that had the simple three-phase input-process-output structure into a website, how about handling the more complicated case where you have more phases? A common case is where you have an indefinite number of inputs, and the output depends on all of them. For example, here's a simple script that will allow you to enter a list of numbers, one after another, and then will display the statistical mode (the most common number) in the list, with an appropriate error message if there is no most common number (for example in the list [1, 2, 3, 4]).

import statistics

def calculate_mode(number_list):
    try:
        return "The mode of the numbers is {}".format(statistics.mode(number_list))
    except statistics.StatisticsError as exc:
        return "Error calculating mode: {}".format(exc)


inputs = []
while True:
    if len(inputs) != 0:
        print("Numbers so far:")
        for input_value in inputs:
            print(input_value)
    value = input("Enter a number, or just hit return to calculate: ")
    if value == "":
        break
    try:
        inputs.append(float(value))
    except:
        print("{} is not a number")

print(calculate_mode(inputs))

How can we turn that into a website? We could display, say, 100 input fields and let the user leave the ones they don't want blank, but (a) that would look hideous, and (b) it would leave people who wanted to get the mode of 150 numbers stuck.

(Let's put aside for the moment the fact that entering lots of numbers into a website would be deathly dull -- there's a solution coming for that :-)

What we need is a page that can accumulate numbers; the user enters the first, then clicks a button to send it to the server, which puts it in a list somewhere. Then they enter the next, and the server adds that one to the list. Then the next, and so on, until they're finished, at which point they click a button to get the result.

Here's a naive implementation. By "naive", I mean that it sort of works in some cases, but doesn't in general; it's the kind of thing that one might write, only to discover that when other people start using it, it breaks in really weird and confusing ways. It's worth going through, though, because the way in which is is wrong is instructive.

Firstly, in our processing.py file we have the processing code, just as before:

import statistics

def calculate_mode(number_list):
    try:
        return "The mode of the numbers is {}".format(statistics.mode(number_list))
    except statistics.StatisticsError as exc:
        return "Error calculating mode: {}".format(exc)

That should be pretty clear. Now, in flask_app.py we have the following code:

(A step-by-step explanation is coming later, but it's worth reading through now to see if you can see how at least some of it it works.)

from flask import Flask, request

from processing import calculate_mode

app = Flask(__name__)
app.config["DEBUG"] = True

inputs = []

@app.route("/", methods=["GET", "POST"])
def mode_page():
    errors = ""
    if request.method == "POST":
        try:
            inputs.append(float(request.form["number"]))
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number"])

        if request.form["action"] == "Calculate number":
            result = calculate_mode(inputs)
            inputs.clear()
            return '''
                <html>
                    <body>
                        <p>{result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

    if len(inputs) == 0:
        numbers_so_far = ""
    else:
        numbers_so_far = "<p>Numbers so far:</p>"
        for number in inputs:
            numbers_so_far += "<p>{}</p>".format(number)

    return '''
        <html>
            <body>
                {numbers_so_far}
                {errors}
                <p>Enter your number:
                <form method="post" action=".">
                    <p><input name="number" /></p>
                    <p><input type="submit" name="action" value="Add another" /></p>
                    <p><input type="submit" name="action" value="Calculate number" /></p>
                </form>
            </body>
        </html>
    '''.format(numbers_so_far=numbers_so_far, errors=errors)

All clear? Maybe... It does work, though, sort of. Let's try it -- copy the code for the two files into your editor tabs, reload the site, and give it a go. If you have a free account, it will work!

Enter "1", and you get this:

Enter some more numbers:

...and calculate the result:

But if you have a paid account, you'll see some weird behaviour. Exactly what you'll get will depend on various random factors, but it will be something like this:

Enter 1, and you might get this:

Enter 2, and you might get this:

Huh? Where did the "1" go? Well, let's enter "3":

Well, that seems to have worked. We'll add "4":

And now we'll add "1" again:

So now our original 1 has come back, but all of the other numbers have disappeared.

In general, it will seem to sometimes forget numbers, and then remember them again later, as if it has multiple lists of numbers -- which is exactly what it does.

Before we go into why it's actually wrong (and why, counterintuitively, it works worse on a paid account than on a free one), here's the promised step-by-step runthrough, with comments after each block of code. Starting off:

from flask import Flask, request

from processing import calculate_mode

app = Flask(__name__)
app.config["DEBUG"] = True

All that is just copied from the previous website.

inputs = []

We're initialising a list for our inputs, and putting it in the global scope, so that it will persist over time. This is because each view of our page will involve a call to the view function:

@app.route("/", methods=["GET", "POST"])
def mode_page():

...which is exactly the same kind of setup for a view function as we had before.

    errors = ""
    if request.method == "POST":
        try:
            inputs.append(float(request.form["number"]))
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number"])

We do very similar validation to the number as we did in our last website, and if the number is valid we add it to the global list.

        if request.form["action"] == "Calculate number":

This bit is a little more tricky. On our page, we have two buttons -- one to add a number, and one to say "do the calculation" -- here's the bit of the HTML code from further down that specifies them:

                    <p><input type="submit" name="action" value="Add another" /></p>
                    <p><input type="submit" name="action" value="Calculate number" /></p>

This means that when we get a post request from a browser, the "action" value in the form object will contain the text of the submit button that was actually clicked.

So, if the "Calculate number" button was the one that the user clicked...

            result = calculate_mode(inputs)
            inputs.clear()
            return '''
                <html>
                    <body>
                        <p>{result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

...we do the calculation and return the result (clearing the list of the inputs at the same time so that the user can try again with another list).

If, however, we get past that if request.form["action"] == "Calculate number" statement, it means either that:

  • The request was using the post method, and we've just added a number to the list or set the error string to reflect the fact that the user entered an invalid number, or
  • The request was using the get method

So:

    if len(inputs) == 0:
        numbers_so_far = ""
    else:
        numbers_so_far = "<p>Numbers so far:</p>"
        for number in inputs:
            numbers_so_far += "<p>{}</p>".format(number)

...we generate a list of the numbers so far, if there are any, and then:

    return '''
        <html>
            <body>
                {numbers_so_far}
                {errors}
                <p>Enter your number:
                <form method="post" action=".">
                    <p><input name="number" /></p>
                    <p><input type="submit" name="action" value="Add another" /></p>
                    <p><input type="submit" name="action" value="Calculate number" /></p>
                </form>
            </body>
        </html>
    '''.format(numbers_so_far=numbers_so_far, errors=errors)

We return our page asking for a number, with the list of numbers so far and errors if either is applicable.

Phew!

So why is it incorrect? If you have a paid account, you've already seen evidence that it doesn't work very well. If you have a free account, here's a thought experiment -- what if two people were viewing the site at the same time? In fact, you can see exactly what would happen if you use the "incognito" or "private tab" feature on your browser -- or, if you have multiple browsers installed, if you use two different browsers (say by visiting the site in Chrome and in Firefox at the same time).

What you'll see is that both users are sharing a list of numbers. The Chrome user starts off, and adds a number to the list:

Now the Firefox user adds a number -- but they see not only the number they added, but also the Chrome user's number:

It's pretty clear what's going on here. There's one server handling the requests from both users, so there's only one list of inputs -- so everyone shares the same list.

But what about the situation for websites running on paid accounts? If you'll remember, it looked like the opposite was going on there -- there were multiple lists, even within the same browser.

This is because paid accounts have multiple servers for the same website. This is a good thing, it means that if they get lots of requests coming in at the same time, then everything gets processed more quickly -- so they can have higher-traffic websites. But it also means that different requests, even successive requests from the same browser, can wind up going to different servers, and because each server has its own list, the browser will see one list for one request, but see a different list on the next request.

What this all means is that global variables don't work for storing state in website code. On each server that's running to control your site, everyone will see the same global variables. And if you have multiple servers, then each one will have a different set of global variables.

What to do?

Sessions to the rescue!

What we need is a way to keep a set of "global" variables that are specific to each person viewing the site, and are shared between all servers. If two people, Alice and Bob, are using the site, then Alice will have her own list of inputs, which all servers can see, and Bob will have a different list of inputs, separate from Alice's but likewise shared between servers.

The web dev mechanism for this is called sessions, and is built into Flask. Let's make a tiny set of modifications to the Flask app to make it work properly. Firstly, we'll import support for sessions by changing our Flask import line from this:

from flask import Flask, request

...to this:

from flask import Flask, request, session

In order to use sessions, we'll also need to configure Flask with a "secret key" -- sessions use cryptography, which requires a random number. Add a line like this just after the line where we configure Flask's debug setting to be True:

app.config["SECRET_KEY"] = "lkmaslkdsldsamdlsdmasldsmkdd"

Use a different string to the one I put above; mashing the keyboard randomly is a good way to get a reasonably random string, though if you want to do things properly, find something truly random.

Next, we'll get rid of the global inputs list by deleting this line:

inputs = []

Now we'll use an inputs list that's stored inside the session object (which looks like a dictionary) instead of using our global variable. Firstly, let's makes sure that whenever we're in our view function, we have a list of inputs associated with the current session if there isn't one already. Right at the start of the view function, add this:

    if "inputs" not in session:
        session["inputs"] = []

Next, inside the bit of code where we're adding a number to the inputs list, replace this line:

        inputs.append(float(request.form["number"]))

...with this one that uses the list on the session:

        session["inputs"].append(float(request.form["number"]))

There's also a subtlety here; because we're changing a list inside a session (instead of adding a new thing to the session), we need to tell the session object that it has changed by putting this line immediately after the last one:

        session.modified = True

Next, when we're calculating the mode, we need to look at our session again to get the list of inputs:

        result = calculate_mode(inputs)

...becomes

        result = calculate_mode(session["inputs"])

...and the line that clears the inputs so that the user can do another list likewise changes from

        inputs.clear()

to:

        session["inputs"].clear()
        session.modified = True

Finally, the code that generates the "numbers so far" list at the start of the page needs to change to use the session:

if len(inputs) == 0:
    numbers_so_far = ""
else:
    numbers_so_far = "<p>Numbers so far:</p>"
    for number in inputs:
        numbers_so_far += "<p>{}</p>".format(number)

...becomes:

if len(session["inputs"]) == 0:
    numbers_so_far = ""
else:
    numbers_so_far = "<p>Numbers so far:</p>"
    for number in session["inputs"]:
        numbers_so_far += "<p>{}</p>".format(number)

Once all of those code changes have been done, you should have this:

from flask import Flask, request, session

from processing import calculate_mode

app = Flask(__name__)
app.config["DEBUG"] = True
app.config["SECRET_KEY"] = "lkmaslkdsldsamdlsdmasldsmkdd"

@app.route("/", methods=["GET", "POST"])
def mode_page():
    if "inputs" not in session:
        session["inputs"] = []

    errors = ""
    if request.method == "POST":
        try:
            session["inputs"].append(float(request.form["number"]))
            session.modified = True
        except:
            errors += "<p>{!r} is not a number.</p>\n".format(request.form["number"])

        if request.form["action"] == "Calculate number":
            result = calculate_mode(session["inputs"])
            session["inputs"].clear()
            session.modified = True
            return '''
                <html>
                    <body>
                        <p>{result}</p>
                        <p><a href="/">Click here to calculate again</a>
                    </body>
                </html>
            '''.format(result=result)

    if len(session["inputs"]) == 0:
        numbers_so_far = ""
    else:
        numbers_so_far = "<p>Numbers so far:</p>"
        for number in session["inputs"]:
            numbers_so_far += "<p>{}</p>".format(number)

    return '''
        <html>
            <body>
                {numbers_so_far}
                {errors}
                <p>Enter your number:
                <form method="post" action=".">
                    <p><input name="number" /></p>
                    <p><input type="submit" name="action" value="Add another" /></p>
                    <p><input type="submit" name="action" value="Calculate number" /></p>
                </form>
            </body>
        </html>
    '''.format(numbers_so_far=numbers_so_far, errors=errors)

Hit the reload button, and give it a try! If you have a paid account, you'll find that now it all works properly -- and if you have a free account, you'll see that separate browsers now have separate lists of numbers :-)

So now we have a multi-user website that keeps state around between page visits.

Processing files

Now, entering all of those numbers one-by-one would be tedious if there were a lot of them. A lot of Python scripts don't request the user to enter data a line at a time; they take a file as their input, process it, and produce a file as the output. Here's a simple script that asks for an input filename and an output filename. It expects the input file to contain a number of lines, each with a comma-separated list of numbers on it. It writes to the output file the same number of lines, each one containing the sum of the numbers from the equivalent line in the input file.

def process_data(input_data):
    result = ""
    for line in input_data.split("\n"):
        if line != "":
            numbers = [float(n) for n in line.split(", ")]
            result += str(sum(numbers))
        result += "\n"
    return result

input_filename = input("Enter the input filename: ")
output_filename = input("Enter the output filename: ")

with open(input_filename, "r") as input_file:
    input_data = input_file.read()

with open(output_filename, "w") as output_file:
    output_file.write(process_data(input_data))

What we want is a Flask app that will allow the user to upload a file like the input file that that script requires, and will then provide the output file to download. This is actually pretty similar to the original app we did -- there's just three phases, input-process-output. So the Flask app looks very similar.

Firstly, we put our calculating routine into processing.py, as normal:

def process_data(input_data):
    result = ""
    for line in input_data.split("\n"):
        if line != "":
            numbers = [float(n) for n in line.split(", ")]
            result += str(sum(numbers))
        result += "\n"
    return result

...and now we write a Flask app that looks like this:

from flask import Flask, make_response, request

from processing import process_data

app = Flask(__name__)
app.config["DEBUG"] = True

@app.route("/", methods=["GET", "POST"])
def file_summer_page():
    if request.method == "POST":
        input_file = request.files["input_file"]
        input_data = input_file.stream.read().decode("utf-8")
        output_data = process_data(input_data)
        response = make_response(output_data)
        response.headers["Content-Disposition"] = "attachment; filename=result.csv"
        return response

    return '''
        <html>
            <body>
                <p>Select the file you want to sum up:
                <form method="post" action="." enctype="multipart/form-data">
                    <p><input type="file" name="input_file" /></p>
                    <p><input type="submit" value="Process the file" /></p>
                </form>
            </body>
        </html>
    '''

Again, we'll go through that bit-by-bit in a moment (though it's worth noting that although this feels like something that should be much harder than the first case, the Flask app is much shorter :-) But let's try it out first -- once you've saved the code on PythonAnywhere and reloaded the site, visit the page:

We specify a file with contents (mine just has "1, 2, 3" on the first line and "4, 5, 6" on the second):

...then we click the button. You'll have to watch for it, but a file download will almost immediately start. In Chrome, for example, this will appear at the bottom of the window:

Open the file in an appropriate application -- here's what it looks like in gedit:

We've got a website where we can upload a file, process it, and download the results :-)

Obviously the user interface could use a bit of work, but that's left as an exercise for the reader...

So, how dow the code work? Here's the breakdown:

from flask import Flask, make_response, request

from processing import process_data

app = Flask(__name__)
app.config["DEBUG"] = True

This is our normal Flask setup code.

@app.route("/", methods=["GET", "POST"])
def file_summer_page():

As usual, we define a view.

    if request.method == "POST":

If the request is use the "post" method...

        input_file = request.files["input_file"]
        input_data = input_file.stream.read().decode("utf-8")

...we ask Flask to extract the uploaded file from the request object, and then we read it into memory. The file it will provide us with will be in binary format, so we convert it into a string, assuming that it's in the UTF-8 character set.

        output_data = process_data(input_data)

Now we process the data using our function. The next step is where it gets a little more complicated:

        response = make_response(output_data)
        response.headers["Content-Disposition"] = "attachment; filename=result.csv"

In the past, we just returned strings from our Flask view functions and let it sort out how that should be presented to the browser. But this time, we want to take a little more control over the kind of response that's going back. In particular, we don't want to dump all of the output into the browser window so that the user has to copy/paste the (potentially thousands of lines of) output into their spreadsheet or whatever. Instead, we want to tell the browser "the thing I'm sending you is a file called 'result.csv', so please download it appropriately". That's what these two lines do -- the first is just a way to tell Flask that we're going to need some detailed control over the response, and the second does that control. Next:

        return response

...we just return the response.

Now that we're out of that first if statement, we know that the request we're handling isn't one with a "post" method, so it must be a "get". So we display the form:

    return '''
        <html>
            <body>
                <p>Select the file you want to sum up:
                <form method="post" action="." enctype="multipart/form-data">
                    <p><input type="file" name="input_file" /></p>
                    <p><input type="submit" value="Process the file" /></p>
                </form>
            </body>
        </html>
    '''

In this case we just return a string of HTML like we did in the previous examples. There are only two new things in there:

                <form method="post" action="." enctype="multipart/form-data">

The enctype="multipart/form-data" in there is just an extra flag that is needed to tell the browser how to format files when it uploads them as part of the "post" request that it's sending to the server, and:

                    <p><input type="file" name="input_file" /></p>

....is just how you specify an input where the user can select a file to upload

So that's it!

And we're done

In this blog post we've presented three different Flask apps, each of which shows how a specific kind of normal Python script can be converted into a website that other people can access to reap the benefits of the code you've written.

Hopefully they're all reasonably clear, and you can see how you could apply the same techniques to your own scripts. If you have any comments or questions, please post them in the comments below -- and if you have any thoughts about other kinds of patterns that we could consider adding to an updated version of this post, or to a follow-up, do let us know.

Thanks for reading!


The PythonAnywhere newsletter, September 2018

Well, our last "monthly" newsletter was in September 2017. We must have shifted the bits in the period left one, or something like that :-)

Anyway, welcome to the September 2018 PythonAnywhere newsletter :-) Here's what we've been up to.

Python 3.7 support

We recently added support for Python 3.7. If you signed up since 28 August, you'll have it available on your account -- you can use it just like any other Python version.

If you signed up before then, it's a little more complicated, but we can update your account to provide it -- there's more information in this blog post.

Self-installation of HTTPS certificates

We've also been working on making setting up HTTPS on your website a bit more streamlined. Previously you had to get the certificate and the private key, and then email us asking for them to be installed, which could take up to 24 hours. Now you can cut our support team out of the loop and install it all yourself. Check out this blog post for the details.

There will be more improvements to HTTPS support coming soon...

Force HTTPS

Another shiny new feature: built-in support for forcing people who visit your site to use HTTPS instead of non-secure HTTP, without the need to change your code! Once again, there's more info on the blog.

PythonAnywhere metrics

We're wondering if it would be interesting for you to hear a bit about some of the metrics we monitor internally to see what's happening in our systems. Here's a random grab-bag of some numbers for this month:

  • Web requests: we're processing on average about 225 hits/second through our systems (across all websites) with spikes at busy times of up to 350/second. For comparison -- apparently that's about what Stack Overflow have to deal with. But there's a difference; they're just one site, but for us...
  • That's across about 28,000 websites. Of course the number of hits sites get is very much spread over a long tail distribution -- many of those sites are ones that people set up as part of tutorials (like the excellent Django Girls), so they only get hits from their owners, while on the other hand the busiest websites might be processing 40 hits/second at their peak times
  • By contrast, there are only 10,000 scheduled tasks :-S
  • Our live system currently comprises 51 separate machines on Amazon AWS.

Let us know whether those are the metrics you'd like to see, whether you'd like to see more, or if you think it's completely uninteresting :-)

GDPR

Like every tech business in the world, we spent a lot of time late last year and early this year working on ensuring that we were compliant with the GDPR. This was doubly-important to us -- most companies are "data controllers" in GDPR terminology, which means that they store data about people, but we are also "data processors", which means that we run computers and programs that other people use in their role as data controllers. Or, to make that a bit more concrete -- if you have personal data about people on a website that you host with us, you're a data controller, and you're delegating the data processing to us.

This of course, meant that it was more than twice as much work for us as it was for most people, but we got it all done a week before the deadline :-)

One interesting side-effect of all of this was that we realised that these newsletters sometime say things like "hey, we've added this cool new feature (for paid accounts only)" -- and when they do say something like that, it's not unreasonable to see them as marketing. We had to make super-sure that all marketing messages from us were opt-in only, so we unsubscribed everyone from the newsletter and put up a banner on login so that people would know about it.

That in turn means that this newsletter is going to about ten times fewer people than the one before -- so if you're reading it over email, thanks for choosing to receive it :-) Of course, you can always unsubscribe using the link at the bottom of the message, or from the email settings tab on the Account page.

New modules

Although you can install Python packages on PythonAnywhere yourself, we like to make sure that we have plenty of batteries included.

Everything got updated for the new system image that provides access to Python 3.7, so if you're using that image, you should have the most recent (or at least a very recent) version of everything :-)

New whitelisted sites

Paying PythonAnywhere customers get unrestricted Internet access, but if you're a free PythonAnywhere user, you may have hit problems when writing code that tries to access sites elsewhere on the Internet. We have to restrict you to sites on a whitelist to stop hackers from creating dummy accounts to hide their identities when breaking into other people's websites.

But we really do encourage you to suggest new sites that should be on the whitelist. Our rule is, if it's got an official public API, which means that the site's owners are encouraging automated access to their server, then we'll whitelist it. Just drop us a line with a link to the API docs.

We've added too many sites to list since our last newsletter to list them all -- but please keep them coming!

That's all for now

That's all we've got this time around. We have some big new features in the pipeline, so keep tuned! Maybe we'll even get our next newsletter out in October 2018. Or at least sometime in 2018...


Page 1 of 17.

Older posts »

PythonAnywhere is a Python development and hosting environment that displays in your web browser and runs on our servers. They're already set up with everything you need. It's easy to use, fast, and powerful. There's even a useful free plan.

You can sign up here.