This post is intended for users who begin their adventure with web
applications. You’ll find below how to structure a web app that relies on
heavy processing of the input data – processing that takes so long that
you can’t do it inside a request handler with a five-minute timeout, or at
least so long that you don’t want to risk slowing down your website by
doing it inside the website’s own code. You want it to happen in the
background. You’ll see an example implementing some hints from the “Async
work in Web apps” help page, which involve writing a jQuery
script polling
simple API endpoint communicating with a database updated by an external
script (so there will be some sqlalchemy
stuff too).
Intro¶
What you will learn¶
The main goal of this entry, however, is to show a solution to an architectural problem in a web application rather than to teach how to write some JavaScript code or how to manage MySQL queries and connections using Python. So, please, bear this in mind as some parts of the code will not be exhaustively explained line by line. For the same reason, the code presented here should be treated as an example and an inspiration, not as a production-ready web application.
Requirements¶
To implement it fully on PythonAnywhere, you’d probably need an Always-on task feature, which is a paid one, but there could be scenarios when a simple Scheduled task would suffice. You can however test this code on a free account (as the development process will use features available for all our users). If you want to follow with your own web app as you read:
- create a default Flask app (using the “Add a new web app” wizard in the
Web page)1 in the
~/asyncinweb
directory in your PythonAnywhere account - create a MySQL database (on the Databases page) called
asyncinweb
(and make sure you know your MySQL password) - the code assumes Python 3.8+ and
SQLAlchemy
1.4+ (so you may need to upgrade it andFlask_SQLAlchemy
as well).
Problem¶
Imagine you have a web app which you want to use as a means to present some data in response for a user’s request. The whole point of it consists of the value added by your app to the input provided by the user. That usually depends on pre-processing of the input data, often involves gathering other data from external (other web services) or internal (database) sources and always requires some kind of processing of the gathered material. The processing is your added value.
Let’s illustrate this scenario with a sample Flask web app. The app will
have two endpoints – the main page, which greets the user, and /processing
– where user can put their data and get a result displayed.
Our example is an abstract one. There may be countless implementations of this scenario. How should we represent the processing in our example to make it congruent with the real world application? We could insert a line with some calculations, but we can easily think of calculations as timeless operations, whereas our processing will most likely involve multiple actions that are time-consuming (performing a database query, connecting to an external service, etc.). In such a case, the most adequate way of representing the processing in our web app will by a time lapse:
|
|
Now, what happens when a user hits the /processing
endpoint? If you’re
patient, you may try it by yourself. (If you are patient – which, with
all due respect to your personal virtues, you’re not! At least when it
comes to web apps. And that’s OK, because we expect web apps to be fast.
That’s the whole point, isn’t it?) Let me suffer on your behalf: if you’d
wait enough, you’d see the infamous “Something went wrong :-(” message on
the screen with an indication of a 50x error, and some similar lines in
your web app’s server log:
2021-09-15 15:07:17 Wed Sep 15 15:07:16 2021 - *** HARAKIRI ON WORKER 2 (pid: 6, try: 1) ***
2021-09-15 15:07:17 Wed Sep 15 15:07:16 2021 - HARAKIRI !!! worker 2 status !!!
2021-09-15 15:07:17 Wed Sep 15 15:07:16 2021 - HARAKIRI [core 0] 10.0.0.224 - GET /processing since 1631718135
2021-09-15 15:07:17 Wed Sep 15 15:07:16 2021 - HARAKIRI !!! end of worker 2 status !!!
2021-09-15 15:07:17 DAMN ! worker 2 (pid: 6) died, killed by signal 9 :( trying respawn ...
2021-09-15 15:07:17 Respawned uWSGI worker 2 (new pid: 11)
2021-09-15 15:07:17 spawned 2 offload threads for uWSGI worker 2
BTW – workers are such honorable guys!
In PythonAnywhere, we assumed that a reasonable time to process a request shouldn’t be longer than brewing a good tea, that is 5 minutes (just imagine what is going on in the parallel Steam Punk universe where our alter egos decided that their difference engines shouldn’t timeout before a good pipe smoke, that is, at least 45 minutes!). If the tea is brewed, but the response is not ready, we consider the process has probably hung and it needs to be restarted. Also, web apps have a limited number of workers (processes) available. If all processes are hanging, your web app becomes slow or even unresponsive.
So, you have a web app which should serve some heavy processed data. And you have a problem: you want to have a cake, and eat it too.
Solution¶
Problem analysis¶
“You can’t have a cake and eat it.” Obviously. Unless… you realize that we are talking about multiple cakes here! The expressiveness of that figure of speech relies on an ellipsis which tricks our brain to merge two pieces into one. If someone told me “you can’t have a brownie and eat a baklava”, I’d reply “Why not?”. However, with some crucial information omitted (ellipsis achieved by using generic names instead of specific ones), I’m ready to assume that the prohibition referred to the cake that I have had previously eaten.
Let’s have a look at our code again:
|
|
Well, you can’t digest (process) the cake and serve it too! And even if
you could… (Crap, we need to stop this analogy right now!) Let’s
dismantle our sleep
metaphor.
We already know that our web app serves some processed data as a response to a user’s request. (BTW – the user revealed that their name is Sidney.) Let’s distinguish that in our example. In production code, the user would be probably recognized by your app as logged in (so there would be some authorization involved). We will keep our code as simple as possible, so I’m going to use the request’s IP and date to have a unique representation of the user:
from datetime import datetime
from flask import Flask, request
...
@app.route("/processing")
def processing():
# Here's Sidney's request (aka "Cake 1")
ip = request.headers.get("X-Real-Ip", "")
now = datetime.utcnow().isoformat()
user = f"{ip}{now}"
# Here we do the processing (aka "Cake 2")
sleep(5 * 60 + 1)
# Here we're serving the result (aka "Cake 3")
return f"Hi {user}, here's your answer!"
All right! We’ve got at least three things going on here:
- getting some data from Sidney’s request
- doing the heavy, time-consuming processing
- serving the result.
It’s high time to replace our cake metaphor and use more meaningful descriptors. Our first step consists on getting some input data (and possibly involves some actions and calculations on it). The next step is represented in our example as a wait with a simple calculation but in reality it would involve acquiring more data, performing additional actions and calculations. The last step is an action of serving the result. We can distinguish between data, calculations and actions conceptually and operationally 2.
Now we can ask ourselves which steps are inherently related to our web app. We can’t remove Sidney’s request, nor we can remove our response. We can however remove the processing, and the web app will be still usable as a web app (we need it as a means of getting the input data and a tool to perform the action of serving the result). That should drastically improve our web app’s performance time-wise!
When we pull the time consuming part out of the web app to an independent script (that is not being executed by the web app directly) another problem occurs – how can the script and the web app communicate? We need some sort of bridge between them, so the input data is transmitted downwards to the script, and the output data is transmitted upwards to the web app. Since we will be transmitting data, the means to store it will be obviously a database.
The solution to our problem will consist of separating the processing and the web app, and adding an intermediary element allowing the communication. That will add some complexity (verbosity) to our application, but that’s the way we’re discharging the fake simplicity of the original “elliptic” code which will lead to failure. And – last but not least – we will eat the cake and have it too!
Let’s have a look at a sample implementation of that solution.
Implementation¶
We will no longer need the /processing
endpoint in the web app, that should
become a script called processing.py
instead. Since the script and the web
app will both connect to the same database, they should share the same
config.
Let’s create the config.py
file (if you skipped the requirements section,
you may want to check it now):
|
|
We will take care of the database in our flask_app.py
. Let’s have a jobs
table that will store information about the incoming requests and their
processing state (represented by Job
instances):
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config.from_object('config')
db = SQLAlchemy(app)
...
class Job(db.Model):
__tablename__ = "jobs"
id = db.Column(db.Integer, primary_key=True)
slug = db.Column(db.String(64), nullable=False)
state = db.Column(db.String(10), nullable=False, default="queued")
result = db.Column(db.Integer, default=0)
(Don’t forget to run db.create_all()
in a Python console before launching
the web app!3)
We will use the slug
only as the representation of the user query; state
will be “queued” (awaiting), “processing” or “completed”.
Once we have the storage for our data, we can use it. First we need to take care of putting the incoming requests into the queue:
@app.route("/")
def main():
# Process Sidney's request
ip = request.headers.get("X-Real-Ip", "")
now = datetime.utcnow().isoformat()
job_id = ip + now
# Store the request info in the database
data = Job(slug=job_id)
db.session.add(data)
db.session.commit()
return f"Hi, your query is: {job_id}..."
All right! Whenever someone hits our main page, we will store the request information in the database. On this stage we can move to the processing part. In our simplistic example we need to do only two things: check for a pending request in the database and perform the processing if needed:
|
|
Now, if we run this script in a console and start hitting our web app, we should see something like this:
Processing job: 117.50.83.792021-09-24T13:20:46.002222... done after 3 seconds!
Processing job: 117.50.83.792021-09-24T13:20:46.773984... done after 8 seconds!
Processing job: 117.50.83.792021-09-24T13:20:47.358036... done after 5 seconds!
We have the processing part done, now we need to show Sidney the result they’re waiting for. Let’s go back to our web app code.
@app.route("/")
def main():
# Process Sidney's request
...
# Store the request info in the database
...
return f"Hi your query is: {job_id}..."
So far we’re only greeting the user whereas we want to keep them updated
and finally show the result. Moreover, we want our site to refresh
automatically when the state of the request changes. To achieve that, we
will need some extra code running in the user’s browser. We keep the state
of the request recorded in the database, so the frontend code needs to know
about the request id and perform queries about it. However, we don’t want
to perform direct database queries from the browser because that would
expose our database. In order to keep the database hidden we will need one
more step – creating a simple API endpoint which will be hit by the
JavaScript code in the browser. Let’s do that first by adding an extra
view in the flask_app.py
:
from flask import jsonify # add this to flask imports
...
@app.route("/query", methods=["POST"])
def query():
# The id of the queried request comes in with a new request
# sent from the frontend JS code
job_id = request.form["id"]
# Now we can ask database about the state of that request
data = Job.query.filter_by(slug=job_id).first()
# And return a response containing the state and the result
return jsonify(
{
"state": data.state,
"result": data.result,
}
)
Excellent. It looks like we’ve got our backend code almost ready! Before we implement the frontend part, where the magic will happen, let’s sum up the data flow in our web app:
- Sidney visits our site and provides some data.
- We store the data in the database.
- Our background script constantly monitors the database and picks up new data to be processed, then updates its state in the base again.
- Meanwhile the information about Sidney’s request is sent to their browser as well, where the code will perform queries to our API about the state of the request.
- When the frontend code sends a query about the request that it received, some response is returned by our backend code (API) and the information about the query is updated in Sidney’s browser.
- Once the data is processed and Sydney’s got their result, the code in the browser stops the polling.
In order for that to work, we need to create the code in Sidney’s browser
which will store information about their original request. We will use a
template and pass the information about the original request in its
context. Let’s add the missing part to the main
view:
from flask import render_template # update the flask imports
...
@app.route("/")
def main():
# Process Sidney's request
ip = request.headers.get("X-Real-Ip", "")
now = datetime.utcnow().isoformat()
job_id = f"{ip}{now}"
# Store the request info in the database
...
# Serve a self-updating template which knows about the original request
return render_template(
"main.html", # template's name
job_id=job_id # template's context
)
The remaining part is the template itself. We will keep it under the
templates/
directory. Let’s make a super simple body which will display
the information about the request’s state and result:
<body>
<div class="content">
<h1>Hello to Async Work in Web Apps Dummy Site</h1>
<p>Status of your request: <span id="status"></span></p>
<p>Your result: <span id="result">?</span></p>
</div>
</body>
The span
tags with ids will be updated by the JavaScript which we put in
the head
tag. In respect for the elders and tradition we’ll use good old
jQuery here (since that is what our help page suggests). See how $('#status' )
and $( '#result' )
match the ids of the span
tags in the body. Also, you
may notice that doPoll
function is a recursive one:
function doPoll(){
$.post(
'/query', // <- We're hitting our /query API endpoint
{ id: job_id }, // <- with this payload (passing request_id
// which will be stored in the template)
function(data) { // <- when the response comes, we'll process
// its data with this function
$( '#status' ).text( data.state );
if (data.result == 0) {
// We're setting the timeout for 5 seconds
// so the page will update itself
// every 5 seconds until we get a non-zero result
setTimeout(doPoll, 5000);
} else {
// When we finally get the result we update it on the page
// and stop the polling.
$( '#result' ).text( data.result );
}
}
);
}
// Here we actually instruct the browser to call the function for the effect:
$(document).ready(function(){
doPoll();
});
Since we’re using this code inside a template and we know about the
job_id
from its context, in the final code we’ll have to adjust the
payload we send to the API to use the data from the template’s context
(that’s the {{ job_id|tojson }}
part – remember the job_id
parameter
from the render_template
function in the main
view?).
Let’s put everything together:
|
|
Et voilà! We have all of the pieces of our web app ready. You may now
reload the app on the Web page, start the processing.py
script in a console
and open multiple browser tabs with our site. You should see how
consecutive tabs update while the script is processing the data.
Final remarks¶
All right – someone might ask – but what kind of problem did we exactly
solve? We cheated a bit in the processing.py
where we set the processing
time to be at most 10 seconds, while at the beginning we were testing 5
minutes unbearable wait which eventually made our web app crash! That’s a
fair point. What we actually achieved is not a final application, it’s
rather a boilerplate which should be refactored and expanded to meet the
real world use cases. The code we have now will make our web app work,
even if the processing is longer than 5 minutes but, let’s be frank, that’s
a dummy web app which reflects the nature of our only user, Sidney, who
only wants to eat as many cookies as possible, and have even more in the
same time!
Apart from a few technical details of the implementation we’ve learned however that we can dismantle the time consuming processes with discernment of different categories (data, calculations and actions). It’s very likely that the most time consuming part of your web app is related to some actions or calculations that are not strictly related to the user’s request, and may be performed asynchronously (our dummy app performs everything in a synchronous manner). The fully implemented solution on PythonAnywhere would probably have one or many Always-on tasks running in the background which would perform those actions and calculations, leaving only the necessary bit waiting for the user’s input.
But that’s the fun I’ll leave to you. And me? I’m going, om nom nom, to eat, nom nom, that brownie now, nom! Will have the baklava later… om nom nom!
Full code¶
File structure¶
12:56 ~/asyncinweb $ tree
.
├── config.py
├── flask_app.py
├── processing.py
└── templates
└── main.html
config.py
¶
|
|
flask_app.py
¶
|
|
templates/main.html
¶
|
|
processing.py
¶
|
|
-
If you want to follow a tutorial on how to create a simple Flask web app with a database on PythonAnywhere, check this post. ↩︎
-
I’ve learned that distinction from Eric Normand’s Grokking Simplicity book. ↩︎
-
You can use the
>>> Run
button while editing theflask_app.py
file in our in-browser editor which will start a Python console and import the database object for you. If you prefer to start Python REPL differently (there are many possibilities on PythonAnywhere!), don’t forget to importdb
object from theflask_app.py
file. ↩︎