Async work in Web Apps or – Have Your Cake and Eat It Too


This post is intended for users who begin their adventure with web applications. You’ll find below how to structure a web app that relies on heavy processing of the input data – processing that takes so long that you can’t do it inside a request handler with a five-minute timeout, or at least so long that you don’t want to risk slowing down your website by doing it inside the website’s own code. You want it to happen in the background. You’ll see an example implementing some hints from the “Async work in Web apps” help page, which involve writing a jQuery script polling simple API endpoint communicating with a database updated by an external script (so there will be some sqlalchemy stuff too).

Intro

What you will learn

The main goal of this entry, however, is to show a solution to an architectural problem in a web application rather than to teach how to write some JavaScript code or how to manage MySQL queries and connections using Python. So, please, bear this in mind as some parts of the code will not be exhaustively explained line by line. For the same reason, the code presented here should be treated as an example and an inspiration, not as a production-ready web application.

Requirements

To implement it fully on PythonAnywhere, you’d probably need an Always-on task feature, which is a paid one, but there could be scenarios when a simple Scheduled task would suffice. You can however test this code on a free account (as the development process will use features available for all our users). If you want to follow with your own web app as you read:

  • create a default Flask app (using the “Add a new web app” wizard in the Web page)1 in the ~/asyncinweb directory in your PythonAnywhere account
  • create a MySQL database (on the Databases page) called asyncinweb (and make sure you know your MySQL password)
  • the code assumes Python 3.8+ and SQLAlchemy 1.4+ (so you may need to upgrade it and Flask_SQLAlchemy as well).

Problem

Imagine you have a web app which you want to use as a means to present some data in response for a user’s request. The whole point of it consists of the value added by your app to the input provided by the user. That usually depends on pre-processing of the input data, often involves gathering other data from external (other web services) or internal (database) sources and always requires some kind of processing of the gathered material. The processing is your added value.

Let’s illustrate this scenario with a sample Flask web app. The app will have two endpoints – the main page, which greets the user, and /processing – where user can put their data and get a result displayed.

Our example is an abstract one. There may be countless implementations of this scenario. How should we represent the processing in our example to make it congruent with the real world application? We could insert a line with some calculations, but we can easily think of calculations as timeless operations, whereas our processing will most likely involve multiple actions that are time-consuming (performing a database query, connecting to an external service, etc.). In such a case, the most adequate way of representing the processing in our web app will by a time lapse:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
from time import sleep

from flask import Flask

app = Flask(__name__)


@app.route("/")            # <- this is our main page
def hello_world():
    return "Hello world!"


@app.route("/processing")  # <- this is where user gets the "answer"
def processing():
    # The following 'sleep' represents time needed to process
    # the user's request. Let's make it 5 minutes and 1 second long.
    sleep(5 * 60 + 1)
    return "Here's your processed data!"

Code Snippet 1: What happens when we hit the /processing endpoint?

Now, what happens when a user hits the /processing endpoint? If you’re patient, you may try it by yourself. (If you are patient – which, with all due respect to your personal virtues, you’re not! At least when it comes to web apps. And that’s OK, because we expect web apps to be fast. That’s the whole point, isn’t it?) Let me suffer on your behalf: if you’d wait enough, you’d see the infamous “Something went wrong :-(” message on the screen with an indication of a 50x error, and some similar lines in your web app’s server log:

2021-09-15 15:07:17 Wed Sep 15 15:07:16 2021 - *** HARAKIRI ON WORKER 2 (pid: 6, try: 1) ***
2021-09-15 15:07:17 Wed Sep 15 15:07:16 2021 - HARAKIRI !!! worker 2 status !!!
2021-09-15 15:07:17 Wed Sep 15 15:07:16 2021 - HARAKIRI [core 0] 10.0.0.224 - GET /processing since 1631718135
2021-09-15 15:07:17 Wed Sep 15 15:07:16 2021 - HARAKIRI !!! end of worker 2 status !!!
2021-09-15 15:07:17 DAMN ! worker 2 (pid: 6) died, killed by signal 9 :( trying respawn ...
2021-09-15 15:07:17 Respawned uWSGI worker 2 (new pid: 11)
2021-09-15 15:07:17 spawned 2 offload threads for uWSGI worker 2
Code Snippet 2: This happens! (Excerpt from the server log)

BTW – workers are such honorable guys!

In PythonAnywhere, we assumed that a reasonable time to process a request shouldn’t be longer than brewing a good tea, that is 5 minutes (just imagine what is going on in the parallel Steam Punk universe where our alter egos decided that their difference engines shouldn’t timeout before a good pipe smoke, that is, at least 45 minutes!). If the tea is brewed, but the response is not ready, we consider the process has probably hung and it needs to be restarted. Also, web apps have a limited number of workers (processes) available. If all processes are hanging, your web app becomes slow or even unresponsive.

So, you have a web app which should serve some heavy processed data. And you have a problem: you want to have a cake, and eat it too.

Solution

Problem analysis

“You can’t have a cake and eat it.” Obviously. Unless… you realize that we are talking about multiple cakes here! The expressiveness of that figure of speech relies on an ellipsis which tricks our brain to merge two pieces into one. If someone told me “you can’t have a brownie and eat a baklava”, I’d reply “Why not?”. However, with some crucial information omitted (ellipsis achieved by using generic names instead of specific ones), I’m ready to assume that the prohibition referred to the cake that I have had previously eaten.

Let’s have a look at our code again:

13
14
15
16
17
@app.route("/processing")
def processing():
    # This is our "cake":
    sleep(5 * 60 + 1)        # we can't process it...
    return "Processed data!" # and serve it in the same time

Code Snippet 3: How many cakes do you see?

Well, you can’t digest (process) the cake and serve it too! And even if you could… (Crap, we need to stop this analogy right now!) Let’s dismantle our sleep metaphor.

We already know that our web app serves some processed data as a response to a user’s request. (BTW – the user revealed that their name is Sidney.) Let’s distinguish that in our example. In production code, the user would be probably recognized by your app as logged in (so there would be some authorization involved). We will keep our code as simple as possible, so I’m going to use the request’s IP and date to have a unique representation of the user:

from datetime import datetime

from flask import Flask, request

...


@app.route("/processing")
def processing():
    # Here's Sidney's request (aka "Cake 1")
    ip = request.headers.get("X-Real-Ip", "")
    now = datetime.utcnow().isoformat()
    user = f"{ip}{now}"

    # Here we do the processing (aka "Cake 2")
    sleep(5 * 60 + 1)

    # Here we're serving the result (aka "Cake 3")
    return f"Hi {user}, here's your answer!"

All right! We’ve got at least three things going on here:

  1. getting some data from Sidney’s request
  2. doing the heavy, time-consuming processing
  3. serving the result.

It’s high time to replace our cake metaphor and use more meaningful descriptors. Our first step consists on getting some input data (and possibly involves some actions and calculations on it). The next step is represented in our example as a wait with a simple calculation but in reality it would involve acquiring more data, performing additional actions and calculations. The last step is an action of serving the result. We can distinguish between data, calculations and actions conceptually and operationally 2.

Now we can ask ourselves which steps are inherently related to our web app. We can’t remove Sidney’s request, nor we can remove our response. We can however remove the processing, and the web app will be still usable as a web app (we need it as a means of getting the input data and a tool to perform the action of serving the result). That should drastically improve our web app’s performance time-wise!

When we pull the time consuming part out of the web app to an independent script (that is not being executed by the web app directly) another problem occurs – how can the script and the web app communicate? We need some sort of bridge between them, so the input data is transmitted downwards to the script, and the output data is transmitted upwards to the web app. Since we will be transmitting data, the means to store it will be obviously a database.

The solution to our problem will consist of separating the processing and the web app, and adding an intermediary element allowing the communication. That will add some complexity (verbosity) to our application, but that’s the way we’re discharging the fake simplicity of the original “elliptic” code which will lead to failure. And – last but not least – we will eat the cake and have it too!

Let’s have a look at a sample implementation of that solution.

Implementation

We will no longer need the /processing endpoint in the web app, that should become a script called processing.py instead. Since the script and the web app will both connect to the same database, they should share the same config.

Let’s create the config.py file (if you skipped the requirements section, you may want to check it now):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from getpass import getuser


# Database setup
username = getuser()        # you may put your username instead
password = "c00kieMonst3r"  # use your MySQL password
hostname = f"{username}.mysql.pythonanywhere-services.com"
databasename = f"{username}$asyncinweb"

SQLALCHEMY_DATABASE_URI = (
    f"mysql://{username}:{password}@{hostname}/{databasename}"
)
SQLALCHEMY_ENGINE_OPTIONS = {"pool_recycle": 299}
SQLALCHEMY_TRACK_MODIFICATIONS = False

Code Snippet 4: Our app's shared configuration

We will take care of the database in our flask_app.py. Let’s have a jobs table that will store information about the incoming requests and their processing state (represented by Job instances):

from flask import Flask
from flask_sqlalchemy import SQLAlchemy

app = Flask(__name__)
app.config.from_object('config')
db = SQLAlchemy(app)

...

class Job(db.Model):
    __tablename__ = "jobs"
    id = db.Column(db.Integer, primary_key=True)
    slug = db.Column(db.String(64), nullable=False)
    state = db.Column(db.String(10), nullable=False, default="queued")
    result = db.Column(db.Integer, default=0)

(Don’t forget to run db.create_all() in a Python console before launching the web app!3)

We will use the slug only as the representation of the user query; state will be “queued” (awaiting), “processing” or “completed”.

Once we have the storage for our data, we can use it. First we need to take care of putting the incoming requests into the queue:

@app.route("/")
def main():
    # Process Sidney's request
    ip = request.headers.get("X-Real-Ip", "")
    now = datetime.utcnow().isoformat()
    job_id = ip + now

    # Store the request info in the database
    data = Job(slug=job_id)
    db.session.add(data)
    db.session.commit()

    return f"Hi, your query is: {job_id}..."

All right! Whenever someone hits our main page, we will store the request information in the database. On this stage we can move to the processing part. In our simplistic example we need to do only two things: check for a pending request in the database and perform the processing if needed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
from random import randrange
from time import sleep

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from config import SQLALCHEMY_DATABASE_URI, SQLALCHEMY_ENGINE_OPTIONS
from flask_app import Job

# Database connection setup
engine = create_engine(
    SQLALCHEMY_DATABASE_URI, **SQLALCHEMY_ENGINE_OPTIONS
)
Session = sessionmaker(engine)


def get_pending_job():
    with Session.begin() as session:
        queue = session.query(Job).filter_by(state="queued")
        if job := queue.first():
            job.state = "processing"
            return job.slug


def process_job(slug):
    print(f"Processing job: {slug}...", end=" ", flush=True)

    # The heavy processing happens here:
    # I use a short wait time here to ease development,
    # but you can experiment with time > 5 min
    # and see if the web app will manage it!
    result = randrange(1, 10)
    sleep(result)

    # With processing done, we update the db record:
    with Session.begin() as session:
        session.query(Job).filter_by(slug=slug).update(
            {"result": result, "state": "completed"}
        )

    print(f"done after {result} seconds!")


if __name__ == "__main__":
    while True:
        if slug := get_pending_job():
            process_job(slug)
        else:
            # We don't need to continuously hammer the database
            # if there are no requests coming in, so let's
            # give it a break!
            sleep(1)

Code Snippet 5: Let's do the lifting!

Now, if we run this script in a console and start hitting our web app, we should see something like this:

Processing job: 117.50.83.792021-09-24T13:20:46.002222... done after 3 seconds!
Processing job: 117.50.83.792021-09-24T13:20:46.773984... done after 8 seconds!
Processing job: 117.50.83.792021-09-24T13:20:47.358036... done after 5 seconds!

We have the processing part done, now we need to show Sidney the result they’re waiting for. Let’s go back to our web app code.

@app.route("/")
def main():
    # Process Sidney's request
    ...
    # Store the request info in the database
    ...
    return f"Hi your query is: {job_id}..."

So far we’re only greeting the user whereas we want to keep them updated and finally show the result. Moreover, we want our site to refresh automatically when the state of the request changes. To achieve that, we will need some extra code running in the user’s browser. We keep the state of the request recorded in the database, so the frontend code needs to know about the request id and perform queries about it. However, we don’t want to perform direct database queries from the browser because that would expose our database. In order to keep the database hidden we will need one more step – creating a simple API endpoint which will be hit by the JavaScript code in the browser. Let’s do that first by adding an extra view in the flask_app.py:

from flask import jsonify  # add this to flask imports

...


@app.route("/query", methods=["POST"])
def query():
    # The id of the queried request comes in with a new request
    # sent from the frontend JS code
    job_id = request.form["id"]
    # Now we can ask database about the state of that request
    data = Job.query.filter_by(slug=job_id).first()
    # And return a response containing the state and the result
    return jsonify(
        {
            "state": data.state,
            "result": data.result,
        }
    )

Excellent. It looks like we’ve got our backend code almost ready! Before we implement the frontend part, where the magic will happen, let’s sum up the data flow in our web app:

  1. Sidney visits our site and provides some data.
  2. We store the data in the database.
  3. Our background script constantly monitors the database and picks up new data to be processed, then updates its state in the base again.
  4. Meanwhile the information about Sidney’s request is sent to their browser as well, where the code will perform queries to our API about the state of the request.
  5. When the frontend code sends a query about the request that it received, some response is returned by our backend code (API) and the information about the query is updated in Sidney’s browser.
  6. Once the data is processed and Sydney’s got their result, the code in the browser stops the polling.

In order for that to work, we need to create the code in Sidney’s browser which will store information about their original request. We will use a template and pass the information about the original request in its context. Let’s add the missing part to the main view:

from flask import render_template  # update the flask imports

...

@app.route("/")
def main():
    # Process Sidney's request
    ip = request.headers.get("X-Real-Ip", "")
    now = datetime.utcnow().isoformat()
    job_id = f"{ip}{now}"

    # Store the request info in the database
    ...

    # Serve a self-updating template which knows about the original request
    return render_template(
        "main.html",   # template's name
        job_id=job_id  # template's context
    )

The remaining part is the template itself. We will keep it under the templates/ directory. Let’s make a super simple body which will display the information about the request’s state and result:

<body>
  <div class="content">
    <h1>Hello to Async Work in Web Apps Dummy Site</h1>
    <p>Status of your request: <span id="status"></span></p>
    <p>Your result: <span id="result">?</span></p>
  </div>
</body>

The span tags with ids will be updated by the JavaScript which we put in the head tag. In respect for the elders and tradition we’ll use good old jQuery here (since that is what our help page suggests). See how $('#status' ) and $( '#result' ) match the ids of the span tags in the body. Also, you may notice that doPoll function is a recursive one:

function doPoll(){
  $.post(
    '/query',         // <- We're hitting our /query API endpoint
    { id: job_id },   // <- with this payload (passing request_id
                      // which will be stored in the template)
    function(data) {  // <- when the response comes, we'll process
                      // its data with this function
      $( '#status' ).text( data.state );
      if (data.result == 0) {
        // We're setting the timeout for 5 seconds
        // so the page will update itself
        // every 5 seconds until we get a non-zero result
        setTimeout(doPoll, 5000);
      } else {
        // When we finally get the result we update it on the page
        // and stop the polling.
        $( '#result' ).text( data.result );
      }
    }
  );
}
// Here we actually instruct the browser to call the function for the effect:
$(document).ready(function(){
  doPoll();
});

Since we’re using this code inside a template and we know about the job_id from its context, in the final code we’ll have to adjust the payload we send to the API to use the data from the template’s context (that’s the {{ job_id|tojson }} part – remember the job_id parameter from the render_template function in the main view?).

Let’s put everything together:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="UTF-8">
  <title>Async Work In Web Apps Dummy Site</title>

  <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
  <script>
    function doPoll(){
      $.post('/query', { id: {{ job_id|tojson }} }, function(data) {
        $( '#status' ).text( data.state );
        if (data.result == 0) {
          setTimeout(doPoll, 5000);
        } else {
          $( '#result' ).text( data.result );
        }
      });
    }
    $(document).ready(function(){
      doPoll();
    });
  </script>
</head>

<body>
  <div class="content">
    <h1>Hello to Async Work in Web Apps Dummy Site</h1>
    <p>Status of your request: <span id="status"></span></p>
    <p>Your result: <span id="result">?</span></p>
  </div>
</body>

</html>

Code Snippet 6: The final template with auto-updating code.

Et voilà! We have all of the pieces of our web app ready. You may now reload the app on the Web page, start the processing.py script in a console and open multiple browser tabs with our site. You should see how consecutive tabs update while the script is processing the data.

Final remarks

All right – someone might ask – but what kind of problem did we exactly solve? We cheated a bit in the processing.py where we set the processing time to be at most 10 seconds, while at the beginning we were testing 5 minutes unbearable wait which eventually made our web app crash! That’s a fair point. What we actually achieved is not a final application, it’s rather a boilerplate which should be refactored and expanded to meet the real world use cases. The code we have now will make our web app work, even if the processing is longer than 5 minutes but, let’s be frank, that’s a dummy web app which reflects the nature of our only user, Sidney, who only wants to eat as many cookies as possible, and have even more in the same time!

Apart from a few technical details of the implementation we’ve learned however that we can dismantle the time consuming processes with discernment of different categories (data, calculations and actions). It’s very likely that the most time consuming part of your web app is related to some actions or calculations that are not strictly related to the user’s request, and may be performed asynchronously (our dummy app performs everything in a synchronous manner). The fully implemented solution on PythonAnywhere would probably have one or many Always-on tasks running in the background which would perform those actions and calculations, leaving only the necessary bit waiting for the user’s input.

But that’s the fun I’ll leave to you. And me? I’m going, om nom nom, to eat, nom nom, that brownie now, nom! Will have the baklava later… om nom nom!

Full code

File structure

12:56 ~/asyncinweb $ tree
.
├── config.py
├── flask_app.py
├── processing.py
└── templates
    └── main.html

config.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from getpass import getuser


# Database setup
username = getuser()        # you may put your username instead
password = "c00kieMonst3r"  # use your MySQL password
hostname = f"{username}.mysql.pythonanywhere-services.com"
databasename = f"{username}$asyncinweb"

SQLALCHEMY_DATABASE_URI = (
    f"mysql://{username}:{password}@{hostname}/{databasename}"
)
SQLALCHEMY_ENGINE_OPTIONS = {"pool_recycle": 299}
SQLALCHEMY_TRACK_MODIFICATIONS = False

flask_app.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
from datetime import datetime

from flask import Flask, request, render_template, jsonify
from flask_sqlalchemy import SQLAlchemy

app = Flask(__name__)
app.config.from_object('config')
db = SQLAlchemy(app)


@app.route("/")
def main():
    ip = request.headers.get("X-Real-Ip", "")
    now = datetime.utcnow().isoformat()
    job_id = f"{ip}{now}"

    data = Job(slug=job_id)
    db.session.add(data)
    db.session.commit()

    return render_template("main.html", job_id=job_id)


@app.route("/query", methods=["POST"])
def query():
    job_id = request.form["id"]
    data = Job.query.filter_by(slug=job_id).first()
    return jsonify(
        {
            "state": data.state,
            "result": data.result,
        }
    )


class Job(db.Model):
    __tablename__ = "jobs"
    id = db.Column(db.Integer, primary_key=True)
    slug = db.Column(db.String(64), nullable=False)
    state = db.Column(db.String(10), nullable=False, default="queued")
    result = db.Column(db.Integer, default=0)

templates/main.html

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="UTF-8">
  <title>Async Work In Web Apps Dummy Site</title>

  <script src="https://code.jquery.com/jquery-3.6.0.min.js"></script>
  <script>
    function doPoll(){
      $.post('/query', { id: {{ job_id|tojson }} }, function(data) {
        $( '#status' ).text( data.state );
        if (data.result == 0) {
          setTimeout(doPoll, 5000);
        } else {
          $( '#result' ).text( data.result );
        }
      });
    }
    $(document).ready(function(){
      doPoll();
    });
  </script>
</head>

<body>
  <div class="content">
    <h1>Hello to Async Work in Web Apps Dummy Site</h1>
    <p>Status of your request: <span id="status"></span></p>
    <p>Your result: <span id="result">?</span></p>
  </div>
</body>

</html>

processing.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
from random import randrange
from time import sleep

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from config import SQLALCHEMY_DATABASE_URI, SQLALCHEMY_ENGINE_OPTIONS
from flask_app import Job


engine = create_engine(
    SQLALCHEMY_DATABASE_URI, **SQLALCHEMY_ENGINE_OPTIONS
)
Session = sessionmaker(engine)


def find_pending_job():
    with Session.begin() as session:
        queue = session.query(Job).filter_by(state="queued")
        if job := queue.first():
            job.state = "processing"
            return job.slug


def process_job(slug):
    print(f"Processing job: {slug}...", end=" ", flush=True)

    result = randrange(1, 10)
    sleep(result)

    with Session.begin() as session:
        session.query(Job).filter_by(slug=slug).update(
            {"result": result, "state": "completed"}
        )

    print(f"done after {result} seconds!")


if __name__ == "__main__":
    while True:
        if slug := find_pending_job():
            process_job(slug)
        else:
            sleep(1)

  1. If you want to follow a tutorial on how to create a simple Flask web app with a database on PythonAnywhere, check this post↩︎

  2. I’ve learned that distinction from Eric Normand’s Grokking Simplicity book. ↩︎

  3. You can use the >>> Run button while editing the flask_app.py file in our in-browser editor which will start a Python console and import the database object for you. If you prefer to start Python REPL differently (there are many possibilities on PythonAnywhere!), don’t forget to import db object from the flask_app.py file. ↩︎

comments powered by Disqus