We’re experimenting with different ways of balancing resource usage between our users. One problem we’ve been experiencing is that individual users could quite easily suck up almost all of the CPU power on a console server.
cgroups to the rescue!
We’ve set up some cgroups at:
/cgroup/cpuacct/users/$username
and
/cgroup/cpu/user_types/anon (150)
/cgroup/cpu/user_types/free (300)
/cgroup/cpu/user_types/paying (700)
/cgroup/cpu/user_types/tarpit (2)
The cpuacct cgroup lets us keep track of how much CPU time each user is using. Then we use the cpu cgroup by setting a value called cpu.shares, the figure in brackets above, which defines what relative proportion of the processor the user is allowed, at busy times.
We then set a daily allowance for each user type – currently, 100 seconds for free users, 5000 seconds for Hacker users, and 20,000 seconds for Web-Developer users. When a user goes over their daily limit, we move the into the tarpit. We then reset the counters once a day (at different times for each user).
The tarpit is used by the scheduler when the server gets busy – so, if no-one else is using the server, you still get 100% of the CPU. But, once there is contention, your jobs will start to run much more slowly.
It’s not a perfect solution, and sometimes we find ourselves needing to straight-out kill user processes when they’re causing trouble for others (that only happens rarely though). But at least now, users will have some kind of visible warning that they’re exceeding their resource allocation.
There’s more info at www.pythonanywhere.com/tarpit/
We’re keen on feedback! What do you think of this system? How would you improve it? What is it like, from the user’s point of view?