If You Wish to Truly Understand Frappe's Scheduler, You Must First Invent the Universe

Like an apple pie, many things that seem simple at first are extremely complex; it's just that we choose to ignore the complexity for practical purposes.

Ankush

24 May 2024

min read

Carl Sagan famously said, "If you wish to make an apple pie from scratch, you must first invent the universe". Like an apple pie, many things that seem simple at first are extremely complex; it's just that we choose to ignore the complexity for practical purposes. In this post, I'll try to go as low as I can in abstractions of Frappe Scheduler, but I don't promise a recipe for inventing the universe.

Frappe's scheduler is one of the features that truly make it "batteries-included". It lets you specify recurring jobs with frequency or even CRON expressions. It lets you update these config in real-time without restarting scheduler process. It even lets you schedule custom code written from UI a.k.a. Server Scripts.

So how does all this work behind the scenes? It's surprisingly simple on surface and increasingly complex the deeper you go. Here's the rough python code for scheduler.py with important bits.

def start_scheduler() -> NoReturn:
    scheduler_tick = get_scheduler_tick()

    while True:
        time.sleep(scheduler_tick)
        for site in get_sites():
            for job in get_all_jobs(site):
                if job.is_due():
                    job.enqueue()

The scheduler is just an infinite loop that keeps checking if any job on any site is due, and then enqueues it. As I said, it is surprisingly simple on the surface, except that scheduler tick business, which seems interesting.

Scheduler Tick

time.sleep(scheduler_tick)

What is that scheduler tick? Well, that tick long sleep is what prevents this infinite loop from hammering CPU constantly. Most Frappe jobs do not have granularity lower than a minute, so checking for pending jobs constantly is busy looping with little to no benefit. By default Frappe uses scheduler tick of 60 seconds to sleep for 60 seconds after every loop.

Scheduling Delays

If we sleep for 60 seconds between each loop, it implies that your job can be delayed by 30 seconds on average. This small error is usually acceptable compared to amount of CPU wastage we'd incur from busy looping.

Except, it's not always 30 seconds. On multi-tenant system we don't want to enqueue all long-running jobs at 12:00 AM. That would add significant sustained load on the server and might degrade overall performance, so we also add some scheduling jitter to delay long-running jobs randomly by up to 10 minutes. Is this ideal? No. But rarely anyone cares if their birthday reminders are sent at 12:00 AM or 12:10 AM.

Here's code for adding jitter. It is, again, surprisingly simple.

next_execution = get_next_execution() # based on cron expression
next_execution += timedelta(seconds=randint(1, 600))

That wasn't so hard to understand so far, so where is the complexity I promised?

time.sleep()

How do you think time.sleep() works? A conceptual model of time.sleep in Python might look something like this:

def sleep(seconds):
    now = datetime.now()
    wait_untill = datetime.now() + timedelta(seconds=seconds)
    while wait_untill > datetime.now():
        continue
    return

If you implement this, it will totally work as expected except we just introduced busy looping again. Well, at least we are not calling the database in a busy loop so it is an improvement right? Surely, it can't be this dumb? Let's see CPU usage of scheduler when it's idle.

    PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 110408 ankush     39  19  114M 75184 22272 S  0.0  0.5  0:22.84 frappe-scheduler: idle

It's 0% and total usage is 22 seconds even though it has been running for an hour, so it's definitely NOT busy-looping. That means Python's implementation of sleep is more efficient than our constructed mental model.

Finding what `time.sleep` actually does.

The best tool for understanding abstractions is "Go To Definition" in your code editor. So I tried it on time.sleep. It gives you def sleep(seconds: float, /) -> None: ... which is just a type-stub instead of an implementation, this implies it's actually implemented in C.

Now, we could go read cpython code and figure out what time.sleep is doing but doing this sleep efficiently seems like operating system's job. So easier method is to just trace system calls and take it from there. Linux has excellent tool called strace to trace all system calls that are being made. So let's write a trivial sleep program and trace its system calls.

# sleep.py
import time
time.sleep(5)

$ strace python sleep.py

This gives a lot of output, but it's easy to find interesting system call because we see 5 second pause around it.

read(3, "import time\n\ntime.sleep(5)\n", 4096) = 27
lseek(3, 0, SEEK_SET)                   = 0
read(3, "import time\n\ntime.sleep(5)\n", 4096) = 27
read(3, "", 4096)                       = 0
close(3)                                = 0
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=30266, tv_nsec=649307312}, NULL) = 0
# ****** No output for 5 seconds ******
rt_sigaction(SIGINT, {...}, {...}, 8) = 0

Aha, clock_nanosleep seems like most likely candidate and sure enough if you check man pages it is exactly what we were looking for.

CLOCK_NANOSLEEP(2)           Linux Programmer's Manual           CLOCK_NANOSLEEP(2)

NAME
       clock_nanosleep - high-resolution sleep with specifiable clock

DESCRIPTION
       clock_nanosleep() allows the calling thread to sleep for
       an interval specified with nanosecond precision.
       ...
       clock_nanosleep() suspends the execution of the calling thread until  either
       at least the time specified by request has elapsed, or a signal is delivered
       that causes a signal handler to be called or that terminates the process.

It is schedulers all the way down

A system call is already so far down in layers of abstractions that one might be forgiven for not knowing how it actually works, but where's the fun in that? So how does Linux puts your thread to sleep efficiently and then wakes it up when the time comes? Let's look at the system call again, this time with more focus on the arguments.

clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=30266, tv_nsec=649307312}, NULL) = 0

We put thread to sleep for 5 seconds, yet we can't see 5 anywhere here. What is tv_sec and tv_nsec here? They seem like some translation of our command to sleep for 5 seconds. Let's run it multiple times and see if anything changes.

λ strace -e trace=clock_nanosleep python sleep.py
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=31402, tv_nsec=617991081}, NULL) = 0
λ strace -e trace=clock_nanosleep python sleep.py
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=31411, tv_nsec=9811133}, NULL) = 0
λ strace -e trace=clock_nanosleep python sleep.py
clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, {tv_sec=31416, tv_nsec=752812932}, NULL) = 0

tv_nsec seems random, but tv_sec seems to be increasing roughly 5ish seconds + random time I wasted in between running the script. So maybe that is the Linux Epoch Time? It seems similar to our mental model of sleep looping till datetime.now() + timedelta(seconds=seconds).

If you dig deeper, it is not Epoch time. Linux has continuously incrementing internal time called "Monotonic time" which is different from real world time. This avoids all complexities that come with real world time such as timezones and time synchronization using NTP.

So here's roughly what is happening when we put our thread to sleep:

1. Create a Timer for future time that is `current_monotonic_time + duration`
2. De-schedule the thread, so it doesn't consume CPU time.
3. When Timer expires thread is put back into run queue.

This still sounds like bunch of abstractions to me, not so different from our mental model of how time.sleep works by busy looping. Surely checking expiry of timer will just be yet another busy loop?

Timers, Interrupts and Linux Process Scheduler

It turns out, Linux IS doing a busy loop, although in a much more efficient manner. Linux maintains a tick counter called jiffies which is incremented 250 times a second by an onboard hardware clock using interrupts. On every increment it checks if there are pending expired timers and executes them. That feels very similar to how Frappe scheduler works at a very high level.

If I were to write it as pseudocode, here's a simplified implementation of this whole process:

HZ : Constant = 250
jiffies : Int = 0
timers : List[Timers]= []

# This is called by a hardware clock "HZ" times in a second.
def increment_jiffies():
    global jiffies
    jiffies += 1
    run_expired_timers()

def run_expired_timers():
    for timer in timers:
        if jiffies > timer.expiry:
            timer.callback()
            timers.remove(timer)

def clock_nanosleep(seconds):
    thread = current()
    timer = Timer(expiry = jiffies + seconds * HZ, callback=lambda: thread.schedule())
    timers.append(timer)
    thread.deschedule()

So what now

This was a thinly veiled post for "why you should take an Operating Systems class". Often neglected, this is easily one of most fundamental and interesting concept that every programmer should at least have some idea about. You will learn about scheduling, isolation, virtualisation, performance, concurrency and most importantly the "magic" that makes it all work.

While it's not same as understanding the process of inventing a universe, it comes pretty close for software engineers.

I recommend these books to start with: Operating System: Three Easy Pieces (free) for theory and Linux Kernel Development for practical implementations.

Published by

Ankush

24 May 2024

Success!

Error

Comments

Mania

June 10, 2024

Been on it for some time, implementing compliance-using schedulers to queue jobs when the server are down, been hectic but every day I revisit I get to understand better and come up with a good solution. This post is helpful.

Arun Mathai

May 26, 2024

Cute stuff :)

Ruth

May 25, 2024

Nerding at its very best. This was a great read.

Discussion

Paul Mugambi

3 days

ago

Beautiful read, and an insight into an individual I respect and have learned a lot from. Am inspired to trust the process and never give up.

Anna Dane

5 days

ago

I must say this is a really amazing post, and for some of my friends who provide Best British Assignment Help, I must recommend this post to them.

Add your comment

Comment