COMS W4118 Operating Systems I

Run Queues and Wait Queues

Process States

Task states about runnability: include/linux/sched.h

/* Used in tsk->state: */
#define TASK_RUNNING            0x0000
#define TASK_INTERRUPTIBLE      0x0001
#define TASK_UNINTERRUPTIBLE    0x0002
...

TASK_RUNNING: the task is runnable – either currently running or on a run queue waiting to run

TASK_INTERRUPTIBLE: the task is sleeping waiting for some condition to exist – can be awakened prematurely if it receives a signal

TASK_UNINTERRUPTIBLE: the task is sleeping waiting for some condition to exist – cannot be awakened prematurely if it receives a signal

OSTEP 4.4 summary:

Run Queue

runqueue

task_structs are linked together in hierarchy via children/sibling list_heads.

Per-CPU run_queue links tasks with state TASK_RUNNING (i.e. running and runnable tasks).

Need a separate list_head embedded into task_struct for run queue:

Wait Queue

waitqueue

Per-event wait_queue links tasks with state TASK_INTERRUPTIBLE/TASK_UNINTERRUPTIBLE (i.e. blocked tasks) that are waiting for the same condition to exist

Compare with run_queue: wait queue entry is NOT embedded into task_struct.

Wait queue kernel data structures (psuedo-code):

struct wait_queue_head {
        spin_lock_t lock;
        struct list_head task_list;
};

struct waitqueue {
        struct task_struct *task;
        wait_queue_func_t func; // callback function, e.g. try_to_wake_up()
        struct list_head entry;
};

wait_event_interruptible() macro, defined in wait.h (kernel 3.12.74, chosen for simplicity):

#define __wait_event_interruptible(wq, condition, ret)              \
do {                                                                \
    DEFINE_WAIT(__wait);                                            \
                                                                    \
    for (;;) {                                                      \
            prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE);      \
            if (condition)                                          \
                    break;                                          \
            if (!signal_pending(current)) {                         \
                    schedule();                                     \
                    continue;                                       \
            }                                                       \
            ret = -ERESTARTSYS;                                     \
            break;                                                  \
    }                                                               \
    finish_wait(&wq, &__wait);                                      \
} while (0)

/**
* wait_event_interruptible - sleep until a condition gets true
* @wq: the waitqueue to wait on
* @condition: a C expression for the event to wait for
*
* The process is put to sleep (TASK_INTERRUPTIBLE) until the
* @condition evaluates to true or a signal is received.
* The @condition is checked each time the waitqueue @wq is woken up.
*
* wake_up() has to be called after changing any variable that could
* change the result of the wait condition.
*
* The function will return -ERESTARTSYS if it was interrupted by a
* signal and 0 if @condition evaluated to true.
*/
#define wait_event_interruptible(wq, condition)                     \
({                                                                  \
    int __ret = 0;                                                  \
    if (!(condition))                                               \
            __wait_event_interruptible(wq, condition, __ret);       \
    __ret;                                                          \
})

Wait event loop:

  1. prepare_to_wait(): add yourself to wait queue, change state to TASK_INTERRUPTIBLE
  2. signal_pending(): check for “spurious wakeup”, i.e. signal interrupted sleep before condition was met
    • break out of loop instead of sleeping
  3. schedule(): put yourself to sleep
  4. finish_wait(): change state to TASK_RUNNING, remove yourself from the wait queue

Perform 1-4 in a loop since there can be multiple waiters on the same condition (e.g. recall condition variable from L06-thread), spurious wakeup

Additional notes:

Scheduling Basics

Core scheduling logic defined in kernel/sched/core.c

schedule():

  1. pick_next_task() choose a new task to run from the run queue
  2. context_switch(): put current task to sleep, start running new task

Wait Queue Scheduling

Sleeping:

  1. wait_event()
  2. Enqueued on wait queue
  3. schedule()
  4. Remove from run queue
  5. pick_next_task()
  6. context_switch()
  7. Other task runs

Waking up

  1. Task signals event: wake_up()
  2. Call try_to_wake_up() on each task
  3. Enqueue each task on run queue
  4. Eventually other task calls schedule() and previously sleeping task gets chosen
  5. Previously sleeping thread checks condition
    • if true, finish_wait() to remove from wait queue and continue execution
    • else, repeat 3-6 from “Sleeping” above

Process State Transitions

proc-state-transitions

When a user process makes read() system call, for example:

  1. Trap into kernel
    • save registers into per-proc kernel stack
  2. Device driver issues an I/O request to the device

  3. Put the calling process to sleep
    • wait_event() -> schedule() -> pick_next_task() -> context_switch()
  4. Another process starts running

  5. The device completes the I/O request and raises a hardware interrupt

  6. Trap into kernel and jump to the interrupt handler
    • wake_up(): enqueue blocked tasks back on run queue
    • Current task eventually calls schedule() -> pick_next_task() -> context_switch()
  7. Another process starts running
    • this process may or may not be the one that called read()

Last updated: 2023-02-19