COMS W4118 Operating Systems I

System Calls

Virtual Address Space Revisited

Recall strace: shows syscalls invoked during program execution.

Heap

malloc() does not appear in the strace, because it is not a syscall. How does process increase the size of its heap?

virtaddr1

See the following lines in strace output:

brk(NULL)                               = 0x558d52610000
brk(0x558d52631000)                     = 0x558d52631000

brk(): changes the location of the program break, which defines the end of the process’s data segment (i.e., the program break is the first location after the end of the uninitialized data segment). Increasing the program break has the effect of allocating memory to the process; decreasing the break deallocates memory.

A program’s break is the address of the top of its heap. brk(NULL) gets the current process break and break(addr) sets the break to addr.

File-backed mappings

Program setup involves mapping in the C standard library:

openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
mmap(0x7f6e911ec000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f6e911ec000
close(3)

Recall from L05-ipc: use mmap() to create a file-backed mapping in virtual memory: mmap

Region of virtual memory refers to a “snapshot” of the file in memory, i.e. a “page”.

Full Reveal: Virtual Address Space

All regions of virtual memory map to a page of physical memory:

virtaddr2

All processes have a direct mapping to shared kernel code & data

Kernel mapping is above the stack, but is not accessible to userspace program because of memory protection. How do syscalls safely access kernel mapping?

Processor Modes

CPU runs with a privilege mode that determines what kind of operations it can perform:

A userspace program (running in user mode) can issue a syscall (e.g. read()) to enter kernel mode so that it can perform a privileged operation (e.g. issue I/O request).

priv-mode

Syscalls act as predefined entry-points into the kernel. They allow userspace programs to “trap” into the kernel to perform a privileged operation and then “return-from-trap” back to user mode.

Interrupts

Need some sort of indicator to tell the processor to stop executing user code and trap into the kernel to perform privileged operations.

Three kinds of interrupts:

Simplified CPU hardware execution loop:

while (1) {
    if (interrupt or exception) {
        n = interrupt/exception type
        call interrupt handler n
    }

    fetch next instruction
    if (instruction == int n)
        call interrupt handler n
    else
        run instruction
}

Note: do not confuse interrupts with userspace signals!

Linux System Call Dispatch

dispatch

User program invokes read():

  1. libc system call wrapper invokes software interrupt 0x80 (system call)
    • Places syscall number __NR_read into %eax register
  2. Trap into kernel mode, look up 0x80 in Interrupt Descriptor Table (IDT)
  3. Jump to interrupt 0x80’s handler: system_call()
  4. system_call() looks up __NR_read in sys_call_table
  5. Unpack registers, call __NR_read’s handler: sys_read()
  6. Perform read()’s real work in sys_read()
    • file entry management, I/O requests, copying data, etc.

Notes:

System Call Parameters

Syscall parameters are passed via registers

Memory validation

We can’t let user programs trick the kernel using malicious addresses:

System calls must validate pointer parameters before copying:

// /include/linux/uaccess.h

static __always_inline unsigned long __must_check
copy_to_user(void __user *to, const void *from, unsigned long n);

static __always_inline unsigned long __must_check
copy_from_user(void *to, const void __user *from, unsigned long n);

Last updated: 2023-02-19