tabletop

Tabletop

Submission

As with previous assignments, we will be using GitHub to distribute skeleton code and collect submissions. Please refer to our Git Workflow guide for a more details. Note that we will be using multiple tags for this assignment: one for each deliverable part.

For students on arm64 computers (e.g. M1/M2 machines): if you want your submission to be built/tested for ARM, you must create and submit a file called .armpls in the top-level directory of your repo; feel free to use the following one-liner:

cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add -f .armpls && git commit .armpls -m "ARM pls"

You should do this first so that this file is present in all parts.

Code Style

There is a script in the skeleton code named run_checkpatch.sh. It is a wrapper over linux/scripts/checkpatch.pl, which is a Perl script that comes with the Linux kernel that checks if your code conforms to the kernel coding style.

Execute run_checkpatch.sh to see if your code conforms to the kernel style – it’ll let you know what changes you should make. We recommend you make those changes.

Passing run_checkpatch.sh with no warnings and no errors is NOT required for this assignment, but will be for the next one. We recommend you get familiar with this workflow now: run the script and fix what it suggests before pushing a tag.

Part 1: Building a Kernel in Debian Linux

Reading

LKD chapter 2 (previously assigned)

Task

Kernel Compilation in Debian Linux

Follow the above guide and compile yourself a pristine (unmodified) kernel from the 5.10.158 Linux source provided in the skeleton repo. You should name it 5.10.158-cs4118, and keep it around as your fallback kernel for all future assignments (including this one), in case you run into any trouble booting into the kernel you’re working on.

Submission

None

Part 2: Set the Table - Adding a Syscall

You’re about to make changes to the pristine kernel source. This means that what you build might not even boot. In order to ensure that you always have the -cs4118 pristine kernel as a fallback to boot into, you should avoid overwriting it by setting the local version in your .config file to something else, like your UNI (for example, -abc1234). Make sure to verify your changes:

$ scripts/diffconfig .config.old .config
LOCALVERSION "-cs4118" -> "-abc1234"

From now on, you should use your UNI as the local version for all modified kernels you build for this course.

You are now ready to add system calls to your kernel.

Reading

LKD Chapter 5: You must read this to understand how adding a system call works in general. Unfortunately, some of the steps described in LKD have changed since the book was written.
What’s Available to Your Module, What Isn’t: In part 3, you will convert your syscall implementation into a kernel module. In order to minimize the amount of modifications you’ll need to make, you should ensure that your syscall implementation only uses symbols that are also available to a kernel module.
The following documentation from the Linux kernel source tree describes how to add a new system call: /Documentation/process/adding-syscalls.rst.
- Follow the “Generic System Call Implementation” section for arm64.
  - You should not add a CONFIG option or a fallback stub for your new system call. This should be done for real syscalls, but won’t work with our grading infrastructure.
- Follow the “x86 System Call Implementation” section for x86.
- Skip the other sections.

Task

In the next few parts, we will implement a new system call: inspect_table(). It retrieves the file descriptor table of a specified process. This syscall should work on both x86 and arm64 architectures.

At the time of writing, the Linux kernel has about 400 system calls, though new system calls are constantly being added to the Linux kernel. Let’s leave some room and use 500 as the syscall number for inspect_table().

The system call should have the following interface:

long inspect_table(pid_t pid, struct fd_info *entries, size_t max_entries);

where struct fd_info is defined as follows:

#define TABLETOP_MAX_PATH_LENGTH 256

struct fd_info {
    unsigned int fd;
    unsigned int flags;
    long long pos;
    char path[TABLETOP_MAX_PATH_LENGTH];
};

We will build up its functionality over the next few parts. For this part, inspect_table() should do the following:

Retrieve the task_struct corresponding to the specified pid.
- If pid < -1, the syscall should return -EINVAL.
- If pid == -1, retrieve the task_struct of the calling process.
- If pid >= 0, retrieve the task_struct of the process corresponding to the given pid. If there is no task corresponding to the provided valid PID, the syscall should return -ESRCH.
Ensure that the calling process has permission to inspect the target process.
- If the euid of the calling process isn’t root and it isn’t the uid of the target process, the syscall should return -EPERM.
On success, the syscall should return 0.
Ignore the entries and max_entries arguments in this part.

Your syscall will use this task_struct in subsequent parts of the assignment.

In order to make TABLETOP_MAX_PATH_LENGTH and struct fd_info available throughout the kernel, you should do the following:

Place the TABLETOP_MAX_PATH_LENGTH macro and the struct fd_info definitions in linux/include/uapi/linux/tabletop.h. Make sure to add include guards in a style similar to that of other header files in that directory.
Create linux/include/linux/tabletop.h, which just includes linux/include/uapi/linux/tabletop.h, along with correct include guards.

A few resources that you might find helpful:

Kernel documentation is located in the Documentation subdirectory of the kernel source.
This article shows how to find a task_struct corresponding to a given pid.
- Note that the code from the article works, but has insufficient error-checking. Fix it.
- Optional: the code also does not properly release acquired resources. You may fix the code by:
  1. Replacing pid_task() with get_pid_task()
  2. Figuring out how to decrement the reference counts for struct pid and struct task_struct.
To learn more about user credentials, check out man 7 credentials.
Take a look at code for other related system calls, such as geteuid() and getuid(). The following header files may also be helpful: include/linux/cred.h and include/linux/uidgid.h.
This article explains why you need both a uAPI header and an internal kernel header. It’s not explicitly necessary, but it provides some motivation behind this design choice.

Installing kernel headers

The syscall you’ve implemented has a struct pointer as a parameter. This means that the struct definition needs to be available in both kernel and user land. You’ll need to install the new uAPI header you’ve created (linux/include/uapi/linux/tabletop.h) from the kernel source tree to userspace.

After compiling and booting into your new kernel, run umask and make sure it outputs 0022. If not, run umask 0022.

Then, run the following:

# make headers_install INSTALL_HDR_PATH=/usr

This command will install the headers found under include/uapi/ in your Linux source tree into /usr/include/. Now you should be able to #include <linux/tabletop.h> from userspace! Try compiling the tester program (see below) to make sure this works.

Testing

We’ve provided a userspace test program under user/test/table-inspector. To test this part’s functionality, run it like so:

./table-inspector <pid> 0

Here is some sample output for this part:

$ ./table-inspector -1 0
inspect_table (0): Success
$ ps aux | grep '/usr/sbin/sshd'
root         490  0.0  0.0  13292  7704 ?        Ss   Feb05   0:00 sshd: /usr/sbin/sshd -D
$ ./table-inspector 490 0
inspect_table (-1): Operation not permitted
$ sudo ./table-inspector 490 0
inspect_table (0): Success
$ ./table-inspector -420 0
inspect_table (-1): Invalid argument
$ ./table-inspector 50000 0 # this pid isn't in use
inspect_table (-1): No such process

You may optionally submit your own test program under user/test/. To learn how to invoke a syscall, read:

man 2 syscall

Submission

Deliverables:

The inspect_table() system call should be implemented in linux/kernel/tabletop.c.
Any other modifications to kernel source code, including updating linux/kernel/Makefile so that linux/kernel/tabletop.o is linked into the kernel, and adding the appropriate header files.
Optional: Your own test program that invokes the syscall from userspace, under user/test/.

To submit this part, push the hw4p2handin tag with the following:

$ git tag -a -m "Completed hw4 part2." hw4p2handin
$ git push origin master
$ git push origin hw4p2handin

Part 3: Moving a system call into a kernel module

By now, you must be tired of rebooting your VM every time you make a small change in system call code. In this part, we will move the code for inspect_table() into a dynamically loadable kernel module. Our goal is to be able to make modifications to the system call code without having to reboot the machine.

Reading

Go over the kernel module reading from part 2 again.

Task

There are two ways to implement a system call using a module. You can try to modify the system call table from the module initialization code. Changes in recent kernels to the mechanics of setting up system calls make this method more cumbersome than it used to be, so we are not going to do this.

Another way is to leave the system call definition in the static kernel code, but have it call another function defined in a module. This is our approach.

Move your implementation of the inspect_table() syscall to the provided module skeleton code:

The system call will be activated when you run sudo insmod tabletop.ko. When it is activated, user programs should be able to call the syscall as usual.
The system call will be deactivated when you call sudo rmmod tabletop. When the inspect_table() syscall is not active (before you run insmod or after you run rmmod), it will return -ENOSYS, indicating that the function is not implemented.

Big hint: use function pointers.

Testing

Make sure you’re still able to run the test program you wrote in Part 2 before you load your module (deactivated), after you load your module (activated), and after you unload your module (deactivated again). You should check that errors are gracefully handled (i.e. the appropriate errnos are set and checked for).

Submission

Deliverables:

The inspect_table() system call should be stubbed out in linux/kernel/tabletop.c, as described above.
The functionality of the inspect_table() system call should be implemented in user/module/tabletop/tabletop.c.
Any other modifications to kernel source code.

To submit this part, push the hw4p3handin tag with the following:

$ git tag -a -m "Completed hw4 part3." hw4p3handin
$ git push origin master
$ git push origin hw4p3handin

Part 4: Appetizer - Listing Open File Descriptors

Now that we have a working syscall implemented in a kernel module, we will begin adding more functionality. Here, we will have the syscall print the target process’s open file descriptors to the kernel log buffer (in addition to the functionality specified in part 2).

Readings

Before getting started, you should understand the data structures related to the file descriptor table. This article implements a simple kernel module that prints the calling task’s open file descriptors with full paths. This code is almost sufficient for this part but it has a major bug. What will this module do if there is a hole in the file descriptor table? For example, a process may have opened file descriptor 3 and 4 and then closed 3.

Another problem with the code from the above article is that it completely ignores synchronization and resource management while accessing the data structures. This is fine for the sake of this assignment. This article attempts to handle synchronization and resource management properly. It also provides a more in-depth explanation with diagrams of the data structures. We recommend reading this article, or at least studying its diagrams.

Task

In this part, you will add the following functionality to inspect_table() in addition to the functionality implemented in part 2:

Retrieve the file descriptor table of the target process.
- Note that the target task’s struct files_struct pointer may be NULL. In this case, the syscall should return -ESRCH.
Print the following header line: Open fds for <pid>:, where <pid> is replaced with the target process’s pid.
Iterate through the task’s file descriptors starting at 0 and using the max_fds field from struct fdtable as an upper bound. Print each open file descriptor number to the kernel log buffer.

Testing

Here is some sample output from inserting the module, running table-inspector for two processes, and then removing the module. Your output format must match EXACTLY.

$ sudo dmesg -Hw
[Feb 7 22:37] Loading tabletop
[  +7.318509] Open fds for 36944:
[  +0.000010] 0
[  +0.000008] 1
[  +0.000008] 2
[Feb 7 22:38] Open fds for 36949:
[  +0.000020] 0
[  +0.000009] 1
[  +0.000009] 2
[  +0.000010] 4
[  +5.033228] Removing tabletop

Submission

Deliverables:

The inspect_table() system call should be stubbed out in linux/kernel/tabletop.c, as described above.
The functionality of the inspect_table() system call with this part’s additions should be implemented in user/module/tabletop/tabletop.c.
Optional: Your own test program that invokes the syscall from userspace, under user/test/.

To submit this part, push the hw4p4handin tag with the following:

$ git tag -a -m "Completed hw4 part4." hw4p4handin
$ git push origin master
$ git push origin hw4p4handin

Part 5: Let’s Eat! - Flags, Pos, and Path

Task

In this final part, we complete the syscall implementation by making use of the entries and max_entries parameters. In addition to the functionality described in parts 2 and 4, your syscall should now do the following:

Instead of simply printing file descriptors to the kernel log buffer, you should now collect all the file metadata specified in struct fd_info for a given open file descriptor.
- unsigned int fd is the file descriptor number.
- unsigned int flags contains the flags for the file descriptor (O_RDWR, O_APPEND, etc.)
- long long pos is the file descriptor’s current offset into the file.
- char path[TABLETOP_MAX_PATH_LENGTH] is the absolute path of the file that the file descriptor refers to.
Your loop should now terminate when you reach fdt->max_fds or max_entries, whichever is smaller. That is, max_entries corresponds to the maximal number of struct fd_info’s that will be copied back to userspace.
Copy struct fd_info for all open file descriptors, up to max_entries, into userspace memory pointed to by the entries argument.
- LKD Chapter 5 tells you what you need to do to copy values between kernel and user space. Be sure to perform proper error-checking.
- If you chose to optionally implement synchronization for this assignment, keep in mind that you cannot call kmalloc() or copy_to_user() while holding a spin lock.
On success, your syscall should return the number of structs copied to userspace.

Hints

Use the kernel function d_path() to retrieve the file path. Be sure to read its documentation very carefully and perform necessary error-checking. If d_path() fails, the syscall should return the error d_path() returned. Don’t worry about copying to userspace if this case occurs.
Try running cat /proc/<pid>/fdinfo/<fd>, which will show information similar to the output of our syscall. Read more about fdinfo in man 5 proc. Check out seq_show(), defined in fs/proc/fd.c, to see how it retrieves the file descriptor flags, including O_CLOEXEC, which is handled differently than other flags.
You should also run ls -alF /proc/<pid>/fd. The files in this directory are symlinks to the actual files referred to by a process’s file descriptors. You can see examples of the expected file path through these symlinks.

Testing

Here is a sample program and its table-inspector output:

#include <fcntl.h>
#include <unistd.h>

int main(void)
{
        int fd1, fd2, fd3;

        fd1 = open("/tmp/tabletop.tmp", O_WRONLY | O_CREAT | O_APPEND, 0644);
        write(fd1, "hello", 5);

        fd2 = open("/tmp/tabletop.tmp", O_RDONLY);
        fd3 = open("/tmp/tabletop.tmp", O_RDONLY | O_CLOEXEC);
        close(fd2);

        pause(); // table-inspector is run while the program is blocked on pause()

        close(fd1);
        close(fd3);
}

$ ./table-inspector 38302 10
inspect_table (5): Success

----------------------------
fd: 0
path: /dev/pts/1
pos: 0
flags: (02002) O_RDWR O_APPEND
----------------------------
fd: 1
path: /dev/pts/1
pos: 0
flags: (02002) O_RDWR O_APPEND
----------------------------
fd: 2
path: /dev/pts/1
pos: 0
flags: (02002) O_RDWR O_APPEND
----------------------------
fd: 3
path: /tmp/tabletop.tmp
pos: 5
flags: (0102001) O_WRONLY O_APPEND
----------------------------
fd: 5
path: /tmp/tabletop.tmp
pos: 0
flags: (02100000) O_RDONLY O_CLOEXEC
----------------------------

Submission

Deliverables:

The inspect_table() system call should be stubbed out in linux/kernel/tabletop.c, as described above.
The functionality of the inspect_table() system call with this part’s additions should be implemented in user/module/tabletop/tabletop.c.
Optional: Your own test program that invokes the syscall from userspace, under user/test/.

To submit this part, push the hw4p5handin tag with the following:

$ git tag -a -m "Completed hw4 part5." hw4p5handin
$ git push origin master
$ git push origin hw4p5handin

Good luck!

Acknowledgements

The Tabletop assignment was designed and implemented by the following TAs of COMS W4118 Operating Systems I, Spring 2022, Columbia University:

Kent Hall
Eilam Lehrman
Xijiao Li
Hans Montero
Tal Zussman

The Tabletop assignment was updated for 64-bit Linux version 5.10.158 by the following TAs of COMS W4118 Operating Systems I, Spring 2023, Columbia University:

Tal Zussman

Last updated: 2023-02-20