As with previous assignments, we will be using GitHub to distribute skeleton code and collect submissions. Please refer to our Git Workflow guide for a more details. Note that we will be using multiple tags for this assignment: one for each deliverable part.
For students on arm64 computers (e.g. M1/M2 machines): if you want your
submission to be built/tested for ARM, you must create and submit a file called
.armpls
in the top-level directory of your repo; feel free to use the
following one-liner:
cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add -f .armpls && git commit .armpls -m "ARM pls"
You should do this first so that this file is present in all parts.
There is a script in the skeleton code named run_checkpatch.sh
. It is a
wrapper over linux/scripts/checkpatch.pl
, which is a Perl script that comes
with the Linux kernel that checks if your code conforms to the kernel coding
style.
Execute run_checkpatch.sh
to see if your code conforms to the kernel style –
it’ll let you know what changes you should make. We recommend you make those
changes.
Passing run_checkpatch.sh
with no warnings and no errors is NOT required for
this assignment, but will be for the next one. We recommend you get familiar
with this workflow now: run the script and fix what it suggests before pushing a
tag.
Kernel Compilation in Debian Linux
Follow the above guide and compile yourself a pristine (unmodified) kernel from
the 5.10.158 Linux source provided in the skeleton repo. You should name it
5.10.158-cs4118
, and keep it around as your fallback kernel for all future
assignments (including this one), in case you run into any trouble booting into
the kernel you’re working on.
You’re about to make changes to the pristine kernel source. This means that what
you build might not even boot. In order to ensure that you always have the
-cs4118
pristine kernel as a fallback to boot into, you should avoid
overwriting it by setting the local version in your .config
file to something
else, like your UNI (for example, -abc1234
). Make sure to verify your changes:
$ scripts/diffconfig .config.old .config
LOCALVERSION "-cs4118" -> "-abc1234"
From now on, you should use your UNI as the local version for all modified kernels you build for this course.
You are now ready to add system calls to your kernel.
LKD Chapter 5: You must read this to understand how adding a system call works in general. Unfortunately, some of the steps described in LKD have changed since the book was written.
What’s Available to Your Module, What Isn’t: In part 3, you will convert your syscall implementation into a kernel module. In order to minimize the amount of modifications you’ll need to make, you should ensure that your syscall implementation only uses symbols that are also available to a kernel module.
The following documentation from the Linux kernel source tree describes how to
add a new system call: /Documentation/process/adding-syscalls.rst
.
CONFIG
option or a fallback stub for your new
system call. This should be done for real syscalls, but won’t work with
our grading infrastructure.In the next few parts, we will implement a new system call: inspect_table()
.
It retrieves the file descriptor table of a specified process. This syscall
should work on both x86 and arm64 architectures.
At the time of writing, the Linux kernel has about 400 system calls, though new
system calls are constantly being added to the Linux kernel. Let’s leave some
room and use 500
as the syscall number for inspect_table()
.
The system call should have the following interface:
long inspect_table(pid_t pid, struct fd_info *entries, size_t max_entries);
where struct fd_info
is defined as follows:
#define TABLETOP_MAX_PATH_LENGTH 256
struct fd_info {
unsigned int fd;
unsigned int flags;
long long pos;
char path[TABLETOP_MAX_PATH_LENGTH];
};
We will build up its functionality over the next few parts. For this part,
inspect_table()
should do the following:
task_struct
corresponding to the specified pid
.
pid < -1
, the syscall should return -EINVAL
.pid == -1
, retrieve the task_struct
of the calling process.pid >= 0
, retrieve the task_struct
of the process corresponding to
the given pid. If there is no task corresponding to the provided valid PID,
the syscall should return -ESRCH
.euid
of the calling process isn’t root and it isn’t the uid
of
the target process, the syscall should return -EPERM
.entries
and max_entries
arguments in this part.Your syscall will use this task_struct
in subsequent parts of the assignment.
In order to make TABLETOP_MAX_PATH_LENGTH
and struct fd_info
available
throughout the kernel, you should do the following:
TABLETOP_MAX_PATH_LENGTH
macro and the struct fd_info
definitions in linux/include/uapi/linux/tabletop.h
. Make sure to add include
guards in a style similar to that of other header files in that directory.linux/include/linux/tabletop.h
, which just includes
linux/include/uapi/linux/tabletop.h
, along with correct include guards.A few resources that you might find helpful:
Kernel documentation is located in the Documentation
subdirectory of the
kernel source.
task_struct
corresponding to
a given pid
.
pid_task()
with get_pid_task()
struct pid
and
struct task_struct
.To learn more about user credentials, check out man 7 credentials
.
Take a look at code for other related system calls, such as geteuid()
and
getuid()
. The following header files may also be helpful:
include/linux/cred.h
and include/linux/uidgid.h
.
The syscall you’ve implemented has a struct pointer as a parameter. This means
that the struct definition needs to be available in both kernel and user land.
You’ll need to install the new uAPI header you’ve created
(linux/include/uapi/linux/tabletop.h
) from the kernel source tree to
userspace.
After compiling and booting into your new kernel, run umask
and make sure it
outputs 0022
. If not, run umask 0022
.
Then, run the following:
# make headers_install INSTALL_HDR_PATH=/usr
This command will install the headers found under include/uapi/
in your Linux
source tree into /usr/include/
. Now you should be able to
#include <linux/tabletop.h>
from userspace! Try compiling the tester program
(see below) to make sure this works.
We’ve provided a userspace test program under user/test/table-inspector
. To
test this part’s functionality, run it like so:
./table-inspector <pid> 0
Here is some sample output for this part:
$ ./table-inspector -1 0
inspect_table (0): Success
$ ps aux | grep '/usr/sbin/sshd'
root 490 0.0 0.0 13292 7704 ? Ss Feb05 0:00 sshd: /usr/sbin/sshd -D
$ ./table-inspector 490 0
inspect_table (-1): Operation not permitted
$ sudo ./table-inspector 490 0
inspect_table (0): Success
$ ./table-inspector -420 0
inspect_table (-1): Invalid argument
$ ./table-inspector 50000 0 # this pid isn't in use
inspect_table (-1): No such process
You may optionally submit your own test program under user/test/
. To learn how
to invoke a syscall, read:
man 2 syscall
Deliverables:
The inspect_table()
system call should be implemented in
linux/kernel/tabletop.c
.
Any other modifications to kernel source code, including updating
linux/kernel/Makefile
so that linux/kernel/tabletop.o
is linked into the
kernel, and adding the appropriate header files.
Optional: Your own test program that invokes the syscall from userspace, under
user/test/
.
To submit this part, push the hw4p2handin
tag with the following:
$ git tag -a -m "Completed hw4 part2." hw4p2handin
$ git push origin master
$ git push origin hw4p2handin
By now, you must be tired of rebooting your VM every time you make a small
change in system call code. In this part, we will move the code for
inspect_table()
into a dynamically loadable kernel module. Our goal is to be
able to make modifications to the system call code without having to reboot the
machine.
There are two ways to implement a system call using a module. You can try to modify the system call table from the module initialization code. Changes in recent kernels to the mechanics of setting up system calls make this method more cumbersome than it used to be, so we are not going to do this.
Another way is to leave the system call definition in the static kernel code, but have it call another function defined in a module. This is our approach.
Move your implementation of the inspect_table()
syscall to the provided module
skeleton code:
The system call will be activated when you run sudo insmod tabletop.ko
.
When it is activated, user programs should be able to call the syscall as
usual.
The system call will be deactivated when you call sudo rmmod tabletop
.
When the inspect_table()
syscall is not active (before you run insmod
or
after you run rmmod
), it will return -ENOSYS
, indicating that the function
is not implemented.
Big hint: use function pointers.
Make sure you’re still able to run the test program you wrote in Part 2 before
you load your module (deactivated), after you load your module (activated), and
after you unload your module (deactivated again). You should check that errors
are gracefully handled (i.e. the appropriate errno
s are set and checked for).
Deliverables:
The inspect_table()
system call should be stubbed out in
linux/kernel/tabletop.c
, as described above.
The functionality of the inspect_table()
system call should be implemented
in user/module/tabletop/tabletop.c
.
Any other modifications to kernel source code.
To submit this part, push the hw4p3handin
tag with the following:
$ git tag -a -m "Completed hw4 part3." hw4p3handin
$ git push origin master
$ git push origin hw4p3handin
Now that we have a working syscall implemented in a kernel module, we will begin adding more functionality. Here, we will have the syscall print the target process’s open file descriptors to the kernel log buffer (in addition to the functionality specified in part 2).
Before getting started, you should understand the data structures related to the file descriptor table. This article implements a simple kernel module that prints the calling task’s open file descriptors with full paths. This code is almost sufficient for this part but it has a major bug. What will this module do if there is a hole in the file descriptor table? For example, a process may have opened file descriptor 3 and 4 and then closed 3.
Another problem with the code from the above article is that it completely ignores synchronization and resource management while accessing the data structures. This is fine for the sake of this assignment. This article attempts to handle synchronization and resource management properly. It also provides a more in-depth explanation with diagrams of the data structures. We recommend reading this article, or at least studying its diagrams.
In this part, you will add the following functionality to inspect_table()
in
addition to the functionality implemented in part 2:
struct files_struct
pointer may be NULL
. In
this case, the syscall should return -ESRCH
.Open fds for <pid>:
, where <pid>
is
replaced with the target process’s pid.max_fds
field from struct fdtable
as an upper bound. Print each open file
descriptor number to the kernel log buffer.Here is some sample output from inserting the module, running table-inspector
for two processes, and then removing the module. Your output format must match
EXACTLY.
$ sudo dmesg -Hw
[Feb 7 22:37] Loading tabletop
[ +7.318509] Open fds for 36944:
[ +0.000010] 0
[ +0.000008] 1
[ +0.000008] 2
[Feb 7 22:38] Open fds for 36949:
[ +0.000020] 0
[ +0.000009] 1
[ +0.000009] 2
[ +0.000010] 4
[ +5.033228] Removing tabletop
Deliverables:
The inspect_table()
system call should be stubbed out in
linux/kernel/tabletop.c
, as described above.
The functionality of the inspect_table()
system call with this part’s
additions should be implemented in user/module/tabletop/tabletop.c
.
Optional: Your own test program that invokes the syscall from userspace, under
user/test/
.
To submit this part, push the hw4p4handin
tag with the following:
$ git tag -a -m "Completed hw4 part4." hw4p4handin
$ git push origin master
$ git push origin hw4p4handin
In this final part, we complete the syscall implementation by making use of the
entries
and max_entries
parameters. In addition to the functionality
described in parts 2 and 4, your syscall should now do the following:
Instead of simply printing file descriptors to the kernel log buffer, you
should now collect all the file metadata specified in struct fd_info
for a
given open file descriptor.
unsigned int fd
is the file descriptor number.unsigned int flags
contains the flags for the file descriptor (O_RDWR
,
O_APPEND
, etc.)long long pos
is the file descriptor’s current offset into the file.char path[TABLETOP_MAX_PATH_LENGTH]
is the absolute path of the file that
the file descriptor refers to.fdt->max_fds
or max_entries
,
whichever is smaller. That is, max_entries
corresponds to the maximal number
of struct fd_info
’s that will be copied back to userspace.struct fd_info
for all open file descriptors, up to max_entries
, into
userspace memory pointed to by the entries
argument.
kmalloc()
or copy_to_user()
while
holding a spin lock.d_path()
to retrieve the file path. Be sure to read
its documentation very carefully and perform necessary error-checking. If
d_path()
fails, the syscall should return the error d_path()
returned.
Don’t worry about copying to userspace if this case occurs.cat /proc/<pid>/fdinfo/<fd>
, which will show information similar
to the output of our syscall. Read more about fdinfo
in man 5 proc
. Check
out seq_show()
, defined in fs/proc/fd.c
, to see how it retrieves the file
descriptor flags, including O_CLOEXEC
, which is handled differently than
other flags.ls -alF /proc/<pid>/fd
. The files in this directory are
symlinks to the actual files referred to by a process’s file descriptors. You
can see examples of the expected file path through these symlinks.Here is a sample program and its table-inspector
output:
#include <fcntl.h>
#include <unistd.h>
int main(void)
{
int fd1, fd2, fd3;
fd1 = open("/tmp/tabletop.tmp", O_WRONLY | O_CREAT | O_APPEND, 0644);
write(fd1, "hello", 5);
fd2 = open("/tmp/tabletop.tmp", O_RDONLY);
fd3 = open("/tmp/tabletop.tmp", O_RDONLY | O_CLOEXEC);
close(fd2);
pause(); // table-inspector is run while the program is blocked on pause()
close(fd1);
close(fd3);
}
$ ./table-inspector 38302 10
inspect_table (5): Success
----------------------------
fd: 0
path: /dev/pts/1
pos: 0
flags: (02002) O_RDWR O_APPEND
----------------------------
fd: 1
path: /dev/pts/1
pos: 0
flags: (02002) O_RDWR O_APPEND
----------------------------
fd: 2
path: /dev/pts/1
pos: 0
flags: (02002) O_RDWR O_APPEND
----------------------------
fd: 3
path: /tmp/tabletop.tmp
pos: 5
flags: (0102001) O_WRONLY O_APPEND
----------------------------
fd: 5
path: /tmp/tabletop.tmp
pos: 0
flags: (02100000) O_RDONLY O_CLOEXEC
----------------------------
Deliverables:
The inspect_table()
system call should be stubbed out in
linux/kernel/tabletop.c
, as described above.
The functionality of the inspect_table()
system call with this part’s
additions should be implemented in user/module/tabletop/tabletop.c
.
Optional: Your own test program that invokes the syscall from userspace, under
user/test/
.
To submit this part, push the hw4p5handin
tag with the following:
$ git tag -a -m "Completed hw4 part5." hw4p5handin
$ git push origin master
$ git push origin hw4p5handin
Good luck!
The Tabletop assignment was designed and implemented by the following TAs of COMS W4118 Operating Systems I, Spring 2022, Columbia University:
The Tabletop assignment was updated for 64-bit Linux version 5.10.158 by the following TAs of COMS W4118 Operating Systems I, Spring 2023, Columbia University:
Last updated: 2023-02-20