As with previous assignments, we will be using GitHub to distribute skeleton code and collect submissions. Please refer to our Git Workflow guide for a more details. Note that we will be using multiple tags for this assignment: one for each deliverable part.
For students on ARM Mac computers (e.g. with M1 chip): if you want your
submission to be built/tested for ARM, you must create and submit a file called
.armpls
in the top-level directory of your repo; feel free to use the
following one-liner:
cd "$(git rev-parse --show-toplevel)" && touch .armpls && git add .armpls && git commit -m "ARM pls"
You should do this first so that this file is present in all parts.
There is a script in the skeleton code named run_checkpatch.sh
. It is a
wrapper over linux/scripts/checkpatch.pl
, which is a Perl script that comes
with the linux kernel that checks if your code conforms to the kernel coding
style.
Execute run_checkpatch.sh
to see if your code conforms to the kernel style –
it’ll let you know what changes you should make. We recommend you make those
changes.
Passing run_checkpatch.sh
with no warnings and no errors is NOT required for
this assignment, but will be for the next one. We recommend you get familiar
with this workflow now: run the script and fix what it suggests before pushing a
tag.
Kernel Compilation in Debian Linux
First, follow the above guide and compile yourself a pristine (unmodified)
kernel from the 5.10.57 Linux source provided in the skeleton repo. You should
name it 5.10.57-cs4118
, and keep it around as your fallback kernel for all
future assignments (including this one), in case you run into any trouble
booting into the kernel you’re working on.
Additionally, make sure that the CONFIG_BLK_DEV_LOOP
option is set to y
in
your .config
file before you build and install your pristine kernel. This will
come in handy in later assignments.
In this part, you will reduce your kernel build time drastically.
localmodconfig
A large amount of time is spent compiling and installing kernel modules you
never use. You can regenerate .config
so that it contains only those modules
you are currently using. This will drastically cut down the number of modules.
This is how:
First, backup your .config
to something like .config.<UNI>-from-lts
.
5.10.57-cs4118
.Run make localmodconfig
in your Linux kernel source tree.
This will take your current .config
and turn off all modules that you
are not using.
It will ask you a few questions. You can hit ENTER to accept the defaults,
or just have yes
do so for you:
$ yes '' | make localmodconfig
Make sure that CONFIG_BLK_DEV_LOOP
is still set to y
before building
and installing this kernel.
Now you have a much smaller .config
. You can follow the rest of the steps
starting from make
.
When you are hacking kernel code, you’ll often make simple changes to only a
handful of .c
files. If you didn’t touch any header files, the modules will
not be rebuilt when you run make
; thus there is no reason to reinstall all
modules every time you rebuild your kernel.
You’re about to make changes to the pristine kernel source. This means that what
you build from it might not even boot. In order to make sure that you always
have the -cs4118
pristine kernel as a fallback to boot into, you should avoid
overwriting it by setting the local version in your .config
file to something
else, like your UNI (for example, -abc1234
). Make sure to verify your changes:
$ scripts/diffconfig .config.old .config
LOCALVERSION "-cs4118" -> "-abc1234"
You should use your UNI as the local version for all modified kernels you build for this course from now on.
You are now ready to add system calls to your kernel.
LKD Chapter 5: you must read this to understand how adding a system call works in general. Unfortunately, some of the steps described in the LKD book have changed since the book was written.
What’s Available to Your Module, What Isn’t: In part 4, you will convert your syscall implementation into a kernel module. In order to minimize the amount of modifications you’ll need to make, you should ensure that your syscall implementation only uses symbols that will also be available in a kernel module.
The following documentation from the Linux kernel source tree describes how to
add a new system call: /Documentation/process/adding-syscalls.rst
.
CONFIG
option and a fallback stub for your new
system call.In the next few parts, we will implement a new system call: inspect_table()
.
It retrieves the file descriptor table of a specified process. This syscall
should work on both x86 and arm64 architectures.
At this time of writing, the Linux kernel has about 400 system calls, though new
system calls are constantly being added to the Linux kernel. Let’s leave some
room and use 500
as the syscall number for inspect_table()
.
The system call should have the following interface:
long inspect_table(pid_t pid, struct fd_info *entries, int max_entries);
where struct fd_info
is defined as follows:
#define TABLETOP_MAX_PATH_LENGTH 256
struct fd_info {
int fd;
unsigned int flags;
long long pos;
char path[TABLETOP_MAX_PATH_LENGTH];
};
We will build up its functionality over the next few parts. For this part,
inspect_table()
should do the following:
task_struct
corresponding to the specified pid
.
pid < -1
, the syscall should return -1
and set errno
to EINVAL
.pid == -1
, retrieve the task_struct
of the calling process.pid >= 0
, retrieve the task_struct
of the process corresponding to
the given pid. If there is no task corresponding to the provided valid pid,
the syscall should return -1
and set errno
to ESRCH
.euid
of the calling process isn’t root and it isn’t the uid
of
the target process, the syscall should return -1
and set errno
to
EPERM
.entries
and max_entries
arguments in this part.Your syscall will use this task_struct
in subsequent parts of the assignment.
A few things that you might find helpful:
Kernel documentation is located in the Documentation
subdirectory of the
kernel source.
task_struct
corresponding to
a given pid
.
pid_task()
with get_pid_task()
struct pid
and
struct task_struct
.To learn more about user credentials, check out man 7 credentials
.
geteuid()
and
getuid()
. The following header files may also be helpful:
include/linux/cred.h
and include/linux/uidgid.h
.We’ve provided a userspace test program under user/test/table-inspector
. To
test this part’s functionality, run it like this:
./table-inspector <pid> 0
Here is some sample output for this part:
$ ./table-inspector -1 0
inspect_table (0): Success
$ ps aux | grep '/usr/sbin/sshd'
root 490 0.0 0.0 13292 7704 ? Ss Feb05 0:00 sshd: /usr/sbin/sshd -D
$ ./table-inspector 490 0
inspect_table (-1): Operation not permitted
$ sudo ./table-inspector 490 0
inspect_table (0): Success
$ ./table-inspector -420 0
inspect_table (-1): Invalid argument
$ ./table-inspector 50000 0 # this pid isn't in use
inspect_table (-1): No such process
You may optionally submit your own test program under user/test/
. To learn how
to invoke a syscall, read:
man 2 syscall
Deliverables:
The inspect_table()
system call should be implemented in
linux/kernel/tabletop.c
. You should place the definition of struct fd_info
at the top of this file.
Any other modifications to kernel source code, including updating
linux/kernel/Makefile
so that linux/kernel/tabletop.o
is linked into the
kernel.
Optional: Your own test program that invokes the syscall from userspace, under
user/test/
.
To submit this part, push the hw4p3handin
tag with the following:
$ git tag -a -m "Completed hw4 part3." hw4p3handin
$ git push origin master
$ git push origin hw4p3handin
By now, you must be tired of rebooting your VM every time you make a small
change in system call code. In this part, we will move the code for
inspect_table()
into a dynamically loadable kernel module. Our goal is to be
able to make modifications to the system call code without having to reboot the
machine.
There are two ways to implement a system call using a module. You can try to modify the system call table from the module initialization code. Changes in recent kernels on the mechanics of setting up system calls make this method more cumbersome than it used to be, so we are not going to do this.
Another way is to leave the system call definition in the static kernel code, but have it call another function defined in a module. This is our approach.
Move your implementation of the inspect_table()
syscall to the provided module
skeleton code:
The system call will be activated when you call sudo insmod
inspect_table.ko
. When it is activated, user programs should be able to make
the syscall as usual.
The system call will be deactivated when you call sudo rmmod inspect_table
.
When the inspect_table()
syscall is not activated, (before you call insmod
or after you call rmmod
), it will return -1 and set errno
to ENOSYS
,
indicating that the function is not implemented.
Big hint: use function pointers.
Make sure you’re still able to run your test program you wrote in Part 3 before
you load your module (deactivated), after you load your module (activated), and
after you unload your module (deactivated again). You should check that errors
are gracefully handled (i.e. the appropriate errno
s are set and checked for).
Deliverables:
The inspect_table()
system call should be stubbed out in
linux/kernel/tabletop.c
, as described above.
The functionality of the inspect_table()
system call should be implemented
in user/module/tabletop/inspect_table.c
.
Any other modifications to kernel source code.
To submit this part, push the hw4p4handin
tag with the following:
$ git tag -a -m "Completed hw4 part4." hw4p4handin
$ git push origin master
$ git push origin hw4p4handin
Now that we have a working syscall implemented in a kernel module, we will begin adding more functionality. Here, we will have the syscall print the target process’s open file descriptors to the kernel log buffer (in addition to the functionality specified in part 3).
Before getting started, we should understand the data structures related to the file descriptor table. This article implements a simple kernel module that prints the calling task’s open file descriptors with full paths. This code is almost sufficient for this part but it has a major bug. What will this module do if there is a hole in the file descriptor table? For example, a process may have opened file descriptor 3 and 4 and then closed 3.
Another problem with the code from the above article is that it completely ignores synchronization and resource management while accessing the data structures. This is fine for the sake of this assignment. This article attempts to handle synchronization and resource management properly. It also provides a more in-depth explanation with diagrams of the data structures. We recommend reading this article, if not, at least studying its diagrams.
In this part, you will add the following functionality to inspect_table()
in
addition to the functionality implemented in part 3:
struct files_struct
pointer may be NULL
,
specifically in the case where there are no open file descriptors. In this
case, the syscall should simply return 0
.Open fds for <pid>:
, where <pid>
is
replaced with the target process’s pid.max_fds
field from struct fdtable
as an upper bound. Print each open file
descriptor number to the kernel log buffer.Here is some sample output from inserting the module, running table-inspector
for two processes, and then removing module. Your output format must match
EXACTLY.
$ sudo dmesg -Hw
[Feb 7 22:37] Loading tabletop
[ +7.318509] Open fds for 36944:
[ +0.000010] 0
[ +0.000008] 1
[ +0.000008] 2
[Feb 7 22:38] Open fds for 36949:
[ +0.000020] 0
[ +0.000009] 1
[ +0.000009] 2
[ +0.000010] 4
[ +5.033228] Removing tabletop
Deliverables:
The inspect_table()
system call should be stubbed out in
linux/kernel/tabletop.c
, as described above.
The functionality of the inspect_table()
system call with this part’s
additions should be implemented in user/module/tabletop/inspect_table.c
.
Optional: Your own test program that invokes the syscall from userspace, under
user/test/
.
To submit this part, push the hw4p5handin
tag with the following:
$ git tag -a -m "Completed hw4 part5." hw4p5handin
$ git push origin master
$ git push origin hw4p5handin
In this final part, we complete the syscall implementation by making use of the
entries
and max_entries
parameters. In addition to the functionality
described in parts 3 and 5, your syscall should now do the following:
Instead of simply printing file descriptors to the kernel log buffer, you
should now collect all the file metadata specified in struct fd_info
for a
given open file descriptor.
int fd
is the file descriptor number.unsigned int flags
contains the flags for the file descriptor (O_RDWR
,
O_APPEND
, etc.)long long pos
is the file descriptor’s current offset into the file.char path[TABLETOP_MAX_PATH_LENGTH]
is the absolute path of the file that
the file descriptor refers to.fdt->max_fds
or max_entries
,
whichever is smaller. That is, max_entries
corresponds to the maximal number
of struct fd_info
’s that will be copied back to userspace.struct fd_info
for all open file descriptors, up to max_entries
, into
userspace memory pointed to by the entries
argument.
kmalloc()
or copy_to_user()
while
holding a spin lock.d_path()
to retrieve the file path. Be sure to read
its documentation very carefully and perform necessary error-checking. If
d_path()
fails, the syscall should return -1
and set errno
to the error
d_path()
returned. Don’t worry about copying to userspace if this case
occurs.cat /proc/<pid>/fdinfo
, which will show information similar to
the output of our syscall. Read more about fdinfo
in man 5 proc
. Check out
seq_show()
, defined in fs/proc/fd.c
, to see how it retrieves the file
descriptor flags, including O_CLOEXEC
, which is handled differently than
other flags.Here is a sample program and its table-inspector
output:
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main()
{
int fd1 = open("/tmp/tabletop.tmp", O_WRONLY | O_CREAT | O_APPEND, 0644);
write(fd1, "hello", 5);
int fd2 = open("/tmp/tabletop.tmp", O_RDONLY);
int fd3 = open("/tmp/tabletop.tmp", O_RDONLY | O_CLOEXEC);
close(fd2);
pause(); // table-inspector is run while the program is blocked on pause()
close(fd1);
close(fd3);
}
$ ./table-inspector 38302 10
inspect_table (5): Success
----------------------------
fd: 0
path: /dev/pts/1
pos: 0
flags: (02002) O_RDWR O_APPEND
----------------------------
fd: 1
path: /dev/pts/1
pos: 0
flags: (02002) O_RDWR O_APPEND
----------------------------
fd: 2
path: /dev/pts/1
pos: 0
flags: (02002) O_RDWR O_APPEND
----------------------------
fd: 3
path: /tmp/tabletop.tmp
pos: 5
flags: (0102001) O_WRONLY O_APPEND
----------------------------
fd: 5
path: /tmp/tabletop.tmp
pos: 0
flags: (02100000) O_RDONLY O_CLOEXEC
----------------------------
Deliverables:
The inspect_table()
system call should be stubbed out in
linux/kernel/tabletop.c
, as described above.
The functionality of the inspect_table()
system call with this part’s
additions should be implemented in user/module/tabletop/inspect_table.c
.
Optional: Your own test program that invokes the syscall from userspace, under
user/test/
.
To submit this part, push the hw4p6handin
tag with the following:
$ git tag -a -m "Completed hw4 part6." hw4p6handin
$ git push origin master
$ git push origin hw4p6handin
Good luck!
The Tabletop assignment was designed and implemented by the following TAs of COMS W4118 Operating Systems I, Spring 2022, Columbia University:
Last updated: 2022-02-22