Operating systems implement a lot of functionality that must be executed in kernel mode: device drivers, system services, file systems, etc. However, those subsystems are not always used, so for most people, including them in the kernel image would only add on bloat to the OS distribution. Instead, they are left out of the base kernel image, and each user may opt to link in the program code only they need.
To avoid having to recompile the kernel each time to link in a new piece of code, Linux exposes a kernel module interface. This feature allow users to dynamically link object files into a running kernel, as well as remove loaded modules. However, to allow the kernel to make sense of the otherwise arbitrary code, kernel modules must strictly abide by Linux’s module protocol, which defines entry points, exit points, and various pieces of metadata for module bookkeeping.
In this guide, we introduce how to create a simple kernel module, as well as ways to interact with it. It should be noted that the module code will be running in the kernel mode, meaning it has privileges that allow it to corrupt a running Linux kernel. It your code crashes, it may bring down the entire operating system! Since you are working within a virtual machine, you can recover from a module failure by simply rebooting your system. But do carefully consider the code you are about to load, and make sure to save any work you may have open in your VM.
A basic kernel module might look something like this:
#include <linux/module.h>
#include <linux/printk.h>
/* This function is called when the module is loaded. */
int hello(void)
{
printk(KERN_INFO "Loading module... Hello World!\n");
return 0;
}
/* This function is called when the module is removed. */
void goodbye(void)
{
printk(KERN_INFO "Removing module... Goodbye World!\n");
}
/* Macros for registering module entry and exit points */
module_init(hello);
module_exit(goodbye);
/* Macros for declaring module metadata */
MODULE_DESCRIPTION("A basic Hello World module");
MODULE_AUTHOR("cs4118");
MODULE_LICENSE("GPL");
This module does nothing but print a “Hello World!” message when it is loaded,
and a “Goodbye World!” message when it is removed, by respectively calling the
hello()
and goodbye()
functions.
The module knows to call these functions because we declared them as the entry
and exit points using the module_init
and module_exit
macros. These
functions may be named anything as long as their names are given to these
macros. Neither the entry point nor the exit point takes any arguments. The
entry point must return an error code, with 0
representing success; the exit
point does not return anything.
We declare some module metadata using the MODULE_DESCRIPTION
, MODULE_AUTHOR
,
and MODULE_LICENSE
macros. These aren’t strictly necessary, but just like a
README
, it is good practice to include them. The MODULE_DESCRIPTION
can be a
short synopsis of what your module is trying to accomplish. For the purposes of
your assignments, MODULE_AUTHOR
should always be your UNI for individual
assignments, or the team number and UNIs of each team member for team
assignments. You may leave MODULE_LICENSE
set to GPL to keep Richard Stallman
happy.
Note that we #include <linux/module.h>
at the top of this small program. This
defines the module macros that you used. In a more complex module, you may need
to include more kernel headers files.
Now that we have written our module, we must still compile it before it can be
loaded it into a running Linux system. We shall name our example module hello
,
so we save its source code in a file named hello.c
.
Linux kernel modules are built using the GNU Make build system. More precisely,
they are built using commands defined in Makefile
s provided by the GNU + Linux
operating system. We may hook into these Makefile
s from a Makefile
of our
own:
obj-m += hello.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
This Makefile
should be located in the same directory as hello.c
, your build
directory. That is:
./
|_ hello.c
|_ Makefile
To build the module, you may need to install the kernel headers if you have not done so already:
$ sudo apt-get install -y linux-headers-$(uname -r)
It isn’t too important to understand the details of exactly how this Makefile
works, as long as it does. If you would like to, you can read more about how
this Makefile
works here.
In your build directory, you may use make all
(or just make
) to build your
module, and use make clean
to clean your build directory of any build
artifacts. You should be able to build without root privileges:
$ make
Doing so will produce several files – the kernel module object that we are
interested in will be named hello.ko
.
Note that the obj-m
variable is used to tell the Linux module build system
which module to build, so if you are building a module named foo
(with its
source code in foo.c
, you will need to change the first line of your
Makefile
to:
obj-m += foo.o
To insert your kernel module, hello.ko
, we run the following command, with
root privileges:
# insmod hello.ko
At this point, the function we passed to module_init
(hello
in the above
example) will be executed.
We can check that our module is present by running the following command:
$ lsmod
This will list all modules running in your system. This may include other
running services provided by kernel modules, depending on your system setup.
At this point, our hello
module should be present.
To remove our running module, hello
, we run the following command, with root
privileges:
# rmmod hello
Note that we do not need to include the .ko
file extension here. Now, running
lsmod
should no longer show the hello
module.
In our above example, the hello()
entry point invokes printk()
. This is the
Linux kernel-equivalent of printf()
– it is called with the format string as
the first argument, followed by a variable number of arguments used by the
format string. printk()
supports the same formatting directives as printf()
,
e.g. %d
, %s
, %.5f
, etc. It is defined in linux/printk.h
.
printk()
cannot output to stdout
, since the module is not a user process.
Instead, its output goes to the kernel log buffer. You can read this log
buffer using the following command:
# dmesg
You will likely find a number of messages from other system services as well –
these all share the same log buffer – starting from the earliest message to the
latest. If you just loaded the above example module, you should find the message
Loading module... Hello World!
at the bottom; if you also removed it, you
should also find Removing module... Goodbye World!
there too.
You may also find it helpful to keep dmesg
open, and see kernel output as it
is being produced by printk()
(similar to the behavior of tail -f
). You may
do this by running dmesg
with the -w
flag.
Since the kernel log buffer is shared amongst many services, it is often full of verbose, noisy messages. This can get rather unwieldy, so to clear the log buffer, we can use the following command:
# dmesg -c
To help you sift through the verbosity, printk()
also supports logging
priorities. These may be specified by passing in macros such as KERN_INFO
to
printk()
; these macros are defined in <linux/printk.h>
. You may filter the
output to dmesg
by using the -l
flag. Note that the macros pr_info()
,
pr_warn()
, etc., which define logging priorities, are now preferred over
using printk()
directly.
For more detailed usage of dmesg
, please check its man
pages.
Be mindful of memory usage while developing kernel modules – your code shouldn’t leak memory.
We highly recommend using KEDR. This framework works especially well for us since we will often be developing only with kernel modules. Check out the linked wiki to see how to use it. There’s also a lot of StackOverflow posts on it, like this one.
Here’s how you can set up KEDR on your VM:
Make sure you can successfully build kernel modules as explained above.
Install CMake: sudo apt-get install -y cmake
Run this one-liner to clone down the KEDR source, build it, and install it:
cd ~ && rm -rf kedr && git clone https://github.com/cs4118/kedr && cd kedr/sources && mkdir build && cd build && cmake .. && make -j$(nproc) && sudo make install
Note that we’ll be using our own fork of KEDR instead of the official repo because KEDR’s maintainer hasn’t had time to add support for Linux 5.x+ and the arm64 architecture. Our fork of KEDR (linked in the command above) has been tested to work on both x86 and arm64 Debian 11+. Do not use the official KEDR repo, as it does not build on all platforms.
Note that if you’re on arm64, you’ll need to restart your VM to complete the installation.
Previously, we provided you with a kernel module makefile that looks like the following:
obj-m += hello.o
all:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
This Makefile actually hooks into another, much more sophisticated Makefile: the top-level Linux kernel Makefile.
Let’s break down lines 3 and 5 first, where we’re (recursively) calling make
.
The -C
option changes the working directory to
/lib/modules/$(shell uname -r)/build
before make
looks for a Makefile to
read. The shell
keyword in $(shell uname -r)
indicates that what follows
should be treated as a shell command rather than a Makefile variable. Try
running the command in your VM, it will print the kernel release version like
so:
$ uname -r
5.10.0-20-arm64
Plugging the kernel release version back into the original path, the resulting directory consists of a reduced kernel source tree. This directory acts as the workbench for building modules: it contains all the information a module needs for compilation and for hooking into your VM’s kernel. If you look at the Makefile under this directory, you can see it points to the actual kernel Makefile:
$ cd /lib/modules/5.10.0-20-arm64/build
$ ls
arch include Makefile Module.symvers scripts tools
$ cat Makefile
include /usr/src/linux-headers-5.10.0-20-common/Makefile
Lines 3 and 5 of our original Makefile also have a special variable, M=$(PWD)
,
which stores the working directory that make
was originally ran from.
Searching for M=dir
in the kernel Makefile, we can find what this variable is
used for:
#Use make M=dir or set the environment variable KBUILD_EXTMOD to
#specify the directory of external module to build.
ifeq ("$(origin M)", "command line")
KBUILD_EXTMOD := $(M)
endif
Ultimately, this variable allows the Makefile to move from the previous
“workbench” directory back to the module source directory before trying to build
the modules
or clean
target.
Finally, let’s go back and break down line 1 of our original Makefile, which
concatenates an object file name to a variable called obj-m
. This variable is
part of Linux’s Makefile that specifies object files to be built as loadable
kernel modules. There is also another variable, obj-y
, which is used when the
object file should be linked into the main kernel image. Since we want our
module to be dynamically loaded, obj-m
is used here.
Adapted from Chapter 2 of Operating Systems Concepts by Abraham Silberschatz, Peter B. Galvin, and Greg Gagne; Programming Projects; Linux Kernel Modules.
KEDR was ported for Linux 5.x+ x86 and arm64 by the following TAs of COMS W4118 Operating Systems, Fall 2021, Columbia University: Hans Montero and Kent Hall.