dev-guides

Linux Kernel Modules

Kernel Modules Primer

Operating systems implement a lot of functionality that must be executed in kernel mode: device drivers, system services, file systems, etc. However, those subsystems are not always used, so for most people, including them in the kernel image would only add on bloat to the OS distribution. Instead, they are left out of the base kernel image, and each user may opt to link in the program code only they need.

To avoid having to recompile the kernel each time to link in a new piece of code, Linux exposes a kernel module interface. This feature allow users to dynamically link object files into a running kernel, as well as remove loaded modules. However, to allow the kernel to make sense of the otherwise arbitrary code, kernel modules must strictly abide by Linux’s module protocol, which defines entry points, exit points, and various pieces of metadata for module bookkeeping.

In this guide, we introduce how to create a simple kernel module, as well as ways to interact with it. It should be noted that the module code will be running in the kernel mode, meaning it has privileges that allow it to corrupt a running Linux kernel. It your code crashes, it may bring down the entire operating system! Since you are working within a virtual machine, you can recover from a module failure by simply rebooting your system. But do carefully consider the code you are about to load, and make sure to save any work you may have open in your VM.

Creating a Module

A basic kernel module might look something like this:

#include <linux/module.h>
#include <linux/printk.h>

/* This function is called when the module is loaded. */
int hello(void)
{
        printk(KERN_INFO "Loading module... Hello World!\n");

        return 0;
}

/* This function is called when the module is removed. */
void goodbye(void)
{
        printk(KERN_INFO "Removing module... Goodbye World!\n");
}

/* Macros for registering module entry and exit points */
module_init(hello);
module_exit(goodbye);

/* Macros for declaring module metadata */
MODULE_DESCRIPTION("A basic Hello World module");
MODULE_AUTHOR("cs4118");
MODULE_LICENSE("GPL");

This module does nothing but print a “Hello World!” message when it is loaded, and a “Goodbye World!” message when it is removed, by respectively calling the hello() and goodbye() functions.

The module knows to call these functions because we declared them as the entry and exit points using the module_init and module_exit macros. These functions may be named anything as long as their names are given to these macros. Neither the entry point nor the exit point takes any arguments. The entry point must return an error code, with 0 representing success; the exit point does not return anything.

We declare some module metadata using the MODULE_DESCRIPTION, MODULE_AUTHOR, and MODULE_LICENSE macros. These aren’t strictly necessary, but just like a README, it is good practice to include them. The MODULE_DESCRIPTION can be a short synopsis of what your module is trying to accomplish. For the purposes of your assignments, MODULE_AUTHOR should always be your UNI for individual assignments, or the team number and UNIs of each team member for team assignments. You may leave MODULE_LICENSE set to GPL to keep Richard Stallman happy.

Note that we #include <linux/module.h> at the top of this small program. This defines the module macros that you used. In a more complex module, you may need to include more kernel headers files.

Building a Module

Now that we have written our module, we must still compile it before it can be loaded it into a running Linux system. We shall name our example module hello, so we save its source code in a file named hello.c.

Linux kernel modules are built using the GNU Make build system. More precisely, they are built using commands defined in Makefiles provided by the GNU + Linux operating system. We may hook into these Makefiles from a Makefile of our own:

obj-m += hello.o
all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

This Makefile should be located in the same directory as hello.c, your build directory. That is:

./
|_ hello.c
|_ Makefile

To build the module, you may need to install the kernel headers if you have not done so already:

$ sudo apt-get install -y linux-headers-$(uname -r)

It isn’t too important to understand the details of exactly how this Makefile works, as long as it does. If you would like to, you can read more about how this Makefile works here.

In your build directory, you may use make all (or just make) to build your module, and use make clean to clean your build directory of any build artifacts. You should be able to build without root privileges:

$ make

Doing so will produce several files – the kernel module object that we are interested in will be named hello.ko.

Note that the obj-m variable is used to tell the Linux module build system which module to build, so if you are building a module named foo (with its source code in foo.c, you will need to change the first line of your Makefile to:

obj-m += foo.o

Loading and Removing Modules

To insert your kernel module, hello.ko, we run the following command, with root privileges:

# insmod hello.ko

At this point, the function we passed to module_init (hello in the above example) will be executed.

We can check that our module is present by running the following command:

$ lsmod

This will list all modules running in your system. This may include other running services provided by kernel modules, depending on your system setup. At this point, our hello module should be present.

To remove our running module, hello, we run the following command, with root privileges:

# rmmod hello

Note that we do not need to include the .ko file extension here. Now, running lsmod should no longer show the hello module.

Reading the Kernel Log Buffer

In our above example, the hello() entry point invokes printk(). This is the Linux kernel-equivalent of printf() – it is called with the format string as the first argument, followed by a variable number of arguments used by the format string. printk() supports the same formatting directives as printf(), e.g. %d, %s, %.5f, etc. It is defined in linux/printk.h.

printk() cannot output to stdout, since the module is not a user process. Instead, its output goes to the kernel log buffer. You can read this log buffer using the following command:

# dmesg

You will likely find a number of messages from other system services as well – these all share the same log buffer – starting from the earliest message to the latest. If you just loaded the above example module, you should find the message Loading module... Hello World! at the bottom; if you also removed it, you should also find Removing module... Goodbye World! there too.

You may also find it helpful to keep dmesg open, and see kernel output as it is being produced by printk() (similar to the behavior of tail -f). You may do this by running dmesg with the -w flag.

Since the kernel log buffer is shared amongst many services, it is often full of verbose, noisy messages. This can get rather unwieldy, so to clear the log buffer, we can use the following command:

# dmesg -c

To help you sift through the verbosity, printk() also supports logging priorities. These may be specified by passing in macros such as KERN_INFO to printk(); these macros are defined in <linux/printk.h>. You may filter the output to dmesg by using the -l flag. Note that the macros pr_info(), pr_warn(), etc., which define logging priorities, are now preferred over using printk() directly.

For more detailed usage of dmesg, please check its man pages.

Memory Leak Checking for Linux Kernel Modules

Be mindful of memory usage while developing kernel modules – your code shouldn’t leak memory.

We highly recommend using KEDR. This framework works especially well for us since we will often be developing only with kernel modules. Check out the linked wiki to see how to use it. There’s also a lot of StackOverflow posts on it, like this one.

Here’s how you can set up KEDR on your VM:

  1. Make sure you can successfully build kernel modules as explained above.

  2. Install CMake: sudo apt-get install -y cmake

  3. Run this one-liner to clone down the KEDR source, build it, and install it:

    cd ~ && rm -rf kedr && git clone https://github.com/cs4118/kedr && cd kedr/sources && mkdir build && cd build && cmake .. && make -j$(nproc) && sudo make install
    

Note that we’ll be using our own fork of KEDR instead of the official repo because KEDR’s maintainer hasn’t had time to add support for Linux 5.x+ and the arm64 architecture. Our fork of KEDR (linked in the command above) has been tested to work on both x86 and arm64 Debian 11+. Do not use the official KEDR repo, as it does not build on all platforms.

Note that if you’re on arm64, you’ll need to restart your VM to complete the installation.

Understanding Kernel Module Makefiles (Optional)

Previously, we provided you with a kernel module makefile that looks like the following:

obj-m += hello.o
all:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

This Makefile actually hooks into another, much more sophisticated Makefile: the top-level Linux kernel Makefile.

Let’s break down lines 3 and 5 first, where we’re (recursively) calling make.

The -C option changes the working directory to /lib/modules/$(shell uname -r)/build before make looks for a Makefile to read. The shell keyword in $(shell uname -r) indicates that what follows should be treated as a shell command rather than a Makefile variable. Try running the command in your VM, it will print the kernel release version like so:

$ uname -r
5.10.0-20-arm64

Plugging the kernel release version back into the original path, the resulting directory consists of a reduced kernel source tree. This directory acts as the workbench for building modules: it contains all the information a module needs for compilation and for hooking into your VM’s kernel. If you look at the Makefile under this directory, you can see it points to the actual kernel Makefile:

$ cd /lib/modules/5.10.0-20-arm64/build
$ ls
arch  include  Makefile  Module.symvers  scripts  tools
$ cat Makefile
include /usr/src/linux-headers-5.10.0-20-common/Makefile

Lines 3 and 5 of our original Makefile also have a special variable, M=$(PWD), which stores the working directory that make was originally ran from. Searching for M=dir in the kernel Makefile, we can find what this variable is used for:

#Use make M=dir or set the environment variable KBUILD_EXTMOD to 
#specify the directory of external module to build.
ifeq ("$(origin M)", "command line")
  KBUILD_EXTMOD := $(M)
endif

Ultimately, this variable allows the Makefile to move from the previous “workbench” directory back to the module source directory before trying to build the modules or clean target.

Finally, let’s go back and break down line 1 of our original Makefile, which concatenates an object file name to a variable called obj-m. This variable is part of Linux’s Makefile that specifies object files to be built as loadable kernel modules. There is also another variable, obj-y, which is used when the object file should be linked into the main kernel image. Since we want our module to be dynamically loaded, obj-m is used here.


Acknowledgements

Adapted from Chapter 2 of Operating Systems Concepts by Abraham Silberschatz, Peter B. Galvin, and Greg Gagne; Programming Projects; Linux Kernel Modules.

KEDR was ported for Linux 5.x+ x86 and arm64 by the following TAs of COMS W4118 Operating Systems, Fall 2021, Columbia University: Hans Montero and Kent Hall.