Core Knowledge That Modern Linux Kernel Developer Should Have

ByContributor September 27, 2023

Languages

The Linux Kernel is written in C programming language, so C is the most important language for the Linux Kernel developer. Initially, the kernel was written in GNU C (now it is also possible to build it using LLVM) which extends standard C with some additional keywords and attributes. I would recommend learning some modern C version like C11 and additionally learning GNU extensions to be able to read kernel code effectively. Small, architecture-specific parts of the kernel and some highly optimized parts of several drivers are written in assembly language. This is the second language of choice. There are 3 main architectures nowadays: x86, ARM, and RISC-V. What assembly language to choose depends on your hardware platform.

You definitely should look at Rust which is gaining popularity in the Linux Kernel community as a more safer and reliable alternativeto C.

Linux is a highly configurable system and its configurability is based on the kernel build system, KBuild. Each developer should know the basics of KBuild and Make to be able to successfully extend/modify the kernel code. Last, but not least is shell scripting. It is hard to imagine Kernel development without command-line usage and a developer inevitably has to write some shell scripts to support their job by automating repetitive tasks.

Software environment

The Linux Kernel development is inextricably linked to the Git source control system. It is not possible to imagine nowadays the kernel development workflow without it. So, Git knowledge is a requirement.

Unless kernel developers run their kernel on specific/customized hardware – emulation is the best developer’s friend. The most popular platform for this is Qemu/KVM. A typical workflow looks like this: a developer introduces some changes to the kernel or a driver, builds it, copies it under a virtual environment, and tests it there. If all is OK, then the developer tests these changes on real hardware, but if something goes wrong, then the kernel under the virtual machine crashes. In this case, it is quite easy to just shut down VM, fix the error and repeat the development/debug cycle. If we didn’t have virtualization we would restart the real machine on each kernel crash and development time would increase in order of magnitude.

Unlike userspace, the kernel has limited debugging capabilities. Actually, the most popular method of kernel debugging was (and sometimes is) inserting printk function calls, which store its output to the kernel’s circular buffer, into the code in question and analyze this output in userspace using dmesg -kw command. Since the kernel version 2.6 a new in-kernel ftrace framework was introduced. It has been developing since then and now it is comprehensive and robust. It proposes a lot of ways of debugging and many output formats. The most popular function – tracing kernel stack traces for the whole kernel, or part of it, or specific modules and showing its output in a special file. It saves hours and hours of debugging for developers. Besides, it is zero-overhead while in the inactive state. Every modern kernel developer should be aware of ftrace.

There are many cases when a developer encounters that his kernel module is simply slow. This is where perf helps. perf is a pair of an in-kernel profiling framework and a userspace tool helping analyze in-kernel performance. The most sophisticated and flexible tool for gathering kernel runtime information is eBPF framework enables running user programs inside the kernel and passing information to the userspace. In fact, enabling user-defined kernel telemetry, this kernel framework revolutionizes kernel observability.

One of the main domain areas of kernel development is embedded development. Indeed, most of the embedded devices starting from IoT for smart homes to Android-base smartphones carry a flavor of the Linux Kernel on board. There are 2 main build systems in the embedded world: Buildroot and Yocto. The former is more simple and straightforward, the latter is more flexible and sophisticated. Both are intended to build highly customized Linux distro, that is, the Linux Kernel + set of userspace software, tailored to a particular hardware board. It is worth mentioning that an embedded developer must be aware and able to create/update dts-files describing a set of hardware components on the board. The main bootloader in the embedded world is u-boot and its knowledge is also a requirement. Talking about userspace, one of the simplest and most well-known minimalistic frameworks is busybox contains only the minimal set of necessary utilities. It is very small in size and therefore convenient for both embedded and emulated (Qemu/KVM) development.

Last but not least, the dev environment. Most of the Linux Kernel devs use vim (or qemu) text editor in the terminal, tmux as a modern and convenient terminal multiplexer and cscope for building cross reference for a kernel source code.

Linux Kernel Core Concepts

The Linux Kernel development technical skills fall into 2 categories: general and domain-specific. General skills should be known by each kernel developer, while domain-specific ones by a developer work in this particular domain area, for example: networking, storage, virtualization, cryptography, embedded, etc… Linux Kernel is huge and it is impossible to know every part with the same level of details.

Let’s start with general skills:

Kernel coding style – The Linux Kernel has its own coding style that can slightly vary from one subsystem to another. It is always a good idea to periodically check your code with a special script within the kernel source code tree scripts/checkpatch.pl.
The Kernel coding patterns – The Kernel has a set of coding patterns recommended to use. The most well-known of those is allocating/deallocating resources during multi-step resource initialization using goto operator.
The Kernel internal data structures – There are several most important data structures in kernel used globally that every developer must be aware of. Those are: Singly and Doubly Linked Lists, Queues, Hashes, Binary Trees, Red-Black trees, Maple trees, and so on.
Synchronization primitives – Back at the beginning of the 2000s, the first commodity SMP CPUs were introduced. Since then every kernel developer must write their code with multithreading in mind. The Linux Kernel has a lot of synchronization primitives, each for a different purpose: atomic operations, spin locks, semaphores, mutexes, RCUs (lockless algorithm class), etc.
Interrupts handling: top and bottom halves – Linux Kernel has a unique interrupt handling scheme: top and bottom halves. The top half is intended for handling an interrupt as quickly as possible and return, while the bottom half is deferred work that further handles results delivered by the top half. For example: the top half copies a new packet from a network card to the main memory as fast as possible and returns awaiting a new one. A bottom half, which runs later as deferred work, examines the received packet and handles it: populate some fields, create appropriate data structures, and pass it to the Kernel’s networking stack. Every developer must be aware of this interrupt handling scheme and design their interrupt handlers appropriately.
Deferred work – A common situation in Linux Kernel development is postponing a part of the job for some moment in the future. Interrupts, mentioned in the previous point, is a good example. There are several deferred work mechanisms in the kernel for different situations: task queues, softirqs, tasklets, workqueues, etc.
Memory management – Kernel developers should be aware of 2 layers of memory management: the lowest, native layer consists of functions kmalloc/kfree, and slab layer which is built atop of a native one and intended to store structures of different sizes in different caches to avoid memory fragmentation.
Virtual File System – Regardless of the type of lower-layer filesystem (ext3, ext4, zfs, lustrefs, xfs, etc…), the kernel maintains a universal interface atop it. It is worth obtaining general knowledge of VFS, as filesystem interaction is one of the most popular communication methods between kernel and userspace.
Scheduler – The scheduler manages all the processes in the operating system: kernel and userspace. The developer must know its basics.
System call interface – The main way of communication between the kernel and userspace is the system call interface. libc library in the userspace encapsulates it and provides more comprehensive and convenient functions for a developer, but sometimes it is needed to call a system directly. So both userspace and kernel programmers should know how to do this. In the very rare case, a kernel developer could want to add a new system call or trace its arguments using ftrace/strace. Useful knowledge.
/sys /proc catalogs – The second most popular way of interaction between the kernel and userspace is a filesystem. In particular, loads of information and settings are contained in /sys and /proc directories. It is worth learning those structures.
Loadable Kernel Modules – One of the main occupations of a Linux Kernel Developer is developing drivers for new hardware devices. Linux drivers are made in the form of Loadable Kernel Modules, special format binaries created from source code made using a particular structure. These modules could be loaded/unloaded without a system reboot. The developer should know the structure of the kernel modules in general, and additional rules for character, block, and network devices. Also, they should be aware of the ways of communication with userspace: sysfs attributes, MMIO, kernel module parameters, and so on.
Udev – Driver developers should be aware of the Udev subsystem that implements infrastructure supporting running user scripts when a device is hotplugged.
Fault injection framework – Allows to test unusual code paths by injecting error results into typically, always-correct functions like memory allocation’s malloc.
Kernel Sanitizers – KASAN, KMSAN and so on. Dynamic tools catch some bad situations like memory corruption. It is always worth loading the new kernel module, running a workload, and trying to catch some subtle, dynamic bug.
Locking correctness validator – Parts of the kernel/module code implement sophisticated locking schemes. This often leads to deadlocks and livelocks. It is hard to debug such a code. Lockdep runtime validator catches such situations and saves hours of debugging efforts.
Kdump/Kexec – There are situations when it is almost impossible to debug code, especially if it relates to early, system boot time code. This is where Kdump/Kexec comes to help. It loads the second, crash kernel, which intercept crashed kernel and makes its dump for further analysis.

When it comes to specific, domain-related skills, the only right answer: it depends. Linux Kernel contains a lot of specialized frameworks. Just one example: It is worth learning I2C (SMB), SPI, and GPIO frameworks for embedded development.

Userspace tools

Kernel developers should have some userspace knowledge and use common tools such as:

bash (or alternative shell) – Building the kernel, scripting routine actions, etc.
ssh (secure network shell) – Used to login and work on the target machine be it a remote network machine, VM, or embedded device.
tmux (terminal multiplexer) – Support multi-terminal configuration. A convenient tool for kernel development. One window can show logs of the kernel build process, the second has a remote ssh shell, and the third has open vim editor.
minicom – This is the main tool to work with embedded boards not equipped with ehternet/wifi/blueetoth modules. The Linux kernel can be configured to create UART-console during the early boot. In turn, a developer, having a 2-wire UART-TO-USB adapter can connect RX and TX wires to appropriate GPIO pins and insert USB adapter to their laptop, run minicom, and obtain connection.
vim – Editor of choice for most of kernel developers. Besides, there are many servers that contain no GUI and Vim is the only choice there.
gdb – Can be useful for debugging kernel OOPS errors. Having OOPS instruction address and loading uncompressed kernel images with debugging symbols (vmlinux) it is possible to walk the stack trace of the error analyzing the code.

Soft skills

Passion – Kernel development is hard and thorough work. It is impossible to succeed in it without being inspired.
Patience – This type of work requires patience in all senses. First, it is a hard work that implies careful code design and debugging, sometimes hours of debugging efforts to detect and fix a small bug. Second, the ever-changing nature of the kernel requires keeping yourself updated on the latest changes and being ready to update your code accordingly, especially code that was accepted to the mainline kernel. Third, working with the community is a hard and sometimes controversial process. Getting your idea across is not so easy.
Persistency – Kernel developers should be persistent in constant learning and in communication with the community in case they want to get their code accepted to the mainline kernel.

software

Illuminating Your Console: Enhancing Your Linux Command Line Experience with ccat

by George Whittaker

Introducing ccat

ccat stands for “colorized cat.” It’s a simple yet powerful tool that, like the traditional cat command, reads files sequentially, writing them to standard output. However, the ccat command adds a visual advantage – color-coding. It makes your command-line experience more user-friendly, improving the readability and understanding of your code.

Installing `ccat`

Before diving in, you need to ensure you have ccat installed on your system. This process varies based on the Linux distribution you’re using, but here are the most common methods:

For Ubuntu, Debian, and derivatives, the process begins by downloading the latest .deb package from the official ccat GitHub repository, which can be found at: https://github.com/jingweno/ccat. After downloading the package, you can install it using the dpkg command:

sudo dpkg -i /path/to/downloaded_file.deb

For Arch Linux and Manjaro, use the below command to download and install the ccat package from the AUR repository:

git clone https://aur.archlinux.org/ccat.git cd ccat makepkg -si

For other distributions, you can build ccat from source. To do so, ensure you have Go installed on your system, clone the ccat repository, then build and install:

git clone https://github.com/jingweno/ccat.git cd ccat go build sudo mv ccat /usr/local/bin/

Using `ccat`

Now that you have ccat installed, let’s see it in action. The usage of ccat follows the same pattern as the cat command, replacing cat with ccat:

ccat file_name

You will notice that different types of text (such as comments, keywords, and strings) are colorized differently, providing a more visually-pleasing and organized output. For example, comments might be displayed in blue, keywords in bold yellow, and strings in green.

If you want to use ccat as your default cat command, you can create an alias. Add the following line to your .bashrc or .zshrc file:

alias cat='ccat'

Remember to source the .bashrc/.zshrc file after updating it or simply close and reopen your terminal.

Customizing `ccat`

Customization is a key benefit of ccat. You can adjust color settings for different types of text in your output, tailoring them to your preference.

Go to Full Article

software

How to Extend or Resize KVM Virtual Machine Disk Size

The post How to Extend or Increase KVM Virtual Machine (VM) Disk Size first appeared on Tecmint: Linux Howtos, Tutorials & Guides .

KVM virtualization technology supports various disk image formats. Two of the most popular and widely used disk formats are qcow2 and raw disk images. The

The post How to Extend or Increase KVM Virtual Machine (VM) Disk Size first appeared on Tecmint: Linux Howtos, Tutorials & Guides.

software

How to Install VLC Media Player in Fedora

The post How to Install VLC Media Player in Fedora 38 first appeared on Tecmint: Linux Howtos, Tutorials & Guides .

VLC is a free and open source, popular, and cross-platform multimedia player and framework that plays files, discs, webcams, devices as well as streams. VLC

The post How to Install VLC Media Player in Fedora 38 first appeared on Tecmint: Linux Howtos, Tutorials & Guides.

software

24 Best Open Source Linux Text Editors in 2023 (GUI + CLI)

The post 24 Best Open Source Text Editors for Linux in 2023 first appeared on Tecmint: Linux Howtos, Tutorials & Guides .

Text editors can be used for writing code, editing text files such as configuration files, creating user instruction files, and many more. In Linux operating

The post 24 Best Open Source Text Editors for Linux in 2023 first appeared on Tecmint: Linux Howtos, Tutorials & Guides.

software

How to Create a LXC Container in Proxmox

The post How to Create Containers in Proxmox first appeared on Tecmint: Linux Howtos, Tutorials & Guides .

In the previous lectures, we learned how to install Proxmox on Debian and also how to create virtual machines. In this tutorial, we will see

The post How to Create Containers in Proxmox first appeared on Tecmint: Linux Howtos, Tutorials & Guides.

software

Linux Threat Report: Earth Lusca Deploys Novel SprySOCKS Backdoor in Attacks on Government Entities

The threat actor Earth Lusca, linked to Chinese state-sponsored hacking groups, has been observed utilizing a new Linux backdoor dubbed SprySOCKS to target government organizations globally. As initially reported in January 2022 by Trend Micro, Earth Lusca has been active since at least 2021 conducting cyber espionage campaigns against public and private sector targets in…

What is the difference between SEO and SEM

What is the difference between SEO and SEM In the realm of digital marketing, the terminology of SEO and SEM is often used interchangeably, yet they represent two distinct strategies each with its unique advantages and constraints. SEO, or search engine optimization, is dedicated to enhancing the visibility of your website in organic search results…. […]

What metrics are used to evaluate e-commerce teams and performance?

What metrics are used to evaluate e-commerce teams and performance? Evaluating the performance of e-commerce teams involves a range of metrics that span across various aspects of the business. These metrics not only assess the effectiveness of the team but also the overall health and success of the e-commerce operation. Here’s a comprehensive look at… […]

Mastering Website and Technical Metrics: A Strategic Guide for Digital Success

Mastering Website and Technical Metrics: A Strategic Guide for Digital Success In the digital era, where online presence is integral to business success, understanding and optimizing website and technical metrics is crucial. These metrics provide insights into the performance, user experience, and overall effectiveness of a website. This article explores key website and technical metrics,… […]

Transform Your Joomla Articles into Products with Our Powerful eCommerce Solution

In the continually evolving landscape of online retail, identifying the appropriate tools to enhance the eCommerce experience is of paramount importance. This article examines a lightweight yet robust eCommerce plugin specifically designed for Joomla, facilitating the seamless transformation of articles into products. Ideal for small to medium-sized online stores, this solution presents a comprehensive array… […]

Core Knowledge That Modern Linux Kernel Developer Should Have

Illuminating Your Console: Enhancing Your Linux Command Line Experience with ccat

Introducing ccat

Installing `ccat`

Using `ccat`

Customizing `ccat`

How to Extend or Resize KVM Virtual Machine Disk Size

How to Install VLC Media Player in Fedora

24 Best Open Source Linux Text Editors in 2023 (GUI + CLI)

How to Create a LXC Container in Proxmox

Linux Threat Report: Earth Lusca Deploys Novel SprySOCKS Backdoor in Attacks on Government Entities

BEST Web Hosting

Ubercloud

Cloud Web Hosting

Similar Posts

Introducing ccat

Installing ccat

Using ccat

Customizing ccat

Installing `ccat`

Using `ccat`

Customizing `ccat`