System functionality

User space interfaces, System calls
Driver Model
buses, PCI
hardware interfaces, [re]booting

This article describes infrastructures used to support and manage other kernel functionalities. This functionality is named after system calls and sysfs.

User space communicationEdit


โšฒ APIs:

kernel space API for user space
uapi inc
arch/x86/include/uapi src
man 2 ioctl
System calls
Device files
user space API for kernel space
linux/uaccess.h inc:
copy_to_user id
copy_from_user id

๐Ÿ“š References

User-space API guides doc
User space
Linux kernel interfaces
ULK3 Chapter 11. Signals

System callsEdit

Table of syscalls

โš™๏ธ Internals:

linux/syscalls.h inc
syscall_init id
entry_SYSCALL_64 id
do_syscall_64 id
man 2 syscall
man 2 syscalls

๐Ÿ“š References

System call
Directory of system calls, man section 2
Anatomy of a system call, part 1 and part 2
syscalls ltp

๐Ÿ’พ Historical

ULK3 Chapter 10. System Calls

Device filesEdit

Classic UNIX devices are Char devices used as byte streams with man 2 ioctl.

โšฒ API:

ls /dev
cat /proc/devices 
cat /proc/misc

Examples: misc_fops id usb_fops id memory_fops id

Allocated devices doc
drivers/char src - actually byte stream devices
Chapter 13. I/O Architecture and Device Drivers


โš ๏ธ Warning: confusion. hiddev isn't real human interface device! It reuses USBHID infrastructure. hiddev is used for example for monitor controls and Uninterruptible Power Supplies. This module supports these devices separately using a separate event interface on /dev/usb/hiddevX (char 180:96 to 180:111) (โš™๏ธ HIDDEV_MINOR_BASE id)

โšฒ API:

uapi/linux/hiddev.h inc

โš™๏ธ Internals:

linux/hiddev.h inc
hiddev_event id
drivers/hid/usbhid/hiddev.c src, hiddev_fops id

๐Ÿ“š References:

HIDDEV - Care and feeding of your Human Interface Devices doc

๐Ÿ“š References:

Device file



๐Ÿ“š References

man 7 netlink
The Linux kernel userโ€™s and administratorโ€™s guide doc


The proc filesystem (procfs) is a special filesystem that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures in the kernel. It can be used to obtain information about the system and to change certain kernel parameters at runtime.

/proc includes a directory for each running process โ€”including kernel threadsโ€” in directories named /proc/PID, where PID is the process number. Each directory contains information about one process, including the command that originally started the process (/proc/PID/cmdline), the names and values of its environment variables (/proc/PID/environ), a symlink to its working directory (/proc/PID/cwd), another symlink to the original executable file โ€”if it still existsโ€” (/proc/PID/exe), a couple of directories with symlinks to each open file descriptor (/proc/PID/fd) and the status โ€”position, flags, ...โ€” of each of them (/proc/PID/fdinfo), information about mapped files and blocks like heap and stack (/proc/PID/maps), a binary image representing the process's virtual memory (/proc/PID/mem), a symlink to the root path as seen by the process (/proc/PID/root), a directory containing hard links to any child process or thread (/proc/PID/task), basic information about a process including its run state and memory usage (/proc/PID/status) and much more.

๐Ÿ“š References

man 5 procfs
man 7 namespaces
man 7 capabilities
linux/proc_fs.h inc
fs/proc src


sysfs is a pseudo-file system that exports information about various kernel subsystems, hardware devices, and associated device drivers from the kernel's device model to user space through virtual files. In addition to providing information about various devices and kernel subsystems, exported virtual files are also used for their configuring. Sysfs is designed to export the information present in the device tree, which would then no longer clutter up procfs.

Sysfs is mounted under the /sys mount point.

โšฒ API:

linux/sysfs.h inc

๐Ÿ“š References

man 5 sysfs
sysfs - The filesystem for exporting kernel objects doc
fs/sysfs src


devtmpfs is a hybrid kernel/userspace approach of a device filesystem to provide nodes before udev runs for the first time.

๐Ÿ“š References

Device file
drivers/base/devtmpfs.c src

Driver ModelEdit

or Device Model, or just DM. DM core structure consists of DM classes, DM buses, DM drivers and DM devices.


โšฒ Infrastructure API:

linux/kobject.h inc


A class is a higher-level view of a device that abstracts out low-level implementation details. Drivers may see a NVME storage or a SATA storage, but, at the class level, they are all simply block_class id devices. Classes allow user space to work with devices based on what they do, rather than how they are connected or how they work. General DM classes structure match composite pattern.

โšฒ API:

ls /sys/class/
class_register id registers class id
linux/device/class.h inc

๐Ÿ‘ Examples: input_class id, block_class id net_class id


A peripheral bus is a channel between the processor and one or more peripheral devices. A DM bus is proxy for a peripheral bus. General DM buses structure match composite pattern. For the purposes of the device model, all devices are connected via a bus, even if it is an internal, virtual, platform_bus_type id. Buses can plug into each other. A USB controller is usually a PCI device, for example. The device model represents the actual connections between buses and the devices they control. A bus is represented by the bus_type id structure. It contains the name, the default attributes, the bus' methods, PM operations, and the driver core's private data.

โšฒ API:

ls /sys/bus/
bus_register id registers bus_type id
linux/device/bus.h inc

๐Ÿ‘ Examples: usb_bus_type id, hid_bus_type id, pci_bus_type id, scsi_bus_type id, platform_bus_type id

Peripheral buses


โšฒ API:

ls /sys/bus/:/drivers/
module_driver id - simple common driver initializer, ๐Ÿ‘ for example used in module_pci_driver id
driver_register id registers device_driver id - basic device driver structure, one per all device instances.
linux/device/driver.h inc

๐Ÿ‘ Examples: hid_generic id usb_register_device_driver id

Platform drivers

module_platform_driver id registers platform_driver id (platform wrapper of device_driver id) with platform_bus_type id
linux/platform_device.h inc

๐Ÿ‘ Examples: gpio_mouse_device_driver id


โšฒ API:

ls /sys/devices/
device_register id registers device id - the basic device structure, per each device instance
linux/device.h inc
linux/dev_printk.h inc

๐Ÿ‘ Examples: platform_bus id mousedev_create

Platform devices

platform_device id - platform wrapper of struct device - the basic device structure doc, contains resources associated with the devie
it is can be created dynamically automatically by platform_device_register_simple id or platform_device_alloc id. Or registered with platform_device_register id.
platform_device_unregister id - releases device and associated resources

๐Ÿ‘ Examples: add_pcspkr id

โšฒ API: ๐Ÿ”ง TODO

platform_device_info platform_device_id platform_device_register_full platform_device_add
platform_device_add_data platform_device_register_data platform_device_add_resources
attribute_group dev_pm_ops

โš™๏ธ Internals:

linux/dev_printk.h inc
lib/kobject.c src
drivers/base/platform.c src
drivers/base/core.c src

๐Ÿ“š References

Device drivers infrastructure doc
Everything you never wanted to know about kobjects, ksets, and ktypes doc
Driver Model doc
The Linux Kernel Device Model doc
Platform Devices and Drivers doc
Linux Device Model, by linux-kernel-labs


Article about modules
cat /proc/modules
kernel/kmod.c src

๐Ÿ“š References

LDD3: Building and Running Modules The Linux Kernel Module Programming Guide

Peripheral busesEdit

โšฒ API:

Shell interface: ls /proc/bus/ /sys/bus/

See also Buses of Driver Model

See Input: keyboard, mouse etc


โšฒ Shell API:

lspci -vv
column -t /proc/bus/pci/devices

Main article: PCI


โšฒ Shell API:

lsusb -v
ls /sys/bus/usb/
cat /proc/bus/usb/devices

โš™๏ธ Internals:

drivers/usb src

๐Ÿ“š References:

USB doc
LDD3:USB Drivers

Other buses

drivers/bus src

Buses for ๐Ÿค– embedded devices:

linux/gpio/driver.h inc linux/gpio.h inc drivers/gpio src tools/gpio src
drivers/i2c src


โšฒ API:

linux/spi/spi.h inc
tools/spi src

โš™๏ธ Internals:

drivers/spi src
spi_register_controller id
spi_controller_list id๐Ÿšง

๐Ÿ“š References:

SPI doc

Hardware interfacesEdit


I/O ports and registersEdit

โšฒ API:

linux/regmap.h inc โ€” register map access API

asm-generic/io.h inc โ€” generic I/O port emulation.

ioport_map id
ioread32 id / iowrite32 id ...
readl id/ writel id ...
The {in,out}[bwl] macros are for emulating x86-style PCI/ISA IO space:
inl id/ outl id ...

linux/ioport.h inc โ€” definitions of routines for detecting, reserving and allocating system resources.

request_mem_region id

arch/x86/include/asm/io.h src

Functions for memory mapped registers:

ioremap id ...

Hardware Device DriversEdit

Keywords: firmware, hotplug, clock, mux, pin

โš™๏ธ Internals:

drivers/acpi src
drivers/base src
drivers/sdio src - Secure Digital Input Output
drivers/virtio src
drivers/hwmon src
drivers/thermal src
drivers/pinctrl src
drivers/clk src

๐Ÿ“š References

Pin control subsystem doc
Linux Hardware Monitoring doc
Firmware guide doc
Devicetree doc
LDD3:The Linux Device Model

Booting and haltingEdit

Kernel bootingEdit

This is loaded in two stages - in the first stage the kernel (as a compressed image file) is loaded into memory and decompressed, and a few fundamental functions such as essential hardware and basic memory management (memory paging) are set up. Control is then switched one final time to the main kernel start process calling start_kernel id, which then performs the majority of system setup (interrupts, the rest of memory management, device and driver initialization, etc.) before spawning separately, the idle process and scheduler, and the init process (which is executed in user space).

Kernel loading stage

The kernel as loaded is typically an image file, compressed into either zImage or bzImage formats with zlib. A routine at the head of it does a minimal amount of hardware setup, decompresses the image fully into high memory, and takes note of any RAM disk if configured. It then executes kernel startup via startup_64 (for x86_64 architecture).

arch/x86/boot/compressed/ src - linker script defines entry startup_64 id in
arch/x86/boot/compressed/head_64.S src - assembly of extractor
extract_kernel id - extractor in language C
Decompressing Linux... done.
Booting the kernel.

Kernel startup stage

The startup function for the kernel (also called the swapper or process 0) establishes memory management (paging tables and memory paging), detects the type of CPU and any additional functionality such as floating point capabilities, and then switches to non-architecture specific Linux kernel functionality via a call to start_kernel id.

โ†ฏ Startup call hierarchy:

arch/x86/kernel/ src โ€“ linker script
arch/x86/kernel/head_64.S src โ€“ assembly of uncompressed startup code
arch/x86/kernel/head64.c src โ€“ platform depended startup:
x86_64_start_kernel id
x86_64_start_reservations id
init/main.c src โ€“ main initialization code
start_kernel id 200 SLOC
mm_init id
mem_init id
vmalloc_init id
sched_init id
rcu_init id โ€“ Read-copy-update
rest_init id
kernel_init id - deferred kernel thread #1
kernel_init_freeable id This and following functions are defied with attribute __init id
prepare_namespace id
initrd_load id
mount_root id
run_init_process id obviously runs the first process man 1 init
kthreadd id โ€“ deferred kernel thread #2
cpu_startup_entry id
do_idle id

start_kernel id executes a wide range of initialization functions. It sets up interrupt handling (IRQs), further configures memory, starts the man 1 init process (the first user-space process), and then starts the idle task via cpu_startup_entry id. Notably, the kernel startup process also mounts the initial ramdisk (initrd) that was loaded previously as the temporary root file system during the boot phase. The initrd allows driver modules to be loaded directly from memory, without reliance upon other devices (e.g. a hard disk) and the drivers that are needed to access them (e.g. a SATA driver). This split of some drivers statically compiled into the kernel and other drivers loaded from initrd allows for a smaller kernel. The root file system is later switched via a call to man 8 pivot_root / man 2 pivot_root which unmounts the temporary root file system and replaces it with the use of the real one, once the latter is accessible. The memory used by the temporary root file system is then reclaimed.

โš™๏ธ Internals:

arch/x86/Kconfig.debug src

๐Ÿ“š References:

Article about booting of the kernel
Initial RAM disk doc
Linux startup process
Linux (U)EFI boot process
The kernelโ€™s command-line parameters doc
Boot time memory management doc
Kernel booting process
Kernel initialization process

๐Ÿ’พ Historical

Halting or rebootingEdit


โšฒ API:

linux/reboot.h inc
reboot_mode id
sys_reboot id calls
machine_restart id or
machine_halt id or
machine_power_off id
linux/reboot-mode.h inc
reboot_mode_driver id
devm_reboot_mode_register id

โš™๏ธ Internals:

kernel/reboot.c src
arch/x86/kernel/reboot.c src
Softdog Driver

Power managementEdit

Keyword: suspend, alarm, hibernation.

โšฒ API:

โŒจ hands-on:
sudo awk '{gsub("^ ","?")} NR>1 {if ($6) {print $1}}' /sys/kernel/debug/wakeup_sources
linux/pm.h inc
linux inc
linux/pm_qos.h inc
linux/pm_clock.h inc
linux/pm_domain.h inc
linux/pm_wakeirq.h inc
linux/pm_wakeup.h inc
wakeup_source id
wakeup_source_register id
linux/suspend.h inc
pm_suspend id suspends the system
Suspend and wakeup depend on
man 2 timer_create and man 2 timerfd_create with clock ids CLOCK_REALTIME_ALARM id or CLOCK_BOOTTIME_ALARM id will wake the system if it is suspended.
man 2 epoll_ctl with flag EPOLLWAKEUP id blocks suspend
See also man 7 capabilities CAP_WAKE_ALARM id, CAP_BLOCK_SUSPEND id

โš™๏ธ Internals:

kernel/power src
alarm_init id
kernel/time/alarmtimer.c src
drivers/base/power src: wakeup_sources id

๐Ÿ“š References:

PM administration doc
CPU and Device PM doc
Power Management doc
tlp โ€“ apply laptop power management settings
ACPI โ€“ Advanced Configuration and Power Interface
Runtime PMEdit

Keywords: runtime power management, devices power management opportunistic suspend, autosuspend, autosleep.

โšฒ API:

async autosuspend_delay_ms control runtime_active_kids runtime_active_time runtime_enabled runtime_status runtime_suspended_time runtime_usage
linux/pm_runtime.h inc
pm_runtime_mark_last_busy id
pm_runtime_enable id
pm_runtime_disable id
pm_runtime_get id โ€“ asynchronous get
pm_runtime_get_sync id
pm_runtime_resume_and_get id โ€“ preferable synchronous get
pm_runtime_put id
pm_runtime_put_noidle id โ€“ just decrement usage counter
pm_runtime_put_sync id
pm_runtime_put_autosuspend id

๐Ÿ‘ Example: ac97_pm id

โš™๏ธ Internals:


๐Ÿ“š References:

Runtime Power Management Framework for I/O Devices doc
Sysfs devices PM API
Power Management for USB
Opportunistic suspend

Building and UpdatingEdit