101.2 Boot the SystemEdit
Candidates should be able to design a disk partitioning scheme for a Linux system. Candidates should be able to select, install and configure a boot manager.
Key Knowledge Areas
- Provide common commands to the boot loader and options to the kernel at boot time.
- Demonstrate knowledge of the boot sequence from BIOS to boot completion.
- Check boot events in the log files.
When an x86 computer starts up it follows a predefined set of steps to boot the operating system. On start up the CPU jumps to the address of the BIOS and proceeds to load it. The BIOS performs some checks and initializes hardware before locating the configured boot device. The boot device is configured in the BIOS user interface by setting the boot order of attached mass storage devices.
Once the boot device is located the BIOS proceeds to load the the Master Boot Record (MBR) of the boot device. The MBR is the first sector ( 512 bytes) of the boot device and contains the 1st stage boot loader with a partition table. The first stage boot loader can either directly boot the operating system in the case of a Linux 1st stage boot loader but usually the the 1st stage boot loader is responsible for locating and loading the 2nd stage boot loader which, in the case of Linux boot loaders, allows for more flexibility in selecting the kernel and operating system to boot.
The partition table is required for the 1st stage boot loader to be able to locate the offset to the 2nd stage boot loader. Due to the limited size of the MBR the partition table only contains the location of the primary partitions which therefore requires that the 2nd stage boot loader must be located on a primary partition.
The 1st stage boot loader locates the partition of the 2nd stage boot loader by looking at the boot sectors of the partition marked as active/bootable. (We will cover marking partitions (as opposed to entire devices) as bootable later when looking at disk partitioning.)
The second stage boot loader has the task of loading the operating system for Linux; this means the Linux kernel and initial ram disk. The second-boot loader may present the user with a menu to select which kernel to boot and may even allow the user to boot heterogeneous operating systems (aka dual boot).
Once the 2nd stage boot loader has loaded the kernel it passes control over to it, The kernel starts and configures the CPU type, interrupt handling, the rest of memory management such as paging tables and memory paging, device initialization, drivers, etc.
The kernel also loads any initial RAM disk image that may be present into memory and mounts it as a temporary root filesystem in ram. The initial ram disk (initrd) contains an image of system configuration files and modules that the kernel will need to be able to access system hardware. It is here that various file system and disk drivers are loaded that enable the kernel to find and mount the real root partition. For example if the real root partition is on a raid 1 devices, the module for raid1 will need to be loaded to enable the kernel to mount and read the root filesystem. The temporary root files system is later swapped out for the real root filesystem once the kernel has access to it.
Once the kernel is fully operational it starts an initial programme which by default is /sbin/init. The init programme sets up user space and starts the login shells and/or the graphical login. What services are started and the state of the machine after the init process has completed its initialization depends on the default runlevel and its configuration.
Figure 101.2.1: The Bootup Process
There are two widely used boot loaders for Linux, namely GRUB and LILO. Both boot loaders are broken into at least two stages. The first stage is a small machine code binary on the MBR. Its sole job is to locate and load the second stage boot loader. The 2nd stage boot loader then locates and loads the Linux kernel passing in any parameters with which it has been provided.
When the 2nd stage boot loader is running, you are presented with an opportunity to pass additional parameters to the Linux kernel via a text console that is part of the boot loader. Typical parameters that you may pass to the kernel include:
init – overrides the process that is run by the kernel after it has finished loading. “init=/bin/bash” is used to bypass the login prompt in cases where the root password had been forgotten. Because this gives access to a root shell, this also highlights why it is important to have good physical access controls to the computer console and why it is a good idea to secure your boot loader with a password. This will prevent users from modifying the boot-time kernel parameters.
root – informs the kernel which device to use as the root filesystem. Often used when troubleshooting an incorrectly configured boot loader. E.g root = /dev/hda1 tells the kernel to use /dev/hda1 as the root device filesystem rather than what it has been configured to use
noapic/nolapic – tells the kernel not to use the advanced programmable interrupt controller or local advanced programmable interrupt controller for assigning IRQ and resources. This effectively turns off pnp in the Linux kernel.
noacpi – turns off the advance configuration and power interface capabilities of the Linux kernel. Often needed in the case of a buggy BIOS.
There are many more parameters that can be passed to the kernel at boot time. Please consult the kernel documentation for more options.
Init Process Overview (SysV init style)
Once the kernel has finished loading it starts the init process. Init is responsible for checking and mounting file systems, and starting up configured services, such as the network, mail and web servers for example, it does this by entering its default runlevel. This is configured in the /sbin/init application configuration file /etc/inittab. An example of the inittab file is given below:
# inittab This file describes how the INIT process should set up # the system in a certain run-level. id:3:initdefault: # System initialization. si::sysinit:/etc/rc.d/rc.sysinit l0:0:wait:/etc/rc.d/rc 0 l1:1:wait:/etc/rc.d/rc 1 l2:2:wait:/etc/rc.d/rc 2 l3:3:wait:/etc/rc.d/rc 3 l4:4:wait:/etc/rc.d/rc 4 l5:5:wait:/etc/rc.d/rc 5 l6:6:wait:/etc/rc.d/rc 6 # Trap CTRL-ALT-DELETE ca::ctrlaltdel:/sbin/shutdown -t3 -r now # When our UPS tells us power has failed, assume we have a few minutes # of power left. Schedule a shutdown for 2 minutes from now. # This does, of course, assume you have powerd installed and your # UPS connected and working correctly. pf::powerfail:/sbin/shutdown -f -h +2 "Power Failure; System Shutting Down" # If power was restored before the shutdown kicked in, cancel it. pr:12345:powerokwait:/sbin/shutdown -c "Power Restored; Shutdown Cancelled" # Run gettys in standard runlevels 1:2345:respawn:/sbin/mingetty tty1 2:2345:respawn:/sbin/mingetty tty2 3:2345:respawn:/sbin/mingetty tty3 4:2345:respawn:/sbin/mingetty tty4 5:2345:respawn:/sbin/mingetty tty5 6:2345:respawn:/sbin/mingetty tty6 # Run xdm in runlevel 5 x:5:respawn:/etc/X11/prefdm -nodaemon
Aline in the inittab file has the following format:
- id – 1-4 characters that identify the function,
- runlevels – the runlevels for which the process will be executed,
- action – One of a defined list of events for which the process should be executed or an instruction to init on what to do when the process is executed. The most commonly used actions are:
- wait – init will wait till the process it is starting has completed before continuing,
- respawn – tells init to restart the process whenever it terminates, this is useful for login processes.
- Ctrl-Alt-Del – this traps the ctrl-alt-delete key combination,
- initdefault – set the default runlevel
- powerfail and powerokwait – are used to respond to notifications from attached UPS devices in the event of power failure and restoration
- process - the command to execute
Line 5 sets the default run level for init. In order to determine the default runlevel the init process searches for the initdefault entry, if there is no such entry (or no /etc/inittab at all), a runlevel must be entered at the system console. There are 6 run levels. On a RedHat-style system, the default run level is 3 for a server (no GUI) or 5 for a desktop machine.
Line 8 is for the sysinit action, this process is run for every runlevel.
Line 10 – 16 define the scripts that should be run for each runlevel.
Line 19, 25, 28 define the processes to be run when the specified actions occur. These actions are:
- Line 19 when ctrl-alt-delete combination is hit, the computer shuts down, ,
- Line 25 when the a signal from the ups on a power outage is received, the computer is told to shutdown after 2 minutes,
- Line 28 when power is restored and the machine has not shutdown the scheduled shutdown is canceled.
Line 32-37 spawn the console logins. They are set to automatically restart when the process dies.
Line 40 – starts the graphical login console for runlevel 5
After it has spawned all of the processes specified, init goes dormant, and waits for one of three events to happen:- processes it started to end or die, a power failure signal or a request via /sbin/telinit to further change the runlevel.
Troubleshooting The Boot ProcessEdit
During boot process the various modules that the kernel loads/probes for their supported hardware , producing copious amount of log output in the processes. These are the console messages that fly-by during system boot.
Since the logging service has not yet been started the kernel logs its messages to an in memory ring buffer. It is called a ring buffer as, once the log has reached a set memory size, earlier messages are overwritten by newer messages.
As this information in the ring buffer could be lost on reboot or by being overwritten most distributions write the ring buffer entries to disk either under /var/log/dmesg, /var/log/messages or /var/log/syslog depending on their logging service configuration.
The contents of the ring buffer can be read with the dmesg command. This is usually combined with a pagination utility, such as less, as the amount of data contain in the ring buffer is usually too large to fit on one screen.
dmesg | less
If you have hardware that is not being configured properly during boot the ring buffer is a good place to look for clues as to what could be wrong as usually the driver will write out some error message that can help in resolving the issue.
Beside the ring buffer you can also by examining the system log files configured by the logging service. These files are located under /var/log and may be /var/log/syslog or /var/log/messages.
Failure to boot to command prompt
In some cases you may be dumped to the command line during the boot process with a message that the root device could not be found or that you can type Ctrl-D plus the root password to perform maintenance. Usually this means that the root device has not been configured properly and you may be able to fix this from the command line.
In other cases you may get a message which says “kernel panic” and the machine automatically reboots, or waits in an unusable state until rebooted. The causes of this kind of problems can be many, from an incorrectly configured initrd image, to a missing root device. The output on the screen, before system failure can provide you with a clue as to the configuration error. This may be due to:
- configuration problems with boot loader, i.e. it is configured for the incorrect root device,
- missing modules in the initrd image,
- configuration error on the root file system.
In the cases of incorrectly configured boot loaders you can either enter the boot loader editor during the 2nd stage, by pressing 'e' for the grub loader, or at the lilo prompt, and passing in the correct parameters to get the computer to boot, editing the configuration file once the system has successfully started. In other cases it may be necessary to boot a live CD and then trouble shoot the root file system without mounting it.
Used files, terms and utilities: