The Linux Kernel/Softdog Driver
A watchdog timer is a device that triggers a system reset if it detects that the system has hung. A program running on the system is supposed periodically to service the watchdog timer by writing a "service pulse." If the watchdog is not serviced within a particular period of time, the watchdog assumes that the system has hung, and triggers a system reset.
Usually, watchdog timers are implemented as add-on cards, or as on-chip peripherals within microcontrollers. But if there is no hardware watchdog, the Linux kernel can provide a software watchdog implemented using kernel timers.
In Linux, the watchdog driver provides a character driver interface to the user space. When some data is written to the watchdog driver, the watchdog driver services the watchdog hardware. The user space application periodically writes some data to the watchdog driver, depending upon the watchdog timeout period. If for some reason the user space application hangs, the watchdog device does not get serviced and hence triggers a system reset.
Usually the application that writes to the watchdog driver is a watchdog daemon which monitors processes in the system, as well as other parameters such as CPU utilization, memory utilization, and so on.
When the softdog driver is opened, softdog schedules a kernel timer to expire after a specified timer margin. When some data is written to the driver, the softdog driver re-schedules the timer. The user space watchdog daemon periodically writes to the driver, and the timer is continuously rescheduled and hence the timer callback is never called. If the watchdog daemon stops writing to the driver, the timer expires and the callback is called. In the timer callback, the system is restarted.
⚲ API:
👁 Example:
📚 References: