Kdump is a reliable Kernel Crash Dumping Mechanism, in which a crash dump is captured by a second kernel (crash kernel) booted when the main kernel crashes.
This second kernel (crash kernel) uses a small amount of memory for booting and capturing the dump image (vmcore) file. The part of memory reserved by the main kernel is used by the second kernel to boot. Preserving the main kernel’s crash dump is a result of kexec mechanism which allows to boot the second kernel without the necessity of rebooting the system and passing through BIOS procedures.
The time for capturing the vmcore file depends on the amount of the occupied memory during crash. The average time of capturing a 5GB vmcore file is approximately 20-25 minutes.
When the kdump is successfully made during the system crash, a vmcore file is created in a dump location (usually /var/crash/ directory) and next the system reboots. After crash the vmcore file should be analyzed to determine root cause of the failure.
Below we present how to configure and manage kdump service on CentOS 7 / RHEL 7.
1. Memory Requirements
The main factor which influences the amount of memory to be reserved is the total amount of installed system RAM.
For x86_64 architecture:
physical RAM | amount of reserved memory |
---|---|
2GB and more | 160 MB + 2 bits for every 4 KB of RAM |
1TB | minimum: 224 MB (160 MB + 64 MB) |
2. Kdump Configuration
Below table presents the default kdump configuration which comes with the system:
parameter | Value | Description |
---|---|---|
dump file location | /var/crash | value set in /etc/kdump.conf file |
memory settings | automatic | parameter crashkernel=auto set in /boot/grub/grub.conf file, 128MB assigned automatically by the system (if RAM >= 2GB) |
compression | enabled | Value set in /etc/kdump.conf file to decrease vmcore size (compression performed before writing the vmcore to disk) |
dump level | 31 | value set in /etc/kdump.conf file to dump the kernel only (unnecessary pages are excluded) |
message level | 1 | value set in the /etc/kdump.conf file to display the progress indicator |
default action (if dump fails) | reboot | value set in the /etc/kdump.conf file |
3. Install and enable kdump during system installation
Kdump can be enabled and partly configured at the end of CentOS 7 / Red Hat 7 installation process. The Anaconda installer provides a screen for kdump configuration when performing installation using the graphical interface:
Memory reservation options:
- Automatic – 128MB assigned automatically by the system (if physical RAM >= 2GB)
- Manual – manual assignment
4. Install kdump on already installed system
Some installation options, such as custom Kickstart installations, may not install or enable kdump by default.
To install kdump service manually, execute the following command:
[root@tuxfixer ~]# yum install kexec-tools
Install kdump graphical configuration tool, if needed (optional):
[root@tuxfixer ~]# yum install system-config-kdump
5. Configure kdump service
The amount of memory reserved for the kdump is specified in the system’s boot loader – GRUB2.
To configure/modify memory reservation for kdump, edit grub configuration file:
[root@tuxfixer ~]# vim /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
Modify the following parameter according to your preferences:
crashkernel=160M
Next, update GRUB2 settings:
[root@tuxfixer ~]# grub2-mkconfig -o /boot/grub2/grub.cfg
In UEFI systems, execute the following instead:
[root@tuxfixer ~]# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
6. Disk space reservation for /var/crash directory
It is a good practice to mount /var directory on separate partiton or logical volume and reserve enough disk space on /var partition for /var/crash directory to generate several vmcore files if a series of crashes occur one by one. If our server usually works with higher memory consumption and some crashes occur one by one, vmcore files may occupy several gigabytes of disk space in /var directory.
Red Hat Support suggests to assign disk space for /var/crash directory according to the following scheme: RAM + 2% more, that means, if we have 32GB of RAM installed in our server, then we need to assign ~33GB (32,64GB) of disk space for /var/crash directory. Of course this is space required for one vmcore file with the full dump level (all pages included in the dump) without compression. With compression enabled and dumplevel set to 31 the vmcore file size decreases significantly (couple of times).
7. Manage kdump service
Below commands are used to mange kdump service.
Check kdump service status:
[root@tuxfixer ~]# systemctl status kdump
Start kdump service:
[root@tuxfixer ~]# systemctl start kdump
Stop kdump service:
[root@tuxfixer ~]# systemctl stop kdump
Enable kdump service (persistent after reboot):
[root@tuxfixer ~]# systemctl enable kdump
Disable kdump service (persistent after reboot):
[root@tuxfixer ~]# systemctl disable kdump
8. Test kdump mechanism
Before we start testing kdump, ensure that the service is running:
[root@tuxfixer ~]# systemctl status kdump
● kdump.service - Crash recovery kernel arming
Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
Active: active (exited) since Tue 2016-07-05 10:46:11 CEST; 3 weeks 3 days ago
Process: 1401 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
Main PID: 1401 (code=exited, status=0/SUCCESS)
CGroup: /system.slice/kdump.service
Jul 05 10:46:10 tuxfixer dracut[4024]: drwxr-xr-x 3 root root 0 Jul 5 10:45 usr/share/zoneinfo
Jul 05 10:46:10 tuxfixer dracut[4024]: drwxr-xr-x 2 root root 0 Jul 5 10:45 usr/share/zoneinfo/Europe
Jul 05 10:46:10 tuxfixer dracut[4024]: -rw-r--r-- 1 root root 2679 Mar 23 23:40 usr/share/zoneinfo/Europe/Warsaw
Jul 05 10:46:10 tuxfixer dracut[4024]: drwxr-xr-x 2 root root 0 Jul 5 10:45 var
Jul 05 10:46:10 tuxfixer dracut[4024]: lrwxrwxrwx 1 root root 11 Jul 5 10:45 var/lock -> ../run/lock
Jul 05 10:46:10 tuxfixer dracut[4024]: lrwxrwxrwx 1 root root 6 Jul 5 10:45 var/run -> ../run
Jul 05 10:46:10 tuxfixer dracut[4024]: ========================================================================
Jul 05 10:46:11 tuxfixer kdumpctl[1401]: kexec: loaded kdump kernel
Jul 05 10:46:11 tuxfixer kdumpctl[1401]: Starting kdump: [OK]
Jul 05 10:46:11 tuxfixer systemd[1]: Started Crash recovery kernel arming.
The easiest (and probably the fastest) way to test kdump mechanism is to trigger the crash using the following commands:
[root@tuxfixer ~]# echo 1 > /proc/sys/kernel/sysrq
[root@tuxfixer ~]# echo c > /proc/sysrq-trigger
Note: the above commands will cause the kernel to crash! Dont’t use these commands on a production environment!
During crash, kdump mechanism will create crash directories in time stamp format (address-YYYY-MM-DD-HH:MM:SS) in the previously set dump file location (usually /var/crash). After crash when the system reboots, we can verify crash directories and vmcore files generated inside:
[root@tuxfixer ~]# ls -l /var/crash
...
drwxr-xr-x 2 root root 4096 Feb 20 12:05 127.0.0.1-2016-02-20-12:00:28
drwxr-xr-x 2 root root 4096 Feb 20 12:27 127.0.0.1-2016-02-20-12:22:17
drwxr-xr-x 2 root root 4096 Feb 20 14:22 127.0.0.1-2016-02-20-14:17:11
[root@tuxfixer ~]# cd /var/crash/127.0.0.1-2016-02-20-14\:17\:11/
[root@tuxfixer 127.0.0.1-2016-02-20-14:17:11]# ls -l
total 1237748
-rw------- 1 root root 1267339537 Feb 20 14:22 vmcore
-rw-r--r-- 1 root root 105830 Feb 20 14:17 vmcore-dmesg.txt
[root@tuxfixer 127.0.0.1-2016-02-20-14:17:11]# du -m vmcore
1209 vmcore