Manage and Configure Kdump Service on CentOS 7 / Red Hat 7

enable disable configure kdump on centos redhat rhel
Kdump is a reliable Kernel Crash Dumping Mechanism, in which a crash dump is captured by a second kernel (crash kernel) booted when the main kernel crashes.

This second kernel (crash kernel) uses a small amount of memory for booting and capturing the dump image (vmcore) file. The part of memory reserved by the main kernel is used by the second kernel to boot. Preserving the main kernel’s crash dump is a result of kexec mechanism which allows to boot the second kernel without the necessity of rebooting the system and passing through BIOS procedures.

The time for capturing the vmcore file depends on the amount of the occupied memory during crash. The average time of capturing a 5GB vmcore file is approximately 20-25 minutes.

When the kdump is successfully made during the system crash, a vmcore file is created in a dump location (usually /var/crash/ directory) and next the system reboots. After crash the vmcore file should be analyzed to determine root cause of the failure.

Below we present how to configure and manage kdump service on CentOS 7 / RHEL 7.

1. Memory Requirements

The main factor which influences the amount of memory to be reserved is the total amount of installed system RAM.

For x86_64 architecture:

physical RAM	amount of reserved memory
2GB and more	160 MB + 2 bits for every 4 KB of RAM
1TB	minimum: 224 MB (160 MB + 64 MB)

2. Kdump Configuration

Below table presents the default kdump configuration which comes with the system:

parameter	Value	Description
dump file location	/var/crash	value set in /etc/kdump.conf file
memory settings	automatic	parameter crashkernel=auto set in /boot/grub/grub.conf file, 128MB assigned automatically by the system (if RAM >= 2GB)
compression	enabled	Value set in /etc/kdump.conf file to decrease vmcore size (compression performed before writing the vmcore to disk)
dump level	31	value set in /etc/kdump.conf file to dump the kernel only (unnecessary pages are excluded)
message level	1	value set in the /etc/kdump.conf file to display the progress indicator
default action (if dump fails)	reboot	value set in the /etc/kdump.conf file

3. Install and enable kdump during system installation

Kdump can be enabled and partly configured at the end of CentOS 7 / Red Hat 7 installation process. The Anaconda installer provides a screen for kdump configuration when performing installation using the graphical interface:

red hat redhat rhel centos 7 enable kdump upon installation

Memory reservation options:

Automatic – 128MB assigned automatically by the system (if physical RAM >= 2GB)
Manual – manual assignment

4. Install kdump on already installed system

Some installation options, such as custom Kickstart installations, may not install or enable kdump by default.
To install kdump service manually, execute the following command:

[root@tuxfixer ~]# yum install kexec-tools

Install kdump graphical configuration tool, if needed (optional):

[root@tuxfixer ~]# yum install system-config-kdump

5. Configure kdump service

The amount of memory reserved for the kdump is specified in the system’s boot loader – GRUB2.

To configure/modify memory reservation for kdump, edit grub configuration file:

[root@tuxfixer ~]# vim /etc/default/grub

GRUB_TIMEOUT=5
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet"
GRUB_DISABLE_RECOVERY="true"

Modify the following parameter according to your preferences:

crashkernel=160M

Next, update GRUB2 settings:

[root@tuxfixer ~]# grub2-mkconfig -o /boot/grub2/grub.cfg

In UEFI systems, execute the following instead:

[root@tuxfixer ~]# grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

6. Disk space reservation for /var/crash directory

It is a good practice to mount /var directory on separate partiton or logical volume and reserve enough disk space on /var partition for /var/crash directory to generate several vmcore files if a series of crashes occur one by one. If our server usually works with higher memory consumption and some crashes occur one by one, vmcore files may occupy several gigabytes of disk space in /var directory.
Red Hat Support suggests to assign disk space for /var/crash directory according to the following scheme: RAM + 2% more, that means, if we have 32GB of RAM installed in our server, then we need to assign ~33GB (32,64GB) of disk space for /var/crash directory. Of course this is space required for one vmcore file with the full dump level (all pages included in the dump) without compression. With compression enabled and dumplevel set to 31 the vmcore file size decreases significantly (couple of times).

7. Manage kdump service

Below commands are used to mange kdump service.

Check kdump service status:

[root@tuxfixer ~]# systemctl status kdump

Start kdump service:

[root@tuxfixer ~]# systemctl start kdump

Stop kdump service:

[root@tuxfixer ~]# systemctl stop kdump

Enable kdump service (persistent after reboot):

[root@tuxfixer ~]# systemctl enable kdump

Disable kdump service (persistent after reboot):

[root@tuxfixer ~]# systemctl disable kdump

8. Test kdump mechanism

Before we start testing kdump, ensure that the service is running:

[root@tuxfixer ~]# systemctl status kdump
● kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; enabled; vendor preset: enabled)
   Active: active (exited) since Tue 2016-07-05 10:46:11 CEST; 3 weeks 3 days ago
  Process: 1401 ExecStart=/usr/bin/kdumpctl start (code=exited, status=0/SUCCESS)
 Main PID: 1401 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/kdump.service

Jul 05 10:46:10 tuxfixer dracut[4024]: drwxr-xr-x   3 root     root            0 Jul  5 10:45 usr/share/zoneinfo
Jul 05 10:46:10 tuxfixer dracut[4024]: drwxr-xr-x   2 root     root            0 Jul  5 10:45 usr/share/zoneinfo/Europe
Jul 05 10:46:10 tuxfixer dracut[4024]: -rw-r--r--   1 root     root         2679 Mar 23 23:40 usr/share/zoneinfo/Europe/Warsaw
Jul 05 10:46:10 tuxfixer dracut[4024]: drwxr-xr-x   2 root     root            0 Jul  5 10:45 var
Jul 05 10:46:10 tuxfixer dracut[4024]: lrwxrwxrwx   1 root     root           11 Jul  5 10:45 var/lock -> ../run/lock
Jul 05 10:46:10 tuxfixer dracut[4024]: lrwxrwxrwx   1 root     root            6 Jul  5 10:45 var/run -> ../run
Jul 05 10:46:10 tuxfixer dracut[4024]: ========================================================================
Jul 05 10:46:11 tuxfixer kdumpctl[1401]: kexec: loaded kdump kernel
Jul 05 10:46:11 tuxfixer kdumpctl[1401]: Starting kdump: [OK]
Jul 05 10:46:11 tuxfixer systemd[1]: Started Crash recovery kernel arming.

The easiest (and probably the fastest) way to test kdump mechanism is to trigger the crash using the following commands:

[root@tuxfixer ~]# echo 1 > /proc/sys/kernel/sysrq
[root@tuxfixer ~]# echo c > /proc/sysrq-trigger

Note: the above commands will cause the kernel to crash! Dont’t use these commands on a production environment!

During crash, kdump mechanism will create crash directories in time stamp format (address-YYYY-MM-DD-HH:MM:SS) in the previously set dump file location (usually /var/crash). After crash when the system reboots, we can verify crash directories and vmcore files generated inside:

[root@tuxfixer ~]# ls -l /var/crash
...
drwxr-xr-x 2 root root 4096 Feb 20 12:05 127.0.0.1-2016-02-20-12:00:28
drwxr-xr-x 2 root root 4096 Feb 20 12:27 127.0.0.1-2016-02-20-12:22:17
drwxr-xr-x 2 root root 4096 Feb 20 14:22 127.0.0.1-2016-02-20-14:17:11
[root@tuxfixer ~]# cd /var/crash/127.0.0.1-2016-02-20-14\:17\:11/
[root@tuxfixer 127.0.0.1-2016-02-20-14:17:11]# ls -l
total 1237748
-rw------- 1 root root 1267339537 Feb 20 14:22 vmcore
-rw-r--r-- 1 root root     105830 Feb 20 14:17 vmcore-dmesg.txt
[root@tuxfixer 127.0.0.1-2016-02-20-14:17:11]# du -m vmcore
1209    vmcore

Grzegorz Juszczak

Leave a Reply Cancel reply