Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: March 18, 2024
Linux, when used with the default settings, will overcommit memory. On one hand, this improves resource utilization. On the other hand, this is dangerous and cannot be directly controlled by applications. That’s why we need to implement analysis and recovery rather than try to mitigate the error.
We can configure Linux to use different policies to allocate memory. We can find the effective policy by mounting the proc filesystem and reading from /proc/sys/vm/. The files overcommit_ratio and overcommit_memory govern how much memory the system may commit that is not backed by physical memory.
OOM (out of memory) killer will only run if we configure the system to overcommit memory. If, however, we wanted to prevent the system from overcommitting we’d set overcommit_memory to 2. In this case, the system would only overcommit total_memory * overcommit_ratio / 100. In other words, if we set overcommit_memory to 2 and overcommit_ratio to 0, the OOM killer would be disabled.
Unfortunately, we need to rely on logs to find out about OOM errors. Linux has a whole zoo of different logging tools. In this article, we’ll use journald to examine the system logs. journald reads the information about kernel events using syslog interface, typically, from a file in the proc filesystem: /proc/kmsg.
Another popular program that reads this file is dmesg. We’ll use journald because, unlike dmesg, it keeps the information available after the system reboot.
journald is a daemon responsible for collecting and storing system logs. It has a sister program: journalctl for controlling it. Since journald stores structured logs, journalctl gives us plenty of tools to sift through them quickly.
If we wanted to find the process id and the program name of the process that was killed by the OOM killer, we could run:
journalctl --list-boots | \
awk '{ print $1 }' | \
xargs -I{} journalctl --utc --no-pager -b {} -kqg 'killed process' -o verbose --output-fields=MESSAGE
And it may output something like:
Sun 2021-06-20 18:49:15.791688 UTC [s=f81666ba69664fb88b545f20e7a3eea6;i=4dc1ac1;b=b7a40e823b4a4bd0bdf9d
f5fc0dd9b10;m=92c6bf1db5;t=5c537037a9548;x=18ca447c19414dd8]
MESSAGE=Killed process 3029 (Web Content) total-vm:10206696kB, anon-rss:6584572kB, file-rss:0kB, shm
em-rss:8732kB
Tue 2021-07-06 08:58:35.087345 UTC [s=bc64b16638964d46b0df6e203765c3f7;i=4dd75a3;b=20f10519157b4f1a9b244
352ffdb9577;m=5d93e0be98;t=5c670a08259f1;x=ca08824178edc0de]
MESSAGE=Killed process 131641 (bash) total-vm:11257404kB, anon-rss:11243152kB, file-rss:12kB, shmem-
rss:0kB
Let’s take a closer look at the journalctl command options we used:
In addition, we used xargs with the -I option to substitute the build number printed by awk into the final journalctl call.
The non-interactive example outputs very little valuable information. It might be used to trigger an alert, but it’s not enough to analyze the incident. Once we can access the system that reported the error, we should examine its log in more detail.
To start examining the OOM log interactively, we may launch journalctl like:
$ journalctl --utc -b -X -ke
X should be an integer, (the minus sign is the part of the literal integer rather than an option syntax). journalctl interprets negative integers as counting from the last boot backward, while positive integers count from the first boot forward.
Navigation commands in interactive mode will use less key bindings.
The -e option will fast-forward the pager to the end of the log. Next, we’ll use one of the available search commands, for example, by typing “?” followed by “Killed”. Once the cursor moves to the error message of interest, we can search again, now for “Mem-info”, which will rewind the log to the summary of the state of the virtual memory at the time OOM killer ran, followed by the list of processes and statistics of their memory usage.
Some Linux systems don’t use journald. A popular tool to manage system logs on Linux is rsyslogd. This tool has a rich system of modules and a number of configuration languages. But, covering all the functionality of rsyslog requires a substantial effort. The entry point to rsyslog configuration is the file /etc/rsyslog.conf. Furthermore, it is usually accompanied by the /etc/rsyslog.d directory, which adds more configuration files.
Conceptually, rsyslog consists of these kinds of modules:
Let’s take a look at some common patterns in rsyslog configuration, especially those that might help us find OOM-related messages. A configuration rule for matching kernel messages will typically look like this:
kern.* -/var/log/kern.log
This will instruct rsyslog to store messages coming from the kernel in a file called /var/log/kern.log. If this file is too long, rsyslog will typically rotate and archive it. Subsequently, zgrep will help us to read from these files:
$ find /var/log -name kern* -exec grep -PnHe 'Killed process' {} + 2>
Here’s an example of the output we’ll see:
/dev/null/var/log/kern.log:5207:Jul 6 11:58:34 cheetah kernel: [401912.220043] Killed process 131641 (bash) total-vm:11257404kB, anon-rss:11243152kB, file-rss:12kB, shmem-rss:0kB
And for archived logs:
$ find /var/log -name kern*.gz -exec zgrep -PnHe 'Killed process' {} + 2>/dev/null
We’ll see output like:
/var/log/kern.log.3.gz:1967:Jun 13 14:41:02 cheetah kernel: [347531.586344] Killed process 98254 (chrome) total-vm:889684kB, anon-rss:113068kB, file-rss:0kB, shmem-rss:1784kB
/var/log/kern.log.3.gz:2861:Jun 13 14:41:22 cheetah kernel: [347551.430284] Killed process 102503 (chrome) total-vm:911152kB, anon-rss:104748kB, file-rss:0kB, shmem-rss:1968kB
/var/log/kern.log.3.gz:3781:Jun 13 14:41:52 cheetah kernel: [347582.087504] Killed process 96187 (chrome) total-vm:873296kB, anon-rss:103512kB, file-rss:0kB, shmem-rss:4540kB
/var/log/kern.log.2.gz:1593:Jun 20 21:49:15 cheetah kernel: [630398.445681] Killed process 3029 (Web Content) total-vm:10206696kB, anon-rss:6584572kB, file-rss:0kB, shmem-rss:8732kB
The modular structure of rsyslog is both a curse and a blessing. As a result of great flexibility comes a great number of possible storage locations where our data is sent. It may be sent to a centralized log collector, or to a local database server, or simply to a pseudoterminal, to name a few.
Of course, the choice of loggers is far greater than journald and rsyslogd. To find all the programs potentially logging kernel events, we may try:
$ sudo lsof | grep /dev/kmsg
This will list all the programs that are reading this file, and, potentially, storing its contents somewhere.
In this article, we looked at how the operating system stores information about OOM errors and where to look for the information about recent OOM errors.