Linuxのoom-killer

2020.3.1 (日)

Linuxマシン自身がメモリ不足でシステムがダウンする事態になると、メモリ不足を検知してメモリを消費しているプロセスを強制的に停止してLinuxマシン自体の存続を試みようとします。Linuxマシンが倒れるよりも個々のプロセスが倒れた方がシステムとしてはまだマシという考え方がコンセプトになっているようです。

謎にDNSサーバーが落ちる自体が数回あって、いろいろ調べてみました。

# systemctl status named-chroot
-- 中略 --
Active: failed (Result: exit-code) since 日 2020-03-01 02:13:59 JST; 11h ago
-- 中略 --
Process: 7389 ExecStart=/usr/sbin/named -u named -c ${NAMEDCONF} -t /var/named/chroot $OPTIONS (code=exited, status=0/SUCCESS)
  Process: 7387 ExecStartPre=/bin/bash -c if [ ! "$DISABLE_ZONE_CHECKING" == "yes" ]; then /usr/sbin/named-checkconf -t /var/named/chroot -z "$NAMEDCONF"; else echo "Checking of zone files is disabled"; fi (code=exited, status=0/SUCCESS)
-- 中略 --

02:13にchrootのマウントでコケている。同時刻付近の/var/log/messagesに以下のような記録がありました。

named invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0

namedがoom-killerを呼んで、その他のプロセスを検査しているようなログが残っていました。駐屯しているサービスを上から下までナメてメモリの消費しているプロセスを洗い出そうとしているわけですが、今回はnamedがoom-killerを呼んでnamed自体を存続させようとしていたようでしたが、それほど多くのメモリを消費しているプロセスが見つからず、named自体の存続に失敗したといった様子です。

# cat  /var/log/messages | grep killed
Feb 28 04:10:12 apw-dns1 systemd: named-chroot.service: main process exited, code=killed, status=9/KILL
Mar  1 02:13:58 apw-dns1 systemd: named-chroot.service: main process exited, code=killed, status=9/KILL

とはいえメモリを消費している何をkillしてゆくのかというのはコンセプトの問題だと思うのですが、以下のコマンドで確認できます。もしこのコマンドが使えなかったらyum -y install dstatします。以前にout-of-memoryした際のscoreが出てきます。

# dstat --top-oom
--out-of-memory---
    kill score
named         179
named         179
named         179
...

oom-killeを無効にできるのですが、Linuxマシン自体が落ちてしまうのでsshのログインもままならないといった状態になるので、これを無効にする必要はないしむしろ危険。Linuxのここらの仕組みは結構いい感じになっているので、そのままにしておきましょう。

関連する/var/log/messagesの内容な以下。

Mar  1 02:13:58 foobar-dns kernel: named invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
Mar  1 02:13:58 foobar-dns kernel: named cpuset=/ mems_allowed=0
Mar  1 02:13:58 foobar-dns kernel: CPU: 0 PID: 7392 Comm: named Not tainted 3.10.0-1062.9.1.el7.x86_64 #1
Mar  1 02:13:58 foobar-dns kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Mar  1 02:13:58 foobar-dns kernel: Call Trace:
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff8517ac23>] dump_stack+0x19/0x1b
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff85175ce9>] dump_header+0x90/0x229
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84b06142>] ? ktime_get_ts64+0x52/0xf0
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bc1714>] oom_kill_process+0x254/0x3e0
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84b32e71>] ? cpuset_mems_allowed_intersects+0x21/0x30
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bc11bd>] ? oom_unkillable_task+0xcd/0x120
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bc1266>] ? find_lock_task_mm+0x56/0xc0
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bc1f66>] out_of_memory+0x4b6/0x4f0
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bc8a6f>] __alloc_pages_nodemask+0xacf/0xbe0
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84c16b28>] alloc_pages_current+0x98/0x110
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bbd617>] __page_cache_alloc+0x97/0xb0
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bc01d8>] filemap_fault+0x298/0x490
Mar  1 02:13:58 foobar-dns kernel: [<ffffffffc029c55e>] __xfs_filemap_fault+0x7e/0x1d0 [xfs]
Mar  1 02:13:58 foobar-dns kernel: [<ffffffffc029c75c>] xfs_filemap_fault+0x2c/0x30 [xfs]
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bec15a>] __do_fault.isra.61+0x8a/0x100
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bec70c>] do_read_fault.isra.63+0x4c/0x1b0
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bf11ba>] handle_pte_fault+0x22a/0xe20
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff84bf3ecd>] handle_mm_fault+0x39d/0x9b0
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff85188653>] __do_page_fault+0x213/0x500
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff85188975>] do_page_fault+0x35/0x90
Mar  1 02:13:58 foobar-dns kernel: [<ffffffff85184778>] page_fault+0x28/0x30
Mar  1 02:13:58 foobar-dns kernel: Mem-Info:
Mar  1 02:13:58 foobar-dns kernel: active_anon:119093 inactive_anon:3722 isolated_anon:0#012 active_file:951 inactive_file:1462 isolated_file:0#012 unevictable:0 dirty:0 writeback:0 unstable:0#012 slab_reclaimable:3155 slab_unreclaimable:3357#012 mapped:309 shmem:6276 pagetables:2885 bounce:0#012 free:7984 free_pcp:30 free_cma:0
Mar  1 02:13:58 foobar-dns kernel: Node 0 DMA free:3052kB min:788kB low:984kB high:1180kB active_anon:10764kB inactive_anon:528kB active_file:4kB inactive_file:196kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:32kB shmem:840kB slab_reclaimable:100kB slab_unreclaimable:256kB kernel_stack:64kB pagetables:240kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:1674 all_unreclaimable? yes
Mar  1 02:13:58 foobar-dns kernel: lowmem_reserve[]: 0 568 568 568
Mar  1 02:13:58 foobar-dns kernel: Node 0 DMA32 free:28884kB min:28884kB low:36104kB high:43324kB active_anon:465608kB inactive_anon:14360kB active_file:3800kB inactive_file:5652kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:612340kB managed:585256kB mlocked:0kB dirty:0kB writeback:0kB mapped:1204kB shmem:24264kB slab_reclaimable:12520kB slab_unreclaimable:13172kB kernel_stack:1728kB pagetables:11300kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:14182 all_unreclaimable? yes
Mar  1 02:13:58 foobar-dns kernel: lowmem_reserve[]: 0 0 0 0
Mar  1 02:13:58 foobar-dns kernel: Node 0 DMA: 24*4kB (UEM) 34*8kB (UE) 22*16kB (UE) 19*32kB (UEM) 7*64kB (EM) 10*128kB (EM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3056kB
Mar  1 02:13:58 foobar-dns kernel: Node 0 DMA32: 213*4kB (UEM) 232*8kB (UEM) 356*16kB (UEM) 186*32kB (UEM) 43*64kB (UEM) 4*128kB (UE) 2*256kB (M) 9*512kB (M) 6*1024kB (M) 0*2048kB 0*4096kB = 28884kB
Mar  1 02:13:58 foobar-dns kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar  1 02:13:58 foobar-dns kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar  1 02:13:58 foobar-dns kernel: 8706 total pagecache pages
Mar  1 02:13:58 foobar-dns kernel: 0 pages in swap cache
Mar  1 02:13:58 foobar-dns kernel: Swap cache stats: add 0, delete 0, find 0/0
Mar  1 02:13:58 foobar-dns kernel: Free swap  = 0kB
Mar  1 02:13:58 foobar-dns kernel: Total swap = 0kB
Mar  1 02:13:58 foobar-dns kernel: 157083 pages RAM
Mar  1 02:13:58 foobar-dns kernel: 0 pages HighMem/MovableOnly
Mar  1 02:13:58 foobar-dns kernel: 6792 pages reserved
Mar  1 02:13:58 foobar-dns kernel: [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
Mar  1 02:13:58 foobar-dns kernel: [  243]     0   243     8841      154      22        0             0 systemd-journal
Mar  1 02:13:58 foobar-dns kernel: [  288]     0   288    13882      110      28        0         -1000 auditd
Mar  1 02:13:58 foobar-dns kernel: [  373]    81   373    14589      192      32        0          -900 dbus-daemon
Mar  1 02:13:58 foobar-dns kernel: [  375]     0   375     1097       35       8        0             0 acpid
Mar  1 02:13:58 foobar-dns kernel: [  378]   998   378    24067       77      16        0             0 chronyd
Mar  1 02:13:58 foobar-dns kernel: [  409]   999   409   153089     1657      61        0             0 polkitd
Mar  1 02:13:58 foobar-dns kernel: [  411]     0   411   136982      528      84        0             0 NetworkManager
Mar  1 02:13:58 foobar-dns kernel: [  413]     0   413     6595       74      19        0             0 systemd-logind
Mar  1 02:13:58 foobar-dns kernel: [  424]     0   424    31573      155      19        0             0 crond
Mar  1 02:13:58 foobar-dns kernel: [  431]     0   431    27527       34      11        0             0 agetty
Mar  1 02:13:58 foobar-dns kernel: [  432]     0   432    27527       34      10        0             0 agetty
Mar  1 02:13:58 foobar-dns kernel: [  534]     0   534    25724      530      51        0             0 dhclient
Mar  1 02:13:58 foobar-dns kernel: [  772]     0   772   143550     2858      95        0             0 tuned
Mar  1 02:13:58 foobar-dns kernel: [  774]     0   774    58733      847      52        0             0 rsyslogd
Mar  1 02:13:58 foobar-dns kernel: [  813]     0   813    53795     2249      59        0             0 google_network_
Mar  1 02:13:58 foobar-dns kernel: [  815]     0   815    54500     2405      62        0          -999 google_accounts
Mar  1 02:13:58 foobar-dns kernel: [  816]     0   816    53824     2284      61        0             0 google_clock_sk
Mar  1 02:13:58 foobar-dns kernel: [  819]     0   819    28230      256      59        0         -1000 sshd
Mar  1 02:13:58 foobar-dns kernel: [ 1135]     0  1135    11157      108      23        0         -1000 systemd-udevd
Mar  1 02:13:58 foobar-dns kernel: [24639]     0 24639    56016      481     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [24640]    48 24640    56056      486     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [24643]    48 24643    56056      485     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [24644]    48 24644    56056      486     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [24645]    48 24645    56056      485     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [24824]    48 24824    56056      486     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [24826]    48 24826    56056      486     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [ 7391]    25  7391    76624    49131     138        0             0 named
Mar  1 02:13:58 foobar-dns kernel: [26434]     0 26434    39195      331      79        0             0 sshd
Mar  1 02:13:58 foobar-dns kernel: [26436]  1001 26436    39195      334      77        0             0 sshd
Mar  1 02:13:58 foobar-dns kernel: [26437]  1001 26437    28898      110      13        0             0 bash
Mar  1 02:13:58 foobar-dns kernel: [26542]     0 26542    60321      289      73        0             0 sudo
Mar  1 02:13:58 foobar-dns kernel: [26543]     0 26543    47950      143      51        0             0 su
Mar  1 02:13:58 foobar-dns kernel: [26544]     0 26544    28863      115      14        0             0 bash
Mar  1 02:13:58 foobar-dns kernel: [28560]    48 28560    56056      510     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [28561]    48 28561    56056      486     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [28562]    48 28562    56056      486     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [30644]    48 30644    56056      486     113        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: [31167]     0 31167    45594      231      47        0             0 crond
Mar  1 02:13:58 foobar-dns kernel: [31168]     0 31168    28296       53      12        0             0 run-parts
Mar  1 02:13:58 foobar-dns kernel: [31182]     0 31182   154577    50275     221        0             0 yum-cron
Mar  1 02:13:58 foobar-dns kernel: [31183]     0 31183    28386       37      11        0             0 awk
Mar  1 02:13:58 foobar-dns kernel: [31341]     0 31341    56016      462     109        0             0 httpd
Mar  1 02:13:58 foobar-dns kernel: Out of memory: Kill process 7391 (named) score 327 or sacrifice child
Mar  1 02:13:58 foobar-dns kernel: Killed process 7391 (named), UID 25, total-vm:306496kB, anon-rss:196524kB, file-rss:0kB, shmem-rss:0kB
Mar  1 02:13:58 foobar-dns systemd: named-chroot.service: main process exited, code=killed, status=9/KILL
Mar  1 02:13:59 foobar-dns sh: Usage:
Mar  1 02:13:59 foobar-dns sh: kill [options] <pid|name> [...]
Mar  1 02:13:59 foobar-dns sh: Options:
Mar  1 02:13:59 foobar-dns sh: -a, --all              do not restrict the name-to-pid conversion to processes
Mar  1 02:13:59 foobar-dns sh: with the same uid as the present process
Mar  1 02:13:59 foobar-dns sh: -s, --signal <sig>     send specified signal
Mar  1 02:13:59 foobar-dns sh: -q, --queue <sig>      use sigqueue(2) rather than kill(2)
Mar  1 02:13:59 foobar-dns sh: -p, --pid              print pids without signaling them
Mar  1 02:13:59 foobar-dns sh: -l, --list [=<signal>] list signal names, or convert one to a name
Mar  1 02:13:59 foobar-dns sh: -L, --table            list signal names and numbers
Mar  1 02:13:59 foobar-dns sh: -h, --help     display this help and exit
Mar  1 02:13:59 foobar-dns sh: -V, --version  output version information and exit
Mar  1 02:13:59 foobar-dns sh: For more details see kill(1).
Mar  1 02:13:59 foobar-dns systemd: named-chroot.service: control process exited, code=exited status=1
Mar  1 02:13:59 foobar-dns systemd: Stopped Berkeley Internet Name Domain (DNS).
Mar  1 02:13:59 foobar-dns systemd: Unit named-chroot.service entered failed state.
Mar  1 02:13:59 foobar-dns systemd: named-chroot.service failed.
Mar  1 02:13:59 foobar-dns systemd: Stopping Set-up/destroy chroot environment for named (DNS)...
Mar  1 02:13:59 foobar-dns systemd: Removed slice User Slice of root.
Mar  1 02:13:59 foobar-dns systemd: Stopped Set-up/destroy chroot environment for named (DNS).