我在 t4g.nano 实例(具有 512 MB RAM)上的 Docker 容器内运行服务。我的服务通常使用大约 65% 的内存,而且它们似乎大多数时候都运行良好,但是,偶尔我会看到内存和 CPU 使用率出现峰值。通过 journalctl 检查时,我可以看到以下内容:

Sep 23 06:31:10 ip-xxxxx kernel: Tasks state (memory values in pages):
Sep 23 06:31:10 ip-xxxxx kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Sep 23 06:31:10 ip-xxxxx kernel: [    150]     0   150    12106      928   102400        0          -250 systemd-journal
Sep 23 06:31:10 ip-xxxxx kernel: [    190]     0   190    72415     6464   114688        0         -1000 multipathd
Sep 23 06:31:10 ip-xxxxx kernel: [    193]     0   193     2736      956    65536        0         -1000 systemd-udevd
Sep 23 06:31:10 ip-xxxxx kernel: [    351]   100   351     4102      832    81920        0             0 systemd-network
Sep 23 06:31:10 ip-xxxxx kernel: [    353]   101   353     6307     1616    94208        0             0 systemd-resolve
Sep 23 06:31:10 ip-xxxxx kernel: [    397]     0   397      557      384    40960        0             0 acpid
Sep 23 06:31:10 ip-xxxxx kernel: [    402]     0   402     1727      480    57344        0             0 cron
Sep 23 06:31:10 ip-xxxxx kernel: [    403]   102   403     2214      832    69632        0          -900 dbus-daemon
Sep 23 06:31:10 ip-xxxxx kernel: [    410]     0   410    20520      640    61440        0             0 irqbalance
Sep 23 06:31:10 ip-xxxxx kernel: [    411]     0   411     8238     2816   114688        0             0 networkd-dispat
Sep 23 06:31:10 ip-xxxxx kernel: [    413]   104   413    55505      992    90112        0             0 rsyslogd
Sep 23 06:31:10 ip-xxxxx kernel: [    414]     0   414   457784     2090   200704        0             0 amazon-ssm-agen
Sep 23 06:31:10 ip-xxxxx kernel: [    417]     0   417   348223     3230   262144        0          -900 snapd
Sep 23 06:31:10 ip-xxxxx kernel: [    418]     0   418     3889      800    77824        0             0 systemd-logind
Sep 23 06:31:10 ip-xxxxx kernel: [    448]     0   448   446710     3840   262144        0          -999 containerd
Sep 23 06:31:10 ip-xxxxx kernel: [    454]     0   454     1408      448    49152        0             0 agetty
Sep 23 06:31:10 ip-xxxxx kernel: [    457]     0   457     1397      352    49152        0             0 agetty
Sep 23 06:31:10 ip-xxxxx kernel: [    481]   115   481     5072     1426    73728        0             0 pgbouncer
Sep 23 06:31:10 ip-xxxxx kernel: [    567]   114   567     4654      579    61440        0             0 chronyd
Sep 23 06:31:10 ip-xxxxx kernel: [    585]   114   585     2557      427    61440        0             0 chronyd
Sep 23 06:31:10 ip-xxxxx kernel: [    636]     0   636    27483     2688   122880        0             0 unattended-upgr
Sep 23 06:31:10 ip-xxxxx kernel: [    676]     0   676    58833     1006    98304        0             0 polkitd
Sep 23 06:31:10 ip-xxxxx kernel: [    772]     0   772   489411     6641   348160        0          -500 dockerd
Sep 23 06:31:10 ip-xxxxx kernel: [    807]     0   807     3785     1024    73728        0         -1000 sshd
Sep 23 06:31:10 ip-xxxxx kernel: [    978]     0   978   309497      862   118784        0          -998 containerd-shim
Sep 23 06:31:10 ip-xxxxx kernel: [    998] 65534   998   308899     1608   122880        0             0 pgbouncer_expor
Sep 23 06:31:10 ip-xxxxx kernel: [   1753]     0  1753    74364     1408   176128        0             0 packagekitd
Sep 23 06:31:10 ip-xxxxx kernel: [   3689]     0  3689      576      352    45056        0             0 apt.systemd.dai
Sep 23 06:31:10 ip-xxxxx kernel: [   3693]     0  3693      576      352    45056        0             0 apt.systemd.dai
Sep 23 06:31:10 ip-xxxxx kernel: [   3721]     0  3721    85333    15613   458752        0             0 unattended-upgr
Sep 23 06:31:10 ip-xxxxx kernel: [   3913]     0  3913   102053    31125   479232        0             0 unattended-upgr
Sep 23 06:31:10 ip-xxxxx kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=init.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/apt-daily>
Sep 23 06:31:10 ip-xxxxx kernel: Out of memory: Killed process 3913 (unattended-upgr) total-vm:408212kB, anon-rss:121788kB, file-rss:2712kB, shmem-rss:0kB, UID:0 p>
Sep 23 06:31:10 ip-xxxxx systemd[1]: apt-daily-upgrade.service: A process of this unit has been killed by the OOM killer.
Sep 23 06:31:10 ip-xxxxx systemd[1]: apt-daily-upgrade.service: Main process exited, code=killed, status=15/TERM
Sep 23 06:31:10 ip-xxxxx systemd[1]: apt-daily-upgrade.service: Failed with result 'oom-kill'

这并不总是会导致实例停止响应,但有时会发生。这有时还会导致实例终止并被替换(我正在使用 Auto Scaling 组),这对我们的用例不利。PS:请不要对我太苛刻,我对 Linux 和管理服务器还比较陌生。


最佳答案
1

您的 t4g.nano 实例似乎遇到了内存限制,导致内存不足 (OOM) 终止程序终止进程。该实例只有 512 MB 的 RAM,尽管您的服务通常使用大约 65% 的内存,但偶尔的峰值会导致内存耗尽。

在您提到的特定情况下,apt-daily-upgrade.service 因内存过载而被终止。这很可能发生在自动软件包更新期间,这会占用大量资源。

以下是一些潜在的解决方案:

  1. 禁用或安排更新:您可以禁用 apt-daily-upgrade.service 以阻止其自动运行,或安排它在内存使用率较低的较不关键时间运行。要禁用它:
sudo systemctl disable apt-daily-upgrade.service apt-daily.service
sudo systemctl mask apt-daily-upgrade.service apt-daily.service
  1. 减少后台服务的内存使用量:
    考虑减少其他后台服务的内存占用(例如,通过禁用或优化日志记录,减少不必要的服务,如 packagekitd 或 polkitd)。

  2. 优化 Docker 容器的内存使用情况:
    确保容器使用尽可能少的内存。您可以为 Docker 容器设置内存限制,以避免使用超过必要数量的内存:

docker run --memory=128m --memory-swap=128m <your_container>
  1. 交换空间:向实例添加交换空间以处理偶尔出现的内存峰值。当 RAM 用完时,这很有用:
sudo fallocate -l 1G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
sudo sh -c 'echo "/swapfile swap swap defaults 0 0" >> /etc/fstab'

这些步骤应该有助于缓解内存问题并减少 OOM 终止的频率。