我正在安装新安装的postgres 10.4,在负载测试期间,我不断收到内存不足的错误。
配置:
$ cat /etc/security/limits.conf
postgres hard memlock 508559360
postgres soft memlock 508559360
$ cat /etc/sysctl.conf
vm.nr_hugepages = 248320
vm.hugetlb_shm_group = 118
vm.overcommit_memory=2
vm.swappiness=1
vm.vfs_cache_pressure=50
kernel.sem = 250 32000 32 128
系统规范:
$ cat /proc/cpuinfo | grep "core id" | wc -l
32
每个CPU都是2.10GHz @ Intel Xeon(R)CPU E5-2695 v4。如上所示,有16个物理核心,32个逻辑核心。
$ free -m
free -m
total used free shared buff/cache available
Mem: 510969 498048 2434 3098 10485 8343
Swap: 1071 1071 0
注意:我们将memlock设置为大约485 GB,直接用于postgres,显示在“使用过”的高栏中。
$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 250G 64K 250G 1% /dev/shm
$ cat /proc/meminfo
MemTotal: 523232496 kB
MemFree: 1216648 kB
MemAvailable: 8348808 kB
Buffers: 65220 kB
Cached: 10211780 kB
SwapCached: 7620 kB
Active: 5556092 kB
Inactive: 5432428 kB
Active(anon): 2208064 kB
Inactive(anon): 1593084 kB
Active(file): 3348028 kB
Inactive(file): 3839344 kB
Unevictable: 24128 kB
Mlocked: 24128 kB
SwapTotal: 1097724 kB
SwapFree: 0 kB
Dirty: 52 kB
Writeback: 0 kB
AnonPages: 728044 kB
Mapped: 62904 kB
Shmem: 3085592 kB
Slab: 1454276 kB
SReclaimable: 1345008 kB
SUnreclaim: 109268 kB
KernelStack: 11984 kB
PageTables: 22856 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 15770860 kB
Committed_AS: 7920136 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 71680 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 248320
HugePages_Free: 245883
HugePages_Rsvd: 1800
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 329600 kB
DirectMap2M: 534444032 kB
Postgres信息:
SELECT version();
PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit
max_connections=600
shared_buffers= '8192 MB'
work_mem = '20 MB'
maintenance_work_mem = 2GB
max_parallel_workers = 8
wal_buffers = 16MB
max_wal_size = 20GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = '364 GB'
default_statistics_target = 1000
log_timezone = 'US/Eastern'
track_activities = on
track_counts = on
track_io_timing = on
stats_temp_directory = 'pg_stat_tmp'
datestyle = 'iso, mdy'
timezone = 'US/Eastern'
default_text_search_config = 'pg_catalog.english'
transform_null_equals = on
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 16384
track_functions = all
track_io_timing = true
pg_stat_statements.track = all
session_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '3s'
auto_explain.log_nested_statements='on'
auto_explain.log_analyze=true
注意:我已经开始使用设置为250 MB的高工作内存,并将其缓慢降低到20 MB,但仍然收到错误。我还验证了连接数不会超过120。在 session 模式下,我们在实例前面使用PGBouncer。
该错误对于堆栈溢出而言太大,但是在这里,它们是链接在一起:
https://codepad.co/snippet/derPU4E8
突出的错误是:
2018-06-18 15:02:22 EDT,28197,mydb,ERROR: could not resize shared memory segment "/PostgreSQL.1552129380" to 192088 bytes: No space left on device
我不明白它在谈论什么设备。我没有什么似乎与OOM接近的,更不用说它谈论的很小的192088字节了。
2018-06-18 15:02:22 EDT,19708,mydb,ERROR: out of memory
2018-06-18 15:02:22 EDT,19708,mydb,DETAIL: Failed on request of size 7232.
2018-06-18 15:02:22 EDT,16688,,LOG: could not fork worker process: Cannot allocate memory
2018-06-18 15:02:22 EDT,4555,,ERROR: out of memory
2018-06-18 15:02:22 EDT,4555,,DETAIL: Failed on request of size 78336.
2018-06-18 15:02:22 EDT,4552,,LOG: could not open directory "/usr/lib/postgresql/10/share/timezone": Cannot allocate memory
2018-06-18 15:02:22 EDT,19935,mydb,ERROR: could not load library "/usr/lib/postgresql/10/lib/auto_explain.so": /usr/lib/postgresql/10/lib/auto_explain.so: failed to map segment from shared object
2018-06-18 15:02:22 EDT,28193,mydb,ERROR: out of memory
2018-06-18 15:02:22 EDT,28193,mydb,DETAIL: Failed on request of size 8192.
2018-06-18 15:02:22 EDT,26927,mydb,ERROR: could not resize shared memory segment "/PostgreSQL.1101931262" to 192088 bytes: No space left on device
问题:如何调试此问题,更重要的是如何解决此问题?
最佳答案
我遵循了Laurenz的建议,不再运行OOM。vm.overcommit_ratio = 100
。
此外,我删除了sysctl/limits中的memlock和大页面定义。
关于postgresql - Postgres内存不足,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/50916940/