问题描述
使用了某开源文件服务,某个进程占用内存持续增长,进程启动后用top命令查看虚拟内存高达17.8G物理内存占用500M,运行一段时间后,物理内存占用高达3G,会导致可用内存为0,分配内存失败。进程重启后,内存释放。
内存持续增长,应该存储在内存泄漏,利用工具分析内存泄漏处
分析过程
1、 第一次用valgrind工具检查内存泄漏处
A、用命令以下命令启动进程
valgrind --tool=memcheck--leak-check=full --leak-resolution=high --num-callers=40--show-reachable=yes --log-file=/home/xxxxx/memcheck.log /usr/local/bin/ffff /etc/ffff /fffd.conf start
B、查看valgrind执行结果,果然有sched_thread.c:400 和ddf_binlog_read (sync.c:1517)2处内存泄漏,malloc后没有free
==21056==LEAK SUMMARY:
==21056== definitelylost: 1,317 bytes in 2 blocks
==21056== indirectly lost: 0 bytes in 0 blocks
==21056== possibly lost: 0 bytes in 0 blocks
==21056== still reachable: 17,188,006,888 bytes in65,564 blocks
==21056== suppressed:0 bytes in 0 blocks
==21056== 288 bytes in 1 blocks are definitelylost in loss record 5 of 10
==21056== at 0x4C27A2E:malloc (vg_replace_malloc.c:270)
==21056== by 0x40D397:sched_dup_array (sched_thread.c:400)
==21056== by 0x40D4CA:sched_start (sched_thread.c:487)
==21056== by 0x403629: main(fffd.c:499)
==21056== 1,029 bytes in 1 blocks are definitelylost in loss record 6 of 10
==21056== at 0x4C27A2E:malloc (vg_replace_malloc.c:270)
==21056== by 0x41A793:ffft_binlog_read (sync.c:1517)
==21056== by 0x41C707:ffft_sync_thread_entrance (sync.c:1785)
==21056== by 0x51AAA50:start_thread (in /lib64/libpthread-2.12.so)
==21056== by 0x41E76FF: ???
C、找到对应的代码修改,内存泄漏处
2、 修改后再次用valgrind检查
A、查看valgrind检查结果,没有内存泄漏,但内存依然持续增长。为什么?难道是缓存了
==14069==LEAK SUMMARY:
==14069== definitelylost: 0 bytes in 0 blocks
==14069== indirectly lost: 0 bytes in 0 blocks
==14069== possibly lost: 544 bytes in 2 blocks
==14069== still reachable: 17,189,181,517 bytes in65,780 blocks
==14069== suppressed:0 bytes in 0 blocks
B、强制释放缓存,执行echo"3">/proc/sys/vm/drop_caches,用free –g 查看可用内存没有增加,那就说明没有使用缓存
问题解决
1、继续分析valgrind检查结果,发现有处分配的内存很大,多大17G左右,这个和用top命令看到的虚拟内存基本一致
==14069== 17,179,607,040 bytes in 65,535 blocks are stillreachable in loss record 51 of 51
==14069== at 0x4C27A2E: malloc(vg_replace_malloc.c:270)
==14069== by 0x4112E8: malloc_mpool(fff_task_queue.c:86)
==14069== by 0x4115BE: free_queue_init(ffft_task_queue.c:211)
==14069== by 0x41972B: work_thread_init(work_thread.c:95)
==14069== by 0x403273: main (hhtf.c:227)
2、查看对应代码分析,这个是接收消息处理队列初始化,其内存分配与配置最大并发连接数有关。
3、检查配置文件max_connections=65535,用netstat –an|grep 11411 查看目前实际使用连接数不到300 个链接
4、修改配置文件max_connections=512 ,重启进程,运行一段时间发现物理内存减少。占用内存比正常。
总结
1、 要善于使用工具来分析问题,解决问题
2、 在使用开源软件时要熟悉,每个配置项的意义,做到合理配值,最好能熟读源码,便于分析问题,解决问题
3、 服务迁移到不同的主机,也需要优化,对配置项参数值调整