python - 寻找方法来分析在Amazon EC2 Ubuntu实例上RAM用完的python进程

我正在处理大型数据文件的Amazon EC2 Ubuntu实例上运行python进程。最初，一切都很好，我没有注意到RAM或CPU使用率的任何持续增长。然后，在处理了一部分输入数据之后，该过程将耗尽内存并死亡。 dmesg -T产生以下内容，但不会告诉我任何事情：

[Thu Jan  3 17:47:27 2013] python invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
[Thu Jan  3 17:47:27 2013] python cpuset=/ mems_allowed=0
[Thu Jan  3 17:47:27 2013] Pid: 1108, comm: python Not tainted 3.2.0-25-virtual #40-Ubuntu
[Thu Jan  3 17:47:27 2013] Call Trace:
[Thu Jan  3 17:47:27 2013]  [<ffffffff810bdb9d>] ? cpuset_print_task_mems_allowed+0x9d/0xb0
[Thu Jan  3 17:47:27 2013]  [<ffffffff81118231>] dump_header+0x91/0xe0
[Thu Jan  3 17:47:27 2013]  [<ffffffff811185b5>] oom_kill_process+0x85/0xb0
[Thu Jan  3 17:47:27 2013]  [<ffffffff8111895a>] out_of_memory+0xfa/0x220
[Thu Jan  3 17:47:27 2013]  [<ffffffff8111e38a>] __alloc_pages_nodemask+0x7ea/0x800
[Thu Jan  3 17:47:27 2013]  [<ffffffff810063dd>] ? pte_mfn_to_pfn+0x8d/0x110
[Thu Jan  3 17:47:27 2013]  [<ffffffff811569fa>] alloc_pages_vma+0x9a/0x150
[Thu Jan  3 17:47:27 2013]  [<ffffffff8113705c>] do_anonymous_page.isra.38+0x7c/0x2f0
[Thu Jan  3 17:47:27 2013]  [<ffffffff8113acc1>] handle_pte_fault+0x1e1/0x200
[Thu Jan  3 17:47:27 2013]  [<ffffffff8100647e>] ? xen_pmd_val+0xe/0x10
[Thu Jan  3 17:47:27 2013]  [<ffffffff810052d9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[Thu Jan  3 17:47:27 2013]  [<ffffffff8113b098>] handle_mm_fault+0x1f8/0x350
[Thu Jan  3 17:47:27 2013]  [<ffffffff81659f9b>] do_page_fault+0x14b/0x520
[Thu Jan  3 17:47:27 2013]  [<ffffffff811425fd>] ? mprotect_fixup+0x17d/0x2b0
[Thu Jan  3 17:47:27 2013]  [<ffffffff81142920>] ? sys_mprotect+0x1f0/0x250
[Thu Jan  3 17:47:27 2013]  [<ffffffff81656bf5>] page_fault+0x25/0x30
[Thu Jan  3 17:47:27 2013] Mem-Info:
[Thu Jan  3 17:47:27 2013] Node 0 DMA per-cpu:
[Thu Jan  3 17:47:27 2013] CPU    0: hi:    0, btch:   1 usd:   0
[Thu Jan  3 17:47:27 2013] Node 0 DMA32 per-cpu:
[Thu Jan  3 17:47:27 2013] CPU    0: hi:  186, btch:  31 usd:   0
[Thu Jan  3 17:47:27 2013] active_anon:142435 inactive_anon:14 isolated_anon:0
[Thu Jan  3 17:47:27 2013]  active_file:0 inactive_file:11 isolated_file:0
[Thu Jan  3 17:47:27 2013]  unevictable:0 dirty:0 writeback:0 unstable:0
[Thu Jan  3 17:47:27 2013]  free:1389 slab_reclaimable:1528 slab_unreclaimable:1686
[Thu Jan  3 17:47:27 2013]  mapped:2 shmem:45 pagetables:793 bounce:0
[Thu Jan  3 17:47:27 2013] Node 0 DMA free:2460kB min:72kB low:88kB high:108kB active_anon:12296kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:14524kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:8kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:16kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Thu Jan  3 17:47:27 2013] lowmem_reserve[]: 0 597 597 597
[Thu Jan  3 17:47:27 2013] Node 0 DMA32 free:3096kB min:3088kB low:3860kB high:4632kB active_anon:557444kB inactive_anon:56kB active_file:0kB inactive_file:44kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:611856kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:180kB slab_reclaimable:6104kB slab_unreclaimable:6744kB kernel_stack:1024kB pagetables:3156kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:27445 all_unreclaimable? yes
[Thu Jan  3 17:47:27 2013] lowmem_reserve[]: 0 0 0 0
[Thu Jan  3 17:47:27 2013] Node 0 DMA: 1*4kB 2*8kB 1*16kB 0*32kB 0*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2468kB
[Thu Jan  3 17:47:27 2013] Node 0 DMA32: 151*4kB 10*8kB 23*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 3100kB
[Thu Jan  3 17:47:27 2013] 55 total pagecache pages
[Thu Jan  3 17:47:27 2013] 0 pages in swap cache
[Thu Jan  3 17:47:27 2013] Swap cache stats: add 0, delete 0, find 0/0
[Thu Jan  3 17:47:27 2013] Free swap  = 0kB
[Thu Jan  3 17:47:27 2013] Total swap = 0kB
[Thu Jan  3 17:47:27 2013] 159472 pages RAM
[Thu Jan  3 17:47:27 2013] 8383 pages reserved
[Thu Jan  3 17:47:27 2013] 261 pages shared
[Thu Jan  3 17:47:27 2013] 149349 pages non-shared
[Thu Jan  3 17:47:27 2013] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[Thu Jan  3 17:47:27 2013] [  238]     0   238     4306       47   0       0             0 upstart-udev-br
[Thu Jan  3 17:47:27 2013] [  242]     0   242     5396      119   0     -17         -1000 udevd
[Thu Jan  3 17:47:27 2013] [  287]     0   287     5362       99   0     -17         -1000 udevd
[Thu Jan  3 17:47:27 2013] [  288]     0   288     5362       99   0     -17         -1000 udevd
[Thu Jan  3 17:47:27 2013] [  361]     0   361     3795       48   0       0             0 upstart-socket-
[Thu Jan  3 17:47:27 2013] [  419]     0   419     1814      123   0       0             0 dhclient3
[Thu Jan  3 17:47:27 2013] [  643]     0   643    12487      151   0     -17         -1000 sshd
[Thu Jan  3 17:47:27 2013] [  657]   101   657    63427      102   0       0             0 rsyslogd
[Thu Jan  3 17:47:27 2013] [  663]   102   663     5981       89   0       0             0 dbus-daemon
[Thu Jan  3 17:47:27 2013] [  725]     0   725     3624       42   0       0             0 getty
[Thu Jan  3 17:47:27 2013] [  732]     0   732     3624       41   0       0             0 getty
[Thu Jan  3 17:47:27 2013] [  741]     0   741     3624       42   0       0             0 getty
[Thu Jan  3 17:47:27 2013] [  743]     0   743     3624       41   0       0             0 getty
[Thu Jan  3 17:47:27 2013] [  747]     0   747     3624       41   0       0             0 getty
[Thu Jan  3 17:47:27 2013] [  755]     0   755     1080       37   0       0             0 acpid
[Thu Jan  3 17:47:27 2013] [  756]     0   756     4776       50   0       0             0 cron
[Thu Jan  3 17:47:27 2013] [  757]     0   757     4225       39   0       0             0 atd
[Thu Jan  3 17:47:27 2013] [  787]     0   787     3624       41   0       0             0 getty
[Thu Jan  3 17:47:27 2013] [  790]   103   790    46895      300   0       0             0 whoopsie
[Thu Jan  3 17:47:27 2013] [  797]     0   797    20467      216   0       0             0 sshd
[Thu Jan  3 17:47:27 2013] [  800]     0   800   146074      260   0       0             0 console-kit-dae
[Thu Jan  3 17:47:27 2013] [  867]     0   867    46645      154   0       0             0 polkitd
[Thu Jan  3 17:47:27 2013] [  983]  1000   983    20467      213   0       0             0 sshd
[Thu Jan  3 17:47:27 2013] [  984]  1000   984     6557     1766   0       0             0 bash
[Thu Jan  3 17:47:27 2013] [ 1108]  1000  1108   163815   138085   0       0             0 python
[Thu Jan  3 17:47:27 2013] Out of memory: Kill process 1108 (python) score 915 or sacrifice child
[Thu Jan  3 17:47:27 2013] Killed process 1108 (python) total-vm:655260kB, anon-rss:552336kB, file-rss:4kB

有没有一种方法可以描述该过程以找出正在发生的情况以及导致RAM使用率突然激增的原因？谢谢

最佳答案

我使用Dowser来帮助跟踪一个项目中的内存使用情况。它作为一个简单的Web界面运行，并产生大量信息，可帮助您查找问题。

Dowser Blog giving an example.

Dowser Wiki