问题描述
我们调整我们的GC以达到最小的停止世界停顿。烫发和寿命长的一代表现得很好。年轻人大部分时间都能正常工作,暂停时间通常不会超过500毫秒(注意[Times:user = 0.35 sys = 0.02,real = 0.06 secs]): {Heap before GC invocations = 11603(full 60):
par new generation total 3640320K,used 3325226K [0x0000000600000000,0x00000006f6e00000,0x00000006f6e00000)
eden space 3235840K,100使用%[0x0000000600000000,0x00000006c5800000,0x00000006c5800000)
从空间404480K,22%使用[0x00000006de300000,0x00000006e3a4a898,0x00000006f6e00000)
到空间404480K,使用0%[0x00000006c5800000,0x00000006c5800000,0x00000006de300000)
并发标记扫描生成总计4147200K,使用1000363K [0x00000006f6e00000,0x00000007f4000000,0x00000007f4000000)
concurrent-mark-sweep生成电容总计196608K,使用133030K [0x00000007f4000000,0x0000000800000000,0x0000000800000000)
2012-07-16T14:36 :05.641 + 0200:427048.412:[GC 427048.412:[ParNew
希望的幸存者大小207093760字节,新的阈值4(最多4)
- 年龄1:8688880字节,8688880总计
- 年龄2:12432832字节,21121712总计
- 年龄3:18200488字节,39322200总计
- 年龄4:20473336字节,59795536共
:3325226K-> 75635K(3640320K),0.0559610秒] 4325590K-> 1092271K(7787520K),0.0562630秒] [时间:用户= 0.35 sys = 0.02,实时= 0.06秒]
GC调用后的堆= 11604(满60):
par新一代总计3640320K,使用75635K [0x0000000600000000,0x00000006f6e00000,0x00000006f6e00000)
eden空间3235840K,使用0%[0x0000000600000000,0x0000000600000000 ,0x00000006c5800000)
来自空间404480K,使用了18%[0x00000006c5800000,0x00000006ca1dcf40,0x00000006de300000)
到空间404480K,使用0%[0x00000006de300000,0x00000006de300000,0x00000006f6e00000)
并发标记扫描生成总计4147200K ,使用1016635K [0x00000006f6e00000,0x00000007f4000000,0x00000007f4000000)
并发标记扫描perm生成总计19660 8K,使用133030K [0x00000007f4000000,0x0000000800000000,0x0000000800000000)
}
然而,一天一次,我们得到非常长的年轻垃圾收集时间(注意[Times:user = 0.41 sys = 0.01,real = 5.51 secs]):
{Heap before GC invocations = 7884(full 37):
par new generation total 3640320K,used 3304448K [0x0000000600000000,0x00000006f6e00000,0x00000006f6e00000)
eden space 3235840K,100%used [0x0000000600000000, 0x00000006c5800000,0x00000006c5800000)
来自空间404480K,16%使用[0x00000006c5800000,0x00000006c9b00370,0x00000006de300000)
到空间404480K,使用0%[0x00000006de300000,0x00000006de300000,0x00000006f6e00000)
并发标记扫描生成总数4147200K,使用1967225K [0x00000006f6e00000,0x00000007f4000000,0x00000007f4000000)
并发标记扫描转发器总计189100K,已使用112825K [0x00000007f4000000,0x00000007ff8ab000,0x000000080 0000000)
2012-07-15T01:23:25.474 + 0200:293088.245:[GC293093.636:[ParNew
希望的幸存者大小207093760字节,新的阈值4(最大值4)
- 年龄1 :30210472 bytes,30210472 total
- age 2:11614600 bytes,41825072 total
- age 3:8591680 bytes,50416752 total
- age 4:7779600 bytes,58196352 total
:在GC调用之后的堆= 7885(时间:用户= 0.41sys = 0.01,实际= 5.51秒)
在GC调用之后的堆= 7885(全部37):
par新一代总计3640320K,使用73854K [0x0000000600000000,0x00000006f6e00000,0x00000006f6e00000)
eden space 3235840K,0%used [0x0000000600000000,0x0000000600000000,0x00000006c5800000)
from space 404480K,18使用%[0x00000006de300000,0x00000006e2b1fb40,0x00000006f6e00000)
到空间404480K,使用0%[0x00000006c5800000,0x00000006c5800000,0x00000006de300000)
并发标记扫描生成tot
使用1972599K [0x00000006f6e00000,0x00000007f4000000,0x00000007f4000000)
concurrent-mark-sweep perm gen total 189100K,used 112825K [0x00000007f4000000,0x00000007ff8ab000,0x0000000800000000)
}
以下是jstat -gccause的相关输出:
时间戳S0 S1 EOP YGC YGCT FGC FGCT GCT LGCC GCC
293083.2 16.96 0.00 96.26 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause否GC
293083.2 16.96 0.00 96.26 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause否GC
293084.2 16.96 0.00 97.69 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause否GC
293085.3 16.96 0.00 98.32 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause否GC
293086.3 16.96 0.00 99.32 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause否GC
时间戳S0 S1 EOP YGC YGCT FGC FGCT GCT LGCC GCC
293087.3 16.96 0.00 99.55 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause否GC
293088.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause分配失败
293089.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause分配失败
293090.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause分配失败
293091.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause分配失败
293092.3 16.96 0.00 100.00 47.44 59.66 788 4 772.326 74 10.893 783.219未知GCCause分配失败
293093.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause分配失败
293084.2 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219未知GCCause分配失败
293094.3 0.00 18.26 6.23 47.56 59.66 7885 772.442 74 10.893 783.334未知GCCause无GC
293094.6 0.00 18.26 6.71 47.56 59.66 7885 772.442 74 10.893 783.334未知GCCause无GC
293095.3 0.00 18.26 6.85 47.56 59.66 7885 772.442 74 10.893 783.334未知GCCause否GC
293095.6 0.00 18.26 6.92 47.56 59.66 7885 772.442 74 10.893 783.334未知GCCause否GC
293096.2 0.00 18.26 9.84 47.56 59.66 7885 772.442 74 10.893 783.334未知GCCause否GC
293096.6 0.00 18.26 10.11 47.56 59.66 7885 772.442 74 10.893 783.334未知GCCause否GC
Allocation失败在其他地方也作为GC原因出现,但始终作为单一条目出现。当它像这样的序列出现时,它与长时间的GC暂停相关联。我研究了Oracle JVM来源,并且分配失败看起来像是一种非常自然的情况:Young没有足够的空间存储新数据,现在是时候清理了。在暂停发生之前,我在系统中检查了任何内存密集,意外的行为,但没有发现任何可疑物。
请注意,年轻垃圾收集时间不会增加在暂停时间很多。我的记忆和GC设置如下(记录设置省略): -Xms6000m
-Xmn2950m
-Xmx6000m
-XX:MaxPermSize = 192m
-XX:+ UseConcMarkSweepGC
-XX:+ UseParNewGC
-Dsun.rmi.dgc.client.gcInterval = 9223372036854775807
-Dsun.rmi.dgc.server.gcInterval = 9223372036854775807
-XX:CMSInitiatingOccupancyFraction = 50
-XX:+ CMSScavengeBeforeRemark
-XX:+ CMSClassUnloadingEnabled
还测试了8000米和12000米的堆。机器:
- 8核16GB内存
- 24核50GB内存
所以我最基本的问题是:为什么ParNewGC偶尔会以这种方式表现?
在执行GC之前,必须让每个线程都安全点(它不会立即停止每个线程)。如果您长时间运行JNI调用或系统调用,则可能需要很长时间才能达到安全点。在这种情况下你看到的是一个很长的停顿,即使这个集合本身的时间长度正常。
We tune our GC for minimum "stop-the-world" pauses. Perm and Tenured generations behave well. Young works fine most of the time, and the pauses usually don't exceed 500ms (note [Times: user=0.35 sys=0.02, real=0.06 secs]):
{Heap before GC invocations=11603 (full 60):
par new generation total 3640320K, used 3325226K [0x0000000600000000, 0x00000006f6e00000, 0x00000006f6e00000)
eden space 3235840K, 100% used [0x0000000600000000, 0x00000006c5800000, 0x00000006c5800000)
from space 404480K, 22% used [0x00000006de300000, 0x00000006e3a4a898, 0x00000006f6e00000)
to space 404480K, 0% used [0x00000006c5800000, 0x00000006c5800000, 0x00000006de300000)
concurrent mark-sweep generation total 4147200K, used 1000363K [0x00000006f6e00000, 0x00000007f4000000, 0x00000007f4000000)
concurrent-mark-sweep perm gen total 196608K, used 133030K [0x00000007f4000000, 0x0000000800000000, 0x0000000800000000)
2012-07-16T14:36:05.641+0200: 427048.412: [GC 427048.412: [ParNew
Desired survivor size 207093760 bytes, new threshold 4 (max 4)
- age 1: 8688880 bytes, 8688880 total
- age 2: 12432832 bytes, 21121712 total
- age 3: 18200488 bytes, 39322200 total
- age 4: 20473336 bytes, 59795536 total
: 3325226K->75635K(3640320K), 0.0559610 secs] 4325590K->1092271K(7787520K), 0.0562630 secs] [Times: user=0.35 sys=0.02, real=0.06 secs]
Heap after GC invocations=11604 (full 60):
par new generation total 3640320K, used 75635K [0x0000000600000000, 0x00000006f6e00000, 0x00000006f6e00000)
eden space 3235840K, 0% used [0x0000000600000000, 0x0000000600000000, 0x00000006c5800000)
from space 404480K, 18% used [0x00000006c5800000, 0x00000006ca1dcf40, 0x00000006de300000)
to space 404480K, 0% used [0x00000006de300000, 0x00000006de300000, 0x00000006f6e00000)
concurrent mark-sweep generation total 4147200K, used 1016635K [0x00000006f6e00000, 0x00000007f4000000, 0x00000007f4000000)
concurrent-mark-sweep perm gen total 196608K, used 133030K [0x00000007f4000000, 0x0000000800000000, 0x0000000800000000)
}
However, sometimes, typically once a day, we get exceptionally long Young garbage collection time (note [Times: user=0.41 sys=0.01, real=5.51 secs]):
{Heap before GC invocations=7884 (full 37):
par new generation total 3640320K, used 3304448K [0x0000000600000000, 0x00000006f6e00000, 0x00000006f6e00000)
eden space 3235840K, 100% used [0x0000000600000000, 0x00000006c5800000, 0x00000006c5800000)
from space 404480K, 16% used [0x00000006c5800000, 0x00000006c9b00370, 0x00000006de300000)
to space 404480K, 0% used [0x00000006de300000, 0x00000006de300000, 0x00000006f6e00000)
concurrent mark-sweep generation total 4147200K, used 1967225K [0x00000006f6e00000, 0x00000007f4000000, 0x00000007f4000000)
concurrent-mark-sweep perm gen total 189100K, used 112825K [0x00000007f4000000, 0x00000007ff8ab000, 0x0000000800000000)
2012-07-15T01:23:25.474+0200: 293088.245: [GC 293093.636: [ParNew
Desired survivor size 207093760 bytes, new threshold 4 (max 4)
- age 1: 30210472 bytes, 30210472 total
- age 2: 11614600 bytes, 41825072 total
- age 3: 8591680 bytes, 50416752 total
- age 4: 7779600 bytes, 58196352 total
: 3304448K->73854K(3640320K), 0.1158280 secs] 5271674K->2046454K(7787520K), 0.1181990 secs] [Times: user=0.41 sys=0.01, real=5.51 secs]
Heap after GC invocations=7885 (full 37):
par new generation total 3640320K, used 73854K [0x0000000600000000, 0x00000006f6e00000, 0x00000006f6e00000)
eden space 3235840K, 0% used [0x0000000600000000, 0x0000000600000000, 0x00000006c5800000)
from space 404480K, 18% used [0x00000006de300000, 0x00000006e2b1fb40, 0x00000006f6e00000)
to space 404480K, 0% used [0x00000006c5800000, 0x00000006c5800000, 0x00000006de300000)
concurrent mark-sweep generation total 4147200K, used 1972599K [0x00000006f6e00000, 0x00000007f4000000, 0x00000007f4000000)
concurrent-mark-sweep perm gen total 189100K, used 112825K [0x00000007f4000000, 0x00000007ff8ab000, 0x0000000800000000)
}
Below is the relevant output from jstat -gccause:
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC
293083.2 16.96 0.00 96.26 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause No GC
293083.2 16.96 0.00 96.26 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause No GC
293084.2 16.96 0.00 97.69 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause No GC
293085.3 16.96 0.00 98.32 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause No GC
293086.3 16.96 0.00 99.32 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause No GC
Timestamp S0 S1 E O P YGC YGCT FGC FGCT GCT LGCC GCC
293087.3 16.96 0.00 99.55 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause No GC
293088.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause Allocation Failure
293089.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause Allocation Failure
293090.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause Allocation Failure
293091.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause Allocation Failure
293092.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause Allocation Failure
293093.3 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause Allocation Failure
293084.2 16.96 0.00 100.00 47.44 59.66 7884 772.326 74 10.893 783.219 unknown GCCause Allocation Failure
293094.3 0.00 18.26 6.23 47.56 59.66 7885 772.442 74 10.893 783.334 unknown GCCause No GC
293094.6 0.00 18.26 6.71 47.56 59.66 7885 772.442 74 10.893 783.334 unknown GCCause No GC
293095.3 0.00 18.26 6.85 47.56 59.66 7885 772.442 74 10.893 783.334 unknown GCCause No GC
293095.6 0.00 18.26 6.92 47.56 59.66 7885 772.442 74 10.893 783.334 unknown GCCause No GC
293096.2 0.00 18.26 9.84 47.56 59.66 7885 772.442 74 10.893 783.334 unknown GCCause No GC
293096.6 0.00 18.26 10.11 47.56 59.66 7885 772.442 74 10.893 783.334 unknown GCCause No GC
"Allocation Failure" appears as the GC cause in other places as well, but always as a single entry. When it comes in the sequence like this, it is associated with a long GC pause. I looked into Oracle JVM sources, and "Allocation Failure" looks like a pretty natural situation: Young is out of space for a new data and it's time to clean up. I checked for any memory intensive, unexpected actions in the system before the pause happened, but found nothing suspicious.
Please note, that the Young Garbage Collection Time doesn't rise much during the pause time. My memory and GC setting are as following (logging settings omitted):
-Xms6000m
-Xmn2950m
-Xmx6000m
-XX:MaxPermSize=192m
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-Dsun.rmi.dgc.client.gcInterval=9223372036854775807
-Dsun.rmi.dgc.server.gcInterval=9223372036854775807
-XX:CMSInitiatingOccupancyFraction=50
-XX:+CMSScavengeBeforeRemark
-XX:+CMSClassUnloadingEnabled
Tested also with 8000m and 12000m heaps. Machines:
- 8-core with 16GB of memory
- 24-core with 50GB of memory
So my basic question is: why ParNewGC incidentally behaves this way?
Before a GC can be performed, it has to get every thread to a safe point (it doesn't just stop every thread immediately). If you have long running JNI calls or system calls, it can take a long time to reach a safe point. What you see in this situation is a long pause even though the collection itself was of a normal length in time.
这篇关于偶然的年轻垃圾收集暂停的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!