问题描述
一如既往,一个冗长的问题描述。
As always, a lengthy problem description.
我们目前正在对我们的产品进行压力测试 - 现在我们面临一个奇怪的问题。一到两个小时后,堆空间开始增长,应用程序稍后会死。
We are currently stress testing our product - and we now we face a strange problem. After one to two hours, heap space begins to grow, the application dies sometime later.
分析应用程序会显示大量的Finalizer对象,填满堆。好吧,我们认为可能是一个奇怪的终结器线程缓慢问题,并审查了减少需要最终确定的对象的数量(在这种情况下为JNA本机句柄)。无论如何都是个好主意并且减少了数以千计的新物品......
Profiling the application shows a very large amount of Finalizer objects, filling the heap. Well, we thought "might be the weird finalizer thread to slow" issue and reviewed for reducing the amount of objects that need to be finalized (JNA native handles in this case). Good idea anyway and reduced thousands of new objects...
接下来的测试显示了相同的模式,仅一小时后并没有那么陡峭。这次Finalizers起源于在测试平台中大量使用的FileInput-和FileOutput流。所有资源都已关闭,但终结器不再清理。
The next tests showed the same pattern, only one hour later and not so steep. This time the Finalizers originated from the FileInput- and FileOutput streams that are heavily used in the testbed. All resources are closed, but the Finalizers not cleaned up anymore.
我不知道为什么在1或2小时后(没有例外),FinalizerThread似乎突然停止工作。如果我们在一些线程中手动强制执行System.runFinalization(),则分析器会显示清除终结器。立即恢复测试会导致Finalizer的新堆分配。
I have no idea why after 1 or 2 hours (no exceptions) the FinalizerThread seems suddenly to stop working. If we force System.runFinalization() by hand in some of our threads, the profiler shows that the finalizers are cleaned up. Resuming the test immediately causes new heap allocation for Finalizers.
FinalizerThread仍在那里,询问jConsole他是在等待。
The FinalizerThread is still there, asking jConsole he's WAITING.
编辑
首先,用HeapAnalyzer检查堆没有发现任何新的/奇怪的。 HeapAnalyzer有一些不错的功能,但我起初遇到了困难。我使用jProfiler,它附带了很好的堆检测工具,并将继续使用它。
First, inspecting the heap with HeapAnalyzer revealed nothing new/strange. HeapAnalyzer has some nice features, but i had my difficulties at first. Im using jProfiler, which comes along with nice heap inspection tools and will stay with it.
也许我错过了HeapAnalyzer中的一些杀手级功能?
Maybe i'm missing some killer features in HeapAnalyzer?
其次,今天我们设置了测试调试连接而不是分析器 - 系统现在稳定了近5个小时。这似乎是一个非常奇怪的组合,包括太多的终结器(在第一次审查中已经减少),分析器和VM GC策略。由于目前一切运行良好,没有真正的见解......
Second, today we set up the tests with a debug connection instead of the profiler - the system is stable for nearly 5 hours now. This seems to be a very strange combination of too much Finalizers (that have been reduced in the first review), the profiler and the VM GC strategies. As everything runs fine at the moment, no real insights...
感谢您的投入到目前为止 - 也许您敬请关注并感兴趣(现在您可能有更多理由相信我们不会谈论一个简单的编程错误。)
Thanks for the input so far - maybe you stay tuned and interested (now that you may have more reason to believe that we do not talk over a simple programming fault).
推荐答案
我想结束这个问题并总结一下当前状态。
I want to close this question with a summary of the current state.
最后一次测试现在超过60小时没有任何问题。这导致我们得出以下总结/结论:
The last test is now up over 60 hours without any problems. That leads us to the following summary/conclusions:
- 我们有一个使用大量对象的高吞吐量服务器,最终实现finalize 。这些对象主要是JNA内存句柄和文件流。比GC和终结器线程更快地构建终结器能够清理,此过程在约3小时后失败。这是一个众所周知的现象( - > google)。
- 我们做了一些优化,所以服务器摆脱了几乎所有的JNA终结器。这个版本是在附带jProfiler的情况下测试的。
- 服务器比我们最初尝试的时间晚了几个小时......
- 分析器显示了大量的终结者,这次主要仅由文件流引起。即使在暂停服务器一段时间后,也没有清除此队列。
- 只有在手动触发System.runFinalization()后,队列才会被清空。恢复服务器开始重新填充......
- 这仍然是莫名其妙的。我们现在猜测这是与GC / finalization的一些分析器交互。
- 为了调试可能的非活动终结器线程的原因,我们分离了探查器并且这次连接了调试器。
- 系统运行没有明显的缺陷... FinalizerThread和GC都是绿色。
- 我们恢复了测试(现在是第一次没有任何除了jConsole附加的代理商)及其现在超过60小时的罚款。显然,最初的JNA重构解决了这个问题,只有剖析会增加了一些不确定性(Heisenberg的问候)。
- We have a high throughput server using lots of objects that in the end implement "finalize". These objects are mostly JNA memory handles and file streams. Building the Finalizers faster than GC and finalizer thread are able to clean up, this process fails after ~3 hours. This is a well known phenomenon (-> google).
- We did some optimizations so the server got rid of nearly all the JNA Finalizers. This version was tested with jProfiler attached.
- The server died some hours later than our initial attempt...
- The profiler showed a huge amount of finalizers, this time caused mostly only by file streams. This queue was not cleaned up, even after pausing the server for some time.
- Only after manually triggering "System.runFinalization()", the queue was emptied. Resuming the server started to refill...
- This is still inexplicable. We now guess this is some profiler interaction with GC/finalization.
- To debug what could be the reason for the inactive finalizer thread we detached the profiler and attached the debugger this time.
- The system was running without noticeable defect... FinalizerThread and GC all "green".
- We resumed the test (now for the first time again without any agents besides jConsole attached) and its up and fine now for over 60 hours. So apparently the initial JNA refactoring solved the issue, only the profiling session added some indeterminism (greetings from Heisenberg).
其他管理策略终结器例如在(除了不过分聪明的不要使用终结者 ..)。
Other strategies for managing the finalizers are for example discussed in http://cleversoft.wordpress.com/2011/05/14/out-of-memory-exception-from-finalizer-object-overflow/ (besides the not overly clever "don't use finalizers"..).
感谢您的所有输入。
这篇关于非常奇怪的OutOfMemoryError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!