本文介绍了Java中String轻量级实现的最佳替代方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的应用程序是带有密集字符串处理的多线程.我们正在经历过多的内存消耗,而性能分析表明这是由于String数据引起的.我认为使用某种flyweight模式实现甚至是缓存将极大地受益于内存消耗(我可以肯定Strings经常被复制,尽管我在这方面没有任何硬数据).

My application is multithreaded with intensive String processing. We are experiencing excessive memory consumption and profiling has demonstrated that this is due to String data. I think that memory consumption would benefit greatly from using some kind of flyweight pattern implementation or even cache (I know for sure that Strings are often duplicated, although I don't have any hard data in that regard).

我看过Java常量池和String.intern,但似乎可以引发一些PermGen问题.

I have looked at Java Constant Pool and String.intern, but it seems that it can provoke some PermGen problems.

在Java中实现应用程序范围的多线程字符串池的最佳替代方法是什么?

What would be the best alternative for implementing application-wide, multithreaded pool of Strings in java?

另请参阅我先前的相关问题: Java如何在引擎盖下实现字符串的飞量模式?

Also see my previous, related question: How does java implement flyweight pattern for string under the hood?

推荐答案

注意:此答案使用的示例可能与现代运行时JVM库无关.特别是substring示例在OpenJDK/Oracle 7+中不再是问题.

Note: This answer uses examples that might not be relevant in modern runtime JVM libraries. In particular, the substring example is no longer an issue in OpenJDK/Oracle 7+.

我知道这与人们经常告诉您的内容背道而驰,但是有时显式创建新的String实例 可能是减少内存的一种重要方法.

I know it goes against what people often tell you, but sometimes explicitly creating new String instances can be a significant way to reduce your memory.

由于字符串是不可变的,因此有几种方法可以利用该事实并共享支持字符的数组以节省内存.但是,有时候,通过防止垃圾回收那些数组中未使用的部分,实际上可以增加内存.

Because Strings are immutable, several methods leverage that fact and share the backing character array to save memory. However, occasionally this can actually increase the memory by preventing garbage collection of unused parts of those arrays.

例如,假设您正在解析日志文件的消息ID以提取警告ID.您的代码如下所示:

For example, assume you were parsing the message IDs of a log file to extract warning IDs. Your code would look something like this:

//Format:
//ID: [WARNING|ERROR|DEBUG] Message...
String testLine = "5AB729: WARNING Some really really really long message";

Matcher matcher = Pattern.compile("([A-Z0-9]*): WARNING.*").matcher(testLine);
if ( matcher.matches() ) {
    String id = matcher.group(1);
        //...do something with id...
}

但是看看实际存储的数据:

But look at the data actually being stored:

    //...
    String id = matcher.group(1);
    Field valueField = String.class.getDeclaredField("value");
    valueField.setAccessible(true);

    char[] data = ((char[])valueField.get(id));
    System.out.println("Actual data stored for string \"" + id + "\": " + Arrays.toString(data) );

这是整个测试行,因为匹配器只是在相同的字符数据周围包装了一个新的String实例.比较用String id = new String(matcher.group(1));替换String id = matcher.group(1);时的结果.

It's the whole test line, because the matcher just wraps a new String instance around the same character data. Compare the results when you replace String id = matcher.group(1); with String id = new String(matcher.group(1));.

这篇关于Java中String轻量级实现的最佳替代方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 16:21