


I hope this question is not considered too basic for this forum, but we'll see. I'm wondering how to refactor some code for better performance that is getting run a bunch of times.


Say I'm creating a word frequency list, using a Map (probably a HashMap), where each key is a String with the word that's being counted and the value is an Integer that's incremented each time a token of the word is found.


In Perl, incrementing such a value would be trivially easy:



But in Java, it's much more complicated. Here the way I'm currently doing it:

int count = map.containsKey(word) ? map.get(word) : 0;
map.put(word, count + 1);


Which of course relies on the autoboxing feature in the newer Java versions. I wonder if you can suggest a more efficient way of incrementing such a value. Are there even good performance reasons for eschewing the Collections framework and using a something else instead?


Update: I've done a test of several of the answers. See below.



很多好的答案这个问题 - 感谢人们 - 所以我决定运行一些测试,弄清楚哪个方法实际上是最快的。我测试的五种方法是这些:

Some test results

I've gotten a lot of good answers to this question--thanks folks--so I decided to run some tests and figure out which method is actually fastest. The five methods I tested are these:

  • 我在

  • 建议使用TestForNull作者:Aleksandar Dimitrov

  • Hank Gay建议的AtomicLong方法

  • jrudolph建议的Trove方法

  • phax.myopenid.com建议的「MutableInt」方法

  • the "ContainsKey" method that I presented in the question
  • the "TestForNull" method suggested by Aleksandar Dimitrov
  • the "AtomicLong" method suggested by Hank Gay
  • the "Trove" method suggested by jrudolph
  • the "MutableInt" method suggested by phax.myopenid.com


  1. 创建了五个相同的类,除了下面显示的差异。每个类都必须执行我所提供的场景的典型操作:打开一个10MB的文件并读入,然后执行文件中所有单词令牌的频率计数。因为这平均只有3秒,所以我执行频率计数(而不是I / O)10次。

  2. 定时了10次迭代的循环,但不是I / O操作,并记录所花费的总时间(以秒钟为单位),基本上使用。

  3. 执行了所有五个测试,

  1. created five classes that were identical except for the differences shown below. Each class had to perform an operation typical of the scenario I presented: opening a 10MB file and reading it in, then performing a frequency count of all the word tokens in the file. Since this took an average of only 3 seconds, I had it perform the frequency count (not the I/O) 10 times.
  2. timed the loop of 10 iterations but not the I/O operation and recorded the total time taken (in clock seconds) essentially using Ian Darwin's method in the Java Cookbook.
  3. performed all five tests in series, and then did this another three times.
  4. averaged the four results for each method.




I'll present the results first and the code below for those who are interested.

ContainsKey 方法是最慢的,所以我将给出每个方法的速度与该方法的速度相比。

The ContainsKey method was, as expected, the slowest, so I'll give the speed of each method in comparison to the speed of that method.

  • ContainsKey: 30.654秒(基线)

  • AtomicLong: 29.780秒li>
  • TestForNull: 28.804秒(快1.06倍)

  • Trove: 26.313秒

  • MutableInt: 25.747秒(速度的1.19倍)

  • ContainsKey: 30.654 seconds (baseline)
  • AtomicLong: 29.780 seconds (1.03 times as fast)
  • TestForNull: 28.804 seconds (1.06 times as fast)
  • Trove: 26.313 seconds (1.16 times as fast)
  • MutableInt: 25.747 seconds (1.19 times as fast)

看起来只有MutableInt方法和Trove方法明显更快,因为只有它们的性能提升超过10% 。然而,如果线程是一个问题,AtomicLong可能比其他人更有吸引力(我不是真的确定)。我也跑了TestForNull与最终变量,但差别是可以忽略不计。

It would appear that only the MutableInt method and the Trove method are significantly faster, in that only they give a performance boost of more than 10%. However, if threading is an issue, AtomicLong might be more attractive than the others (I'm not really sure). I also ran TestForNull with final variables, but the difference was negligible.


Note that I haven't profiled memory usage in the different scenarios. I'd be happy to hear from anybody who has good insights into how the MutableInt and Trove methods would be likely to affect memory usage.


Personally, I find the MutableInt method the most attractive, since it doesn't require loading any third-party classes. So unless I discover problems with it, that's the way I'm most likely to go.

import java.util.HashMap;
import java.util.Map;
Map<String, Integer> freq = new HashMap<String, Integer>();
int count = freq.containsKey(word) ? freq.get(word) : 0;
freq.put(word, count + 1);



import java.util.HashMap;
import java.util.Map;
Map<String, Integer> freq = new HashMap<String, Integer>();
Integer count = freq.get(word);
if (count == null) {
    freq.put(word, 1);
else {
    freq.put(word, count + 1);



import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.atomic.AtomicLong;
final ConcurrentMap<String, AtomicLong> map =
    new ConcurrentHashMap<String, AtomicLong>();
map.putIfAbsent(word, new AtomicLong(0));



import gnu.trove.TObjectIntHashMap;
TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>();
freq.adjustOrPutValue(word, 1, 1);



import java.util.HashMap;
import java.util.Map;
class MutableInt {
  int value = 1; // note that we start at 1 since we're counting
  public void increment () { ++value;      }
  public int  get ()       { return value; }
Map<String, MutableInt> freq = new HashMap<String, MutableInt>();
MutableInt count = freq.get(word);
if (count == null) {
    freq.put(word, new MutableInt());
else {


08-20 11:53