问题描述
不是一个真正的问题,因为我已经找到了答案,但还是有趣的事情。
Not a real question because I already found out the answer, but still interesting thing.
我一直认为哈希表是最快的关联容器,如果你凑正常。
I always thought that hash table is the fastest associative container if you hash properly.
但是,下面的code是非常缓慢的。它只能执行大约100万次迭代和时间超过2分钟的时间上的酷睿2 CPU。
However, the following code is terribly slow. It executes only about 1 million iterations and takes more than 2 minutes of time on a Core 2 CPU.
在code执行以下操作:它保持了集待办事项
它需要处理的项目。在每一个需要从这个集合中的项迭代(无所谓哪个项目),删除它,处理它,如果它没有被处理(可能增加更多的项目要处理),并重复,直到没有要处理的项目。
The code does the following: it maintains the collection todo
of items it needs to process. At each iteration it takes an item from this collection (doesn't matter which item), deletes it, processes it if it wasn't processed (possibly adding more items to process), and repeats this until there are no items to process.
罪魁祸首似乎是Dictionary.Keys.First()操作。
The culprit seems to be the Dictionary.Keys.First() operation.
现在的问题是为什么会慢?
The question is why is it slow?
Stopwatch watch = new Stopwatch();
watch.Start();
HashSet<int> processed = new HashSet<int>();
Dictionary<int, int> todo = new Dictionary<int, int>();
todo.Add(1, 1);
int iterations = 0;
int limit = 500000;
while (todo.Count > 0)
{
iterations++;
var key = todo.Keys.First();
var value = todo[key];
todo.Remove(key);
if (!processed.Contains(key))
{
processed.Add(key);
// process item here
if (key < limit) { todo[key + 13] = value + 1; todo[key + 7] = value + 1; }
// doesn't matter much how
}
}
Console.WriteLine("Iterations: {0}; Time: {1}.", iterations, watch.Elapsed);
这导致:
Iterations: 923007; Time: 00:02:09.8414388.
只要改变字典到SortedDictionary收益率:
Simply changing Dictionary to SortedDictionary yields:
Iterations: 499976; Time: 00:00:00.4451514.
而仅具有2倍以下的迭代更快
300倍。
300 times faster while having only 2 times less iterations.
同样的情况,在java中。使用的HashMap
,而不是词典
和键设置()。迭代器()。下一个()
而不是 Keys.First()
。
The same happens in java.Used HashMap
instead of Dictionary
and keySet().iterator().next()
instead of Keys.First()
.
推荐答案
词典&LT; TKEY的,TValue&GT;
维护一个哈希表
它会枚举通过哈希表中的桶循环,直到找到一个非空桶,然后返回在桶中的价值。
一旦字典变大,该操作变得昂贵。
此外,从词典中删除项目不收缩的水桶阵,因此一()
通话得到的慢的为您删除的项目。 (因为它有循环再找到一个非空斗)
Its enumerator will loop through the buckets in the hash table until it finds a non-empty bucket, then return the value in that bucket.
Once the dictionary grows large, this operation becomes expensive.
In addition, removing an item from the dictionary doesn't shrink the buckets array, so the First()
call gets slower as you remove items. (Because it has to loop further to find a non-empty bucket)
因此,一边喊一()
和删除是O(n )。
Therefore, repeatedly calling First()
and removing is O(n).
顺便说一句,就可以避免这样的值查询:(这并不会使它明显快)
By the way, you can avoid the value lookup like this: (This will not make it noticeably faster)
var kvp = todo.First();
//Use kvp.Key and kcp.Value
这篇关于为什么Dictionary.First()这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!