问题描述
在寻找一个快速的组合键字典我来到异常我不明白,也没有理由的。
In search of a fast composite key for Dictionary I came upon anomaly I cannot understand nor justify.
在有限的测试
Dictionary<KeyValuePair<UInt32, UInt32>, string>
是显著慢(200:1),比
is significantly slower (200:1) than
Dictionary<KeyValuePair<UInt16, UInt16>, string>
这两个循环测试从0到1000填充后的containsKey
Test on two loops from 0 to 1000Populate and then ContainsKey
Poplulate ContainsKey
UInt32 92085 86578
UInt16 2201 431
问题在于
new KeyValuePair<UInt32, UInt32>(i, j).GetHashCode();
产生很多重复。
在循环i和j 1024仅1024唯一的哈希值创建的。
yields MANY duplicates.
In looping i and j 1024 only 1024 unique hash values are created.
我试过* 31和J * 97(两个素数),并导致了105280在1024x1024的独特。还有很多重复的。 CasperOne我知道这是不一样的随机的。但它不是我的工作,以随机的输入。 GetHash code()应该是随机的输出。
Based on avalanche comment from CasperOne tried i*31 and j*97 (two prime numbers) and that resulted in 105280 unique on 1024X1024. Still a lot of duplicates. CasperOne I know that is not the same as random. But it is not my job to randomize the input. GetHashCode() is supposed to randomize the output.
为什么大量重复的?
在同一回路
new KeyValuePair<UInt16, UInt16>(i, j).GetHashCode();
收益率1024×1024唯一的哈希codeS(完美的)。
yields 1024 X 1024 unique hash codes (perfect).
的Int32有同样的问题。
Int32 has the same problem.
这些重复的哈希值杀
Dictionary<KeyValuePair<UInt32, UInt32>, string>
元组也产生了大量的重复不会降低在的Int32相比,Int16类型的。
Tuple also generates a lot of duplicates it does not degrade at Int32 compared to Int16.
时产生的原始KVP和原料KPV.GetHash code是相似的。
Time for generating the raw KVP and the raw KPV.GetHashCode is similar.
同样异常与HashSet的。
Same anomaly with HashSet.
Dictionary<KeyValuePair<UInt32, UInt32>, string> dKVPu32 = new Dictionary<KeyValuePair<UInt32, UInt32>, string>();
Dictionary<KeyValuePair<UInt16, UInt16>, string> dKVPu16 = new Dictionary<KeyValuePair<UInt16, UInt16>, string>();
KeyValuePair<UInt32, UInt32> kvpUint32;
KeyValuePair<UInt16, UInt16> kvpUint16;
int range = 1000;
Int32 hashCode;
HashSet<Int32> kvpUint32Hash = new HashSet<Int32>();
HashSet<Int32> kvpUint16Hash = new HashSet<Int32>();
Stopwatch sw = new Stopwatch();
sw.Start();
for (UInt32 i = 0; i < range; i++)
{
for (UInt32 j = 0; j < range; j++)
{
kvpUint32 = new KeyValuePair<UInt32, UInt32>(i, j);
}
}
Console.WriteLine("UInt32 raw " + sw.ElapsedMilliseconds.ToString());
// 7
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
for (UInt16 j = 0; j < range; j++)
{
kvpUint16 = new KeyValuePair<UInt16, UInt16>(i, j);
}
}
Console.WriteLine("UInt16 raw " + sw.ElapsedMilliseconds.ToString());
// 6
sw.Restart();
for (UInt32 i = 0; i < range; i++)
{
for (UInt32 j = 0; j < range; j++)
{
hashCode = new KeyValuePair<UInt32, UInt32>(i, j).GetHashCode();
kvpUint32Hash.Add(hashCode);
}
}
Console.WriteLine("UInt32 GetHashCode " + sw.ElapsedMilliseconds.ToString() + " unique count " + kvpUint32Hash.Count.ToString());
// 285 1024
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
for (UInt16 j = 0; j < range; j++)
{
hashCode = new KeyValuePair<UInt16, UInt16>(i, j).GetHashCode();
kvpUint16Hash.Add(hashCode);
}
}
Console.WriteLine("UInt16 GetHashCode " + sw.ElapsedMilliseconds.ToString() + " unique count " + kvpUint16Hash.Count.ToString());
// 398 1000000
sw.Restart();
Console.ReadLine();
for (UInt32 i = 0; i < range; i++)
{
for (UInt32 j = 0; j < range; j++)
{
dKVPu32.Add(new KeyValuePair<UInt32, UInt32>(i, j), String.Format("{0} {1}", i.ToString(), j.ToString()));
}
}
Console.WriteLine("hsKVPu32 pop " + sw.ElapsedMilliseconds.ToString());
// 92085
sw.Restart();
for (UInt32 i = 0; i < range; i++)
{
for (UInt32 j = 0; j < range; j++)
{
if (!dKVPu32.ContainsKey(new KeyValuePair<UInt32, UInt32>(i, j))) Debug.WriteLine("Opps"); ;
}
}
Console.WriteLine("hsKVPu32 find " + sw.ElapsedMilliseconds.ToString());
// 86578
dKVPu32.Clear();
dKVPu32 = null;
GC.Collect();
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
for (UInt16 j = 0; j < range; j++)
{
dKVPu16.Add(new KeyValuePair<UInt16, UInt16>(i, j), String.Format("{0} {1}", i.ToString(), j.ToString()));
}
}
Console.WriteLine("hsKVPu16 pop " + sw.ElapsedMilliseconds.ToString());
// 2201
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
for (UInt16 j = 0; j < range; j++)
{
if (!dKVPu16.ContainsKey(new KeyValuePair<UInt16, UInt16>(i, j))) Debug.WriteLine("Opps"); ;
}
}
sw.Stop();
Console.WriteLine("hsKVPu16 find " + sw.ElapsedMilliseconds.ToString());
// 431
P.S。最快的是包装.E.G。 ((UInt32的)INT1&LT;&LT; 16)| INT2;
P.S. The fastest is to pack .E.G. ((UInt32)int1 << 16) | int2;
第一UInt32的列的散列等于KVP的哈希未来两年的。
The hash of first UInt32 column equals hash of KVP of the next two.
2281371105 8 992
2281371104 8 993
2281371107 8 994
2281371105 8 992
2281371104 8 993
2281371107 8 994
2281371145 0 0
2281371147 0 2
2281371149 0 4
2281371151 0 6
2281371137 0 8
2281371145 0 0
2281371147 0 2
2281371149 0 4
2281371151 0 6
2281371137 0 8
2281371144 0 1
2281371146 0 3
2281371148 0 5
2281371150 0 7
2281371136 0 9
2281371144 0 1
2281371146 0 3
2281371148 0 5
2281371150 0 7
2281371136 0 9
2281371144 1 0
2281371145 1 1
2281371146 1 2
2281371147 1 3
2281371148 1 4
2281371149 1 5
2281371150 1 6
2281371151 1 7
2281371136 1 8
2281371137 1 9
2281371144 1 0
2281371145 1 1
2281371146 1 2
2281371147 1 3
2281371148 1 4
2281371149 1 5
2281371150 1 6
2281371151 1 7
2281371136 1 8
2281371137 1 9
2281371147 2 0
2281371146 2 1
2281371144 2 3
2281371151 2 4
2281371150 2 5
2281371149 2 6
2281371148 2 7
2281371139 2 8
2281371147 2 0
2281371146 2 1
2281371144 2 3
2281371151 2 4
2281371150 2 5
2281371149 2 6
2281371148 2 7
2281371139 2 8
我发现的唯一的模式是,无论是和或差或KVP匹配。
但无法找到时总结,何时减的模式。
这是一个糟糕的散列所以知道它是什么,是没有价值的。
The only pattern I have found is that either the sum or difference or the KVP matches.
But could not find a pattern for when to sum and when to subtract.
It is a bad hash so knowing what it is is of little value.
推荐答案
首先,我们可以用这个时间方面的分配 - 这感觉对我来说,这真是的只是的有关散列冲突,如很明显,那些将杀死性能。
Firstly, we can dispense with the timing aspect of this - it feels to me like this is really just about hash collisions, as obviously those will kill the performance.
所以,真正的问题是,为什么有更多的哈希冲突的 KeyValuePair&LT; UINT,UINT&GT;
比 KeyValuePair&LT; USHORT,USHORT&GT;
。为了找出答案多一点的是,我已经写了下面的小程序:
So, the question is really why there are more hash collisions for KeyValuePair<uint, uint>
than KeyValuePair<ushort, ushort>
. To help find out a bit more about that, I've written the following short program:
using System;
using System.Collections.Generic;
class Program
{
const int Sample1 = 100;
const int Sample2 = 213;
public static void Main()
{
Display<uint, ushort>();
Display<ushort, ushort>();
Display<uint, uint>();
Display<ushort, uint>();
}
static void Display<TKey, TValue>()
{
TKey key1 = (TKey) Convert.ChangeType(Sample1, typeof(TKey));
TValue value1 = (TValue) Convert.ChangeType(Sample1, typeof(TValue));
TKey key2 = (TKey) Convert.ChangeType(Sample2, typeof(TKey));
TValue value2 = (TValue) Convert.ChangeType(Sample2, typeof(TValue));
Console.WriteLine("Testing {0}, {1}", typeof(TKey).Name, typeof(TValue).Name);
Console.WriteLine(new KeyValuePair<TKey, TValue>(key1, value1).GetHashCode());
Console.WriteLine(new KeyValuePair<TKey, TValue>(key1, value2).GetHashCode());
Console.WriteLine(new KeyValuePair<TKey, TValue>(key2, value1).GetHashCode());
Console.WriteLine(new KeyValuePair<TKey, TValue>(key2, value2).GetHashCode());
Console.WriteLine();
}
}
我的机器上的输出是:
The output on my machine is:
Testing UInt32, UInt16
-1888265981
-1888265981
-1888265806
-1888265806
Testing UInt16, UInt16
-466800447
-459525951
-466800528
-459526032
Testing UInt32, UInt32
958334947
958334802
958334802
958334947
Testing UInt16, UInt32
-1913331935
-1913331935
-1913331935
-1913331935
您可以明显地尝试不同的采样值,看看哪里有冲突。
You can obviously try varying the sample values to see where there are collisions.
的结果 KeyValuePair&LT; USHORT,UINT&GT;
特别令人担忧,而 KeyValuePair&LT的结果; USHORT,USHORT&GT;
是出奇的好。
The results of KeyValuePair<ushort, uint>
are particularly worrying, and the results of KeyValuePair<ushort, ushort>
are surprisingly good.
其实, KeyValuePair&LT; USHORT,UINT&GT;
不只是坏 - 它的可笑的糟糕,因为据我可以看到 - 我的天堂T找到的任意的值运行64位CLR时,它不具有-1913331935相同散列code。运行32位CLR我得到一个不同的哈希code,但相同的哈希code对于所有的值。
In fact, KeyValuePair<ushort, uint>
isn't just bad - it's ludicrously bad as far as I can see - I haven't to find any value which doesn't have the same hash code of -1913331935 when running the 64 bit CLR. Running the 32 bit CLR I get a different hash code, but still the same hash code for all values.
目前看来,在.NET 4.5(这是我跑什么的)的默认实现 GetHash code
不只是采取的第一个实例字段该结构,为previously记录。我怀疑,至少对于某些类型的,它只是使用了前4个字节的内存超出在盒装的值(也有将在这里拳击每次调用)的头部,并且最终的有时的,这正好是第一个字段(如果该字段是 UINT
),有时的是多个领域(如 USHORT,USHORT
,其中这两个领域能适应内部4个字节)和有时的是,在所有的( USHORT,UINT没有字段
)。
It appears that in .NET 4.5 (which is what I'm running) the default implementation of GetHashCode
doesn't just take the first instance field of the struct, as previously documented. I suspect that for at least some types, it just uses the first 4 bytes of memory beyond the header in the boxed value (and there will be boxing for every call here), and that ends up sometimes being just the first field (if that field is a uint
), sometimes being more than one field (e.g. for ushort, ushort
where both fields can fit "inside" 4 bytes) and sometimes being no fields at all (ushort, uint
).
(事实上,这并不能解释为什么你会得到1024不同的hash codeS中的 UINT,UINT
的情况下,而不是仅仅1000我仍然不确定上。)
(Actually, this doesn't explain why you get 1024 different hash codes in the uint, uint
case instead of just 1000. I'm still unsure on that.)
最后,使用不重写值类型 GetHash code
作为字典的键好像它只是一个坏主意,除非你测试确保它是适合您的具体要求。这里是黑魔法有信心吧,海事组织只是太多了。
Ultimately, using a value type which doesn't override GetHashCode
as a dictionary key seems like it's just a bad idea, unless you've tested to ensure that it's suitable for your specific requirements. There's just too much which is black magic to be confident about it, IMO.
这篇关于新KeyValuePair&LT; UInt32的,UInt32的&GT;(I,J).GetHash code();高重复率的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!