问题描述
我有一些布尔数组,它们的大小不是恒定的,并且我需要一个强大而快速的哈希算法,以使它们发生哈希冲突的机会降到最低。
I have some boolean arrays that their sizes are not constant, And I need a strong and fast hash algorithm to give minimum chance of hash collision for them.
我的自己的想法是计算每个布尔数组的整数值,但是例如,这2个数组将给出3的相同哈希值:
[0,1,1]和[1,1]
My own idea was calculating the integer value of each boolean array but for example these 2 arrays will give same hash of 3:
[0 , 1, 1] and [1, 1]
在计算整数值后,我想乘以数组的大小,但是这个想法也很烂,因为哈希冲突的可能性很高。
I thought to multiply the size of array after calculating integer value, but this idea also sucks, because there is a high chance of hash collision.
有人有个好主意吗?
推荐答案
插入前哨 true
元素放在数组的开头,然后将数组解释为二进制数。对于少于32个元素的数组,这是完美的散列(无冲突)。对于较大的数组,我建议对小于2 的大质数进行模运算。
Insert a sentinel true
element at the start of the array, then interpret the array as a binary number. This is a perfect hash (no collisions) for arrays with less than 32 elements. For larger arrays I suggest doing the arithmetic modulo a large prime less than 2.
示例:
Array | Binary | Decimal
------------+--------+---------
[ 0, 1, 1 ] | 01011 | 11
[ 1, 1 ] | 00111 | 7
这与将数组解释为二进制数并使用<$进行按位或c $ c> 1<< n ,其中 n
是数组的大小。
This is the same as interpreting the array as a binary number and taking the bitwise OR with 1 << n
, where n
is the size of the array.
实现:
int hash(int[] array)
{
int h = (1 << array.length);
for (int i = 0; i < array.length; i++)
{
h = h | (array[i] << (array.length - i - 1));
}
return h;
}
这篇关于可变大小布尔数组的哈希算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!