问题描述
我试图编写一个小程序来演示只有等于被覆盖而不是hashcode()方法的java中的哈希冲突。这是为了证明两个不相等的对象可以具有相同的哈希码的理论。这是针对面试问题的问题。
我创建了200,000个对象,将它们存储在一个数组中,然后将它们进行比较以查看哪些是重复的。 (为此,我使用嵌套for循环在对象创建阶段之后迭代对象数组。)对于大约200,000个对象,我得到9次冲突。第一个是索引196和121949的对象。然后我打印这些哈希码来显示这两个值是相同的。
然而,我得到了一些非常令人惊讶的行为。如果我迭代嵌套for循环并打印hashcode的第一个冲突,我会得到相同的散列码值
1867750575
1867750575
对于索引为196和121949的两个对象。
但是,如果我注释掉检测所有冲突的嵌套for循环,并直接打印索引为196和121949的元素的哈希码,我会得到
1829164700
366712642
请注意,我没有评论创建这些元素,只是我检查碰撞的部分。
为什么会发生这种情况,即使我没有迭代它们,不应该哈希码是否一致?附录1:据我所知,是否有一个背后的原因,按照生日原则,如果我创建了200,000个对象,我必须得到一个碰撞,如何迭代在每个hascode或不改变任何东西?
附录2:我试着添加另一个数组200000大小只是为了查看碰撞索引是否改变,他们没有,所以显然在未提交循环的情况下对二进制文件进行更改不会做任何更改。因此,改变二进制变化散列码的假设不成立。
这是我的代码
import java.util.HashMap;
或
public class EmployeeFactory {
private static int counter = 0;
public int id;
public String empName;
EmployeeFactory(){
id = counter;
empName =employee_+ id;
counter ++;
}
@Override
public boolean equals(Object o){
//如果对象与自身进行比较,则返回true
if(o == this){
return true;
}
if(o == null || o.getClass()!= this.getClass()){
return false;
}
EmployeeFactory emp =(EmployeeFactory)o;
//比较数据成员并相应返回
返回this.id == emp.id;
public static void main(String [] args){
int Obj_Count = 200000;
EmployeeFactory objs [] = new EmployeeFactory [Obj_Count];
for(int i = 0; i< Obj_Count; ++ i){
EmployeeFactory temp = new EmployeeFactory();
objs [i] = temp;
}
//请在下面的循环中注释一次,并在保留注释的同时尝试一次。 (int i = 0; i< Obj_Count; ++ i)
/ *
(int j = i + 1; j< Obj_Count; + + j)
{
if(objs [i] .hashCode()== objs [j] .hashCode())
{
System.out.println(Objects ID为+ objs [i] .id
+和+ objs [j] .id +相撞。);
System.out.println(Object Is+ i +and Obj ID is+ objs [i] .id +Has Hashcode+ objs [i] .hashCode());
System.out.println(Object Is+ j +and Obj ID is+ objs [j] .id +Has Hashcode+ objs [j] .hashCode());
System.out.println();
}
}
}
* /
HashMap< EmployeeFactory,EmployeeFactory> hm = new HashMap< EmployeeFactory,EmployeeFactory>();
objs [121949] .id = objs [196] .id;
hm.put(objs [196],objs [196]);
hm.put(objs [121949],objs [121949]);
System.out.println(hm.get(objs [121949])。empName);
System.out.println(hm.get(objs [196])。empName);
//检查散列映射
System.out.println(hm.get(objs [121949])。hashCode());
System.out.println(hm.get(objs [196])。hashCode());
//检查数组
System.out.println(objs [121949] .hashCode());
System.out.println(objs [196] .hashCode());
$ b code
$ b出现输出:
employee_121949
employee_196
1829164700
366712642
1829164700
366712642
未评论的循环输出
ID为196和121949的对象发生冲突。
对象是196和对象ID是196有散列码1867750575
对象是121949和对象ID是121949有散列码1867750575
ID为62082和145472的对象发生冲突。
对象是62082和Obj ID是62082有哈希码2038112324
对象是145472and对象ID是145472有哈希码2038112324
ID为62354和105841的对象发生冲突。
对象是62354和Obj ID是62354有Hashcode 2134400190
对象是105841和Obj ID是105841有Hashcode 2134400190
ID 68579和186938碰撞的对象。
Object是68579和Obj ID是68579有Hashcode 1872358815
Object是186938and Obj ID是186938有Hashcode 1872358815
ID为105219和111288的对象发生冲突。
对象是105219和Obj ID是105219有哈希码651156501
对象是111288and对象ID是111288有哈希码651156501
ID为107634和152385的对象发生冲突。
对象是107634和Obj ID是107634有哈希码273791087
对象是152385和Obj ID是152385有哈希码273791087
ID为108007和146405的对象发生冲突。
对象是108007和Obj ID是108007有Hashcode 1164664992
对象是146405和Obj ID是146405有Hashcode 1164664992
ID为135275和180997的对象发生冲突。
对象是135275和Obj ID是135275有Hashcode 996371445
对象是180997和Obj ID是180997有Hashcode 996371445
ID为153749和184310的对象发生冲突。
对象是153749和Obj ID是153749有哈希码254720071
对象是184310和Obj ID是184310有哈希码254720071
employee_121949
employee_121949
1867750575
1867750575
1867750575
1867750575
解决方案Matt Timmermans的回答很好地涵盖了基本问题,特别是你不可能期望在不同运行之间有任何一致性......。 (+1)
默认
Object.hashCode()
实现(也称为身份哈希码因为它与System.identityHashCode(obj)
)相同,在Hotspot中,它只是一个带有线程本地种子的伪随机数。一段时间以来,对象的内存地址一直没有任何依赖性。如果你的程序执行是完全确定性的,那么哈希将很可能是可重复的。
另外请注意,标识哈希码是在第一次调用 Object.hashCode()
System.identityHashCode()
,并将该值存储在对象中,以便后续对该对象的调用将返回相同的价值。如果在另一个线程中运行碰撞检测器循环,则会得到完全不同的哈希值,从而导致不同的碰撞。I tried to write a small program to demonstrate hash collisions in java when only equals is overridden and not hashcode() method. This was to prove the theory that two unequal objects can have the same hashcode. This was for an Interview question where the behaviour was asked.
I created 200,000 objects, stored them in an array and then compared them to see which are duplicates. (For this I am using a nested for loop iterating over an array of objects after the object creation phase.) For around 200,000 objects I get 9 collisions. First one being object at index 196 and 121949. I then go on to print these hash codes to show that both values are the same.
However I am getting some very surprising behavior. If I iterate over the nested for loop and print the first collision of hashcodes, I get same hashcode value
1867750575 1867750575
for both object at index 196 and 121949.
But if I comment out the nested for loop for detecting all the collisions and directly print hashcode for elements at index 196 and 121949 I get
1829164700 366712642
Note, I am not commenting out the creation of these elements, just the part where I check for collisions.
Why is this happening, even if I dont iterate over them, shouldn't the hashcode be consistent ?
Addendum 1: Is there a source behind this, as far as I know, going by the Birthday principle, if I create 200,000 objects, I must get a collision, how is iterating over each hascode or not changing anything?
Addendum 2: I tried adding another array of 200000 size just to see if the colliding indexes change, they did not, so apparently making changes to the binary with the loop uncommitted does not make any changes. So the hypothesis that changing binary changes hashcode doesn't hold.
Here is my code
import java.util.HashMap; public class EmployeeFactory { private static int counter = 0; public int id; public String empName; EmployeeFactory() { id = counter; empName = "employee_" + id; counter++; } @Override public boolean equals(Object o) { // If the object is compared with itself then return true if (o == this) { return true; } if (o == null || o.getClass() != this.getClass()) { return false; } EmployeeFactory emp = (EmployeeFactory) o; // Compare the data members and return accordingly return this.id == emp.id; } public static void main(String[] args) { int Obj_Count = 200000; EmployeeFactory objs[] = new EmployeeFactory[Obj_Count]; for (int i = 0; i < Obj_Count; ++i) { EmployeeFactory temp = new EmployeeFactory(); objs[i] = temp; } //Please try code once un commenting the loop below and once while keeping it commented. /* for (int i = 0; i < Obj_Count; ++i) { for (int j = i + 1; j < Obj_Count; ++j) { if (objs[i].hashCode() == objs[j].hashCode()) { System.out.println("Objects with IDs " + objs[i].id + " and " + objs[j].id + " collided."); System.out.println("Object Is " + i + "and Obj ID is "+ objs[i].id + " Has Hashcode " + objs[i].hashCode()); System.out.println("Object Is " + j + "and Obj ID is "+ objs[j].id + " Has Hashcode " + objs[j].hashCode()); System.out.println(""); } } } */ HashMap<EmployeeFactory, EmployeeFactory> hm = new HashMap<EmployeeFactory, EmployeeFactory>(); objs[121949].id = objs[196].id; hm.put(objs[196], objs[196]); hm.put(objs[121949], objs[121949]); System.out.println(hm.get(objs[121949]).empName); System.out.println(hm.get(objs[196]).empName); // checking the hashmap System.out.println(hm.get(objs[121949]).hashCode()); System.out.println(hm.get(objs[196]).hashCode()); // Checking the array System.out.println(objs[121949].hashCode()); System.out.println(objs[196].hashCode()); } }
Comemented Output:
employee_121949 employee_196 1829164700 366712642 1829164700 366712642
Un-commented loop Output
Objects with IDs 196 and 121949 collided. Object Is 196and Obj ID is 196 Has Hashcode 1867750575 Object Is 121949and Obj ID is 121949 Has Hashcode 1867750575 Objects with IDs 62082 and 145472 collided. Object Is 62082and Obj ID is 62082 Has Hashcode 2038112324 Object Is 145472and Obj ID is 145472 Has Hashcode 2038112324 Objects with IDs 62354 and 105841 collided. Object Is 62354and Obj ID is 62354 Has Hashcode 2134400190 Object Is 105841and Obj ID is 105841 Has Hashcode 2134400190 Objects with IDs 68579 and 186938 collided. Object Is 68579and Obj ID is 68579 Has Hashcode 1872358815 Object Is 186938and Obj ID is 186938 Has Hashcode 1872358815 Objects with IDs 105219 and 111288 collided. Object Is 105219and Obj ID is 105219 Has Hashcode 651156501 Object Is 111288and Obj ID is 111288 Has Hashcode 651156501 Objects with IDs 107634 and 152385 collided. Object Is 107634and Obj ID is 107634 Has Hashcode 273791087 Object Is 152385and Obj ID is 152385 Has Hashcode 273791087 Objects with IDs 108007 and 146405 collided. Object Is 108007and Obj ID is 108007 Has Hashcode 1164664992 Object Is 146405and Obj ID is 146405 Has Hashcode 1164664992 Objects with IDs 135275 and 180997 collided. Object Is 135275and Obj ID is 135275 Has Hashcode 996371445 Object Is 180997and Obj ID is 180997 Has Hashcode 996371445 Objects with IDs 153749 and 184310 collided. Object Is 153749and Obj ID is 153749 Has Hashcode 254720071 Object Is 184310and Obj ID is 184310 Has Hashcode 254720071 employee_121949 employee_121949 1867750575 1867750575 1867750575 1867750575
解决方案Matt Timmermans' answer covers the basic issues fairly well, particularly "You can't expect to have any consistency...between different runs." (+1)
The default
Object.hashCode()
implementation (also called the identity hash code because it's the same asSystem.identityHashCode(obj)
) is currently, in Hotspot, just a pseudo-random number with a thread-local seed. There hasn't been any dependency on the object's memory address for quite some time. If your program execution is completely deterministic, the hashes will most likely be repeatable.Also note that the identity hashcode is lazily generated on the first call to
Object.hashCode()
orSystem.identityHashCode()
and the value is stored in the object so that subsequent calls on this object will return the same value. If you run your collision detector loop in another thread, you'll get completely different hash values, and thus different collisions.这篇关于为什么Java hashcodes在一种情况下会发生碰撞,而另一种则不会发生碰撞? (代码如下)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!