问题描述
我有两个不同的程序,希望分别在Python和Java中使用Murmur3对相同的字符串进行哈希处理.
Python版本2.7.9:
mmh3.hash128('abc')
给出79267961763742113019008347020647561319L.
Java是Guava 18.0:
HashCode hashCode = Hashing.murmur3_128().newHasher().putString("abc", StandardCharsets.UTF_8).hash();
给出字符串"6778ad3f3f3f96b4522dca264174a23b",将其转换为BigInterger即可得到137537073056680613988840834069010096699.
如何从两者中获得相同的结果?
谢谢
以下是从这两者中获得相同结果的方法:
byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();
byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));
assertEquals("79267961763742113019008347020647561319",
new BigInteger(mm3_be).toString());
哈希码的字节需要被视为 little endian ,但是BigInteger
会将字节解释为big endian.您大概是使用new BigInteger(hex, 16)
创建BigInteger
的,但是HashCode.toString()
的输出实际上是一系列十六进制数字对,它们表示哈希字节的顺序与asBytes()
返回的顺序相同(小端) . (您也可以反转这些对的十六进制,以获取一个十六进制数,当传递给new BigInteger(reversedHex, 16)
时,该十六进制数的生成结果相同).
我认为toString()
的文档有些混乱,因为它指的是大端".这实际上并不意味着该方法的输出是十六进制数字,表示被解释为大端字节的字节.
我们有一个公开的问题,用于将asBigInteger()
添加到HashCode
.
I have two different program that wish to hash same string using Murmur3 in Python and Java respectively.
Python version 2.7.9:
mmh3.hash128('abc')
Gives 79267961763742113019008347020647561319L.
Java is Guava 18.0:
HashCode hashCode = Hashing.murmur3_128().newHasher().putString("abc", StandardCharsets.UTF_8).hash();
Gives string "6778ad3f3f3f96b4522dca264174a23b", converting to BigInterger gives 137537073056680613988840834069010096699.
How to get same result from both?
Thanks
Here's how to get the same result from both:
byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();
byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));
assertEquals("79267961763742113019008347020647561319",
new BigInteger(mm3_be).toString());
The hash code's bytes need to be treated as little endian but BigInteger
interprets bytes as big endian. You were presumably using new BigInteger(hex, 16)
to create the BigInteger
, but the output of HashCode.toString()
is actually a series of pairs of hexadecimal digits representing the hash bytes in the same order they're returned by asBytes()
(little endian). (You can also reverse those pairs of hexadecimal to get a hex number that does produce the same result when passed to new BigInteger(reversedHex, 16)
).
I think the documentation of toString()
is somewhat confusing because of the way it refers to "big endian"; it doesn't actually mean that the output of the method is the hexadecimal number representing the bytes interpreted as big endian.
We have an open issue for adding asBigInteger()
to HashCode
.
这篇关于Murmur3散列Python和Java实现之间的不同结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!