本文介绍了Murmur3散列Python和Java实现之间的不同结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时删除!!

我有两个不同的程序,希望分别在Python和Java中使用Murmur3对相同的字符串进行哈希处理.

Python版本2.7.9:

 mmh3.hash128('abc')
 

给出79267961763742113019008347020647561319L.

Java是Guava 18.0:

 HashCode hashCode = Hashing.murmur3_128().newHasher().putString("abc", StandardCharsets.UTF_8).hash();
 

给出字符串"6778ad3f3f3f96b4522dca264174a23b",将其转换为BigInterger即可得到137537073056680613988840834069010096699.

如何从两者中获得相同的结果?

谢谢

解决方案

以下是从这两者中获得相同结果的方法:

byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();
byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));
assertEquals("79267961763742113019008347020647561319",
    new BigInteger(mm3_be).toString());

哈希码的字节需要被视为 little endian ,但是BigInteger会将字节解释为big endian.您大概是使用new BigInteger(hex, 16)创建BigInteger的,但是HashCode.toString()的输出实际上是一系列十六进制数字对,它们表示哈希字节的顺序与asBytes()返回的顺序相同(小端) . (您也可以反转这些对的十六进制,以获取一个十六进制数,当传递给new BigInteger(reversedHex, 16)时,该十六进制数的生成结果相同).

我认为toString()的文档有些混乱,因为它指的是大端".这实际上并不意味着该方法的输出是十六进制数字,表示被解释为大端字节的字节.

我们有一个公开的问题,用于将asBigInteger()添加到HashCode.

I have two different program that wish to hash same string using Murmur3 in Python and Java respectively.

Python version 2.7.9:

mmh3.hash128('abc')

Gives 79267961763742113019008347020647561319L.

Java is Guava 18.0:

HashCode hashCode = Hashing.murmur3_128().newHasher().putString("abc", StandardCharsets.UTF_8).hash();

Gives string "6778ad3f3f3f96b4522dca264174a23b", converting to BigInterger gives 137537073056680613988840834069010096699.

How to get same result from both?

Thanks

解决方案

Here's how to get the same result from both:

byte[] mm3_le = Hashing.murmur3_128().hashString("abc", UTF_8).asBytes();
byte[] mm3_be = Bytes.toArray(Lists.reverse(Bytes.asList(mm3_le)));
assertEquals("79267961763742113019008347020647561319",
    new BigInteger(mm3_be).toString());

The hash code's bytes need to be treated as little endian but BigInteger interprets bytes as big endian. You were presumably using new BigInteger(hex, 16) to create the BigInteger, but the output of HashCode.toString() is actually a series of pairs of hexadecimal digits representing the hash bytes in the same order they're returned by asBytes() (little endian). (You can also reverse those pairs of hexadecimal to get a hex number that does produce the same result when passed to new BigInteger(reversedHex, 16)).

I think the documentation of toString() is somewhat confusing because of the way it refers to "big endian"; it doesn't actually mean that the output of the method is the hexadecimal number representing the bytes interpreted as big endian.

We have an open issue for adding asBigInteger() to HashCode.

这篇关于Murmur3散列Python和Java实现之间的不同结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

1403页,肝出来的..

09-06 22:44