本文介绍了为什么 Python dict 可以有多个具有相同哈希的键?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解 Python hash 函数的底层.我创建了一个自定义类,其中所有实例都返回相同的哈希值.

I am trying to understand the Python hash function under the hood. I created a custom class where all instances return the same hash value.

class C:
    def __hash__(self):
        return 42

我只是假设上述类的一个实例在任何时候都可以在一个dict中,但实际上一个dict可以有多个具有相同hash的元素.

I just assumed that only one instance of the above class can be in a dict at any time, but in fact a dict can have multiple elements with the same hash.

c, d = C(), C()
x = {c: 'c', d: 'd'}
print(x)
# {<__main__.C object at 0x7f0824087b80>: 'c', <__main__.C object at 0x7f0823ae2d60>: 'd'}
# note that the dict has 2 elements

我做了更多的实验,发现如果我重写 __eq__ 方法使得类的所有实例比较相等,那么 dict 只允许一个实例.

I experimented a little more and found that if I override the __eq__ method such that all the instances of the class compare equal, then the dict only allows one instance.

class D:
    def __hash__(self):
        return 42
    def __eq__(self, other):
        return True

p, q = D(), D()
y = {p: 'p', q: 'q'}
print(y)
# {<__main__.D object at 0x7f0823a9af40>: 'q'}
# note that the dict only has 1 element

所以我很想知道 dict 如何拥有多个具有相同哈希的元素.

So I am curious to know how a dict can have multiple elements with the same hash.

推荐答案

有关 Python 哈希如何工作的详细描述,请参阅我对 Why is早退比别的慢?

For a detailed description of how Python's hashing works see my answer to Why is early return slower than else?

基本上,它使用哈希来选择表中的一个槽.如果槽中有一个值并且哈希匹配,它会比较项目以查看它们是否相等.

Basically it uses the hash to pick a slot in the table. If there is a value in the slot and the hash matches, it compares the items to see if they are equal.

如果哈希匹配但项目不相等,那么它会尝试另一个插槽.有一个公式可以选择这个(我在参考答案中描述过),它会逐渐引入哈希值的未使用部分;但是一旦它用完它们,它最终会通过哈希表中的所有槽.这保证最终我们要么找到匹配的项目,要么找到一个空槽.当搜索找到一个空槽时,它会插入值或放弃(取决于我们是添加还是获取值).

If the hash matches but the items aren't equal, then it tries another slot. There's a formula to pick this (which I describe in the referenced answer), and it gradually pulls in unused parts of the hash value; but once it has used them all up, it will eventually work its way through all slots in the hash table. That guarantees eventually we either find a matching item or an empty slot. When the search finds an empty slot, it inserts the value or gives up (depending whether we are adding or getting a value).

需要注意的重要一点是没有列表或桶:只有一个具有特定槽数的哈希表,每个哈希用于生成候选槽序列.

The important thing to note is that there are no lists or buckets: there is just a hash table with a particular number of slots, and each hash is used to generate a sequence of candidate slots.

这篇关于为什么 Python dict 可以有多个具有相同哈希的键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 08:07