问题描述
我试图了解Python的hash
功能.我创建了一个自定义类,其中所有实例都返回相同的哈希值.
I am trying to understand the Python hash
function under the hood. I created a custom class where all instances return the same hash value.
class C:
def __hash__(self):
return 42
我只是假设任何时候以上类的一个实例都可以位于dict
中,但实际上dict
可以有多个具有相同散列的元素.
I just assumed that only one instance of the above class can be in a dict
at any time, but in fact a dict
can have multiple elements with the same hash.
c, d = C(), C()
x = {c: 'c', d: 'd'}
print(x)
# {<__main__.C object at 0x7f0824087b80>: 'c', <__main__.C object at 0x7f0823ae2d60>: 'd'}
# note that the dict has 2 elements
我做了一些实验,发现如果我重写__eq__
方法,以使该类的所有实例都相等,那么dict
仅允许一个实例.
I experimented a little more and found that if I override the __eq__
method such that all the instances of the class compare equal, then the dict
only allows one instance.
class D:
def __hash__(self):
return 42
def __eq__(self, other):
return True
p, q = D(), D()
y = {p: 'p', q: 'q'}
print(y)
# {<__main__.D object at 0x7f0823a9af40>: 'q'}
# note that the dict only has 1 element
因此,我很想知道dict
如何具有相同散列的多个元素.
So I am curious to know how a dict
can have multiple elements with the same hash.
推荐答案
有关Python哈希的工作原理的详细说明,请参见我对的回答.提前返回比其他慢吗?
For a detailed description of how Python's hashing works see my answer to Why is early return slower than else?
基本上,它使用哈希在表中选择一个插槽.如果广告位中有一个值且哈希值匹配,它将比较项目以查看它们是否相等.
Basically it uses the hash to pick a slot in the table. If there is a value in the slot and the hash matches, it compares the items to see if they are equal.
如果哈希不匹配或项目不相等,则尝试另一个槽.有一个公式可以选择(我在引用的答案中对此进行了描述),它会逐渐提取哈希值的未使用部分;但是一旦将其全部用尽,它最终将在哈希表的所有插槽中工作.这样可以保证最终我们找到匹配的商品或空的广告位.当搜索发现一个空插槽时,它会插入值或放弃(取决于我们要添加还是获取值).
If the hash doesn't match or the items aren't equal, then it tries another slot. There's a formula to pick this (which I describe in the referenced answer), and it gradually pulls in unused parts of the hash value; but once it has used them all up, it will eventually work its way through all slots in the hash table. That guarantees eventually we either find a matching item or an empty slot. When the search finds an empty slot, it inserts the value or gives up (depending whether we are adding or getting a value).
要注意的重要一点是,没有列表或存储桶:只有一个具有特定数量的插槽的哈希表,每个哈希用于生成一系列候选插槽.
The important thing to note is that there are no lists or buckets: there is just a hash table with a particular number of slots, and each hash is used to generate a sequence of candidate slots.
这篇关于为什么Python字典可以有多个具有相同哈希值的键?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!