问题描述
In [30]: import numpy as np
In [31]: d = np.dtype(np.float64)
In [32]: d
Out[32]: dtype('float64')
In [33]: d == np.float64
Out[33]: True
In [34]: hash(np.float64)
Out[34]: -9223372036575774449
In [35]: hash(d)
Out[35]: 880835502155208439
为什么这些dtypes相等但散列却不同?
Why do these dtypes compare equal but hash different?
请注意,Python确实承诺:
Note that Python does promise that:
我针对此问题的解决方法是在所有内容上调用np.dtype
,此后哈希值和比较是一致的.
My workaround for this problem is to call np.dtype
on everything, after which hash values and comparisons are consistent.
推荐答案
它们不应以这种方式运行,但是__eq__
和__hash__
对应的numpy.dtype
对象在本质上是损坏的无法确定的设计级别.为此,我将从njsmith的与dtype相关的错误报告中的评论中吸取大量精力.答案.
They shouldn't behave this way, but __eq__
and __hash__
for numpy.dtype
objects are broken on an essentially unfixable design level. I'll be pulling heavily from njsmith's comments on a dtype-related bug report for this answer.
np.float64
实际上不是dtype.在Python类型系统的一般意义上,它是一种类型.具体来说,如果您从float64 dtype数组中检索标量,则np.float64
是结果标量的类型.
np.float64
isn't actually a dtype. It's a type, in the ordinary sense of the Python type system. Specifically, if you retrieve a scalar from an array of float64 dtype, np.float64
is the type of the resulting scalar.
np.dtype(np.float64)
是dtype,是numpy.dtype
的实例. dtypes是NumPy记录NumPy数组内容结构的方式.对于结构化数组,它们尤其重要.复杂的dtypes.尽管普通的Python类型可能在dtypes中扮演了很多角色,但为新的结构化数组动态创建新类型将非常尴尬,而且在类型类统一之前可能是不可能的.
np.dtype(np.float64)
is a dtype, an instance of numpy.dtype
. dtypes are how NumPy records the structure of the contents of a NumPy array. They are particularly important for structured arrays, which can have very complex dtypes. While ordinary Python types could have filled much of the role of dtypes, creating new types on the fly for new structured arrays would be highly awkward, and it would probably have been impossible in the days before type-class unification.
numpy.dtype
基本上是这样实现__eq__
的:
numpy.dtype
implements __eq__
basically like this:
def __eq__(self, other):
if isinstance(other, numpy.dtype):
return regular_comparison(self, other)
return self == numpy.dtype(other)
这很坏.除其他问题外,它不是传递性的,它在应返回NotImplemented
时会引发TypeError
,并且由于dtype强制的工作方式,其输出有时确实很奇怪:
which is pretty broken. Among other problems, it's not transitive, it raises TypeError
when it should return NotImplemented
, and its output is really bizarre at times because of how dtype coercion works:
>>> x = numpy.dtype(numpy.float64)
>>> x == None
True
numpy.dtype.__hash__
并没有任何改善.它没有尝试与numpy.dtype.__eq__
接受的所有其他类型的__hash__
方法保持一致(并且要处理这么多不兼容的类型,怎么可能?).哎呀,它甚至不应该存在,因为dtype对象是可变的!不仅像模块或文件对象一样易变,在这里还可以,因为__eq__
和__hash__
通过标识工作. dtype对象是可变的,其方式实际上会更改其哈希值:
numpy.dtype.__hash__
isn't any better. It makes no attempt to be consistent with the __hash__
methods of all the other types numpy.dtype.__eq__
accepts (and with so many incompatible types to deal with, how could it?). Heck, it shouldn't even exist, because dtype objects are mutable! Not just mutable like modules or file objects, where it's okay because __eq__
and __hash__
work by identity. dtype objects are mutable in ways that will actually change their hash value:
>>> x = numpy.dtype([('f1', float)])
>>> hash(x)
-405377605
>>> x.names = ['f2']
>>> hash(x)
1908240630
当您尝试比较d == np.float64
时,d.__eq__
从np.float64
中构建一个dtype并发现d == np.dtype(np.float64)
为True.但是,当您使用它们的散列时,np.float64
将常规(基于身份)哈希用于类型对象,而d
将哈希用于dtype对象.通常,不同类型的相等对象应该具有相等的哈希值,但是dtype实现并不关心该哈希值.
When you try to compare d == np.float64
, d.__eq__
builds a dtype out of np.float64
and finds that d == np.dtype(np.float64)
is True. When you take their hashes, though, np.float64
uses the regular (identity-based) hash for type objects and d
uses the hash for dtype objects. Normally, equal objects of different types should have equal hashes, but the dtype implementation doesn't care about that.
不幸的是,如果不破坏人们所依赖的API,就不可能解决dtype __eq__
和__hash__
的问题.人们指望像x.dtype == 'float64'
或x.dtype == np.float64
之类的东西,修复dtype会破坏这种情况.
Unfortunately, it's impossible to fix the problems with dtype __eq__
and __hash__
without breaking APIs people are relying on. People are counting on things like x.dtype == 'float64'
or x.dtype == np.float64
, and fixing dtypes would break that.
这篇关于为什么这些dtype比较相等但散列却不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!