为什么这些dtype比较相等但散列却不同?

本文介绍了为什么这些dtype比较相等但散列却不同?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

In [30]: import numpy as np

In [31]: d = np.dtype(np.float64)

In [32]: d
Out[32]: dtype('float64')

In [33]: d == np.float64
Out[33]: True

In [34]: hash(np.float64)
Out[34]: -9223372036575774449

In [35]: hash(d)
Out[35]: 880835502155208439

为什么这些dtypes相等但散列却不同?

Why do these dtypes compare equal but hash different?

请注意，Python确实承诺:

Note that Python does promise that:

我针对此问题的解决方法是在所有内容上调用np.dtype，此后哈希值和比较是一致的.

My workaround for this problem is to call np.dtype on everything, after which hash values and comparisons are consistent.

推荐答案

它们不应以这种方式运行，但是__eq__和__hash__对应的numpy.dtype对象在本质上是损坏的无法确定的设计级别.为此，我将从njsmith的与dtype相关的错误报告中的评论中吸取大量精力.答案.

They shouldn't behave this way, but __eq__ and __hash__ for numpy.dtype objects are broken on an essentially unfixable design level. I'll be pulling heavily from njsmith's comments on a dtype-related bug report for this answer.

np.float64实际上不是dtype.在Python类型系统的一般意义上，它是一种类型.具体来说，如果您从float64 dtype数组中检索标量，则np.float64是结果标量的类型.

np.float64 isn't actually a dtype. It's a type, in the ordinary sense of the Python type system. Specifically, if you retrieve a scalar from an array of float64 dtype, np.float64 is the type of the resulting scalar.

np.dtype(np.float64)是dtype，是numpy.dtype的实例. dtypes是NumPy记录NumPy数组内容结构的方式.对于结构化数组，它们尤其重要.复杂的dtypes.尽管普通的Python类型可能在dtypes中扮演了很多角色，但为新的结构化数组动态创建新类型将非常尴尬，而且在类型类统一之前可能是不可能的.

np.dtype(np.float64) is a dtype, an instance of numpy.dtype. dtypes are how NumPy records the structure of the contents of a NumPy array. They are particularly important for structured arrays, which can have very complex dtypes. While ordinary Python types could have filled much of the role of dtypes, creating new types on the fly for new structured arrays would be highly awkward, and it would probably have been impossible in the days before type-class unification.

numpy.dtype基本上是这样实现__eq__的:

numpy.dtype implements __eq__ basically like this:

def __eq__(self, other):
    if isinstance(other, numpy.dtype):
        return regular_comparison(self, other)
    return self == numpy.dtype(other)

这很坏.除其他问题外，它不是传递性的，它在应返回NotImplemented时会引发TypeError，并且由于dtype强制的工作方式，其输出有时确实很奇怪:

which is pretty broken. Among other problems, it's not transitive, it raises TypeError when it should return NotImplemented, and its output is really bizarre at times because of how dtype coercion works:

>>> x = numpy.dtype(numpy.float64)
>>> x == None
True

numpy.dtype.__hash__并没有任何改善.它没有尝试与numpy.dtype.__eq__接受的所有其他类型的__hash__方法保持一致(并且要处理这么多不兼容的类型，怎么可能?).哎呀，它甚至不应该存在，因为dtype对象是可变的！不仅像模块或文件对象一样易变，在这里还可以，因为__eq__和__hash__通过标识工作. dtype对象是可变的，其方式实际上会更改其哈希值:

numpy.dtype.__hash__ isn't any better. It makes no attempt to be consistent with the __hash__ methods of all the other types numpy.dtype.__eq__ accepts (and with so many incompatible types to deal with, how could it?). Heck, it shouldn't even exist, because dtype objects are mutable! Not just mutable like modules or file objects, where it's okay because __eq__ and __hash__ work by identity. dtype objects are mutable in ways that will actually change their hash value:

>>> x = numpy.dtype([('f1', float)])
>>> hash(x)
-405377605
>>> x.names = ['f2']
>>> hash(x)
1908240630

当您尝试比较d == np.float64时，d.__eq__从np.float64中构建一个dtype并发现d == np.dtype(np.float64)为True.但是，当您使用它们的散列时，np.float64将常规(基于身份)哈希用于类型对象，而d将哈希用于dtype对象.通常，不同类型的相等对象应该具有相等的哈希值，但是dtype实现并不关心该哈希值.

When you try to compare d == np.float64, d.__eq__ builds a dtype out of np.float64 and finds that d == np.dtype(np.float64) is True. When you take their hashes, though, np.float64 uses the regular (identity-based) hash for type objects and d uses the hash for dtype objects. Normally, equal objects of different types should have equal hashes, but the dtype implementation doesn't care about that.

不幸的是，如果不破坏人们所依赖的API，就不可能解决dtype __eq__和__hash__的问题.人们指望像x.dtype == 'float64'或x.dtype == np.float64之类的东西，修复dtype会破坏这种情况.

Unfortunately, it's impossible to fix the problems with dtype __eq__ and __hash__ without breaking APIs people are relying on. People are counting on things like x.dtype == 'float64' or x.dtype == np.float64, and fixing dtypes would break that.

这篇关于为什么这些dtype比较相等但散列却不同?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！