

In [30]: import numpy as np

In [31]: d = np.dtype(np.float64)

In [32]: d
Out[32]: dtype('float64')

In [33]: d == np.float64
Out[33]: True

In [34]: hash(np.float64)
Out[34]: -9223372036575774449

In [35]: hash(d)
Out[35]: 880835502155208439


Why do these dtypes compare equal but hash different?


Note that Python does promise that:


My workaround for this problem is to call np.dtype on everything, after which hash values and comparisons are consistent.



They shouldn't behave this way, but __eq__ and __hash__ for numpy.dtype objects are broken on an essentially unfixable design level. I'll be pulling heavily from njsmith's comments on a dtype-related bug report for this answer.

np.float64 isn't actually a dtype. It's a type, in the ordinary sense of the Python type system. Specifically, if you retrieve a scalar from an array of float64 dtype, np.float64 is the type of the resulting scalar.

np.dtype(np.float64) is a dtype, an instance of numpy.dtype. dtypes are how NumPy records the structure of the contents of a NumPy array. They are particularly important for structured arrays, which can have very complex dtypes. While ordinary Python types could have filled much of the role of dtypes, creating new types on the fly for new structured arrays would be highly awkward, and it would probably have been impossible in the days before type-class unification.


numpy.dtype implements __eq__ basically like this:

def __eq__(self, other):
    if isinstance(other, numpy.dtype):
        return regular_comparison(self, other)
    return self == numpy.dtype(other)


which is pretty broken. Among other problems, it's not transitive, it raises TypeError when it should return NotImplemented, and its output is really bizarre at times because of how dtype coercion works:

>>> x = numpy.dtype(numpy.float64)
>>> x == None

numpy.dtype.__hash__ isn't any better. It makes no attempt to be consistent with the __hash__ methods of all the other types numpy.dtype.__eq__ accepts (and with so many incompatible types to deal with, how could it?). Heck, it shouldn't even exist, because dtype objects are mutable! Not just mutable like modules or file objects, where it's okay because __eq__ and __hash__ work by identity. dtype objects are mutable in ways that will actually change their hash value:

>>> x = numpy.dtype([('f1', float)])
>>> hash(x)
>>> x.names = ['f2']
>>> hash(x)

When you try to compare d == np.float64, d.__eq__ builds a dtype out of np.float64 and finds that d == np.dtype(np.float64) is True. When you take their hashes, though, np.float64 uses the regular (identity-based) hash for type objects and d uses the hash for dtype objects. Normally, equal objects of different types should have equal hashes, but the dtype implementation doesn't care about that.

Unfortunately, it's impossible to fix the problems with dtype __eq__ and __hash__ without breaking APIs people are relying on. People are counting on things like x.dtype == 'float64' or x.dtype == np.float64, and fixing dtypes would break that.


