问题描述
我按照 numpy 文档创建了 numpy ndarray 的子类.特别是,我添加了 一个自定义属性,通过修改提供的代码.
我正在并行循环中操作此类的实例,使用 Python multiprocessing
.据我了解,作用域本质上是复制"到多个线程的方式是使用 pickle
.
我现在遇到的问题与 numpy 数组的腌制方式有关.我找不到任何关于此的全面文档,但一些莳萝开发人员之间的讨论表明我应该关注在 __reduce__
方法上,该方法在酸洗时被调用.
有人能对此有更多的了解吗?最小的工作示例实际上只是我上面链接的 numpy 示例代码,为了完整起见,复制到此处:
将 numpy 导入为 np类 RealisticInfoArray(np.ndarray):def __new__(cls, input_array, info=None):# 输入数组是一个已经形成的ndarray实例# 我们首先强制转换为我们的类类型obj = np.asarray(input_array).view(cls)# 将新属性添加到创建的实例中obj.info = 信息# 最后,我们必须返回新创建的对象:返回对象def __array_finalize__(self, obj):# 见 InfoArray.__array_finalize__ 评论如果 obj 为 None:返回self.info = getattr(obj, 'info', None)
现在问题来了:
进口泡菜obj = RealisticInfoArray([1, 2, 3], info='foo')打印 obj.info #'foo'pickle_str = pickle.dumps(obj)new_obj = pickle.loads(pickle_str)打印 new_obj.info # 引发 AttributeError
谢谢.
np.ndarray
使用 __reduce__
来pickle 自身.当您调用该函数时,我们可以查看它实际返回的内容,以了解发生了什么:
所以,我们得到了一个三元组.__reduce__
的文档描述了每个元素的作用:
当返回一个元组时,它必须在两到五个元素之间长.可以省略可选元素,也可以提供 None作为他们的价值.这个元组的内容被正常腌制,并且用于在解酸时重建对象.的语义每个元素是:
将被调用以创建初始版本的可调用对象物体.元组的下一个元素将为这个可调用的元素和后面的元素提供了额外的状态信息随后将用于完全重建腌制数据.
在 unpickling 环境中,这个对象必须是一个类,一个callable 注册为安全构造函数"(见下文),或者它必须有一个具有真值的属性
__safe_for_unpickling__
.否则,将在 unpickling 中引发UnpicklingError
环境.请注意,像往常一样,可调用对象本身由名称.可调用对象的参数元组.
可选地,对象的状态,它将被传递给对象的
__setstate__()
方法,如 Pickling 和 unpickling 普通类实例部分所述.如果对象没有__setstate__()
方法,然后,如上所述,该值必须是一个字典,它将被添加到对象的__dict__
.
所以,_reconstruct
是调用重建对象的函数,(, (0,), 'b')
是传递给该函数的参数,以及 (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))
被传递给类' __setstate__
.这给了我们一个机会;我们可以覆盖 __reduce__
并将我们自己的元组提供给 __setstate__
,然后额外覆盖 __setstate__
,以在我们解压时设置我们的自定义属性.我们只需要确保保留父类需要的所有数据,并调用父类的__setstate__
:
class RealisticInfoArray(np.ndarray):def __new__(cls, input_array, info=None):obj = np.asarray(input_array).view(cls)obj.info = 信息返回对象def __array_finalize__(self, obj):如果 obj 为 None:返回self.info = getattr(obj, 'info', None)def __reduce__(self):# 获取父对象的 __reduce__ 元组pickled_state = super(RealisticInfoArray, self).__reduce__()# 创建我们自己的元组传递给 __setstate__new_state = pickled_state[2] + (self.info,)# 返回一个用我们自己的元组替换父级的 __setstate__ 元组的元组返回(pickled_state[0]、pickled_state[1]、new_state)def __setstate__(self, state):self.info = state[-1] # 设置信息属性# 使用其他元组元素调用父级的 __setstate__.super(RealisticInfoArray, self).__setstate__(state[0:-1])
用法:
>>>obj = pick.RealisticInfoArray([1, 2, 3], info='foo')>>>pickle_str = pickle.dumps(obj)>>>pickle_str"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'I've created a subclass of numpy ndarray following the numpy documentation. In particular, I have added a custom attribute by modifying the code provided.
I'm manipulating instances of this class within a parallel loop, using Python multiprocessing
. As I understand it, the way that the scope is essentially 'copied' to multiple threads is using pickle
.
The problem I am now coming up against relates to the way that numpy arrays are pickled. I can't find any comprehensive documentation about this, but some discussions between the dill developers suggest that I should be focusing on the __reduce__
method, which is being called upon pickling.
Can anyone shed any more light on this? The minimal working example is really just the numpy example code I linked to above, copied here for completeness:
import numpy as np
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# see InfoArray.__array_finalize__ for comments
if obj is None: return
self.info = getattr(obj, 'info', None)
Now here is the problem:
import pickle
obj = RealisticInfoArray([1, 2, 3], info='foo')
print obj.info # 'foo'
pickle_str = pickle.dumps(obj)
new_obj = pickle.loads(pickle_str)
print new_obj.info # raises AttributeError
Thanks.
np.ndarray
uses __reduce__
to pickle itself. We can take a look at what it actually returns when you call that function to get an idea of what's going on:
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> obj.__reduce__()
(<built-in function _reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))
So, we get a 3-tuple back. The docs for __reduce__
describe what each element is doing:
So, _reconstruct
is the function called to rebuild the object, (<class 'pick.RealisticInfoArray'>, (0,), 'b')
are the arguments passed to that function, and (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))
gets passed to the class' __setstate__
. This gives us an opportunity; we could override __reduce__
and provide our own tuple to __setstate__
, and then additionally override __setstate__
, to set our custom attribute when we unpickle. We just need to make sure we preserve all the data the parent class needs, and call the parent's __setstate__
, too:
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
obj = np.asarray(input_array).view(cls)
obj.info = info
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.info = getattr(obj, 'info', None)
def __reduce__(self):
# Get the parent's __reduce__ tuple
pickled_state = super(RealisticInfoArray, self).__reduce__()
# Create our own tuple to pass to __setstate__
new_state = pickled_state[2] + (self.info,)
# Return a tuple that replaces the parent's __setstate__ tuple with our own
return (pickled_state[0], pickled_state[1], new_state)
def __setstate__(self, state):
self.info = state[-1] # Set the info attribute
# Call the parent's __setstate__ with the other tuple elements.
super(RealisticInfoArray, self).__setstate__(state[0:-1])
Usage:
>>> obj = pick.RealisticInfoArray([1, 2, 3], info='foo')
>>> pickle_str = pickle.dumps(obj)
>>> pickle_str
"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb."
>>> new_obj = pickle.loads(pickle_str)
>>> new_obj.info
'foo'
这篇关于酸洗 numpy 数组的子类时保留自定义属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!