问题描述
我尝试通过以下方式对pandas.DataFrame使用组成,但是当我尝试复制对象时却给了我错误.
I try to use composition with pandas.DataFrame in the following way, but it is giving me errors when I try to copy the object.
import numpy as np
import pandas as pd
import copy
class Foo(object):
"""
Foo is composed mostly of a pd.DataFrame, and behaves like it too.
"""
def __init__(self, df, attr_custom):
self._ = df
self.attr_custom = attr_custom
# the following code allows Foo objects to behave like pd.DataFame,
# and I want to keep this behavior.
def __getattr__(self, attr):
return getattr(self._, attr)
df = pd.DataFrame(np.random.randint(0,2,(3,2)), columns=['A','B'])
foo = Foo(df)
foo_cp = copy.deepcopy(foo)
我得到的错误:
---> 16 foo_cp = copy.deepcopy(foo)
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in deepcopy(x, memo, _nil)
188 raise Error(
189 "un(deep)copyable object of type %s" % cls)
--> 190 y = _reconstruct(x, rv, 1, memo)
191
192 memo[d] = y
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in _reconstruct(x, info, deep, memo)
341 slotstate = None
342 if state is not None:
--> 343 y.__dict__.update(state)
344 if slotstate is not None:
345 for key, value in slotstate.iteritems():
TypeError: 'BlockManager' object is not iterable
我的问题:
- 有什么想法吗?
- 将合成与pandas.DataFrame一起使用的推荐"方式是什么?
- 如果出于某些原因使用
_
作为虚拟属性的名称不是一个好主意,请告诉我.
- Any idea what is going on here?
- What is the "recommended" way of using composition with pandas.DataFrame?
- If for some reasons it is a bad idea to use
_
as the name of the dummy attribute, please let me know.
推荐答案
执行此操作的标准方法是定义_constructor
属性:
The standard way to do this is define a _constructor
property:
class Foo(pd.DataFrame):
@property
def _constructor(self):
return Foo
那么大多数DataFrame方法应该可以工作,并返回Foo.
Then most DataFrame methods should work, and return a Foo.
In [11]: df = pd.DataFrame([[1, 2], [3, 4]])
In [12]: foo = Foo(df)
In [13]: foo.copy()
Out[13]:
0 1
0 1 2
1 3 4
In [14]: type(foo.copy())
Out[14]: __main__.Foo
包括copy.deepcopy:
Including copy.deepcopy:
In [15]: copy.deepcopy(foo)
Out[15]:
0 1
0 1 2
1 3 4
In [16]: type(copy.deepcopy(foo))
Out[16]: __main__.Foo
此外:我不会使用_
作为变量/方法名称,它根本不是描述性的.您可以在名称前加上_
表示该名称应视为私有",但要给它一个(描述性的)名称,例如_df
.
Aside: I wouldn't use _
as a variable/method name, it's not descriptive at all. You can prefix a name with _
to show that it should be considered "private", but give it a (descriptive!) name e.g. _df
.
_
在python中经常用来表示丢弃此变量",因此您可以这样写:
_
is often used in python to mean "discard this variable", so you might write:
sum(1 for _ in x) # this is basically the same as len!
尽管使用_
是完全有效的python,例如:
Although it would be perfectly valid python to use the _
e.g.:
sum( _ ** 2 for _ in x)
通常会对此表示皱眉(而不是使用i
之类的东西).
This would generally be frowned upon (instead use i
or something).
在ipython中,_
表示先前返回的值.
In ipython _
means the previous returned value.
这篇关于复制主要由pandas.DataFrame组成的复合对象时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!