本文介绍了复制主要由pandas.DataFrame组成的复合对象时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试通过以下方式对pandas.DataFrame使用组成,但是当我尝试复制对象时却给了我错误.

I try to use composition with pandas.DataFrame in the following way, but it is giving me errors when I try to copy the object.

import numpy as np
import pandas as pd
import copy


class Foo(object):
    """
    Foo is composed mostly of a pd.DataFrame, and behaves like it too.
    """

    def __init__(self, df, attr_custom):
        self._ = df
        self.attr_custom = attr_custom

    # the following code allows Foo objects to behave like pd.DataFame,
    # and I want to keep this behavior.
    def __getattr__(self, attr):
        return getattr(self._, attr)


df = pd.DataFrame(np.random.randint(0,2,(3,2)), columns=['A','B'])
foo = Foo(df)
foo_cp = copy.deepcopy(foo)

我得到的错误:

---> 16 foo_cp = copy.deepcopy(foo)

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in deepcopy(x, memo, _nil)
    188                             raise Error(
    189                                 "un(deep)copyable object of type %s" % cls)
--> 190                 y = _reconstruct(x, rv, 1, memo)
    191
    192     memo[d] = y

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/copy.pyc in _reconstruct(x, info, deep, memo)
    341                 slotstate = None
    342             if state is not None:
--> 343                 y.__dict__.update(state)
    344             if slotstate is not None:
    345                 for key, value in slotstate.iteritems():

TypeError: 'BlockManager' object is not iterable

我的问题:

  1. 有什么想法吗?
  2. 将合成与pandas.DataFrame一起使用的推荐"方式是什么?
  3. 如果出于某些原因使用_作为虚拟属性的名称不是一个好主意,请告诉我.
  1. Any idea what is going on here?
  2. What is the "recommended" way of using composition with pandas.DataFrame?
  3. If for some reasons it is a bad idea to use _ as the name of the dummy attribute, please let me know.

推荐答案

执行此操作的标准方法是定义_constructor属性:

The standard way to do this is define a _constructor property:

class Foo(pd.DataFrame):
    @property
    def _constructor(self):
        return Foo

那么大多数DataFrame方法应该可以工作,并返回Foo.

Then most DataFrame methods should work, and return a Foo.

In [11]: df = pd.DataFrame([[1, 2], [3, 4]])

In [12]: foo = Foo(df)

In [13]: foo.copy()
Out[13]:
   0  1
0  1  2
1  3  4

In [14]: type(foo.copy())
Out[14]: __main__.Foo

包括copy.deepcopy:

Including copy.deepcopy:

In [15]: copy.deepcopy(foo)
Out[15]:
   0  1
0  1  2
1  3  4

In [16]: type(copy.deepcopy(foo))
Out[16]: __main__.Foo


此外:我不会使用_作为变量/方法名称,它根本不是描述性的.您可以在名称前加上_表示该名称应视为私有",但要给它一个(描述性的)名称,例如_df.


Aside: I wouldn't use _ as a variable/method name, it's not descriptive at all. You can prefix a name with _ to show that it should be considered "private", but give it a (descriptive!) name e.g. _df.

_在python中经常用来表示丢弃此变量",因此您可以这样写:

_ is often used in python to mean "discard this variable", so you might write:

sum(1 for _ in x)  # this is basically the same as len!

尽管使用_是完全有效的python,例如:

Although it would be perfectly valid python to use the _ e.g.:

sum( _ ** 2 for _ in x)

通常会对此表示皱眉(而不是使用i之类的东西).

This would generally be frowned upon (instead use i or something).

在ipython中,_表示先前返回的值.

In ipython _ means the previous returned value.

这篇关于复制主要由pandas.DataFrame组成的复合对象时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-03 07:51