如何重载__eq__以比较pandas DataFrame和Series?

本文介绍了如何重载__eq__以比较pandas DataFrame和Series?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

为清楚起见，我将从代码中摘录并使用通用名称.我有一个 Foo()类，用于将DataFrame存储到属性.

For clarity I will extract an excerpt from my code and use general names. I have a class Foo() that stores a DataFrame to an attribute.

import pandas as pd
import pandas.util.testing as pdt

class Foo():

    def __init__(self, bar):
        self.bar = bar                                     # dict of dicts
        self.df = pd.DataFrame(bar)                        # pandas object

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        return NotImplemented

    def __ne__(self, other):
        result = self.__eq__(other)
        if result is NotImplemented:
            return result
        return not result

但是，当我尝试比较两个 Foo 实例时，我得到了与比较两个DataFrame的歧义相关的例外(比较应该可以在没有的'df'键的情况下正常工作Foo .__ dict __ ).

However, when I try to compare two instances of Foo, I get an excepetion related to the ambiguity of comparing two DataFrames (the comparison should work fine without the 'df' key in Foo.__dict__).

d1 = {'A' : pd.Series([1, 2], index=['a', 'b']),
      'B' : pd.Series([1, 2], index=['a', 'b'])}
d2 = d1.copy()

foo1 = Foo(d1)
foo2 = Foo(d2)

foo1.bar                                                   # dict
foo1.df                                                    # pandas DataFrame

foo1 == foo2                                               # ValueError

[Out] ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

幸运的是，pandas具有实用程序功能，可以断言两个DataFrame或Series是否为真.如果可能的话，我想使用此函数的比较操作.

Fortunately, pandas has utility functions for asserting whether two DataFrames or Series are true. I'd like to use this function's comparison operation if possible.

pdt.assert_frame_equal(pd.DataFrame(d1), pd.DataFrame(d2)) # no raises

有一些选项可以解决两个 Foo 实例的比较:

There are a few options to resolve the comparison of two Foo instances:

比较 __ dict __ 的副本，其中 new_dict 缺少df键
从 __ dict __ 中删除df键(不理想)
不要比较 __ dict __ ，但是其中只有一部分包含在元组中
重载 __ eq __ 以便于熊猫DataFrame比较

compare a copy of __dict__, where new_dict lacks the df key
delete the df key from __dict__ (not ideal)
don't compare __dict__, but only parts of it contained in a tuple
overload the __eq__ to facilitate pandas DataFrame comparisons

从长远来看，最后一个选项似乎是最可靠的，但是我不确定最好的方法.最后，我想重构 __ eq __ 来比较 Foo .__ dict __ 中的所有项目，包括DataFrames(和Series).如何做到这一点?

The last option seems the most robust in the long-run, but I am not sure of the best approach. In the end, I would like to refactor __eq__ to compare all items from Foo.__dict__, including DataFrames (and Series). Any ideas on how to accomplish this?

以比较pandas

如何重载eq以比较pandas DataFrame和Series?

问题描述

推荐答案