本文介绍了如何重载__eq__以比较pandas DataFrame和Series?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为清楚起见,我将从代码中摘录并使用通用名称.我有一个 Foo()类,用于将DataFrame存储到属性.

For clarity I will extract an excerpt from my code and use general names. I have a class Foo() that stores a DataFrame to an attribute.

import pandas as pd
import pandas.util.testing as pdt

class Foo():

    def __init__(self, bar):
        self.bar = bar                                     # dict of dicts
        self.df = pd.DataFrame(bar)                        # pandas object

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        return NotImplemented

    def __ne__(self, other):
        result = self.__eq__(other)
        if result is NotImplemented:
            return result
        return not result

但是,当我尝试比较两个 Foo 实例时,我得到了与比较两个DataFrame的歧义相关的例外(比较应该可以在没有的'df'键的情况下正常工作Foo .__ dict __ ).

However, when I try to compare two instances of Foo, I get an excepetion related to the ambiguity of comparing two DataFrames (the comparison should work fine without the 'df' key in Foo.__dict__).

d1 = {'A' : pd.Series([1, 2], index=['a', 'b']),
      'B' : pd.Series([1, 2], index=['a', 'b'])}
d2 = d1.copy()

foo1 = Foo(d1)
foo2 = Foo(d2)

foo1.bar                                                   # dict
foo1.df                                                    # pandas DataFrame

foo1 == foo2                                               # ValueError

[Out] ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

幸运的是,pandas具有实用程序功能,可以断言两个DataFrame或Series是否为真.如果可能的话,我想使用此函数的比较操作.

Fortunately, pandas has utility functions for asserting whether two DataFrames or Series are true. I'd like to use this function's comparison operation if possible.

pdt.assert_frame_equal(pd.DataFrame(d1), pd.DataFrame(d2)) # no raises

有一些选项可以解决两个 Foo 实例的比较:

There are a few options to resolve the comparison of two Foo instances:

  1. 比较 __ dict __ 的副本,其中 new_dict 缺少df键
  2. __ dict __ 中删除df键(不理想)
  3. 不要比较 __ dict __ ,但是其中只有一部分包含在元组中
  4. 重载 __ eq __ 以便于熊猫DataFrame比较
  1. compare a copy of __dict__, where new_dict lacks the df key
  2. delete the df key from __dict__ (not ideal)
  3. don't compare __dict__, but only parts of it contained in a tuple
  4. overload the __eq__ to facilitate pandas DataFrame comparisons

从长远来看,最后一个选项似乎是最可靠的,但是我不确定最好的方法.最后,我想重构 __ eq __ 来比较 Foo .__ dict __ 中的所有项目,包括DataFrames(和Series).如何做到这一点?

The last option seems the most robust in the long-run, but I am not sure of the best approach. In the end, I would like to refactor __eq__ to compare all items from Foo.__dict__, including DataFrames (and Series). Any ideas on how to accomplish this?

推荐答案

这些线程的解决方案

比较两个熊猫数据框之间的差异

具有NaNs相等性比较的熊猫数据框

def df_equal(self):
    try:
        assert_frame_equal(csvdata, csvdata_old)
        return True
    except:
        return False

对于数据框字典:

def df_equal(df1, df2):
    try:
        assert_frame_equal(df1, df2)
        return True
    except:
        return False

def __eq__(self, other):
    if self.df.keys() != other.keys():
        return False
    for k in self.df.keys():
        if not df_equal(self.df[k], other[k]):
            return False
    return True

这篇关于如何重载__eq__以比较pandas DataFrame和Series?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-14 07:03