问题描述
为清楚起见,我将从代码中摘录并使用通用名称.我有一个 Foo()
类,用于将DataFrame存储到属性.
For clarity I will extract an excerpt from my code and use general names. I have a class Foo()
that stores a DataFrame to an attribute.
import pandas as pd
import pandas.util.testing as pdt
class Foo():
def __init__(self, bar):
self.bar = bar # dict of dicts
self.df = pd.DataFrame(bar) # pandas object
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.__dict__ == other.__dict__
return NotImplemented
def __ne__(self, other):
result = self.__eq__(other)
if result is NotImplemented:
return result
return not result
但是,当我尝试比较两个 Foo
实例时,我得到了与比较两个DataFrame的歧义相关的例外(比较应该可以在没有的'df'键的情况下正常工作Foo .__ dict __
).
However, when I try to compare two instances of Foo
, I get an excepetion related to the ambiguity of comparing two DataFrames (the comparison should work fine without the 'df' key in Foo.__dict__
).
d1 = {'A' : pd.Series([1, 2], index=['a', 'b']),
'B' : pd.Series([1, 2], index=['a', 'b'])}
d2 = d1.copy()
foo1 = Foo(d1)
foo2 = Foo(d2)
foo1.bar # dict
foo1.df # pandas DataFrame
foo1 == foo2 # ValueError
[Out] ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
幸运的是,pandas具有实用程序功能,可以断言两个DataFrame或Series是否为真.如果可能的话,我想使用此函数的比较操作.
Fortunately, pandas has utility functions for asserting whether two DataFrames or Series are true. I'd like to use this function's comparison operation if possible.
pdt.assert_frame_equal(pd.DataFrame(d1), pd.DataFrame(d2)) # no raises
有一些选项可以解决两个 Foo
实例的比较:
There are a few options to resolve the comparison of two Foo
instances:
- 比较
__ dict __
的副本,其中new_dict
缺少df键 - 从
__ dict __
中删除df键(不理想) - 不要比较
__ dict __
,但是其中只有一部分包含在元组中 - 重载
__ eq __
以便于熊猫DataFrame比较
- compare a copy of
__dict__
, wherenew_dict
lacks the df key - delete the df key from
__dict__
(not ideal) - don't compare
__dict__
, but only parts of it contained in a tuple - overload the
__eq__
to facilitate pandas DataFrame comparisons
从长远来看,最后一个选项似乎是最可靠的,但是我不确定最好的方法.最后,我想重构 __ eq __
来比较 Foo .__ dict __
中的所有项目,包括DataFrames(和Series).如何做到这一点?
The last option seems the most robust in the long-run, but I am not sure of the best approach. In the end, I would like to refactor __eq__
to compare all items from Foo.__dict__
, including DataFrames (and Series). Any ideas on how to accomplish this?
推荐答案
这些线程的解决方案
def df_equal(self):
try:
assert_frame_equal(csvdata, csvdata_old)
return True
except:
return False
对于数据框字典:
def df_equal(df1, df2):
try:
assert_frame_equal(df1, df2)
return True
except:
return False
def __eq__(self, other):
if self.df.keys() != other.keys():
return False
for k in self.df.keys():
if not df_equal(self.df[k], other[k]):
return False
return True
这篇关于如何重载__eq__以比较pandas DataFrame和Series?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!