问题描述
在对某些功能进行单元测试的情况下,我正在尝试使用python pandas建立2个DataFrame的相等性:
In the context of unit testing some functions, I'm trying to establish the equality of 2 DataFrames using python pandas:
ipdb> expect
1 2
2012-01-01 00:00:00+00:00 NaN 3
2013-05-14 12:00:00+00:00 3 NaN
ipdb> df
identifier 1 2
timestamp
2012-01-01 00:00:00+00:00 NaN 3
2013-05-14 12:00:00+00:00 3 NaN
ipdb> df[1][0]
nan
ipdb> df[1][0], expect[1][0]
(nan, nan)
ipdb> df[1][0] == expect[1][0]
False
ipdb> df[1][1] == expect[1][1]
True
ipdb> type(df[1][0])
<type 'numpy.float64'>
ipdb> type(expect[1][0])
<type 'numpy.float64'>
ipdb> (list(df[1]), list(expect[1]))
([nan, 3.0], [nan, 3.0])
ipdb> df1, df2 = (list(df[1]), list(expect[1])) ;; df1 == df2
False
鉴于我要针对整个df
(包括NaN
职位)测试整个expect
,我在做什么错了?
Given that I'm trying to test the entire of expect
against the entire of df
, including NaN
positions, what am I doing wrong?
比较包含NaN
的Series/DataFrames相等性的最简单方法是什么?
What is the simplest way to compare equality of Series/DataFrames including NaN
s?
推荐答案
您可以将assert_frame_equals与check_names = False一起使用(以免检查索引/列名称),如果它们不相等,则会出现此错误: >
You can use assert_frame_equals with check_names=False (so as not to check the index/columns names), which will raise if they are not equal:
In [11]: from pandas.testing import assert_frame_equal
In [12]: assert_frame_equal(df, expected, check_names=False)
您可以将其包装在函数中,例如:
You can wrap this in a function with something like:
try:
assert_frame_equal(df, expected, check_names=False)
return True
except AssertionError:
return False
在最近的熊猫中,此功能已添加为 .equals
:
In more recent pandas this functionality has been added as .equals
:
df.equals(expected)
这篇关于具有NaNs相等性的Pandas DataFrames比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!