本文介绍了比较数据框列和条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有2个数据框,如下所示:
I have 2 dataframes as below:
df1:
ID col1 col2
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 B4
5 A5 B5
6 A6 B6
df2:
col1 col2
A1 B1
A2 O5
H3 B3
A4 B4
A5 66
A6 C6
预期结果:我想根据条件生成结果df-df1的col1,col2中的每个值都应存在于df2的col1,col2值中
Expected Result: I would like to generate a result df based on the condition - Each value in col1,col2 of df1 should exist in col1,col2 values of df2
预期结果df:
ID col1 col2 Error
1 A1 B1 No mismatch with df2
2 A2 B2 col2 mismatch with df2
3 A3 B3 col1 mismatch with df2
4 A4 B4 No mismatch with df2
5 A5 B5 col2 mismatch with df2
6 A6 B6 col2 mismatch with df2
推荐答案
使用字典理解功能创建帮助器DataFrame并与 isin
:
Create helper DataFrame with dictionary comprehension and comparing with isin
:
m = pd.DataFrame({c: ~df1[c].isin(df2[c]) for c in ['col1','col2']})
print (m)
col1 col2
0 False False
1 False True
2 True False
3 False False
4 False True
5 False True
然后 numpy.where
由 any
进行至少测试每行一个True
和 dot
通过矩阵乘法获取列名称:
And then numpy.where
with mask by any
for test at least one True
per rows and dot
with matrix multiplication for get column names:
df1['Error'] = np.where(m.any(axis=1),
m.dot(m.columns + ', ').str.rstrip(', ') + ' mismatch with df2',
'No mismatch with df2')
print (df1)
ID col1 col2 Error
0 1 A1 B1 No mismatch with df2
1 2 A2 B2 col2 mismatch with df2
2 3 A3 B3 col1 mismatch with df2
3 4 A4 B4 No mismatch with df2
4 5 A5 B5 col2 mismatch with df2
5 6 A6 B6 col2 mismatch with df2
这篇关于比较数据框列和条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!