我必须对数据框-df
和df1
df
在下面
Facility Category ID Part Text
Centennial History 11111 A Drain
Centennial History 11111 B Read
Centennial History 11111 C EKG
Centennial History 11111 D Assistant
Centennial History 11111 E Primary
df1
如下(仅包含一个小问题的示例,实际上是50,000行)Facility Category ID Part Text
Centennial History 11111 D Assistant
基本上,我想比较数据框之间的行,如果行在两个数据框之间匹配,则在第一个数据框
df
中创建另一列,列标题为['MatchingFlag']
我的最终结果数据框如下所示,因为我同样担心那些不匹配的数据框。
Facility Category ID Part Text MatchingFlag
Centennial History 11111 A Drain No
Centennial History 11111 B Read No
Centennial History 11111 C EKG No
Centennial History 11111 D Assistant Yes
Centennial History 11111 E Primary No
有什么帮助吗?我尝试过合并两个数据帧的
df = pd.merge(df1, df, how='left', on=['Facility', 'Category', 'ID', 'Part', 'Text'])
,然后根据空白或NaN值创建一个标志,但这并没有达到我的期望。 最佳答案
可能需要在要匹配的列上设置索引,然后使用该索引来排序匹配的行
columns = ['Facility', 'Category', 'ID', 'Part', 'Text']
# It's always a good idea to sort after creating a MultiIndex like this
df = df.set_index(columns).sortlevel()
df1 = df1.set_index(columns).sortlevel()
# You don't have to use Yes here, anything will do
# The boolean True might be more appropriate
df['MatchingFlag'] = "Yes"
df1['MatchingFlag'] = "Yes"
# Add them together, matching rows will have the value "YesYes"
# Non-matches will be nan
result = df + df1
# If you'd rather not have NaN's
result.loc[:,'MatchingFlag'] = result.loc[:,'MatchingFlag'].replace('YesYes','Yes')
result.loc[:,'MatchingFlag'] = result['MatchingFlag'].fillna('No')
关于python - Pandas 在两个数据框之间进行比较,标记匹配的内容,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/33024537/