本文介绍了在python中的pandas中匹配数据框之间的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有两个数据框
df1,
Names
one two three
Sri is a good player
Ravi is a mentor
Kumar is a cricketer
df2,
values
sri
NaN
sri, is
kumar,cricketer
我正在尝试在df1中获取包含df2中所有项目的行
I am trying to get the row in df1 which contains the all the items in df2
我的预期输出是
values Names
sri Sri is a good player
NaN
sri, is Sri is a good player
kumar,cricketer Kumar is a cricketer
我尝试过,df1["Names"].str.contains("|".join(df2["values"].values.tolist()))
但是我无法达到预期的输出,因为它具有(,").请帮助
but I cannot achieve my expected output as it has (","). Please help
推荐答案
使用集合
s1 = df1.Names.dropna()
s1.loc[:] = [set(x.lower().split()) for x in s1.values.tolist()]
a1 = s1.values
s2 = df2['values'].dropna()
s2.loc[:] = [set(x.replace(' ', '').lower().split(',')) for x in s2.values.tolist()]
a2 = s2.values
i = np.column_stack([a1 >= a2[:, None], [True] * len(a2)]).argmax(1)
df2.assign(Names=pd.Series(
np.append(df1.Names.values, np.nan)[i], s2.index
))
values Names
0 sri Sri is a good player
1 NaN NaN
2 sri, is Sri is a good player
3 kumar,cricketer Kumar is a cricketer
这篇关于在python中的pandas中匹配数据框之间的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!