我试图生成df中company1包含在company2中的所有行。我是这样做的:

df1=df[['company1','company2']][(df.apply(lambda x: x['company1'] in x['company2'], axis=1) == True)]

当我运行上述代码行时,它还显示“South”与“Southern”匹配。此外,“南”与“南”匹配。我想把这些案子都处理掉。Company1只应包含在Company2的开头。而且,company1不应该是company2中某些单词的一部分,例如“South”(company1)与“Southern”(company2)匹配。我应该如何修改我的代码来完成以上两个需求?

最佳答案

我认为你需要:

df = pd.DataFrame({'company1': {0: 'South', 1: 'South', 2:'South'},
                   'company2': {0: 'Southern', 1: 'Route South', 2: 'South Route'}})

print (df)
  company1     company2
0    South     Southern
1    South  Route South
2    South  South Route

df1=df[df['company2'].str.contains("|".join('^' + df['company1'] + ' '))]
print (df1)
  company1     company2
2    South  South Route

关于python - Pandas 中的字符串收容,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/40117685/

10-13 04:06