我想删除花药单词中单词部分以外的特定单词
这是例子
data1
name
here is a this
company
there is no food
data2
words count
is 56
com 17
no 22
我写了这个功能,但问题是如果另一个词的一部分,它会删除一个词
def drop(y):
for x in data2.words.values:
y['name']= y['name'].str.replace(x, '')
return y
输出
name
here a th
pany
there food
我所期望的:
name
here a this
company
there food
最佳答案
为了避免多个空格,您可以按空格分割值,过滤出匹配的值,然后再加入:
s = set(data2['words'])
data1['name'] = [' '.join(y for y in x.split() if not y in s) for x in data1['name']]
print (data1)
name
0 here a this
1 company
2 there food
如果将单词边界
replace
与正则表达式一起使用,但可以使用多个空格,则可以使用\b\b
解决方案:pat = '|'.join(r"\b{}\b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '')
print (data1)
name
0 here a this
1 company
2 there food
所以最后有必要将其删除:
pat = '|'.join(r"\b{}\b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '').str.replace(' +', ' ')
print (data1)
name
0 here a this
1 company
2 there food