我想删除花药单词中单词部分以外的特定单词
这是例子

data1
    name

    here is a this
    company
    there is no food

data2
    words   count

    is       56
    com     17
    no      22

我写了这个功能,但问题是如果另一个词的一部分,它会删除一个词

def drop(y):
    for x in data2.words.values:
        y['name']= y['name'].str.replace(x, '')

    return y

输出
    name

    here a th
    pany
    there food

我所期望的:
    name

    here a this
    company
    there food

最佳答案

为了避免多个空格,您可以按空格分割值,过滤出匹配的值,然后再加入:

s = set(data2['words'])
data1['name'] = [' '.join(y for y in x.split() if not y in s) for x in data1['name']]
print (data1)
          name
0  here a this
1      company
2   there food

如果将单词边界replace与正则表达式一起使用,但可以使用多个空格,则可以使用\b\b解决方案:
pat = '|'.join(r"\b{}\b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '')
print (data1)
           name
0  here  a this
1       company
2  there   food

所以最后有必要将其删除:
pat = '|'.join(r"\b{}\b".format(x) for x in data2['words'])
data1['name'] = data1['name'].str.replace('('+ pat + ')', '').str.replace(' +', ' ')
print (data1)
          name
0  here a this
1      company
2   there food

10-06 05:23
查看更多