我有这样的清单:

stopwords = ['a', 'and', 'is']


这样的句子:

sentence = 'A Mule is Eating and drinking.'


预期产量:

reduced = ['mule', 'eating', 'drinking']


我到目前为止:

reduced = filter(None, re.match(r'\W+', sentence.lower()))


现在如何过滤停用词(请注意大写到小写的转换以及标点符号的省略)?

最佳答案

过滤器表达式错误。更改为:

>>> reduced = filter(lambda w: w not in stopwords, re.split(r'\W+', sentence.lower()))


第一个参数是过滤标准。另请注意,要拆分句子,您需要re.split而不是re.match

>>> list(reduced)
['mule', 'eating', 'drinking']

09-12 11:39