python - 从文本问题中删除标点符号/数字

我有一些在python中使用正则表达式可以很好地删除标点符号/数字的代码，我不得不稍稍更改一下代码，以使停止列表起作用，但并不是特别重要。无论如何，现在标点符号并没有被删除，坦率地说，我为为什么而感到困惑。

import re
import nltk

# Quran subset
filename = raw_input('Enter name of file to convert to ARFF with extension, eg. name.txt: ')

# create list of lower case words
word_list = re.split('\s+', file(filename).read().lower())
print 'Words in text:', len(word_list)
# punctuation and numbers to be removed
punctuation = re.compile(r'[-.?!,":;()|0-9]')
for word in word_list:
    word = punctuation.sub("", word)
print word_list

关于为什么它不起作用的任何指示都是很棒的，我不是python的专家，所以这可能有点荒谬。谢谢。

最佳答案

改变

for word in word_list:
    word = punctuation.sub("", word)

至

word_list = [punctuation.sub("", word) for word in word_list]

在上面的word中分配给for-loop，只需更改此临时变量引用的值即可。它不会更改word_list。