3   3   how are you doing???
2   5   dear, where abouts!!!!!!........
4   6   don't worry i'll be there for ya///

。我想去掉他们的标点符号。我怎么能用正则表达式循环和剥离。
>>> import re
>>> a="what is. your. name?"
>>> b=re.findall(r'\w+',a)
>>> b
['what', 'is', 'your', 'name']

。。。
File "/usr/lib/python2.7/re.py", line 137, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python2.7/re.py", line 242, in _compile
    raise error, v # invalid expression
sre_constants.error: multiple repeat

编辑:句子是第三列,分隔符是制表符,所以如何从第三列删除标点符号。

最佳答案

Iterate lines using for loop:

with open('/path/to/file.txt') as f:
    for line in f:
        words = re.findall(r'\w+', line)
        # do something with words

with open('/path/to/file.txt') as f:
    for line in f:
        col1, col2, rest = line.split('\t', 2) # split into 3 columns
        words = re.findall(r'\w+', rest)
        line = '\t'.join(col1, col2, ' '.join(words))
        # do something with words or line

关于python - 从文件列表中删除标点符号,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/20569077/

10-16 07:36