我该如何编写一个regex来删除所有以#开头并在行尾停止的注释——但同时排除前面两行

#!/usr/bin/python


#-*- coding: utf-8 -*-

最佳答案

您可以使用tokenize.generate_tokens解析Python代码来删除注释。以下是对this example from the docs稍加修改的版本:

import tokenize
import io
import sys
if sys.version_info[0] == 3:
    StringIO = io.StringIO
else:
    StringIO = io.BytesIO

def nocomment(s):
    result = []
    g = tokenize.generate_tokens(StringIO(s).readline)
    for toknum, tokval, _, _, _  in g:
        # print(toknum,tokval)
        if toknum != tokenize.COMMENT:
            result.append((toknum, tokval))
    return tokenize.untokenize(result)

with open('script.py','r') as f:
    content=f.read()

print(nocomment(content))

例如:
如果script.py包含
def foo(): # Remove this comment
    ''' But do not remove this #1 docstring
    '''
    # Another comment
    pass

那么nocomment的输出是
def foo ():
    ''' But do not remove this #1 docstring
    '''

    pass

10-08 09:11