我正在编写一个脚本来清理从PDF转换而来的文本文件。由于某些原因,锚字符^$(与字符串的开头和结尾匹配)在我的正则表达式中似乎无法正常工作。我在Linux上使用Python 3.6.6。

为什么^Credits$与下面的代码中的独立行Credits不匹配?

>>> import re
>>> my_regex = r'^Credits$'
>>> my_string = "based upon extrinsic circumstances, as discussed in Serrano v. Priest, 20 Cal.3d 25, 49.\n\nCredits\n(Added by Stats.1977, c. 1197, p. 3979,  1. Amended by Stats.1993, c. 645 (S.B.764),  2.)"
>>> print(re.findall(my_regex,my_string))
[]


这是print()函数显示的文本片段(my_string):

based upon extrinsic circumstances, as discussed in Serrano v. Priest, 20 Cal.3d 25, 49.

Credits
(Added by Stats.1977, c. 1197, p. 3979,  1. Amended by Stats.1993, c. 645 (S.B.764),  2.)


谢谢您的帮助。

最佳答案

正如@CertainPerformance所说,在re.M的末尾使用findall标志:

print(re.findall(my_regex,my_string,re.M))


演示:

>>> import re
>>> my_regex = r'^Credits$'
>>> my_string = "based upon extrinsic circumstances, as discussed in Serrano v. Priest, 20 Cal.3d 25, 49.\n\nCredits\n(Added by Stats.1977, c. 1197, p. 3979,  1. Amended by Stats.1993, c. 645 (S.B.764),  2.)"
>>> print(re.findall(my_regex,my_string,re.M))
['Credits']


或与r'(?m)^Credits$'一起使用:

>>> import re
>>> my_regex = r'(?m)^Credits$'
>>> my_string = "based upon extrinsic circumstances, as discussed in Serrano v. Priest, 20 Cal.3d 25, 49.\n\nCredits\n(Added by Stats.1977, c. 1197, p. 3979,  1. Amended by Stats.1993, c. 645 (S.B.764),  2.)"
>>> print(re.findall(my_regex,my_string,re.M))
['Credits']

10-05 22:09