通知7文字冒险代码可以重点介绍北,南,西,东,西北,西南,东南和东北等方向。我正在开发一个代码验证脚本,其任务之一是查找这些单词的实例。我的第一次尝试使用蛮力:
import re
sample_line = 'The westerly barn is a room. The field is east of the barn. \
The stable is northeast of the field. The forest is northwest of the field.'
# note: this could be generated with zip and north/south'' and east/west/'', but that's another exercise.
x = [ 'north', 'south', 'east', 'west', 'northwest', 'southwest', 'southeast', 'northeast' ]
regstr = r'\b({0})\b'.format('|'.join(x))
print(re.findall(regstr, sample_line))
这有效并且给了我我想要的:
[ 'east', 'northeast', 'northwest' ]
而忽略了westerly
。我想使用一些对称性来减少正则表达式。但我注意到我偏爱的方式让零长度比赛成为可能。所以我想出了这个:
regstr2 = r'\b(north|south|(north|south)?(east|west))\b'
print(sample_line)
print([x[0] for x in re.findall(regstr2, sample_line)])
这行得通,但感觉不佳。
在this link的帮助下,我的第三次尝试是:
regstr3 = r'(?=.)(\b(north|south)?(east|west)?\b)'
print(sample_line)
print([x[0] for x in re.findall(regstr3, sample_line)])
这有我想要的三个方向,但即使有推荐的(?=。),也有很多我希望忽略的零长度匹配。
Python是否有办法使
regstr3
的变体起作用?尽管有很明显的解决方法,但要有一个整齐的正则表达式而不需要重复和类似的单词,这将是令人愉快的。 最佳答案
您可以限制单词边界:通过在其后添加(?<!\w)
,使初始单词边界仅与单词的开头匹配,而在单词的末尾,通过添加(?!\w)
使其仅在单词的末尾匹配:
\b(?<!\w)((?:north|south)?(?:east|west)?)\b(?!\w)
请参见regex demo
图案细节
\b(?<!\w)
-左侧没有单词char的单词边界((?:north|south)?(?:east|west)?)
-捕获组1:(?:north|south)?
-可选的子字符串,north
或south
(?:east|west)?
-可选的子字符串,east
或west
\b(?!\w)
-单词边界,右边没有单词char。Python demo:
import re
rx = r"\b(?<!\w)((?:north|south)?(?:east|west)?)\b(?!\w)"
s = "The westerly barn is a room. The field is east of the barn. The stable is northeast of the field. The forest is northwest of the field."
print( re.findall(rx, s) )