通知7文字冒险代码可以重点介绍北,南,西,东,西北,西南,东南和东北等方向。我正在开发一个代码验证脚本,其任务之一是查找这些单词的实例。我的第一次尝试使用蛮力:

import re

sample_line = 'The westerly barn is a room. The field is east of the barn. \
  The stable is northeast of the field. The forest is northwest of the field.'

# note: this could be generated with zip and north/south'' and east/west/'', but that's another exercise.
x = [ 'north', 'south', 'east', 'west', 'northwest', 'southwest', 'southeast', 'northeast' ]

regstr = r'\b({0})\b'.format('|'.join(x))

print(re.findall(regstr, sample_line))


这有效并且给了我我想要的:[ 'east', 'northeast', 'northwest' ]而忽略了westerly

我想使用一些对称性来减少正则表达式。但我注意到我偏爱的方式让零长度比赛成为可能。所以我想出了这个:

regstr2 = r'\b(north|south|(north|south)?(east|west))\b'

print(sample_line)
print([x[0] for x in re.findall(regstr2, sample_line)])


这行得通,但感觉不佳。

this link的帮助下,我的第三次尝试是:

regstr3 = r'(?=.)(\b(north|south)?(east|west)?\b)'

print(sample_line)
print([x[0] for x in re.findall(regstr3, sample_line)])


这有我想要的三个方向,但即使有推荐的(?=。),也有很多我希望忽略的零长度匹配。

Python是否有办法使regstr3的变体起作用?尽管有很明显的解决方法,但要有一个整齐的正则表达式而不需要重复和类似的单词,这将是令人愉快的。

最佳答案

您可以限制单词边界:通过在其后添加(?<!\w),使初始单词边界仅与单词的开头匹配,而在单词的末尾,通过添加(?!\w)使其仅在单词的末尾匹配:

\b(?<!\w)((?:north|south)?(?:east|west)?)\b(?!\w)


请参见regex demo

图案细节


\b(?<!\w)-左侧没有单词char的单词边界
((?:north|south)?(?:east|west)?)-捕获组1:


(?:north|south)?-可选的子字符串,northsouth
(?:east|west)?-可选的子字符串,eastwest

\b(?!\w)-单词边界,右边没有单词char。


Python demo

import re
rx = r"\b(?<!\w)((?:north|south)?(?:east|west)?)\b(?!\w)"
s = "The westerly barn is a room. The field is east of the barn.   The stable is northeast of the field. The forest is northwest of the field."
print( re.findall(rx, s) )

09-04 16:27