我在使用正则表达式时遇到麻烦,无法捕获连续的大写单词。
这是我希望正则表达式捕获的内容:
"said Polly Pocket and the toys" -> Polly Pocket
这是我正在使用的正则表达式:
re.findall('said ([A-Z][\w-]*(\s+[A-Z][\w-]*)+)', article)
它返回以下内容:
[('Polly Pocket', ' Pocket')]
我希望它返回:
['Polly Pocket']
最佳答案
积极向前看:
([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)
断言当前要接受的单词需要紧随其后的是另一个带有大写字母的单词。分割:
( # begin capture
[A-Z] # one uppercase letter \ First Word
[a-z]+ # 1+ lowercase letters /
(?=\s[A-Z]) # must have a space and uppercase letter following it
(?: # non-capturing group
\s # space
[A-Z] # uppercase letter \ Additional Word(s)
[a-z]+ # lowercase letter /
)+ # group can be repeated (more words)
) #end capture