本文介绍了使用正则表达式获取连续的大写单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的正则表达式无法捕获连续的大写单词.这是我希望正则表达式捕获的内容:
I am having trouble with my regex for capturing consecutive capitalized words.Here is what I want the regex to capture:
"said Polly Pocket and the toys" -> Polly Pocket
这是我使用的正则表达式:
Here is the regex I am using:
re.findall('said ([A-Z][\w-]*(\s+[A-Z][\w-]*)+)', article)
它返回以下内容:
[('Polly Pocket', ' Pocket')]
我希望它返回:
['Polly Pocket']
推荐答案
使用积极的前瞻性:
([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)
断言要接受的当前单词后面需要跟另一个带有大写字母的单词.分解:
Assert that the current word, to be accepted, needs to be followed by another word with a capital letter in it. Broken down:
( # begin capture
[A-Z] # one uppercase letter \ First Word
[a-z]+ # 1+ lowercase letters /
(?=\s[A-Z]) # must have a space and uppercase letter following it
(?: # non-capturing group
\s # space
[A-Z] # uppercase letter \ Additional Word(s)
[a-z]+ # lowercase letter /
)+ # group can be repeated (more words)
) #end capture
这篇关于使用正则表达式获取连续的大写单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!