我想用regex替换'HDMWhoSomeThing''HDM Who Some Thing'这样的字符串。
所以我想提取以大写字母开头或仅由大写字母组成的单词。注意,在字符串'HDMWho'中,最后一个大写字母实际上是单词Who的第一个字母,不应该包含在单词HDM中。
实现此目标的正确regex是什么?我尝试过很多类似于[A-Z][a-z]+的regex,但都没有成功。当然,[A-Z][a-z]+给我'Who Some Thing'——当然没有'HDM'
有什么想法吗?
谢谢,
鲁基

最佳答案

#! /usr/bin/env python

import re
from collections import deque

pattern = r'([A-Z]{2,}(?=[A-Z]|$)|[A-Z](?=[a-z]|$))'
chunks = deque(re.split(pattern, 'HDMWhoSomeMONKEYThingXYZ'))

result = []
while len(chunks):
  buf = chunks.popleft()
  if len(buf) == 0:
    continue
  if re.match(r'^[A-Z]$', buf) and len(chunks):
    buf += chunks.popleft()
  result.append(buf)

print ' '.join(result)

输出:
HDM Who Some MONKEY Thing XYZ

Judging by lines of code, this task is a much more natural fit with re.findall:

pattern = r'([A-Z]{2,}(?=[A-Z]|$)|[A-Z][a-z]*)'
print ' '.join(re.findall(pattern, 'HDMWhoSomeMONKEYThingX'))

输出:
有猴子的人

10-06 10:37