问题描述
我正在寻找一个正则表达式来匹配 python 中带连字符的单词.
I'm looking for a regex to match hyphenated words in python.
我设法得到的最接近的是:'\w+-\w+[-w+]*'
The closest I've managed to get is: '\w+-\w+[-w+]*'
text = "one-hundered-and-three- some text foo-bar some--text"
hyphenated = re.findall(r'\w+-\w+[-\w+]*',text)
返回列表 ['one-hundered-and-three-', 'foo-bar'].
which returns list ['one-hundered-and-three-', 'foo-bar'].
这几乎是完美的,除了三"后面的连字符.如果后跟单词",我只想要额外的连字符.即而不是 '[-\w+]*' 我需要类似 '(-\w+)*' 的东西,我认为它可以工作,但没有(它返回 ['-three, '']).即匹配 |word 后跟连字符后跟 word 后跟 hyphen_word 零次或多次|.
This is almost perfect except for the trailing hyphen after 'three'. I only want the additional hyphen if followed by a 'word'. i.e. instead of the '[-\w+]*' I need something like '(-\w+)*' which I thought would work, but doesn't (it returns ['-three, '']). i.e. something that matches |word followed by hyphen followed by word followed by hyphen_word zero or more times|.
推荐答案
试试这个:
re.findall(r'\w+(?:-\w+)+',text)
这里我们认为带连字符的单词是:
Here we consider a hyphenated word to be:
- 多个单词字符
- 后跟任意数量的:
- 一个连字符
- 后跟字符字符
这篇关于带连字符的 Python 正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!