问题描述
我正在编写一个 python 正则表达式,它在文本文档中查找带引号的字符串(从黑匣子记录的航空公司飞行员的引号).我首先尝试使用以下规则编写正则表达式:
I'm writing a python regex that looks through a text document for quoted strings (quotes of airline pilots recorded from blackboxes). I started by trying to write a regex with the following rules:
返回引号之间的内容.
如果以 single 开头,则仅在以 single 结尾时返回.
如果以 double 开头,则仅在以 double 结尾时返回.
例如,我不想匹配hi there"或hi there",而是hi there"和hi there".
For instance I don't want to match "hi there', or 'hi there", but "hi there" and 'hi there'.
我使用一个包含以下内容的测试页面:
I use a testing page which contains things like:
CA "Runway 18, wind 230 degrees, five knots, altimeter 30."
AA "Roger that"
18:24:10 [flap lever moving into detent]
ST: "Some passenger's pushing a switch. May I?"
所以我决定从简单的开始:
So I decided to start simple:
re.findall('("|\').*?\\1', page)
########## /("|').*?\1/ <-- raw regex I think I'm going for.
这个正则表达式的行为非常出乎意料.
我认为它会:
This regex acts very unexpectedly.
I thought it would:
- ( " | " ) 匹配单引号或双引号,另存为反向引用/1.
- .*?匹配非贪婪通配符.
- \1 匹配它在反向引用 \1 中找到的任何内容(第一步).
相反,它返回一个引号数组,但从不返回任何其他内容.
Instead, it returns an array of quotes but never anything else.
['"', '"', "'", "'"]
我真的很困惑,因为等效的(afaik)正则表达式在 VIM 中工作得很好.
I'm really confused because the equivalent (afaik) regex works just fine in VIM.
\("\|'\).\{-}\1/)
我的问题是:
为什么它只返回括号内的内容作为匹配项?这是我对反向引用的理解的缺陷吗?如果是这样,那么为什么它可以在 VIM 中工作?
My question is this:
Why does it return only what is inside parenthesis as the match? Is this a flaw in my understanding of back references? If so then why does it work in VIM?
我如何在 python 中编写我正在寻找的正则表达式?
And how do I write the regex I'm looking for in python?
感谢您的帮助!
推荐答案
阅读 文档.re.findall
返回组,如果有的话.如果您想要整个匹配项,您必须将其全部分组,或者使用 re.finditer
.请参阅这个问题.
Read the documentation. re.findall
returns the groups, if there are any. If you want the entire match you must group it all, or use re.finditer
. See this question.
这篇关于python正则表达式中括号的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!