问题描述
我目前正在编写一个正则表达式来查找字符串中的单位和大小(或者它可以用作维度).例如:产品:A,2 x 3.5 加仑瓶"
为简单起见,我将删除所有空格,因此变为:
产品:A,2x3.5 加仑瓶"
我的正则表达式如下:
numAndSize = re.compile(r'\d+[xX]\d+(\.\d+)?')
但是当我尝试使用 findall 时,会发生这种情况:
在 [47]: numAndSize.findall("Product:A,2x3.5gallonbottles")输出[47]:['.5']
我-只-得到这个字符串中的'.5',而不是整个表达式
然而,使用搜索和组按预期工作:
在 [50]: numAndSize.search("Product:A,2x3.5gallonbottles").group(0)输出[50]:'2x3.5'
从那里开始,我尝试将我的正则表达式更改为不包含可选的小数,并在其上运行 findall.
在[51]中:numAndSize = re.compile(r'\d+[xX]\d+')在 [52]: numAndSize.findall("Product:A,2x3.5gallonbottles")输出[52]:['2x3']
这种行为背后有什么原因吗?出于我的目的,我当然可以使用 .search().group(),但我个人喜欢 findall,因为输出以干净的格式返回了更多信息.
如果正则表达式包含任何捕获组,re.findall()
将返回这些组而不是整个匹配项.要获得整个比赛,请使用非捕获组:
或者,如果您可以利用此行为使其返回维度(或单位或其他任何内容)的元组:
>>>numAndSize = re.compile(r'(\d+)[xX](\d+(?:\.\d+)?)')>>>numAndSize.findall("产品:A,2x3.5gallonbottles")[('2', '3.5')]I'm currently writing a regular expression to find the units and size (or it could work as dimensions) in a string. For example: "Product: A, 2 x 3.5 gallon bottles"
For simplicity, I'm removing all whitespace, so this becomes:
"Product:A,2x3.5gallonbottles"
My regex is as follows:
numAndSize = re.compile(r'\d+[xX]\d+(\.\d+)?')
But when I try to use findall, this happens:
In [47]: numAndSize.findall("Product:A,2x3.5gallonbottles")
Out[47]: ['.5']
I -only- get the '.5' in this string, instead of the entire expression
Using search and group, however, works as expected:
In [50]: numAndSize.search("Product:A,2x3.5gallonbottles").group(0)
Out[50]: '2x3.5'
From there, I tried changing my regex to not include the optional decimal, and ran findall on that.
In [51]: numAndSize = re.compile(r'\d+[xX]\d+')
In [52]: numAndSize.findall("Product:A,2x3.5gallonbottles")
Out[52]: ['2x3']
Is there a reason behind this behavior? For my purposes I can certainly use .search().group(), but I personally like findall since the output gives back a lot more information in a clean format.
If the regular expression contains any capturing groups, re.findall()
will return those groups instead of the entire match. To get the entire match use a non-capturing group:
>>> numAndSize = re.compile(r'\d+[xX]\d+(?:\.\d+)?')
>>> numAndSize.findall("Product:A,2x3.5gallonbottles")
['2x3.5']
Or if you could take advantage of this behavior to have it return a tuple of the dimensions (or units or whatever they are):
>>> numAndSize = re.compile(r'(\d+)[xX](\d+(?:\.\d+)?)')
>>> numAndSize.findall("Product:A,2x3.5gallonbottles")
[('2', '3.5')]
这篇关于Python 正则表达式的奇怪行为 - findall 只找到“()?";部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!