我尝试使用正则表达式从下面的日志中捕获数据组。模式是
<item> : <key> = <value> , <key> = <value>, ..., <key> = <value>
([#\w\d]*?)[\s]*=[\s]*([.\w\d]*)
可以捕获组<key>
和组<value>
但我也想捕获
<item>
组,因此我将上述内容分组并使用{n}重复。([\w]*):([\s]*(([#\w\d]*?)[\s]*=[\s]*([.\w\d]*)),*){1,}
20141207,07:15:52,0,>>比率:收银员#= 30,
值= 2.579,单位=比率,错误= N 20141207,07:15:52,0,>>比率:
收银员#= 31,值= 4.509,单位=比率,错误= N
20141207,07:15:52,0,>> RATIO:Casher#= 32,
值= 3.735,单位=比率,错误= N 20141207,07:15:52,0,>>比率:
收银员#= 33,值= 2.401,单位=比率,错误= N
20141207,07:15:52,0,>>客户:收银员#= 30,值= 50,单位=计数
20141207,07:15:52,0,>>客户:收银员#= 31,值= 6,单位=计数
20141207,07:15:52,0,>>客户:收银员#= 32,值= 88,单位=计数
20141207,07:15:52,0,>>客户:收银员#= 33,值= 33,单位=计数
显然结果不是预期的。有人可以给我一些提示吗?我最终使用python来翻译代码。谢谢。
最佳答案
(?<=>>)(\w+):|([\w#]+)\s*=\s*(\S+?)(?:,|\s)
试试看,获取捕捉,请看演示。
https://regex101.com/r/fA6wE2/1
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
>> '>>'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[\w#]+ any character of: word characters (a-z,
A-Z, 0-9, _), '#' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
\S+? non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
least amount possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of grouping