我有一些带有多行块的文本文件,例如

2011/01/01 13:13:13,<AB>, Some Certain Text,=,
[
certain text
         [
                  0: 0 0 0 0 0 0 0 0
                  8: 0 0 0 0 0 0 0 0
                 16: 0 0 0 9 343 3938 9433 8756
                 24: 6270 4472 3182 2503 1768 1140 836 496
                 32: 326 273 349 269 144 121 94 82
                 40: 64 80 66 59 56 47 50 46
                 48: 64 35 42 53 42 40 41 34
                 56: 35 41 39 39 47 30 30 39
                 Total count: 12345
        ]
    certain text
]
some text
2011/01/01 14:14:14,<AB>, Some Certain Text,=,
[
 certain text
   [
              0: 0 0 0 0 0 0 0 0
              8: 0 0 0 0 0 0 0 0
             16: 0 0 0 4 212 3079 8890 8941
             24: 6177 4359 3625 2420 1639 974 594 438
             32: 323 286 318 296 206 132 96 85
             40: 65 73 62 53 47 55 49 52
             48: 29 44 44 41 43 36 50 36
             56: 40 30 29 40 35 30 25 31
             64: 47 31 25 29 24 30 35 31
             72: 28 31 17 37 35 30 20 33
             80: 28 20 37 25 21 23 25 36
             88: 27 35 22 23 15 24 34 28
             Total count: 123456
    ]
    certain text
some text
]
这些变体长度块存在于文本之间。我想读出 : 之后的所有数字并将它们保存在单独的数组中。
在这种情况下,将有两个数组:

我发现 lpeg 可能是实现它的轻量级方法。但我对 PEG 和 LPeg 完全陌生。请帮忙!

最佳答案

LPEG版本:

local lpeg            = require "lpeg"
local lpegmatch       = lpeg.match
local C, Ct, P, R, S  = lpeg.C, lpeg.Ct, lpeg.P, lpeg.R, lpeg.S
local Cg              = lpeg.Cg

local data_to_arrays

do
  local colon    = P":"
  local lbrak    = P"["
  local rbrak    = P"]"
  local digits   = R"09"^1
  local eol      = P"\n\r" + P"\r\n" + P"\n" + P"\r"
  local ws       = S" \t\v"
  local optws    = ws^0
  local getnum   = C(digits) / tonumber * optws
  local start    = lbrak * optws * eol
  local stop     = optws * rbrak
  local line     = optws * digits * colon * optws
                 * getnum * getnum * getnum * getnum
                 * getnum * getnum * getnum * getnum
                 * eol
  local count    = optws * P"Total count:" * optws * getnum * eol
  local inner    = Ct(line^1 * count^-1)
--local inner    = Ct(line^1 * Cg(count, "count")^-1)
  local array    = start * inner * stop
  local extract  = Ct((array + 1)^0)

  data_to_arrays = function (data)
    return lpegmatch (extract, data)
  end
end

这实际上只有在恰好有八个整数时才有效
数据块的每一行。
根据您输入的格式如何,这可能是诅咒或
祝福;-)
和一个测试文件:
data = [[
some text
[
some text
         [
                  0: 0 0 0 0 0 0 0 0
                  8: 0 0 0 0 0 0 0 0
                 16: 0 0 0 9 343 3938 9433 8756
                 24: 6270 4472 3182 2503 1768 1140 836 496
                 32: 326 273 349 269 144 121 94 82
                 40: 64 80 66 59 56 47 50 46
                 48: 64 35 42 53 42 40 41 34
                 56: 35 41 39 39 47 30 30 39
                 Total count: 12345
        ]
    some text
]
some text
[
 some text
   [
              0: 0 0 0 0 0 0 0 0
              8: 0 0 0 0 0 0 0 0
             16: 0 0 0 4 212 3079 8890 8941
             24: 6177 4359 3625 2420 1639 974 594 438
             32: 323 286 318 296 206 132 96 85
             40: 65 73 62 53 47 55 49 52
             48: 29 44 44 41 43 36 50 36
             56: 40 30 29 40 35 30 25 31
             64: 47 31 25 29 24 30 35 31
             72: 28 31 17 37 35 30 20 33
             80: 28 20 37 25 21 23 25 36
             88: 27 35 22 23 15 24 34 28
    ]
    some text
some text
]
]]

local arrays = data_to_arrays (data)

for n = 1, #arrays do
  local ar   = arrays[n]
  local size = #ar
  io.write (string.format ("[%d] = { --[[size: %d items]]\n  ", n, size))
  for i = 1, size do
    io.write (string.format ("%d,%s", ar[i], (i % 5 == 0) and "\n  " or " "))
  end
  if ar.count ~= nil then
    io.write (string.format ("\n  [\"count\"] = %d,", ar.count))
  end
  io.write (string.format ("\n}\n"))
end

关于在 Lua 中用 LPeg 解析出多行,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/19410454/

10-12 17:51