日志文件中的正则表达式匹配

日志文件中的正则表达式匹配

本文介绍了日志文件中的正则表达式匹配,返回上面和下面的比赛动态内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个格式一些包罗万象的日志文件如下:

 时间戳事件摘要
富细节
账户名:用户A
酒吧更多详情
时间戳事件摘要
巴兹细节
账户名:用户B
qux更多详情
时间戳等。

我想搜索用户B 日志文件,如果找到,从preceding时间戳回响(但不包括)以下的时间戳。有可能会几个事件匹配我的搜索。这将是很好呼应某种 --- ---启动 - 完 - 周围的每场比赛。

这将是完美的 pcregrep -M ,对不对?问题是,的GnuWin32的 pcregrep 与多行正则表达式搜索大文件死机,而这些包罗万象的日志可以是100兆以上。

我已经试过

我的hackish的解决方法迄今为止涉及到使用的grep -B15 -A30 来找到匹配的线条和打印周围的内容,然后通过管道将现在更容易管理的块到 pcregrep 为抛光。问题是,一些事件是小于十行,而有些则是30个或更多;而我得到的地方遇到较短的事件,一些意想不到的结果。

 :parselog<用户名> <&日志文件GT;设置静音= 1
设置计数= 0
设置deez = 20 \\ D \\ D-\\ D \\ D-\\ D \\ D \\ D \\ D:\\ D \\ D:\\ D \\ D
搜索呼应%〜2%,含1〜...记录在/ Fdelims =%% I(
    'grep的-P -i -B15 -A30?:\\ S + \\ B%〜1 \\ B(@mydomain \\ .EXT)$%〜2^ | pcregrep -M -i^%deez%(| \\ n)的?+ \\ B%〜1 \\ B(@mydomain \\ .EXT |吗?\\ r \\ n)(| \\ N)+ \\ N%deez? %2 ^> NUL
)做(
    回声(%% I | FINDSTR^ 20 [0-9] [0-9] - [0-9] [0-9] - [0-9] [0-9] [0-9] [0。 -9]:[0-9] [0-9]:[0-9] [0-9]> NUL&放大器;及(
        如果定义了沉默(
            设置静音=
            集中找到= 1
            集/ A计数+ = 1
            回声;
            呼应---------------记录的开始!算!-------------
        )其他(
            设置静音= 1
            呼应----------------记录的结尾!算!--------------
            回声;
        )
    )
    如果没有定义沉默回声(%%我
)GOTO:EOF

有没有更好的方法来做到这一点?我遇到那些看起来有点意思的 AWK 命令,是这样的:

  awk的/启动模式/,/结束模式/日志文件

...但它需要匹配一个中间图案为好。不幸的是,我没那么熟悉的 AWK 语法。有什么建议?


埃德莫顿建议我提供一些例如日志和期望的输出。

示例包罗万象

  2013年3月25日8时02分32秒Auth.Critical 169.254.8.110 3月25日8时02分32秒DC3 MSWinEventLog 2安全11730158周一3月25日8时02分28秒529 2013安全NT AUTHORITY \\ SYSTEM N / A审计失败DC3 2登录失败:    原因:未知的用户名或密码错误    用户名:user5f    域:MYDOMAIN    登录类型:3    登录过程:Advapi    验证包:协商    工作站名称:DC3    主叫用户名:DC3 $    调用方域:MYDOMAIN    方登录ID:(0x0,0x3E7)    来电进程ID:400    传递服务: -     源网络地址:169.254.7.86    源端口:40838
2013年3月25日8时02分32秒Auth.Critical 169.254.8.110 3月25日8时02分32秒DC3 MSWinEventLog 2安全11730159周一3月25日8时02分29秒680 2013安全NT AUTHORITY \\ SYSTEM N / A审计失败DC3 9登录尝试:MICROSOFT_AUTHENTICATION_PACKAGE_V1_0登录帐户:USER6Q源工作站:DC3错误code:0xC0000234
2013年3月25日8时02分32秒Auth.Critical 169.254.8.110 3月25日8时02分32秒DC3 MSWinEventLog 2安全11730160周一3月25日8时02分29秒2013 539安全NT AUTHORITY \\ SYSTEM N / A审计失败DC3 2登录失败:    原因:帐户锁定    用户名:[email protected]    域:MYDOMAIN    登录类型:3    登录过程:Advapi    验证包:协商    工作站名称:DC3    主叫用户名:DC3 $    调用方域:MYDOMAIN    方登录ID:(0x0,0x3E7)    来电进程ID:400    传递服务: -     源网络地址:169.254.7.89    源端口:55314
2013年3月25日8时02分32秒Auth.Notice 169.254.5.62 3月25日8时36分38秒DC4.mydomain.tld MSWinEventLog 5安全201326798周一3月25日8点36分37秒2013 4624微软Windows的安全的审核ñ / A审核成功DC4.mydomain.tld 12544帐户已成功登录。学科:
    安全ID:S-1-0-0
    用户名: -
    帐户域: -
    登录ID:为0x0登录类型:3新登录:
    安全ID:S-1-5-21-606747145-1409082233-725345543-160838
    账户名称:DEPTACCT16 $
    帐户域:MYDOMAIN
    登录ID:0x1158e6012c
    登录GUID:{} BCC72986-82A0-4EE9-3729-847BA6FA3A98进程信息:
    进程ID:为0x0
    进程名称: - 网络信息:
    工作站名称:
    源网络地址:169.254.114.62
    源端口:42183详细身份验证信息:
    登录过程:Kerberos的
    验证包:Kerberos
    传递服务: -
    包名称(仅NTLM): -
    密钥长度:0创建一个登录会话时生成此事件。据被访问的计算机上生成。主题字段表示...
2013年3月25日8时02分32秒Auth.Critical 169.254.8.110 3月25日8时02分32秒DC3 MSWinEventLog 2安全11730162周一3月25日8点02分30秒2013 675安全NT AUTHORITY \\ SYSTEM N / A审计失败DC3 9 pre-身份验证失败:    用户名:USER8Y    用户名:%{S-1-5-21-606747145-1409082233-725345543-3904}    服务名称:KRBTGT / MYDOMAIN    pre-认证类型:为0x0    失败code:的0x19    客户端地址:169.254.87.158
2013年3月25日8时02分32秒Auth.Critical等。

示例命令

 电话:parselog user6q \\\\路径\\为\\追赶all.log

预期结果

  ---------------开始记录1 -------------
2013年3月25日8时02分32秒Auth.Critical 169.254.8.110 3月25日8时02分32秒DC3 MSWinEventLog 2安全11730159周一3月25日8时02分29秒680 2013安全NT AUTHORITY \\ SYSTEM N / A审计失败DC3 9登录尝试:MICROSOFT_AUTHENTICATION_PACKAGE_V1_0登录帐户:USER6Q源工作站:DC3错误code:0xC0000234
---------------记录1月底-------------
---------------记录2开始-------------
2013年3月25日8时02分32秒Auth.Critical 169.254.8.110 3月25日8时02分32秒DC3 MSWinEventLog 2安全11730160周一3月25日8时02分29秒2013 539安全NT AUTHORITY \\ SYSTEM N / A审计失败DC3 2登录失败:    原因:帐户锁定    用户名:[email protected]    域:MYDOMAIN    登录类型:3    登录过程:Advapi    验证包:协商    工作站名称:DC3    主叫用户名:DC3 $    调用方域:MYDOMAIN    方登录ID:(0x0,0x3E7)    来电进程ID:400    传递服务: -     源网络地址:169.254.7.89    源端口:55314
---------------记录2月底-------------


解决方案

这是所有你需要用GNU AWK(对于IGNORECASE):

  $猫tst.awk
功能prtRecord(){
    如果(记录〜正则表达式){
        printf的--------纪录%起始D --------%S,++ numRecords,ORS
        printf的%S,记录
        printf的---------纪录%d个---------%s%S结束,numRecords,ORS,ORS
    }
    记录=
}
BEGIN {IGNORECASE = 1}
/ ^ [[:数字:]] + - [[:数字:]] + - [[:数字:]] + / {prtRecord()}
{纪录=记录$ 0个ORS}
END {prtRecord()}

或任何AWK:

  $猫tst.awk
功能prtRecord(){
    如果(tolower的(记录)〜tolower的(正则表达式)){
        printf的--------纪录%起始D --------%S,++ numRecords,ORS
        printf的%S,记录
        printf的---------纪录%d个---------%s%S结束,numRecords,ORS,ORS
    }
    记录=
}
/ ^ [[:数字:]] + - [[:数字:]] + - [[:数字:]] + / {prtRecord()}
{纪录=记录$ 0个ORS}
END {prtRecord()}

您会在UNIX上运行它无论哪种方式:

  $ awk的正则表达式-v = user6q -f tst.awk文件

我不知道Windows的语法,但我希望它是非常相似,如果不相同的。

请注意在脚本中使用tolower的()来作出比较小写的两侧,与之匹配的是不区分大小写。如果你可以在搜索正则表达式这是正确的情况下,而不是传球,那么你不需要tolower的调用()比较的两边。 NBD,它可能只是加速脚本略有上升。

  $ awk的正则表达式-v = user6q -f tst.awk文件
--------记录1开始--------
2013年3月25日8时02分32秒Auth.Critical 169.254.8.110 3月25日8时02分32秒DC3 MSWinEventLog 2安全性
    11730159周一3月25日8时02分29秒680 2013安全NT AUTHORITY \\ SYSTEM N / A审计失败
MICROSOFT_AUTHENTICATION_PACKAGE_V1_0:由DC3 9登录尝试登录帐户:USER6Q源工作站:DC3错误code:0xC0000234
---------记录1月底-----------------记录2开始--------
2013年3月25日8时02分32秒Auth.Critical 169.254.8.110 3月25日8时02分32秒DC3 MSWinEventLog 2安全性
    11730160周一3月25日8时02分29秒539 2013安全NT AUTHORITY \\ SYSTEM N / A审计失败
DC3 2登录失败:    原因:帐户锁定    用户名:[email protected]    域:MYDOMAIN    登录类型:3    登录过程:Advapi    验证包:协商    工作站名称:DC3    主叫用户名:DC3 $    调用方域:MYDOMAIN    方登录ID:(0x0,0x3E7)    来电进程ID:400    传递服务: -     源网络地址:169.254.7.89    源端口:55314
---------记录2月底---------

I have some catchall log files in a format as follows:

timestamp event summary
foo details
account name: userA
bar more details
timestamp event summary
baz details
account name: userB
qux more details
timestamp etc.

I would like to search the log file for userB, and if found, echo from the preceding timestamp down to (but not including) the following timestamp. There will likely be several events matching my search. It would be nice to echo some sort of --- start --- and --- end --- surrounding each match.

This would be perfect for pcregrep -M, right? Problem is, GnuWin32's pcregrep crashes with multiline regexps searching large files, and these catch-all logs can be 100 megs or more.

What I've tried

My hackish workaround thus far involves using grep -B15 -A30 to find matching lines and print surrounding content, then piping the now more manageable chunk into pcregrep for polishing. Problem is that some events are less than ten lines, while others are 30 or more; and I'm getting some unexpected results where the shorter events are encountered.

:parselog <username> <logfile>

set silent=1
set count=0
set deez=20\d\d-\d\d-\d\d \d\d:\d\d:\d\d
echo Searching %~2 for records containing %~1...

for /f "delims=" %%I in (
    'grep -P -i -B15 -A30 ":\s+\b%~1\b(@mydomain\.ext)?$" "%~2" ^| pcregrep -M -i "^%deez%(.|\n)+?\b%~1\b(@mydomain\.ext|\r?\n)(.|\n)+?\n%deez%" 2^>NUL'
) do (
    echo(%%I| findstr "^20[0-9][0-9]-[0-9][0-9]-[0-9][0-9].[0-9][0-9]:[0-9][0-9]:[0-9][0-9]" >NUL && (
        if defined silent (
            set silent=
            set found=1
            set /a "count+=1"
            echo;
            echo ---------------start of record !count!-------------
        ) else (
            set silent=1
            echo ----------------end of record !count!--------------
            echo;
        )
    )
    if not defined silent echo(%%I
)

goto :EOF

Is there a better way to do this? I've come across an awk command that looked interesting, something like:

awk "/start pattern/,/end pattern/" logfile

... but it would need to match a middle pattern as well. Unfortunately, I'm not that familiar with awk syntax. Any suggestions?


Ed Morton suggested that I supply some example logging and expected output.

Example catch-all

2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730158    Mon Mar 25 08:02:28 2013    529 Security    NT AUTHORITY\SYSTEM N/A Audit Failure   dc3 2   Logon Failure:

    Reason:     Unknown user name or bad password

    User Name:  user5f

    Domain:     MYDOMAIN

    Logon Type: 3

    Logon Process:  Advapi

    Authentication Package: Negotiate

    Workstation Name:   dc3

    Caller User Name:   dc3$

    Caller Domain:  MYDOMAIN

    Caller Logon ID:    (0x0,0x3E7)

    Caller Process ID:  400

    Transited Services: -

    Source Network Address: 169.254.7.86

    Source Port:    40838
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730159    Mon Mar 25 08:02:29 2013    680 Security    NT AUTHORITY\SYSTEM N/A Audit Failure   dc3 9   Logon attempt by:   MICROSOFT_AUTHENTICATION_PACKAGE_V1_0

Logon account:  USER6Q

Source Workstation: dc3

Error Code: 0xC0000234
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730160    Mon Mar 25 08:02:29 2013    539 Security    NT AUTHORITY\SYSTEM N/A Audit Failure   dc3 2   Logon Failure:

    Reason:     Account locked out

    User Name:  [email protected]

    Domain: MYDOMAIN

    Logon Type: 3

    Logon Process:  Advapi

    Authentication Package: Negotiate

    Workstation Name:   dc3

    Caller User Name:   dc3$

    Caller Domain:  MYDOMAIN

    Caller Logon ID:    (0x0,0x3E7)

    Caller Process ID: 400

    Transited Services: -

    Source Network Address: 169.254.7.89

    Source Port:    55314
2013-03-25 08:02:32 Auth.Notice 169.254.5.62    Mar 25 08:36:38 DC4.mydomain.tld MSWinEventLog  5   Security    201326798   Mon Mar 25 08:36:37 2013    4624    Microsoft-Windows-Security-Auditing     N/A Audit Success   DC4.mydomain.tld    12544   An account was successfully logged on.

Subject:
    Security ID:        S-1-0-0
    Account Name:       -
    Account Domain:     -
    Logon ID:       0x0

Logon Type:         3

New Logon:
    Security ID:        S-1-5-21-606747145-1409082233-725345543-160838
    Account Name:       DEPTACCT16$
    Account Domain:     MYDOMAIN
    Logon ID:       0x1158e6012c
    Logon GUID:     {BCC72986-82A0-4EE9-3729-847BA6FA3A98}

Process Information:
    Process ID:     0x0
    Process Name:       -

Network Information:
    Workstation Name:
    Source Network Address: 169.254.114.62
    Source Port:        42183

Detailed Authentication Information:
    Logon Process:      Kerberos
    Authentication Package: Kerberos
    Transited Services: -
    Package Name (NTLM only):   -
    Key Length:     0

This event is generated when a logon session is created. It is generated on the computer that was accessed.

The subject fields indicate...
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730162    Mon Mar 25 08:02:30 2013    675 Security    NT AUTHORITY\SYSTEM N/A Audit Failure   dc3 9   Pre-authentication failed:

    User Name:  USER8Y

    User ID:        %{S-1-5-21-606747145-1409082233-725345543-3904}

    Service Name:   krbtgt/MYDOMAIN

    Pre-Authentication Type:    0x0

    Failure Code:   0x19

    Client Address: 169.254.87.158
2013-03-25 08:02:32 Auth.Critical   etc.

Example command

call :parselog user6q \\path\to\catch-all.log

Expected result

---------------start of record 1-------------
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730159    Mon Mar 25 08:02:29 2013    680 Security    NT AUTHORITY\SYSTEM N/A Audit Failure   dc3 9   Logon attempt by:   MICROSOFT_AUTHENTICATION_PACKAGE_V1_0

Logon account:  USER6Q

Source Workstation: dc3

Error Code: 0xC0000234
---------------end of record 1-------------


---------------start of record 2-------------
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security    11730160    Mon Mar 25 08:02:29 2013    539 Security    NT AUTHORITY\SYSTEM N/A Audit Failure   dc3 2   Logon Failure:

    Reason:     Account locked out

    User Name:  [email protected]

    Domain: MYDOMAIN

    Logon Type: 3

    Logon Process:  Advapi

    Authentication Package: Negotiate

    Workstation Name:   dc3

    Caller User Name:   dc3$

    Caller Domain:  MYDOMAIN

    Caller Logon ID:    (0x0,0x3E7)

    Caller Process ID: 400

    Transited Services: -

    Source Network Address: 169.254.7.89

    Source Port:    55314
---------------end of record 2-------------
解决方案

This is all you need with GNU awk (for IGNORECASE):

$ cat tst.awk
function prtRecord() {
    if (record ~ regexp) {
        printf "-------- start of record %d --------%s", ++numRecords, ORS
        printf "%s", record
        printf "--------- end of record %d ---------%s%s", numRecords, ORS, ORS
    }
    record = ""
}
BEGIN{ IGNORECASE=1 }
/^[[:digit:]]+-[[:digit:]]+-[[:digit:]]+/ { prtRecord() }
{ record = record $0 ORS }
END { prtRecord() }

or with any awk:

$ cat tst.awk
function prtRecord() {
    if (tolower(record) ~ tolower(regexp)) {
        printf "-------- start of record %d --------%s", ++numRecords, ORS
        printf "%s", record
        printf "--------- end of record %d ---------%s%s", numRecords, ORS, ORS
    }
    record = ""
}
/^[[:digit:]]+-[[:digit:]]+-[[:digit:]]+/ { prtRecord() }
{ record = record $0 ORS }
END { prtRecord() }

Either way you'd run it on UNIX as:

$ awk -v regexp=user6q -f tst.awk file

I don't know the Windows syntax but I expect it's very similar if not identical.

Note the use of tolower() in the script to make both sides of the comparison lower case so the match is case-insensitive. If you can instead pass in a search regexp that's the correct case, then you don't need to call tolower() on either side of the comparison. nbd, it might just speed the script up slightly.

$ awk -v regexp=user6q -f tst.awk file
-------- start of record 1 --------
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security
    11730159    Mon Mar 25 08:02:29 2013    680 Security    NT AUTHORITY\SYSTEM N/A Audit Failure
dc3 9   Logon attempt by:   MICROSOFT_AUTHENTICATION_PACKAGE_V1_0

Logon account:  USER6Q

Source Workstation: dc3

Error Code: 0xC0000234
--------- end of record 1 ---------

-------- start of record 2 --------
2013-03-25 08:02:32 Auth.Critical   169.254.8.110   Mar 25 08:02:32 dc3 MSWinEventLog   2   Security
    11730160    Mon Mar 25 08:02:29 2013    539 Security    NT AUTHORITY\SYSTEM N/A Audit Failure
dc3 2   Logon Failure:

    Reason:     Account locked out

    User Name:  [email protected]

    Domain: MYDOMAIN

    Logon Type: 3

    Logon Process:  Advapi

    Authentication Package: Negotiate

    Workstation Name:   dc3

    Caller User Name:   dc3$

    Caller Domain:  MYDOMAIN

    Caller Logon ID:    (0x0,0x3E7)

    Caller Process ID: 400

    Transited Services: -

    Source Network Address: 169.254.7.89

    Source Port:    55314
--------- end of record 2 ---------

这篇关于日志文件中的正则表达式匹配,返回上面和下面的比赛动态内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 18:37