好的,我有一些文字:

=== Blah 1 ===
::Junk I wish: 2 Ignore <br/>
::More Junk: 1.2-2.7 <br/>
::ABC: [http://www.google.com (STUFF/I/Want)]<br/>
::More2: Ignore<br/>
::More Stuf 2 Ignore: N/A<br/>

=== Blah 2 ===
::Junk I wish: More 2 Ignore <br/>
::More Junk: 1.2-2.7 <br/>
::ABC: [http://www.google.com (Other/STUFF/I/Want)]<br/>
::More2: More Ignore<br/>
::More Stuf 2 Ignore: More N/A<br/>

我想输出:
Blah 1, (STUFF/I/Want)
Blah 2, (Other/STUFF/I/Want)

我想出了如何截取我想要的部分行:
gawk  '/===/ {print } /ABC/ {print $3}' file_name

输出以下内容:
=== Blah 1 ===
(STUFF/I/Want)]<br/>
=== Blah 2 ===
(Other/STUFF/I/Want)]<br/>

我不明白的是如何剔除我不想要的其他字符,并将其放在一行上。

最佳答案

单程。
script.awk的内容:

BEGIN {
    ## Characters to separate output fields
    OFS = ", "
}

## When line begins with several equal signs, remove them, both leading
## and trailing, and save the title.
$1 ~ /^=+$/ {
    gsub( /\s*=\s*/, "", $0 )
    title = $0
    next
}

## For the second field, split line with both pair of parentheses and
## print second field.
$1 ~ /ABC/ {

    ## For GNU-Awk
    #split( $0, abc_line, /(\()|(\))/, seps )
    #printf "%s%s%s%s%s\n", title, OFS, seps[1], abc_line[2], seps[2]

    ## For Awk
    split( $0, abc_line, /(\()|(\))/ )
    printf "%s%s(%s)\n", title, OFS, abc_line[2]

}

像这样运行:
awk -f script.awk infile

它产生:
Blah 1, (STUFF/I/Want)
Blah 2, (Other/STUFF/I/Want)

关于regex - gawk-提取文本并将其放在同一行上,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/15326057/

10-10 08:19