linux - 如何解析awk中的单词？

我想知道如何解析如下所示的段落:

Text Text Text Text Text Text Text Text Text Text Text Text Text Text Text Text
Text Text Text Text Text Text Text Text Text Text Text Text Text Text Text Text
And many other lines with text that I do not need

                                    * * * * * * *

Autolisp - Dialect of LISP used by the Autocad CAD package, Autodesk,
Sausalito, CA.

CPL -

  1. Combined Programming Language.  U Cambridge and U London.  A very
complex language, syntactically based on ALGOL-60, with a pure functional
subset.

Modula-3* - Incoprporation of Modula-2* ideas into Modula-3.  "Modula-3*:

因此，我可以从awk语句中获得以下导出:

Autolisp
CPL
Modula-3*

我尝试了以下句子，因为我要过滤的文件很大。它是到目前为止所有现有编程语言的列表，但是基本上所有行都遵循与上述相同的模式

到目前为止我使用过的句子:

BEGIN{$0 !~ /^ / && NF == 2 && $2 == "-"} { print $1 }

BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"} /^FLIP -/{print $1,$3}

BEGIN{RS=""; FS=OFS="\n"} {print $1 NF-1}

BEGIN{NF == 2 && $2 == "-" } { print $1 }

BEGIN { RS = "" } { print $1 }

到目前为止对我有用的句子是:

BEGIN { RS = "\n\n"; FS = " - " }
{ print $1 }

awk -F " - " "/ - /{ print $1 }" file.txt

但是它仍然打印或跳过我需要/不需要的行。

感谢您的帮助和回复!
我摔断了头几天，因为我是AWK编程的新手

最佳答案

默认的FS应该很好，为了避免任何重复的行，您可以将输出通过管道传递到sort -u

$ gawk '$2 == "-"  { print $1 }' file | sort -u
Autolisp
CPL
Modula-3*

它可能不会过滤掉您想要的所有内容，但是您可以继续添加规则，直到过滤掉不良数据为止。

另外，您可以避免使用关联数组来使用sort:

$ gawk '$2=="-" { arr[$1] } END { for (key in arr) print key}' file
Autolisp
CPL
Modula-3*

关于linux - 如何解析awk中的单词？，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/18246370/