问题描述
我一直在看很多帖子,但是还没有完全找到我想要的东西.我不确定如何获取以下示例数据:
I've been looking at a lot of posts and haven't quite found what I'm looking for. I'm not sure how to go about taking the following sample data:
host1 input nic1 ip1 ip2 PROT 30000 10
host1 input nic1 ip1 ip2 PROT 40000 10
host1 input nic1 ip1 ip2 PROT 50000 10
host1 input nic1 ip1 ip2 PROT 60000 10
host1 input nic1 ip3 ip2 PROT 10 30000
host1 input nic1 ip3 ip2 PROT 10 40000
host1 input nic1 ip3 ip2 PROT 10 50000
host1 input nic1 ip3 ip2 PROT 10 60000
host1 output nic1 ip2 ip1 PROT 10 30000
host1 output nic1 ip2 ip1 PROT 10 40000
host1 output nic1 ip2 ip1 PROT 10 50000
host1 output nic1 ip2 ip1 PROT 10 60000
host1 output nic1 ip2 ip3 PROT 30000 10
host1 output nic1 ip2 ip3 PROT 40000 10
host1 output nic1 ip2 ip3 PROT 50000 10
host1 output nic1 ip2 ip3 PROT 60000 10
host1 output loc ip2 ip2 PROT 10 30000
host1 output loc ip2 ip2 PROT 10 50000
并将其合并为:
host1 input nic1 ip1 ip2 PROT 30000:60000 10
host1 input nic1 ip3 ip2 PROT 10 30000:60000
host1 output nic1 ip2 ip1 PROT 10 30000:60000
host1 output nic1 ip2 ip3 PROT 30000:60000 10
host1 output loc ip2 ip2 PROT 10 30000:50000
我有大量这样的数据,需要确定给定行的多个字段的范围,但是我认为,如果有人可以像我上面那样向我展示如何针对一个字段进行操作,我应该能够找出其余的.如果没有的话,我会跟进:).预先感谢您的帮助.
I have a large amount of data like this with the need to make ranges for multiple fields of a given line but I think if somebody can show me how to do it for one field as I have above, I should be able to figure the rest out. And if not I'll follow up :). Thanks in advance for any help.
推荐答案
更新
我已经重构了以下答案中的代码,以使其更具可读性.主体应该阅读几乎是英文散文.
Update
I have refactored the code in the answer below so as to make it more readable. The main body should read almost English prose.
#!/usr/bin/awk -f
# main body
NR == 1 {
copyRecordTo(veryold)
next
}
{
if (inSameGroup()) {
copyRecordTo(old)
} else {
makeRangeForField(NF - 1)
makeRangeForField(NF)
nicePrint()
copyRecordTo(veryold)
}
}
END {
makeRangeForField(NF - 1)
makeRangeForField(NF)
nicePrint()
}
# functions
function copyRecordTo(line) {
for (i = 1; i <= NF; ++i) line[i] = $i
}
function nicePrint() {
for (i = 1; i <= NF; ++i) {
i == NF - 1 ? fmt = "%s\t\t" : fmt = "%s\t"
printf(fmt, old[i])
}
printf("\n")
}
function makeRangeForField(f) {
if (old[f] != veryold[f])
old[f] = veryold[f]":"old[f]
}
function inSameGroup() {
b = 1
for (i = 1; i <= NF - 2; ++i)
b *= $i == veryold[i]
return b == 1
}
原始答案
以下awk
脚本几乎生成了您要查找的内容.
Original answer
The following awk
script generates almost what you are looking for.
基本上,脚本会执行以下操作:
Essentially the script does the following:
- 将仅在第7和/或第8字段中不同的每一行行的第一行存储在
veryold
中 - 将最后一个读取行存储在
old
中 - "boolean"
b
用于检查何时超过了最后一行 - 发生这种情况时,
veryold
的最后两个字段与old
的最后两个字段之间用:
连接在一起,如果它们之间不同,并且会打印old
- 在最后两个字段之间使用了另一个标签
\t
,以提高可读性
- stores in
veryold
the first line of each set of lines that differ only for the 7th and/or 8th filed - stores in
old
the last read line - the "boolean"
b
is used to check when that last line is surpassed - when this happens the last two fields of
veryold
are joined with those ofold
with a:
in between if they are different, andold
is printed - one more tab
\t
is used between the last two fields to improve readability
其他两点:
-
NR == 1
是一种特殊情况,只需要初始化veryold
- 读取最后一行后,
END
处理存储在old
中的最后一行的特殊情况
NR == 1
is a special case that has to initializeveryold
only- after the last line is read
END
handles the special case of the last line stored inold
#!/usr/bin/awk -f
NR == 1 {
for (i = 2; i <= NF; ++i) {
veryold[i] = $i
}
next
}
{
b = 1
for (i = 2; i <= NF - 2; ++i) {
b *= $i == veryold[i]
}
if (b == 1) {
for (i = 1; i <= NF; ++i) {
old[i] = $i
}
} else {
if (old[NF - 1] != veryold[NF - 1]) {
old[NF - 1] = veryold[NF - 1]":"old[NF - 1]
}
if (old[NF] != veryold[NF]) {
old[NF] = veryold[NF]":"old[NF]
}
for (i = 1; i <= NF; ++i) {
if (i == NF - 1) {
fmt = "%s\t\t"
} else {
fmt = "%s\t"
}
printf(fmt, old[i])
}
printf("\n")
for (i = 2; i <= NF; ++i) {
veryold[i] = $i
}
}
}
END {
if (old[NF - 1] != veryold[NF - 1]) {
old[NF - 1] = veryold[NF - 1]":"old[NF - 1]
}
if (old[NF] != veryold[NF]) {
old[NF] = veryold[NF]":"old[NF]
}
for (i = 1; i <= NF; ++i) {
if (i == NF - 1) {
fmt = "%s\t\t"
} else {
fmt = "%s\t"
}
printf(fmt, old[i])
}
}
这篇关于合并除关键字段外所有相同的行,并使关键字段成为范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!