问题描述
我有以下标签分隔文件:
A1 A1 0 0 2 1 1 1 1 1 1 1 2 1 1 1
A2 A2 0 0 2 1 1 1 1 1 1 1 1 1 1
A3 A3 0 0 2 2 1 1 2 2 1 1 1 1 1
A5 A5 0 0 2 2 1 1 1 1 1 1 1 2 1 1
想法是总结列之间的信息7(包含)和在文件末尾添加的新列中的结尾。
为此,这些是规则:
-
如果行中(第7列和结尾之间)的2总数为 0 :将1 1添加到新的最后一列
-
如果行中的2总数(第7列和结尾之间)为 1 :将1 2添加到新的最后一列
-
如果总数为行中2的r(第7列和结尾之间) 2或更多:将2 2添加到新的最后一列
我开始使用命令提取我想要处理的列:
然后我使用以下方法计算每行中的出现次数:
哪些输出:
1 1
2 0
3 2
4 1
然后我的想法是编写一个循环遍历行的for循环来添加新的汇总列。
我正在考虑这种结构,基于我在这里找到的东西::
<$读取行时p $ p>
;
do
set $ line
如果[$ 2== 0]
则
$ 3 ==1 1
elif [$ 2== 1]
然后
$ 3 ==1 2
elif [$ 2> = 2]
然后
$ 3 ==2 2
其他
打印[错误]
fi
完成< tmp_occurences.txt
但是我被困在这里。我是否必须在开始循环之前创建新列?我是朝着正确的方向前进?
理想情况下,最终输出(在合并初始文件和摘要列的前6列之后)将是:
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 1
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
感谢您的帮助!
使用gnu-aw你可以这样做:
awk -v OFS ='\ t''{
c = 0;
for(i = 7; i< = NF; i ++)
if($ i == 2)
c ++
if(c == 0)
s = 1 1
否则if(c == 1)
s =1 2
else
s =2 2
NF = 6
打印$ 0,s
}'档案
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 1
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
PS:如果不使用gnu-awk你可以使用:
awk -v OFS ='\ t''{c = 0; for(i = 7; i< = NF; i ++){if($ i == 2)c ++; $ i =} if(c == 0)s =1 1;否则如果(c == 1)s =1 2;否则s =2 2; NF = 6;打印$ 0,s}'文件
I have the following tab-separated file:
A1 A1 0 0 2 1 1 1 1 1 1 1 2 1 1 1
A2 A2 0 0 2 1 1 1 1 1 1 1 1 1 1 1
A3 A3 0 0 2 2 1 1 2 2 1 1 1 1 1 1
A5 A5 0 0 2 2 1 1 1 1 1 1 1 2 1 1
The idea is to summarise the information between column 7 (included) and the end in a new column that is added at the end of the file.
To do so, these are the rules:
If the total number of "2"s in the row (between column 7 and the end) is 0: add "1 1" to the new last column
If the total number of "2"s in the row (between column 7 and the end) is 1: add "1 2" to the new last column
If the total number of "2"s in the row (between column 7 and the end) is 2 or more: add "2 2" to the new last column
I started to extract the columns I want to work on using the command:
Then I count the number of occurrence in each row using:
Which outputs:
1 1
2 0
3 2
4 1
Then my idea was to write a for loop that loops through the lines to add the new summary column. I was thinking in this kind of structure, based on what I found here: http://www.thegeekstuff.com/2010/06/bash-if-statement-examples:
while read line ;
do
set $line
If ["$2"==0]
then
$3=="1 1"
elif ["$2"==1 ]
then
$3=="1 2"
elif ["$2">=2 ]
then
$3=="2 2"
else
print ["error"]
fi
done < tmp_occurences.txt
But I am stuck here. Do I have to create the new column before starting the loop? Am I going in the right direction?
Ideally, the final output (after merging the first 6 columns from the initial file and the summary column) would be:
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 1
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
Thank you for your help!
Using gnu-awk you can do:
awk -v OFS='\t' '{
c=0;
for (i=7; i<=NF; i++)
if ($i==2)
c++
if (c==0)
s="1 1"
else if (c==1)
s="1 2"
else
s="2 2"
NF=6
print $0, s
}' file
A1 A1 0 0 2 1 1 2
A2 A2 0 0 2 1 1 1
A3 A3 0 0 2 2 2 2
A5 A5 0 0 2 2 1 2
PS: If not using gnu-awk you can use:
awk -v OFS='\t' '{c=0; for (i=7; i<=NF; i++) {if ($i==2) c++; $i=""} if (c==0) s="1 1"; else if (c==1) s="1 2"; else s="2 2"; NF=6; print $0, s}' file
这篇关于BASH - 使用Loop和If语句汇总来自唯一字段中多个字段的信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!