问题描述
输入1
1 10611 2122 C :0.983607 :0.0163934
1 10611 2 122 C:0.983607 G:0.0163934
输入2
1 10611 rs146752890 C 摹 100 PASS AC=184;RSQ=0.8228;AVGPOST=0.9640;AN=2184;ERATE=0.0031;VT=SNP;AA=.;THETA=0.0127;LDAF=0.0902;SNPSOURCE=LOWCOV;AF=0.08;ASN_AF=0.08;AMR_AF=0.14;AFR_AF=0.08;EUR_AF=0.07
1 10611 rs146752890 C G 100 PASS AC=184;RSQ=0.8228;AVGPOST=0.9640;AN=2184;ERATE=0.0031;VT=SNP;AA=.;THETA=0.0127;LDAF=0.0902;SNPSOURCE=LOWCOV;AF=0.08;ASN_AF=0.08;AMR_AF=0.14;AFR_AF=0.08;EUR_AF=0.07
在这里
第一和第二栏前的匹配和价值观:第一档和第二档第4列第5列是equel和第6列(前值:)的第二个文件,第一个和第5栏是equel和输出创造在此基础上match.Will获得的输入和输出线清晰的思路和这两个文件都.gz文件解
here1st and 2nd column are matching and values before ':' of 5th column of first file and 4th column of 2nd files are equel and 6th column(values before ':') of first and 5th column of second files are equel and output is creating based on this match.Will get the clear idea from input and output line and both files are .gz files
输出
1 10611 rs146752890 C G 100 PASS AC=184;RSQ=0.8228;AVGPOST=0.9640;AN=2184;ERATE=0.0031;VT=SNP;AA=.;THETA=0.0127;LDAF=0.0902;SNPSOURCE=LOWCOV;AF=0.08;ASN_AF=0.08;AMR_AF=0.14;AFR_AF=0.08;EUR_AF=0.07;REF=0.983607;ALT=0.0163934;
1 10611 rs146752890 C G 100 PASS AC=184;RSQ=0.8228;AVGPOST=0.9640;AN=2184;ERATE=0.0031;VT=SNP;AA=.;THETA=0.0127;LDAF=0.0902;SNPSOURCE=LOWCOV;AF=0.08;ASN_AF=0.08;AMR_AF=0.14;AFR_AF=0.08;EUR_AF=0.07;REF=0.983607;ALT=0.0163934;
推荐答案
下面是一个使用 AWK
的一种方法:
Here's one way using awk
:
awk 'FNR==NR { split($5,a,":"); split($6,b,":"); c[$1,$2,a[1],b[1]]="REF=" a[2] ";ALT=" b[2] ";"; next } ($1,$2,$4,$5) in c { print $0 ";" c[$1,$2,$4,$5] }' input1 input2
结果:
1 10611 rs146752890 C G 100 PASS AC=184;RSQ=0.8228;AVGPOST=0.9640;AN=2184;ERATE=0.0031;VT=SNP;AA=.;THETA=0.0127;LDAF=0.0902;SNPSOURCE=LOWCOV;AF=0.08;ASN_AF=0.08;AMR_AF=0.14;AFR_AF=0.08;EUR_AF=0.07;REF=0.983607;ALT=0.0163934;
因此,对于COM pressed文件,请尝试:
So for compressed files, try:
awk 'FNR==NR { split($5,a,":"); split($6,b,":"); c[$1,$2,a[1],b[1]]="REF=" a[2] ";ALT=" b[2] ";"; next } ($1,$2,$4,$5) in c { print $0 ";" c[$1,$2,$4,$5] }' <(gzip -dc input1.gz) <(gzip -dc input2.gz) | gzip > output.gz
的编辑:的
,试试这个:
awk 'FNR==NR { split($5,a,":"); split($6,b,":"); c[$1,$2,a[1],b[1]]="REF=" a[2] ";ALT=" b[2] ";"; next } ($1,$2,$4,$5) in c { print $1, $2, $3, $4, $5, $6, $7, c[$1,$2,$4,$5] $8 ";" }' file1 file2
结果:
1 10611 rs146752890 C G 100 PASS REF=0.983607;ALT=0.0163934;AC=184;RSQ=0.8228;AVGPOST=0.9640;AN=2184;ERATE=0.0031;VT=SNP;AA=.;THETA=0.0127;LDAF=0.0902;SNPSOURCE=LOWCOV;AF=0.08;ASN_AF=0.08;AMR_AF=0.14;AFR_AF=0.08;EUR_AF=0.07;
这篇关于我需要合并两个文件,并创建一个新文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!