问题描述
请只看一下下面代码中的大量手动输入,而无需了解它:
Please only glance at the abundance of manual input in the code below, no need to understand it:
#!/bin/bash
paste A1.dat A2.dat A3.dat A4.dat A5.dat A6.dat > A.dat
awk '{print ($2 + $21 + $40 + $59 + $78 + $97), ($3 + $22 + $41 + $60 + $79 + $98), ($4 + $23 + $42 + $61 + $80 + $99) + ($6 + $25 + $44 + $63 + $82 + $101) + ($8 + $27 + $46 + $65 + $84 + $103), ($5 + $24 + $43 + $62 + $81 + $100) + ($7 + $26 + $45 + $64 + $83 + $102) + ($9 + $ 28 + $47 + $66 + $85 + $104), ($10 + $29 + $48 + $67 + $86 + $105) + ($12 + $31 + $50 + $69 + $88 + $107) + ($14 + $33 + $52 + $71 + $90 + $109) + ($16 + $35 + $54 + $73 + $92 + $111) + ($18 + $37 + $56 + $75 + $94 + $113), ($11 + $30 + $49 + $68 + $87 + $106) + ($13 + $32 + $51 + $70 + $89 + $108) + ($15 + $34 + $53 + $72 + $91 + $110) + ($17 + $36 + $55 + $74 + $93 + $112) + ($19 + $38 + $57 + $76 + $95 + $114)}' A.dat >> A_full.dat
代码目标:获取存储在 n
个输入文件中的数据,每个文件包含19列数据和相等的行数.以某种方式处理此数据,以生成具有7列数据和与每个输入文件相同的行数的输出文件.
Code objective: Take data stored in n
input files each containing 19 columns of data and equal # of rows. Manipulate this data in a certain fashion to generate an output file with 7 columns of data and the same # of rows as each of the input files.
我在上面的代码中所做的:使用 paste
将所有n个输入文件(A?.dat)合并为1个文件(A.dat).接下来,我使用 awk
操纵A.dat中的数据以获取输出文件(A_full.dat).对于较大的n值,这变得不规则且麻烦.
What I did in the code above: Used paste
to merge all of the n input files (A?.dat) into 1 file (A.dat). Next, I use awk
to manipulate the data in A.dat to get the output file (A_full.dat). This becomes unruly and cumbersome for a large value of n.
我的请求:帮助我概括任何 n
值的代码.我上面发布的代码适用于n = 6的情况.要了解该代码执行的数据操作,请查看下面的n = 2的代码(请参见示例文件后面的说明):
My request: Help me generalize the code for any value of n
. The code I've posted above is for when n=6. To understand what data manipulation the code does, please look at the code below for n=2 (see explanation after the sample files):
#!/bin/bash
paste A1.dat A2.dat > A.dat
awk '{print $1, ($2 + $21), ($3 + $22), ($4 + $23) + ($6 + $25) + ($8 + $27), ($5 + $24) + ($7 + $26) + ($9 + $28), ($10 + $29) + ($12 + $31) + ($14 + $33) + ($16 + $35 ) + ($18 + $37), ($11 + $30) + ($13 + $32) + ($15 + $34) + ($17 + $36) + ($19 + $38)}' A.dat >> A_full.dat
示例文件:
A1.dat:
-0.908 0.3718E-03 0.2227E-02 0.1216E-05 0.6719E-05 0.1697E-05 0.1052E-04 0.1697E-05 0.1052E-04 0.5774E-07 0.3360E-06 0.5774E-07 0.3360E-06 0.5418E-06 0.3169E-05 0.1972E-06 0.1099E-05 0.1610E-05 0.9417E-05
-0.902 0.1042E-02 0.3365E-02 0.3427E-05 0.1021E-04 0.4837E-05 0.1619E-04 0.4837E-05 0.1619E-04 0.1623E-06 0.5093E-06 0.1623E-06 0.5093E-06 0.1522E-05 0.4803E-05 0.5530E-06 0.1661E-05 0.4522E-05 0.1427E-04
-0.895 0.1962E-02 0.4677E-02 0.6479E-05 0.1428E-04 0.9232E-05 0.2289E-04 0.9232E-05 0.2289E-04 0.3064E-06 0.7100E-06 0.3064E-06 0.7100E-06 0.2870E-05 0.6694E-05 0.1042E-05 0.2310E-05 0.8530E-05 0.1988E-04
-0.889 0.3067E-02 0.6167E-02 0.1019E-04 0.1893E-04 0.1470E-04 0.3064E-04 0.1470E-04 0.3064E-04 0.4806E-06 0.9388E-06 0.4806E-06 0.9388E-06 0.4500E-05 0.8850E-05 0.1629E-05 0.3047E-05 0.1337E-04 0.2629E-04
A2.dat:
-0.908 0.9081E-04 0.5463E-03 0.9126E-05 0.5564E-04 0.4880E-06 0.3004E-05 0.4880E-06 0.3004E-05 0.2218E-06 0.1311E-05 0.2218E-06 0.1311E-05 0.1433E-06 0.8079E-06 0.1452E-06 0.8808E-06 0.4262E-06 0.2402E-05
-0.902 0.2531E-03 0.8191E-03 0.2580E-04 0.8502E-04 0.1377E-05 0.4565E-05 0.1377E-05 0.4565E-05 0.6264E-06 0.2000E-05 0.6264E-06 0.2000E-05 0.3994E-06 0.1211E-05 0.4063E-06 0.1327E-05 0.1188E-05 0.3599E-05
-0.895 0.4742E-03 0.1130E-02 0.4894E-04 0.1194E-03 0.2604E-05 0.6378E-05 0.2604E-05 0.6378E-05 0.1187E-05 0.2805E-05 0.1187E-05 0.2805E-05 0.7483E-06 0.1670E-05 0.7638E-06 0.1839E-05 0.2225E-05 0.4963E-05
-0.889 0.7357E-03 0.1480E-02 0.7735E-04 0.1591E-03 0.4094E-05 0.8448E-05 0.4094E-05 0.8448E-05 0.1874E-05 0.3729E-05 0.1874E-05 0.3729E-05 0.1161E-05 0.2186E-05 0.1191E-05 0.2419E-05 0.3452E-05 0.6496E-05
A.dat:
-0.908 0.3718E-03 0.2227E-02 0.1216E-05 0.6719E-05 0.1697E-05 0.1052E-04 0.1697E-05 0.1052E-04 0.5774E-07 0.3360E-06 0.5774E-07 0.3360E-06 0.5418E-06 0.3169E-05 0.1972E-06 0.1099E-05 0.1610E-05 0.9417E-05 -0.908 0.9081E-04 0.5463E-03 0.9126E-05 0.5564E-04 0.4880E-06 0.3004E-05 0.4880E-06 0.3004E-05 0.2218E-06 0.1311E-05 0.2218E-06 0.1311E-05 0.1433E-06 0.8079E-06 0.1452E-06 0.8808E-06 0.4262E-06 0.2402E-05
-0.902 0.1042E-02 0.3365E-02 0.3427E-05 0.1021E-04 0.4837E-05 0.1619E-04 0.4837E-05 0.1619E-04 0.1623E-06 0.5093E-06 0.1623E-06 0.5093E-06 0.1522E-05 0.4803E-05 0.5530E-06 0.1661E-05 0.4522E-05 0.1427E-04 -0.902 0.2531E-03 0.8191E-03 0.2580E-04 0.8502E-04 0.1377E-05 0.4565E-05 0.1377E-05 0.4565E-05 0.6264E-06 0.2000E-05 0.6264E-06 0.2000E-05 0.3994E-06 0.1211E-05 0.4063E-06 0.1327E-05 0.1188E-05 0.3599E-05
-0.895 0.1962E-02 0.4677E-02 0.6479E-05 0.1428E-04 0.9232E-05 0.2289E-04 0.9232E-05 0.2289E-04 0.3064E-06 0.7100E-06 0.3064E-06 0.7100E-06 0.2870E-05 0.6694E-05 0.1042E-05 0.2310E-05 0.8530E-05 0.1988E-04 -0.895 0.4742E-03 0.1130E-02 0.4894E-04 0.1194E-03 0.2604E-05 0.6378E-05 0.2604E-05 0.6378E-05 0.1187E-05 0.2805E-05 0.1187E-05 0.2805E-05 0.7483E-06 0.1670E-05 0.7638E-06 0.1839E-05 0.2225E-05 0.4963E-05
-0.889 0.3067E-02 0.6167E-02 0.1019E-04 0.1893E-04 0.1470E-04 0.3064E-04 0.1470E-04 0.3064E-04 0.4806E-06 0.9388E-06 0.4806E-06 0.9388E-06 0.4500E-05 0.8850E-05 0.1629E-05 0.3047E-05 0.1337E-04 0.2629E-04 -0.889 0.7357E-03 0.1480E-02 0.7735E-04 0.1591E-03 0.4094E-05 0.8448E-05 0.4094E-05 0.8448E-05 0.1874E-05 0.3729E-05 0.1874E-05 0.3729E-05 0.1161E-05 0.2186E-05 0.1191E-05 0.2419E-05 0.3452E-05 0.6496E-05
A_full.dat:
A_full.dat:
-0.908 0.00046261 0.0027733 1.4712e-05 8.9407e-05 3.62278e-06 2.10697e-05
-0.902 0.0012951 0.0041841 4.1655e-05 0.00013674 1.01681e-05 3.18896e-05
-0.895 0.0024362 0.005807 7.9091e-05 0.000192216 1.91659e-05 4.4386e-05
-0.889 0.0038027 0.007647 0.000125128 0.000256206 3.00122e-05 5.86236e-05
有关输出文件(A_full.dat)的7列的更多信息:
- 所有输入的A?.dat文件在col 1中具有相同的值.A_full.dat也必须具有相同的col 1.
- A_full.dat的col 2应该是所有A?.dat文件的col 2的总和.
- A_full.dat的col 3应该是所有A?.dat文件的col 3的总和. A_full.dat的
- col 4应该是所有A?.dat文件的cols 4、6和8的总和.
- A_full.dat的col 5应该是所有A?.dat文件的cols 5、7和9的总和.
- A_full.dat的col 6应该是所有A?.dat文件的cols 10、12、14、16和18的总和.
- A_full.dat的col 7应该是所有A?.dat文件的cols 11、13、15、17和19的总和.
- All of the input A?.dat files have the same values in the col 1. A_full.dat must also have the same col 1.
- col 2 of A_full.dat should be the summation of col 2 of all A?.dat files.
- col 3 of A_full.dat should be the summation of col 3 of all A?.dat files.
- col 4 of A_full.dat should be the summation of cols 4, 6, and 8 of all A?.dat files.
- col 5 of A_full.dat should be the summation of cols 5, 7, and 9 of all A?.dat files.
- col 6 of A_full.dat should be the summation of cols 10, 12, 14, 16, and 18 of all A?.dat files.
- col 7 of A_full.dat should be the summation of cols 11, 13, 15, 17, and 19 of all A?.dat files.
起初,我以一种令人困惑的方式发布了这个问题,但是在@ markp-fuso的输入的帮助下,我对其进行了编辑以使其易于理解.
At first, I posted this question in a confusing manner, but with the help of @markp-fuso's input, I've edited it to make it easier to comprehend.
推荐答案
注意:根据OP的最新更改(在输出中包括字段$ 1)进行了更新,并结合了EdMorton对的建议awk/for
循环.
NOTE: Updated based on OPs latest changes (include field $1 in the output), and incorporating EdMorton's suggestion for the awk/for
loop.
基于OP当前的 awk
命令...
Based on OP's current awk
command ...
awk '{print ($2 + $21 + $40 + $59 + $78 + $97), ($3 + $22 + $41 + $60 + $79 + $98), ($4 + $23 + $42 + $61 + $80 + $99) + ($6 + $25 + $44 + $63 + $82 + $101) + ($8 + $27 + $46 + $65 + $84 + $103), ($5 + $24 + $43 + $62 + $81 + $100) + ($7 + $26 + $45 + $64 + $83 + $102) + ($9 + $ 28 + $47 + $66 + $85 + $104), ($10 + $29 + $48 + $67 + $86 + $105) + ($12 + $31 + $50 + $69 + $88 + $107) + ($14 + $33 + $52 + $71 + $90 + $109) + ($16 + $35 + $54 + $73 + $92 + $111) + ($18 + $37 + $56 + $75 + $94 + $113), ($11 + $30 + $49 + $68 + $87 + $106) + ($13 + $32 + $51 + $70 + $89 + $108) + ($15 + $34 + $53 + $72 + $91 + $110) + ($17 + $36 + $55 + $74 + $93 + $112) + ($19 + $38 + $57 + $76 + $95 + $114)}' A.dat >> A_full.dat
...以及各种评论和编辑,我得出以下结论:
... as well as an assortment of comments and edits, I come away with the following:
- 所有输入文件都有19个字段
- 所有输入文件的行数相同
- 不确定要对字段#1进行什么操作(由于问题编辑和令人困惑的解释) 对于每组输入行,
- 期望的输出由7列(
col1
至col7
)组成 -
col1
:第一个文件中字段#1的副本(所有输入文件中的字段#1都应相同) -
col2
:所有输入文件中字段#2的总和 -
col3
:(否定)来自所有输入文件的字段#3的总和 -
col4
:所有输入文件中字段#4,#6和#8的总和 -
col5
:(否定)所有输入文件中的字段#5,#7和#9的总和 -
col6
:所有输入文件中字段#10,#12,#14,#16和#18的总和 -
col7
:所有输入文件中字段#11,#13,#15,#17和#19的总和 - 现在,我假设我们希望输出行按从输入文件中读取的相同顺序进行排序(即,输入NR ==输出NR)
- OP需要一种可以处理
n
个输入文件的解决方案
- all input files have 19 fields
- all input files have the same number of rows
- unsure what, if anything, is to be done with field #1 (due to question edits and confusing explanation)
- desired output consists of 7x columns (
col1
tocol7
) for each set of input rows col1
: copy of field #1 from first file (field #1 should be the same in all input files)col2
: summation of field #2 from all input filescol3
: (negated) summation of field #3 from all input filescol4
: summation of fields #4, #6 and #8 from all input filescol5
: (negated) summation of fields #5, #7 and #9 from all input filescol6
: summation of fields #10, #12, #14, #16 and #18 from all input filescol7
: summation of fields #11, #13, #15, #17 and #19 from all input files- for now I'm assuming we want the output rows ordered by the same order in which they're read from the input files (ie, input NR == output NR)
- OP needs a solution that can work with
n
number of input files
将 n
个输入文件而不是 paste
(ing)粘贴到一个大文件( A.dat
)中,然后使用awk
解析 nx 19
列,我建议让 awk
读取单个数据文件( A?.dat
)并累积即时获得所需的数据值.
Instead of paste
(ing) the n
input files into a single big file (A.dat
) and then having awk
parse the n x 19
columns, I propose having awk
read the individual data files (A?.dat
) and accumulate the desired data values 'on the fly'.
一种 awk
解决方案:
awk '
FNR==NR { col1[FNR]=$1 }
{ col2[FNR]+=($2)
col3[FNR]-=($3)
col4[FNR]+=($4 + $6 + $8)
col5[FNR]-=($5 + $7 + $9)
col6[FNR]+=($10 + $12 + $14 + $16 + $18)
col7[FNR]+=($11 + $13 + $15 + $17 + $19)
}
END { for ( i=1 ; i <= FNR ; i++ )
printf "%s %7.5f %7.5f %8.6f %8.6f %d %d\n", col1[i], col2[i], col3[i], col4[i], col5[i], col6[i], col7[i]
}
' A1.dat A2.dat A3.dat ... An.dat
注意: printf
格式基于OP提供的有限示例输出;可能需要根据较大数据集的预期结果进行调整.
NOTE: printf
formats are based on the limited sample output provided by OP; may need to adjust these based on the desired results from a larger data set.
注意:此 awk
解决方案的缺点是我们必须将所有(输出)数据存储在一组数组中,这反过来可能会导致内存不足如果我们要处理大量的行,则会出现使用问题.
NOTE: One downside to this awk
solution is that we have to store all (output) data in a set of arrays which, in turn, could lead to memory usage issues if we're dealing with a large volume of rows.
将OP样本输入文件( A.dat
)解析回前两个原始数据文件中:
Parsing the OPs sample input file (A.dat
) back out into the first 2x original data files:
$ cat A1.dat
4.429 0.3620E-01 0.3919E-01 0.1063E-01 0.9525E-02 0.9146E-02 0.7986E-02 0.9146E-02 0.7986E-02 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
4.436 0.3489E-01 0.3876E-01 0.1022E-01 0.9461E-02 0.8803E-02 0.7872E-02 0.8803E-02 0.7872E-02 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
4.442 0.3364E-01 0.3852E-01 0.9760E-02 0.9469E-02 0.8402E-02 0.7801E-02 0.8402E-02 0.7801E-02 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
4.449 0.3260E-01 0.3917E-01 0.9364E-02 0.9753E-02 0.8040E-02 0.8083E-02 0.8040E-02 0.8083E-02 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
$ cat A2.dat
4.429 0.4333E-01 0.3393E-01 0.6788E-02 0.6654E-02 0.8228E-02 0.7242E-02 0.8228E-02 0.7242E-02 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
4.436 0.4101E-01 0.3372E-01 0.6687E-02 0.6563E-02 0.7849E-02 0.7179E-02 0.7849E-02 0.7179E-02 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
4.442 0.3861E-01 0.3437E-01 0.6561E-02 0.6437E-02 0.7440E-02 0.7192E-02 0.7440E-02 0.7192E-02 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
4.449 0.3646E-01 0.3667E-01 0.6462E-02 0.6514E-02 0.7091E-02 0.7443E-02 0.7091E-02 0.7443E-02 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00
针对这两个输入文件运行建议的 awk
解决方案会生成:
Running the proposed awk
solution against these 2x input files generates:
$ awk '{ col1[FNR]+= .... }' A1.dat A2.dat
4.429 0.07953 -0.07312 0.052166 -0.046635 0 0
4.436 0.07590 -0.07248 0.050211 -0.046126 0 0
4.442 0.07225 -0.07289 0.048005 -0.045892 0 0
4.449 0.06906 -0.07584 0.046088 -0.047319 0 0
这篇关于通过长时间的awk操作(涉及100多个列)来概括(特定)脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!