问题描述
我试图计算文件中许多列的中位数(而不是平均值).我写了这篇文章(对仅适用于一列的代码进行了改编).
I tried to calculate the median (not the mean) for many columns in a file. I wrote this (an adaptation from a code that works for only 1 column).
sort -n <infile | awk '{for (i = 1; i <= NF; ++i); count[NR] = $i;}END {for (i = 1; i <= NF; ++i); if (NR % 2) {print count[(NR + 1) / 2];} else {print (count[(NR / 2)] + count[(NR / 2) + 1]) / 2;}}'
复合cg00000029 cg00000108 cg00000109 cg00000165
TCGA-G4-6298-11A 0.309164840970903 0.108696904309357
TCGA-G4-6311-11A 0.284214936998384 0.192558185484861
TCGA-AA-3506-11A 0.293174399370542 0.12546425658397
TCGA-AA-3713-11A 0.225964654660289 0.150662194530275
Composite cg00000029 cg00000108 cg00000109 cg00000165
TCGA-G4-6298-11A 0.309164840970903 0.108696904309357
TCGA-G4-6311-11A 0.284214936998384 0.192558185484861
TCGA-AA-3506-11A 0.293174399370542 0.12546425658397
TCGA-AA-3713-11A 0.225964654660289 0.150662194530275
推荐答案
考虑使用 datamash
$ cat input
Composite cg00000029 cg00000108 cg00000109 cg00000165
TCGA-G4-6298-11A 0.309164840970903 0.108696904309357
TCGA-G4-6311-11A 0.284214936998384 0.192558185484861
TCGA-AA-3506-11A 0.293174399370542 0.12546425658397
TCGA-AA-3713-11A 0.225964654660289 0.150662194530275
$ datamash --header-in -W median 2 < input
0.28869466818446
$ datamash --header-in -W median 3 < input
0.13806322555712
有关上面使用的选项,请参见datamash --help
.
See datamash --help
for the options used above.
这篇关于使用awk计算具有许多列的文件的中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!