问题描述
我有一个数据集,其中许多缺失值都为-999.数据的一部分是
I have a dataset with many missing values as -999. Part of the data is
input.txt
30
-999
10
40
23
44
-999
-999
31
-999
54
-999
-999
-999
-999
-999
-999
-999 and so on
我想在不考虑缺失值的情况下计算每6行间隔的平均值.
I would like calculate the average in each 6 rows interval without considering the missing values.
期望输出是
ofile.txt
29.4
42.5
-999
与此同时,我正在尝试
awk '!/\-999/{sum += $1; count++} NR%6==0{print count ? (sum/count) : count;sum=count=0}' input.txt
它正在给予
29.4
42.5
0
推荐答案
我不确定为什么要取消-999
值,为什么-999
比零更好呢?第三组的平均值.在前两个组中,-999
值既不影响总和,也不影响计数,因此可以说为零是一个更好的选择.
I'm not entirely sure why, if you're discounting -999
values, you'd think that -999
was a better choice than zero for the average of the third group. In the first two groups, the -999
values contribute to neither the sum nor the count, so an argument could be made that zero is a better choice.
但是,可能是您希望-999
表示缺乏价值"(在组中没有价值的情况下肯定会发生这种情况).在这种情况下,您只需在原始代码中输出-999
而不是count
:
However, it may be that you want -999
to represent a "lack of value" (which would certainly be the case where there were no values in a group). If that's the case, you can just ouput -999
rather than count
in your original code:
awk '!/\-999/{sm+=$1;ct++} NR%6==0{print ct?(sm/ct):-999;sm=ct=0}' input.txt
即使您认为零 是一个更好的答案,我还是要明确指出,而不是输出count变量本身:
Even if you decide that zero is a better answer, I'd still make that explicit rather than outputting the count variable itself:
awk '!/\-999/{sm+=$1;ct++} NR%6==0{print ct?(sm/ct):0;sm=ct=0}' input.txt
这篇关于在不考虑Shell脚本中缺少值的情况下计算平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!