csv文件中有很多站点,我不知道如何使用循环来计数每个站点的nan数。到目前为止,我到了,一一计算。有人可以帮我吗,谢谢。

station1= train_df[train_df['station'] == 28079004]
station1 = station1[['date', 'O_3']]
count_nan = len(station1) - station1.count()
print(count_nan)

最佳答案

我认为需要通过station列和set_index创建索引,过滤列以检查缺失值并最后通过sum进行计数:

train_df = pd.DataFrame({'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'date':pd.date_range('2015-01-01', periods=6),
                   'O_3':[np.nan,3,np.nan,9,2,np.nan],
                   'station':[28079004] * 2 + [28079005] * 4})

print (train_df)
   B  C       date  O_3   station
0  4  7 2015-01-01  NaN  28079004
1  5  8 2015-01-02  3.0  28079004
2  4  9 2015-01-03  NaN  28079005
3  5  4 2015-01-04  9.0  28079005
4  5  2 2015-01-05  2.0  28079005
5  4  3 2015-01-06  NaN  28079005

df = train_df.set_index('station')[['date', 'O_3']].isnull().sum(level=0).astype(int)
print (df)
          date  O_3
station
28079004     0    1
28079005     0    2


另一个解决方案:

df = train_df[['date', 'O_3']].isnull().groupby(train_df['station']).sum().astype(int)
print (df)
          date  O_3
station
28079004     0    1
28079005     0    2

09-30 13:50
查看更多