数据帧中的NaN:当时间序列的首次观察为NaN时，先填充第一个可用的，否则继续进行上一个/先前的观察

本文介绍了数据帧中的NaN:当时间序列的首次观察为NaN时，先填充第一个可用的，否则继续进行上一个/先前的观察的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在根据statsmodels执行ADF测试.该值系列可能缺少遗忘之处.实际上，如果NaN的分数大于c，我将放弃分析.但是，如果该系列解决了所有问题，则adfuller无法处理丢失的数据.由于这是具有最小帧大小的训练数据，因此我想这样做:

I am performing an ADF-test from statsmodels. The value series can have missing obversations. In fact, I am dropping the analysis if the fraction of NaNs is larger than c. However, if the series makes it through the I get the problem, that the adfuller cannot deal with missing data. Since this is training data with a minimum framesize, I would like to do:

1)如果x(t = 0)= NaN，则找到下一个非NaN值(t> 0)2)否则，如果x(t)= NaN，则x(t)= x(t-1)

1) if x(t=0) = NaN, then find the next non-NaN value (t>0)2) otherwise if x(t) = NaN, then x(t) = x(t-1)

因此，我在这里损害了我的第一个价值，但要确保输入数据始终具有相同的维数.另外，如果使用dropna的limit选项，我可以用0填充第一个值.

So I am compromising here my first value, but making sure the input data has always the same dimension. Alternatively, I could fill if the first value is missing with 0 making use of the limit option from dropna.

从文档中，我对100％的其他选项不清楚:方法:{'backfill'，'bfill'，'pad'，'ffill'，None}，默认为None

From the documentation the different option are not 100% clear to me:method : {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None

填充/填充:这是否意味着我保留了之前的值?回填/填充:这是否表示我将来会从有效值中获取该值?

pad / ffill: does that mean I carry over the previous value?backfill / bfill: does that mean I the value is taken from a valid one in the future?

df.dropna(method = 'bfill', limit 1, inplace = True)
df.dropna(method = 'ffill', inplace = True)

那会不会有限制?该文档使用限制= 1"，但预先确定了要填充的值.

Would that work with limit? The documentation uses 'limit = 1' but has predetermined a value to be filled.

推荐答案

要预先填充所有(除了可能要填充的)第一个观察值以外的所有观察值，可以将两个调用链接到 fillna ，第一个带有method='ffill'，第二个带有method='fill':

To front-fill all observations except for (possibly) the first ones, which should be backfilled, you can chain two calls to fillna, the first with method='ffill' and the second with method='fill':

df = pd.DataFrame({'a': [None, None, 1, None, 2, None]})
>>> df.fillna(method='ffill').fillna(method='bfill')
    a
0   1.0
1   1.0
2   1.0
3   1.0
4   2.0
5   2.0

这篇关于数据帧中的NaN:当时间序列的首次观察为NaN时，先填充第一个可用的，否则继续进行上一个/先前的观察的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！