我有一个数据集,其中某些时期是中间值,我想将此值复制到之前的第n行和之后的第n行。在我的情况下,周期为15天,因此从中间值开始是7天,之后是7天。我该怎么做?
好吧,我检查了很多书籍和网页,并且有很多关于fillna的参考资料,但是这些都不能解决我的问题。因此,我尚未尝试任何代码。
有我的数据集
DATE RAIN RR_MIDDLE CONDITION_RR CONDITION_PR SEASON
0 1983-07-22 0.000 0.00 Dry Dry Dry_Season
1 1983-07-23 NaN NaN NaN NaN NaN
2 1983-07-24 NaN NaN NaN NaN NaN
.....................................................................
15 1983-08-06 0.000 0.00 Wet Wet Wet_Season
我期望填充表具有相同的值,例如一个季节中的中间一个(周期)。
DATE RAIN RR_MIDDLE CONDITION_RR CONDITION_PR SEASON
0 1983-07-22 0.000 0.00 Dry Dry Dry_Season
1 1983-07-23 0.000 0.00 Dry Dry Dry_Season
2 1983-07-24 0.000 0.00 Dry Dry Dry_Season
3 1983-07-25 0.000 0.00 Dry Dry Dry_Season
4 1983-07-26 0.000 0.00 Dry Dry Dry_Season
5 1983-07-27 0.000 0.00 Dry Dry Dry_Season
6 1983-07-28 0.000 0.00 Dry Dry Dry_Season
7 1983-07-29 0.000 0.00 Dry Dry Dry_Season
8 1983-07-30 0.000 0.00 Wet Wet Wet_Season
9 1983-07-31 0.000 0.00 Wet Wet Wet_Season
10 1983-08-01 0.000 0.00 Wet Wet Wet_Season
11 1983-08-02 0.000 0.00 Wet Wet Wet_Season
12 1983-08-03 0.000 0.00 Wet Wet Wet_Season
13 1983-08-04 0.000 0.00 Wet Wet Wet_Season
14 1983-08-05 0.000 0.00 Wet Wet Wet_Season
15 1983-08-06 0.000 0.00 Wet Wet Wet_Season
16 1983-08-07 0.000 0.00 Wet Wet Wet_Season
And so on.....
最佳答案
如果您事先知道要填充的NaN
的数量,并且在整个数据集中都相同,那么最简单的解决方案是两种填充的limit
参数:
df.ffill(limit=7).bfill(limit=7)
DATE RAIN RR_MIDDLE CONDITION_RR CONDITION_PR SEASON
0 1983-07-22 0.0 0.0 Dry Dry Dry_Season
1 1983-07-23 0.0 0.0 Dry Dry Dry_Season
2 1983-07-24 0.0 0.0 Dry Dry Dry_Season
3 1983-07-25 0.0 0.0 Dry Dry Dry_Season
4 1983-07-26 0.0 0.0 Dry Dry Dry_Season
5 1983-07-27 0.0 0.0 Dry Dry Dry_Season
6 1983-07-28 0.0 0.0 Dry Dry Dry_Season
7 1983-07-29 0.0 0.0 Dry Dry Dry_Season
8 1983-07-30 0.0 0.0 Wet Wet Wet_Season
9 1983-07-31 0.0 0.0 Wet Wet Wet_Season
10 1983-08-01 0.0 0.0 Wet Wet Wet_Season
11 1983-08-02 0.0 0.0 Wet Wet Wet_Season
12 1983-08-03 0.0 0.0 Wet Wet Wet_Season
13 1983-08-04 0.0 0.0 Wet Wet Wet_Season
14 1983-08-05 0.0 0.0 Wet Wet Wet_Season
15 1983-08-06 0.0 0.0 Wet Wet Wet_Season
否则,您需要使用
interpolate
来nearest
;但是,仅适用于数字类型。因此,我们需要变换每列,进行插值和变换回去。str_cols = ['CONDITION_RR', 'CONDITION_PR', 'SEASON']
d = {} # Holds mapping from str values to integers
for col in str_cols:
u = df[col].dropna().unique()
d[col] = dict(zip(u, range(len(u))))
df[col] = df[col].map(d[col]) # Map unique values to integers
df = df.apply(pd.Series.interpolate, method='nearest')
# Map back
for col in str_cols:
rev_d = {v:k for k,v in d[col].items()}
df[col] = df[col].map(rev_d)
关于python - 是否有一种简单的python方法来填充缺失值的前第n个和后第n个,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/56841247/