我有以下数据帧:
S
2011-01-26 1
2011-01-27 0
2011-01-28 0
2011-01-29 0
2011-01-30 0
2011-01-31 0
2011-02-01 0
2011-02-02 0
2011-02-03 0
2011-02-04 0
2011-02-05 0
2011-02-06 0
2011-02-07 0
2011-02-08 0
2011-02-09 0
我正在尝试从
df
生成以下数据帧: S S1 S2 S3
2011-01-26 1 0 0 0
2011-01-27 0 1 0 0
2011-01-28 0 1 0 0
2011-01-29 0 0 1 0
2011-01-30 0 0 1 0
2011-01-31 0 0 1 0
2011-02-01 0 0 1 0
2011-02-02 0 0 0 1
2011-02-03 0 0 0 1
2011-02-04 0 0 0 1
2011-02-05 0 0 0 1
2011-02-06 0 0 0 1
2011-02-07 0 0 0 1
2011-02-08 0 0 0 1
2011-02-09 0 0 0 1
可以看到,每列中
df
的数量向下增加了2的倍数。在Pandas中是否有一个函数,比如1
可以指定向下填充x行?更新
事实上,我有一个更复杂的任务。
如果这是我的
fillna
: S
2011-01-26 1
2011-01-27 0
2011-01-28 0
2011-01-29 0
2011-01-30 0
2011-01-31 0
2011-02-01 0
2011-02-02 0
2011-02-03 0
2011-02-04 0
2011-02-05 0
2011-02-06 0
2011-02-07 0
2011-02-08 0
2011-02-09 0
... (all zeros)
S
2011-04-26 1
2011-04-27 0
2011-04-28 0
2011-04-29 0
2011-04-30 0
2011-04-31 0
2011-05-01 0
2011-05-02 0
2011-05-03 0
2011-05-04 0
2011-05-05 0
2011-05-06 0
2011-05-07 0
2011-05-08 0
2011-05-09 0
我需要这个:
S S1 S2 S3
2011-01-26 1 0 0 0
2011-01-27 0 1 0 0
2011-01-28 0 1 0 0
2011-01-29 0 0 1 0
2011-01-30 0 0 1 0
2011-01-31 0 0 1 0
2011-02-01 0 0 1 0
2011-02-02 0 0 0 1
2011-02-03 0 0 0 1
2011-02-04 0 0 0 1
2011-02-05 0 0 0 1
2011-02-06 0 0 0 1
2011-02-07 0 0 0 1
2011-02-08 0 0 0 1
2011-02-09 0 0 0 1
all zeros every where
S S1 S2 S3
2011-04-26 1 0 0 0
2011-04-27 0 1 0 0
2011-04-28 0 1 0 0
2011-04-29 0 0 1 0
2011-04-30 0 0 1 0
2011-04-31 0 0 1 0
2011-05-01 0 0 1 0
2011-05-02 0 0 0 1
2011-05-03 0 0 0 1
2011-05-04 0 0 0 1
2011-05-05 0 0 0 1
2011-05-06 0 0 0 1
2011-05-07 0 0 0 1
2011-05-08 0 0 0 1
2011-05-09 0 0 0 1
最佳答案
据我所知,没有现成的功能可以做到这一点。但是我们可以用下面的技巧来做类似的事情。
import pandas as pd
import numpy as np
# your data
# ========================================
df = pd.DataFrame(0, index=pd.date_range('2015-01-01', periods=100, freq='D'), columns=['col'])
df.iloc[[0, 71], 0] = 1
grouped = df.groupby(df.col.cumsum())
grouped.get_group(1)
Out[275]:
col
2015-01-01 1
2015-01-02 0
2015-01-03 0
2015-01-04 0
2015-01-05 0
2015-01-06 0
2015-01-07 0
2015-01-08 0
... ...
2015-03-05 0
2015-03-06 0
2015-03-07 0
2015-03-08 0
2015-03-09 0
2015-03-10 0
2015-03-11 0
2015-03-12 0
[71 rows x 1 columns]
grouped.get_group(2)
Out[276]:
col
2015-03-13 1
2015-03-14 0
2015-03-15 0
2015-03-16 0
2015-03-17 0
2015-03-18 0
2015-03-19 0
2015-03-20 0
... ...
2015-04-03 0
2015-04-04 0
2015-04-05 0
2015-04-06 0
2015-04-07 0
2015-04-08 0
2015-04-09 0
2015-04-10 0
[29 rows x 1 columns]
# processing
# ==================================
def func(group):
group['temp'] = 0
group.temp.iloc[2 ** np.arange(int(np.log2(len(group))) + 1) - 1] = 1
group['new_col'] = group.temp.cumsum()
return pd.get_dummies(group.new_col)
grouped.apply(func)
Out[281]:
1 2 3 4 5 6 7
2015-01-01 1 0 0 0 0 0 0
2015-01-02 0 1 0 0 0 0 0
2015-01-03 0 1 0 0 0 0 0
2015-01-04 0 0 1 0 0 0 0
2015-01-05 0 0 1 0 0 0 0
2015-01-06 0 0 1 0 0 0 0
2015-01-07 0 0 1 0 0 0 0
2015-01-08 0 0 0 1 0 0 0
... .. .. .. .. .. .. ..
2015-04-03 0 0 0 0 1 NaN NaN
2015-04-04 0 0 0 0 1 NaN NaN
2015-04-05 0 0 0 0 1 NaN NaN
2015-04-06 0 0 0 0 1 NaN NaN
2015-04-07 0 0 0 0 1 NaN NaN
2015-04-08 0 0 0 0 1 NaN NaN
2015-04-09 0 0 0 0 1 NaN NaN
2015-04-10 0 0 0 0 1 NaN NaN
关于python - Python: Pandas 在DataFrame中生成向下填充变量,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/31305769/