问题描述
我有一个数据框,其中包含每个组在特定时期内的观察次数.某些组不包含所有句点,对于这些组,我想附加 x 行,其中包含缺少的句点.这样每个组在所有 6 个时期都有一行
I have a dataframe that contains for each group the number of observations during a certain period. Some groups don't contain all periods, and for these groups I want to append x rows with the missing periods in it.So that each group has a row for all 6 periods
我当前的 df 看起来像这样:
My current df looks something like this:
> ID PERIOD VAlUE
1 1 10
1 2 8
1 3 8
1 4 15
1 5 6
1 6 44
2 1 NONE
3 2 4
3 5 25
我想要一个像这样的数据框.
I want a dataframe looking like this.
> ID PERIOD VAlUE
1 1 10
1 2 8
1 3 8
1 4 15
1 5 6
1 6 44
2 1 NONE
2 2 NONE
2 3 NONE
2 4 NONE
2 5 NONE
2 6 4
3 1 NONE
3 2 4
3 3 NONE
3 4 NONE
3 5 25
3 6 NONE
结果是:
- 对于 ID == 1,什么都没有发生,因为它包含了所有 6 个句点
- 对于 ID == 2,它为第一个 df 中没有的每个周期附加了 5 行.
- 对于 ID == 2,它为第一个 df 中没有的每个周期附加了 4 行.所以它添加了周期 1,3,4 & 的行.6.
我真的不知道该怎么做,因此非常感谢您的帮助.
I really don't have a clue how to do it, so help would really be appreciated.
推荐答案
您可以将索引设置为 'ID' 和 'PERIOD' 然后通过生成两列的乘积来构造一个新索引并将其作为新索引传递到 reindex
,它有一个可选的 fill_value
参数,您可以将其设置为 str NONE
:
You can set the index to 'ID' and 'PERIOD' and then construct a new index by generating the product of both columns and pass this as the new index to reindex
, this has an optional fill_value
param which you can set to the str NONE
:
In [158]:
iterables = [df['ID'].unique(),df['PERIOD'].unique()]
df = df.set_index(['ID','PERIOD'])
df = df.reindex(index=pd.MultiIndex.from_product(iterables, names=['ID', 'PERIOD']), fill_value='NONE').reset_index()
df
Out[158]:
ID PERIOD VAlUE
0 1 1 10
1 1 2 8
2 1 3 8
3 1 4 15
4 1 5 6
5 1 6 44
6 2 1 NONE
7 2 2 NONE
8 2 3 NONE
9 2 4 NONE
10 2 5 NONE
11 2 6 NONE
12 3 1 NONE
13 3 2 4
14 3 3 NONE
15 3 4 NONE
16 3 5 25
17 3 6 NONE
因此分解上述内容:
In [160]:
# create a list of the iterable index values we want to generate all product combinations from
iterables = [df['ID'].unique(),df['PERIOD'].unique()]
iterables
Out[160]:
[array([1, 2, 3], dtype=int64), array([1, 2, 3, 4, 5, 6], dtype=int64)]
In [163]:
# set the index to ID and PERIOD
df = df.set_index(['ID','PERIOD'])
df
Out[163]:
VAlUE
ID PERIOD
1 1 10
2 8
3 8
4 15
5 6
6 44
2 1 NONE
3 2 4
5 25
In [164]:
# reindex and pass the product from iterables as the new index
df.reindex(index=pd.MultiIndex.from_product(iterables, names=['ID', 'PERIOD']), fill_value='NONE').reset_index()
Out[164]:
ID PERIOD VAlUE
0 1 1 10
1 1 2 8
2 1 3 8
3 1 4 15
4 1 5 6
5 1 6 44
6 2 1 NONE
7 2 2 NONE
8 2 3 NONE
9 2 4 NONE
10 2 5 NONE
11 2 6 NONE
12 3 1 NONE
13 3 2 4
14 3 3 NONE
15 3 4 NONE
16 3 5 25
17 3 6 NONE
这篇关于如果每组缺少一行,则在 pandas/ipython 中为每组添加行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!