问题描述
假设我有一个具有以下结构的数据框:
Let's say I have a dataframe with the following structure:
observation
d1 1
d2 1
d3 -1
d4 -1
d5 -1
d6 -1
d7 1
d8 1
d9 1
d10 1
d11 -1
d12 -1
d13 -1
d14 -1
d15 -1
d16 1
d17 1
d18 1
d19 1
d20 1
其中d1:d20是某个日期时间索引(在此概括).
Where d1:d20 is some datetime index (generalized here).
如果我想将d1:d2,d3:d6,d7:d10等拆分为各自的块",我将如何用Python做到这一点?
If I wanted to split d1:d2, d3:d6, d7:d10, etc into their own respective "chunks", how would I do that pythonically?
注意:
df1 = df[(df.observation==1)]
df2 = df[(df.observation==-1)]
不是我想要的.
我可以想到蛮力的方式,虽然行得通,但并不优雅.
I can think of brute force ways, which would work, but are not wildly elegant.
推荐答案
您可以基于observation
列的diff()
的cumsum()
创建组变量,如果diff()不等于零,指定一个True值,因此每次出现一个新值时,都会使用cumsum()
创建一个新的组ID,然后您可以在groupby()
之后使用df.groupby((df.observation.diff() != 0).cumsum())...(other chained analysis here)
应用标准分析,或将其拆分为较小的数据list-comprehension
的框架:
You can create a group variable based on the cumsum()
of the diff()
of the observation
column where if the diff() is not equal to zero, assign a True value, thus every time a new value appears, a new group id will be created with the cumsum()
, and then you can either apply standard analysis after groupby()
with df.groupby((df.observation.diff() != 0).cumsum())...(other chained analysis here)
or split them into smaller data frames with list-comprehension
:
lst = [g for _, g in df.groupby((df.observation.diff() != 0).cumsum())]
lst[0]
# observation
#d1 1
#d2 1
lst[1]
# observation
#d3 -1
#d4 -1
#d5 -1
#d6 -1
...
索引块在这里:
[i.index for i in lst]
#[Index(['d1', 'd2'], dtype='object'),
# Index(['d3', 'd4', 'd5', 'd6'], dtype='object'),
# Index(['d7', 'd8', 'd9', 'd10'], dtype='object'),
# Index(['d11', 'd12', 'd13', 'd14', 'd15'], dtype='object'),
# Index(['d16', 'd17', 'd18', 'd19', 'd20'], dtype='object')]
这篇关于将Pandas数据框分成许多块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!