问题描述
我有一个数据框,例如下面的df.我想为条件为true的每个数据块创建一个新的数据框,以便将其返回df_1,df_2 .... df_n.
I have a dataframe, like df below. I want to create a new dataframe for every chunk of data where the condition is true, so that it would be return df_1, df_2....df_n.
| df | | df_1 | | df_2 |
| Value | Condition | | Value | | Value |
|-------|-----------| |-------|---|-------|
| 2 | True | | | 2 | | 0 |
| 5 | True | | | 5 | | 5 |
| 4 | True | | | 4 | | |
| 4 | False | | | | | |
| 2 | False | | | | | |
| 0 | True | | | | | |
| 5 | True | | | | | |
| 7 | False | | | | | |
| 8 | False | | | | | |
| 9 | False | | | | | |
我唯一的想法是循环遍历数据帧,为每个True值块返回起始索引和结束索引,然后通过循环遍历返回的索引创建新的数据帧,并为每个起始/结束对返回以下内容:
My only idea is to loop through the dataframe, returning the start and end index for every chunk of True values, then creating new dataframes with a loop going over the returned indices returning something like this for each start/end pair:
newdf = df.iloc[start:end]
但是这样做似乎效率低下.
But doing that seems inefficient.
推荐答案
这是另一种解决方案.请注意 consecutive_groups
食谱来自 more_itertools 库.
This is an alternative solution. Note the consecutive_groups
recipe is from more_itertools library.
from itertools import groupby
from operator import itemgetter
def consecutive_groups(iterable, ordering=lambda x: x):
for k, g in groupby(enumerate(iterable), key=lambda x: x[0] - ordering(x[1])):
yield map(itemgetter(1), g)
grps = consecutive_groups(df[df.Condition].index)
dfs = {i: df.iloc[list(j)] for i, j in enumerate(grps, 1)}
# {1: Value Condition
# 0 2 True
# 1 5 True
# 2 4 True,
# 2: Value Condition
# 5 0 True
# 6 5 True}
这篇关于当条件为真时, pandas 将数据帧分为多个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!