问题描述
我有一个熊猫DataFrame,需要将其以n行的块的形式输入到下游函数中(在示例中为print
).这些块可能有重叠的行.
I have a pandas DataFrame that need to be fed in chunks of n-rows into downstream functions (print
in the example). The chunks may have overlapping rows.
让我们从一个虚拟的DataFrame开始吧:
Let's start from a dummy DataFrame:
d = {'A':list(range(1000)), 'B':list(range(1000))}
df=pd.DataFrame(d)
在2行块与1行重叠的情况下,我有以下代码:
In the case of a 2-rows chunks with 1-row overlap I have the following code:
a = df.index.values[:-1]
for i in a:
print(df.iloc[i:i+2])
输出是这样的:
...
A B
996 996 996
997 997 997
A B
997 997 997
998 998 998
A B
998 998 998
999 999 999
这正是我想要的.
是否有更好/更快的方法来遍历pandas.DataFrame的n行块?
Is there a better/faster approach to iterate over chunks of n-rows of a pandas.DataFrame?
推荐答案
使用 DataFrame.groupby
具有整数除法,并使用与df
相同的长度创建的助手1d数组-索引值不重叠:
Use DataFrame.groupby
with integer division with helper 1d array created with same length like df
- index values are not overlapped:
d = {'A':list(range(5)), 'B':list(range(5))}
df=pd.DataFrame(d)
print (np.arange(len(df)) // 2)
[0 0 1 1 2]
for i, g in df.groupby(np.arange(len(df)) // 2):
print (g)
A B
0 0 0
1 1 1
A B
2 2 2
3 3 3
A B
4 4 4
对于重叠的值,请进行编辑此答案:
For overlapping values is edited this answer:
def chunker1(seq, size):
return (seq.iloc[pos:pos + size] for pos in range(0, len(seq)-1))
for i in chunker1(df,2):
print (i)
A B
0 0 0
1 1 1
A B
1 1 1
2 2 2
A B
2 2 2
3 3 3
A B
3 3 3
4 4 4
这篇关于大 pandas 重叠重叠一次遍历多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!