问题描述
如何遍历Pandas DataFrame的成对行?
How can I iterate over pairs of rows of a Pandas DataFrame?
例如:
content = [(1,2,[1,3]),(3,4,[2,4]),(5,6,[6,9]),(7,8,[9,10])]
df = pd.DataFrame( content, columns=["a","b","interval"])
print df
输出:
a b interval
0 1 2 [1, 3]
1 3 4 [2, 4]
2 5 6 [6, 9]
3 7 8 [9, 10]
现在我想做类似的事情
for (indx1,row1), (indx2,row2) in df.?
print "row1:\n", row1
print "row2:\n", row2
print "\n"
应输出
row1:
a 1
b 2
interval [1,3]
Name: 0, dtype: int64
row2:
a 3
b 4
interval [2,4]
Name: 1, dtype: int64
row1:
a 3
b 4
interval [2,4]
Name: 1, dtype: int64
row2:
a 5
b 6
interval [6,9]
Name: 2, dtype: int64
row1:
a 5
b 6
interval [6,9]
Name: 2, dtype: int64
row2:
a 7
b 8
interval [9,10]
Name: 3, dtype: int64
有内置的方法可以实现吗?我看了df.groupby(df.index//2)和df.itertuples,但是这些方法似乎都不符合我的要求.
Is there a builtin way to achieve this?I looked at df.groupby(df.index // 2) and df.itertuples but none of these methods seems to do what I want.
修改:总体目标是获得一列布尔值,以指示时间间隔"列中的时间间隔是否重叠.在上面的示例中,列表为
The overall goal is to get a list of bools indicating whether the intervals in column "interval" overlap. In the above example the list would be
overlaps = [True, False, False]
每对一个傻瓜.
推荐答案
如果要保持循环for
,可以使用zip
和iterrows
是一种方法
If you want to keep the loop for
, using zip
and iterrows
could be a way
for (indx1,row1),(indx2,row2) in zip(df[:-1].iterrows(),df[1:].iterrows()):
print "row1:\n", row1
print "row2:\n", row2
print "\n"
要同时访问下一行,请使用df[1:].iterrows()
在第二行之后开始第二行.然后您就可以按照想要的方式获得输出.
To access the next row at the same time, start the second iterrow one row after with df[1:].iterrows()
. and you get the output the way you want.
row1:
a 1
b 2
Name: 0, dtype: int64
row2:
a 3
b 4
Name: 1, dtype: int64
row1:
a 3
b 4
Name: 1, dtype: int64
row2:
a 5
b 6
Name: 2, dtype: int64
row1:
a 5
b 6
Name: 2, dtype: int64
row2:
a 7
b 8
Name: 3, dtype: int64
但是正如@RafaelC所说,进行for
循环可能不是解决您的一般问题的最佳方法.
But as said @RafaelC, doing for
loop might not be the best method for your general problem.
这篇关于 pandas 遍历DataFrame行对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!