数据框组内的计算

数据框组内的计算

本文介绍了Pandas 数据框组内的计算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有 Pandas 数据框,如下所示.我想要做的是,partition (or groupby) by BlockID, LineID, WordID,然后在每个组内使用 current WordStartX - previous (WordStartX + WordWidth)派生出另一列,例如 WordDistance 表示这个词和前一个词之间的距离.

这篇文章

I've Pandas Dataframe as shown below. What I'm trying to do is, partition (or groupby) by BlockID, LineID, WordID, and then within each group use current WordStartX - previous (WordStartX + WordWidth) to derive another column, e.g., WordDistance to indicate the distance between this word and previous word.

This post Row operations within a group of a pandas dataframe is very helpful but in my case multiple columns involved (WordStartX and WordWidth).

 *BlockID  LineID  WordID  WordStartX  WordWidth     WordDistance
0        0       0       0         275        150                 0
1        0       0       1         431         96   431-(275+150)=6
2        0       0       2         642         90   642-(431+96)=115
3        0       0       3         746        104   746-(642+90)=14
4        1       0       0         273         69         ...
5        1       0       1         352        151         ...
6        1       0       2         510         92
7        1       0       3         647         90
8        1       0       4         752        105**
解决方案

The diff() and shift() functions are usually helpful for calculation referring to previous or next rows:

df['WordDistance'] = (df.groupby(['BlockID', 'LineID'])
        .apply(lambda g: g['WordStartX'].diff() - g['WordWidth'].shift()).fillna(0).values)

这篇关于Pandas 数据框组内的计算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 03:26