Pandas 数据框组内的计算 | 数据框组内的计算

本文介绍了Pandas 数据框组内的计算的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有 Pandas 数据框，如下所示.我想要做的是，partition (or groupby) by BlockID, LineID, WordID，然后在每个组内使用 current WordStartX - previous (WordStartX + WordWidth)派生出另一列，例如 WordDistance 表示这个词和前一个词之间的距离.

这篇文章

I've Pandas Dataframe as shown below. What I'm trying to do is, partition (or groupby) by BlockID, LineID, WordID, and then within each group use current WordStartX - previous (WordStartX + WordWidth) to derive another column, e.g., WordDistance to indicate the distance between this word and previous word.

This post Row operations within a group of a pandas dataframe is very helpful but in my case multiple columns involved (WordStartX and WordWidth).

 *BlockID  LineID  WordID  WordStartX  WordWidth     WordDistance
0        0       0       0         275        150                 0
1        0       0       1         431         96   431-(275+150)=6
2        0       0       2         642         90   642-(431+96)=115
3        0       0       3         746        104   746-(642+90)=14
4        1       0       0         273         69         ...
5        1       0       1         352        151         ...
6        1       0       2         510         92
7        1       0       3         647         90
8        1       0       4         752        105**

解决方案

The diff() and shift() functions are usually helpful for calculation referring to previous or next rows:

df['WordDistance'] = (df.groupby(['BlockID', 'LineID'])
        .apply(lambda g: g['WordStartX'].diff() - g['WordWidth'].shift()).fillna(0).values)

这篇关于Pandas 数据框组内的计算的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！