本文介绍了使用Pandas groupby计算许多坡度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!



Some illustrative data in a DataFrame (MultiIndex) format:

|entity| year |value|+------+------+-----+| a | 1999 | 2 || | 2004 | 5 || b | 2003 | 3 || | 2007 | 2 || | 2014 | 7 |

|entity| year |value|+------+------+-----+| a | 1999 | 2 || | 2004 | 5 || b | 2003 | 3 || | 2007 | 2 || | 2014 | 7 |

在上述示例中,我想使用scipy.stats.linregress为每个实体a​​和b计算斜率.我在 split-apply-combine建议之后尝试在第一列上使用groupby ,但似乎有问题,因为它期望一个值Series(ab),而我需要对右边的两列进行操作.

I would like to calculate the slope using scipy.stats.linregress for each entity a and b in the above example. I tried using groupby on the first column, following the split-apply-combine advice, but it seems problematic since it's expecting one Series of values (a and b), whereas I need to operate on the two columns on the right.


This is easily done in R via plyr, not sure how to approach it in pandas.



A function can be applied to a groupby with the apply function. The passed function in this case linregress. Please see below:

In [4]: x = pd.DataFrame({'entity':['a','a','b','b','b'],

In [5]: x
  entity  value  year
0      a      2  1999
1      a      5  2004
2      b      3  2003
3      b      2  2007
4      b      7  2014

In [6]: from scipy.stats import linregress

In [7]: x.groupby('entity').apply(lambda v: linregress(v.year, v.value)[0])
a    0.600000
b    0.403226

这篇关于使用Pandas groupby计算许多坡度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-23 04:14