问题描述
DataFrame(MultiIndex)格式的一些说明性数据:
Some illustrative data in a DataFrame (MultiIndex) format:
|entity| year |value|+------+------+-----+| a | 1999 | 2 || | 2004 | 5 || b | 2003 | 3 || | 2007 | 2 || | 2014 | 7 |
|entity| year |value|+------+------+-----+| a | 1999 | 2 || | 2004 | 5 || b | 2003 | 3 || | 2007 | 2 || | 2014 | 7 |
在上述示例中,我想使用scipy.stats.linregress
为每个实体a
和b
计算斜率.我在 split-apply-combine建议之后尝试在第一列上使用groupby
,但似乎有问题,因为它期望一个值Series
(a
和b
),而我需要对右边的两列进行操作.
I would like to calculate the slope using scipy.stats.linregress
for each entity a
and b
in the above example. I tried using groupby
on the first column, following the split-apply-combine advice, but it seems problematic since it's expecting one Series
of values (a
and b
), whereas I need to operate on the two columns on the right.
这可以很容易地在R中通过plyr
完成,不确定如何在熊猫中进行.
This is easily done in R via plyr
, not sure how to approach it in pandas.
推荐答案
可以使用apply
函数将函数应用于groupby
.在这种情况下,传递的函数为linregress
.请参见以下内容:
A function can be applied to a groupby
with the apply
function. The passed function in this case linregress
. Please see below:
In [4]: x = pd.DataFrame({'entity':['a','a','b','b','b'],
'year':[1999,2004,2003,2007,2014],
'value':[2,5,3,2,7]})
In [5]: x
Out[5]:
entity value year
0 a 2 1999
1 a 5 2004
2 b 3 2003
3 b 2 2007
4 b 7 2014
In [6]: from scipy.stats import linregress
In [7]: x.groupby('entity').apply(lambda v: linregress(v.year, v.value)[0])
Out[7]:
entity
a 0.600000
b 0.403226
这篇关于使用Pandas groupby计算许多坡度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!