我正在尝试在数据框的行上应用公式,以获取行中数字的趋势。

以下示例在使用.apply的部分之前可用。

df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
axisvalues=list(range(1,len(db.columns)+1))

def calc_slope(row):
    return scipy.stats.linregress(df.iloc[row,:], y=axisvalues)

calc_slope(1) # this works

df["New"]=df.apply(calc_slope,axis=1) # this fails *- "too many values to unpack"*


感谢您的任何帮助

最佳答案

我认为您需要一个属性:

def calc_slope(row):
    a = scipy.stats.linregress(row, y=axisvalues)
    return a.slope

df["slope"]=df.apply(calc_slope,axis=1)
print (df)
          A         B         C         D     slope
0  0.444640  0.024624 -0.016216  0.228935 -2.553465
1  1.226611  1.962481  1.103834  0.645562 -1.455239
2 -0.259415  0.971097  0.124538 -0.704115 -0.718621
3  1.938422  1.787310 -0.619745 -2.560187 -0.575519
4 -0.986231 -1.942930  2.677379 -1.813071  0.075679
5  0.611214 -0.258453  0.053452  1.223544  0.841865
6  0.685435  0.962880 -1.517077 -0.101108 -0.652503
7  0.368278  1.314202  0.748189  2.116189  1.350132
8 -0.322053 -1.135443 -0.161071 -1.836761 -0.987341
9  0.798461  0.461736 -0.665127 -0.247887 -1.610447

对于所有属性,将namedtuple转换为dict,然后转换为Series。输出是新的DataFrame,因此如果有必要join是原始的:
np.random.seed(1997)

df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
axisvalues=list(range(1,len(df.columns)+1))

def calc_slope(row):
    a = scipy.stats.linregress(row, y=axisvalues)
    return pd.Series(a._asdict())

print (df.apply(calc_slope,axis=1))
      slope  intercept    rvalue    pvalue    stderr
0 -2.553465   2.935355 -0.419126  0.580874  3.911302
1 -1.455239   4.296670 -0.615324  0.384676  1.318236
2 -0.718621   2.523733 -0.395862  0.604138  1.178774
3 -0.575519   2.578530 -0.956682  0.043318  0.123843
4  0.075679   2.539066  0.127254  0.872746  0.417101
5  0.841865   2.156991  0.425333  0.574667  1.266674
6 -0.652503   2.504915 -0.561947  0.438053  0.679154
7  1.350132   0.965285  0.794704  0.205296  0.729193
8 -0.987341   1.647104 -0.593680  0.406320  0.946311
9 -1.610447   2.639780 -0.828856  0.171144  0.768641

df = df.join(df.apply(calc_slope,axis=1))
print (df)
          A         B         C         D     slope  intercept    rvalue  \
0  0.444640  0.024624 -0.016216  0.228935 -2.553465   2.935355 -0.419126
1  1.226611  1.962481  1.103834  0.645562 -1.455239   4.296670 -0.615324
2 -0.259415  0.971097  0.124538 -0.704115 -0.718621   2.523733 -0.395862
3  1.938422  1.787310 -0.619745 -2.560187 -0.575519   2.578530 -0.956682
4 -0.986231 -1.942930  2.677379 -1.813071  0.075679   2.539066  0.127254
5  0.611214 -0.258453  0.053452  1.223544  0.841865   2.156991  0.425333
6  0.685435  0.962880 -1.517077 -0.101108 -0.652503   2.504915 -0.561947
7  0.368278  1.314202  0.748189  2.116189  1.350132   0.965285  0.794704
8 -0.322053 -1.135443 -0.161071 -1.836761 -0.987341   1.647104 -0.593680
9  0.798461  0.461736 -0.665127 -0.247887 -1.610447   2.639780 -0.828856

     pvalue    stderr
0  0.580874  3.911302
1  0.384676  1.318236
2  0.604138  1.178774
3  0.043318  0.123843
4  0.872746  0.417101
5  0.574667  1.266674
6  0.438053  0.679154
7  0.205296  0.729193
8  0.406320  0.946311
9  0.171144  0.768641

关于python - 在 Pandas 行/回归线之间应用公式,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47635210/

10-09 20:21