我正在尝试在数据框的行上应用公式,以获取行中数字的趋势。
以下示例在使用.apply
的部分之前可用。
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
axisvalues=list(range(1,len(db.columns)+1))
def calc_slope(row):
return scipy.stats.linregress(df.iloc[row,:], y=axisvalues)
calc_slope(1) # this works
df["New"]=df.apply(calc_slope,axis=1) # this fails *- "too many values to unpack"*
感谢您的任何帮助
最佳答案
我认为您需要一个属性:
def calc_slope(row):
a = scipy.stats.linregress(row, y=axisvalues)
return a.slope
df["slope"]=df.apply(calc_slope,axis=1)
print (df)
A B C D slope
0 0.444640 0.024624 -0.016216 0.228935 -2.553465
1 1.226611 1.962481 1.103834 0.645562 -1.455239
2 -0.259415 0.971097 0.124538 -0.704115 -0.718621
3 1.938422 1.787310 -0.619745 -2.560187 -0.575519
4 -0.986231 -1.942930 2.677379 -1.813071 0.075679
5 0.611214 -0.258453 0.053452 1.223544 0.841865
6 0.685435 0.962880 -1.517077 -0.101108 -0.652503
7 0.368278 1.314202 0.748189 2.116189 1.350132
8 -0.322053 -1.135443 -0.161071 -1.836761 -0.987341
9 0.798461 0.461736 -0.665127 -0.247887 -1.610447
对于所有属性,将namedtuple转换为
dict
,然后转换为Series
。输出是新的DataFrame
,因此如果有必要join
是原始的:np.random.seed(1997)
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
axisvalues=list(range(1,len(df.columns)+1))
def calc_slope(row):
a = scipy.stats.linregress(row, y=axisvalues)
return pd.Series(a._asdict())
print (df.apply(calc_slope,axis=1))
slope intercept rvalue pvalue stderr
0 -2.553465 2.935355 -0.419126 0.580874 3.911302
1 -1.455239 4.296670 -0.615324 0.384676 1.318236
2 -0.718621 2.523733 -0.395862 0.604138 1.178774
3 -0.575519 2.578530 -0.956682 0.043318 0.123843
4 0.075679 2.539066 0.127254 0.872746 0.417101
5 0.841865 2.156991 0.425333 0.574667 1.266674
6 -0.652503 2.504915 -0.561947 0.438053 0.679154
7 1.350132 0.965285 0.794704 0.205296 0.729193
8 -0.987341 1.647104 -0.593680 0.406320 0.946311
9 -1.610447 2.639780 -0.828856 0.171144 0.768641
df = df.join(df.apply(calc_slope,axis=1))
print (df)
A B C D slope intercept rvalue \
0 0.444640 0.024624 -0.016216 0.228935 -2.553465 2.935355 -0.419126
1 1.226611 1.962481 1.103834 0.645562 -1.455239 4.296670 -0.615324
2 -0.259415 0.971097 0.124538 -0.704115 -0.718621 2.523733 -0.395862
3 1.938422 1.787310 -0.619745 -2.560187 -0.575519 2.578530 -0.956682
4 -0.986231 -1.942930 2.677379 -1.813071 0.075679 2.539066 0.127254
5 0.611214 -0.258453 0.053452 1.223544 0.841865 2.156991 0.425333
6 0.685435 0.962880 -1.517077 -0.101108 -0.652503 2.504915 -0.561947
7 0.368278 1.314202 0.748189 2.116189 1.350132 0.965285 0.794704
8 -0.322053 -1.135443 -0.161071 -1.836761 -0.987341 1.647104 -0.593680
9 0.798461 0.461736 -0.665127 -0.247887 -1.610447 2.639780 -0.828856
pvalue stderr
0 0.580874 3.911302
1 0.384676 1.318236
2 0.604138 1.178774
3 0.043318 0.123843
4 0.872746 0.417101
5 0.574667 1.266674
6 0.438053 0.679154
7 0.205296 0.729193
8 0.406320 0.946311
9 0.171144 0.768641
关于python - 在 Pandas 行/回归线之间应用公式,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/47635210/