问题描述
这是我的数据框df的示例:
here is an example of my dataframe df:
Category Y X1 X2
0 Apple 0.083050996 0.164056482 0.519875358
1 Apple 0.411044939 0.774160332 0.002869499
2 Apple 0.524315907 0.422193005 0.97720091
3 Apple 0.721124638 0.645927536 0.750210715
4 Berry 0.134488729 0.299288214 0.522933484
5 Berry 0.733162132 0.608742944 0.957595544
6 Berry 0.113051075 0.641533175 0.19799635
7 Berry 0.275379123 0.249143751 0.049082766
8 Carrot 0.588121494 0.750480977 0.615399987
9 Carrot 0.878221581 0.021366296 0.069184879
现在,我希望代码能够对每个类别进行回归(即,按类别分组的横截面回归(对于Apple,Berry和Carrot等)).
Now I want the code to be able to do a regression for each Category (ie, cross sectional regression grouped by Category (for Apple, Berry and Carrot etc,)).
然后我要添加新列df ['Y_hat'],该列具有回归的预测值,以及相应的2个beta和t统计值(beta和t-stat值对于多行同一类别).
Then I want to add new columns df['Y_hat'] which has the forecast value from the regression, and the corresponding 2 beta and t-statistic values (beta and t-stat values would be the same for multiple rows of same category).
最终df会有5列,分别是Y_hat,beta 1,beta 2,t-stat 1和t-stat 2.
Final df would have 5 additional columns, Y_hat, beta 1, beta 2 , t-stat 1 and t-stat 2.
推荐答案
您要为"GroupBy"做很多事情:)
You want to do a lot of things for a "GroupBy" :)
我认为最好是按类别对DataFrame进行切片,然后将该类别的每个单独结果存储在字典中,然后在循环末尾使用该字典来构建DataFrame.
I think is better if you slice the DataFrame by Category, then store each individual result for that category in a dictionary which you're going to use at the end of the loop to build your DataFrame.
result = {}
# loop on every category
for category in df['Category'].unique():
# slice
df_slice = df[df['Category'] == category]
# run all the stuff your want to do
result[category] = {
'predicted_value': ***,
'Y_hat': ***
'etc'
...
}
# build dataframe with all your results
final_df = pd.DataFrame(result)
如果也需要调试的话,将会更加容易!祝你好运!:)
Will be much easier if ever need to debug too! Good luck! :)
这篇关于在pandas数据框中按组进行回归,并添加包含预测值和beta/t-stats的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!