在pandas数据框中按组进行回归，并添加包含预测值和beta/t-stats的列

本文介绍了在pandas数据框中按组进行回归，并添加包含预测值和beta/t-stats的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这是我的数据框df的示例:

here is an example of my dataframe df:

Category    Y                 X1        X2
0   Apple   0.083050996 0.164056482 0.519875358
1   Apple   0.411044939 0.774160332 0.002869499
2   Apple   0.524315907 0.422193005 0.97720091
3   Apple   0.721124638 0.645927536 0.750210715
4   Berry   0.134488729 0.299288214 0.522933484
5   Berry   0.733162132 0.608742944 0.957595544
6   Berry   0.113051075 0.641533175 0.19799635
7   Berry   0.275379123 0.249143751 0.049082766
8   Carrot  0.588121494 0.750480977 0.615399987
9   Carrot  0.878221581 0.021366296 0.069184879

现在，我希望代码能够对每个类别进行回归(即，按类别分组的横截面回归(对于Apple，Berry和Carrot等)).

Now I want the code to be able to do a regression for each Category (ie, cross sectional regression grouped by Category (for Apple, Berry and Carrot etc,)).

然后我要添加新列df ['Y_hat']，该列具有回归的预测值，以及相应的2个beta和t统计值(beta和t-stat值对于多行同一类别).

Then I want to add new columns df['Y_hat'] which has the forecast value from the regression, and the corresponding 2 beta and t-statistic values (beta and t-stat values would be the same for multiple rows of same category).

最终df会有5列，分别是Y_hat，beta 1，beta 2，t-stat 1和t-stat 2.

Final df would have 5 additional columns, Y_hat, beta 1, beta 2 , t-stat 1 and t-stat 2.

推荐答案

您要为"GroupBy"做很多事情:)

You want to do a lot of things for a "GroupBy" :)

我认为最好是按类别对DataFrame进行切片，然后将该类别的每个单独结果存储在字典中，然后在循环末尾使用该字典来构建DataFrame.

I think is better if you slice the DataFrame by Category, then store each individual result for that category in a dictionary which you're going to use at the end of the loop to build your DataFrame.

result = {}
# loop on every category
for category in df['Category'].unique():
    # slice
    df_slice = df[df['Category'] == category]
    # run all the stuff your want to do
    result[category] = {
      'predicted_value': ***,
      'Y_hat': ***
      'etc'
      ...
    }

# build dataframe with all your results
final_df = pd.DataFrame(result)

如果也需要调试的话，将会更加容易！祝你好运！:)

Will be much easier if ever need to debug too! Good luck! :)

这篇关于在pandas数据框中按组进行回归，并添加包含预测值和beta/t-stats的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！