本文介绍了使用data.table创建一列回归系数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力解决它应该是我问过的上一个问题的简单扩展

I'm struggling with what seems like it should be a simple extension of a previous question I'd asked here.

我正在尝试在(a)个日期范围内进行汇总(b)因素变量。示例数据可能是:

I'm trying to aggregate over (a) a range of dates and (b) a factor variable. Sample data might be:

Brand    Day     Rev     RVP
  A      1        2535.00  195.00
  B      1        1785.45  43.55
  C      1        1730.87  32.66
  A      2        920.00   230.00
  B      2        248.22   48.99
  C      3        16466.00 189.00
  A      1        2535.00  195.00
  B      3        1785.45  43.55
  C      3        1730.87  32.66
  A      4        920.00   230.00
  B      5        248.22   48.99
  C      4        16466.00 189.00

借助有用的建议,我已经弄清楚了如何使用data.table查找几天内品牌的平均收入:

Thanks to helpful advice, I've figured out how to find the mean revenue for brands over a period of days using data.table:

new_df<-df[,(mean(Rev)), by=list(Brand,Day)]

现在,我想创建一个新表,其中有一个列,列出了每个品牌的按每日Rev进行的OLS回归得出的系数估算值。我试图这样做,如下所示:

Now, I'd like to create a new table where there is a column listing the coefficient estimate from an OLS regression of Rev by Day for each brand. I tried to do this as follows:

new_df2<-df[,(lm(Rev~Day)), by=list(Brand)]

这似乎不太正确。有什么想法吗?我确信这是我想念的东西。

That doesn't seem quite right. Thoughts? I'm sure it's something obvious I've missed.

推荐答案

我认为这就是您想要的:

I think this is what you want:

new_df2<-df[,(lm(Rev~Day)$coefficients[["Day"]]), by=list(Brand)]

lm 返回完整的模型对象,您需要向下钻取它,以便从每个组中获取单个值,然后将其转换为一列。

lm returns a full model object, you need to drill down into it to get a single value from each group that can be turned into a column.

这篇关于使用data.table创建一列回归系数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!