使用 R 中的 plm 包来拟合固定效应模型,向模型添加滞后变量的正确语法是什么?类似于 Stata 中的“L1.variable”命令。

这是我添加滞后变量的尝试(这是一个测试模型,可能没有意义):

library(foreign)
nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
pnlswork <- plm.data(nlswork, c('idcode', 'year'))
ffe <- plm(ln_wage ~ ttl_exp+lag(wks_work,1)
           , model = 'within'
           , data = nlswork)
summary(ffe)

R输出:
Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = nlswork,
    model = "within")

Unbalanced Panel: n=3911, T=1-14, N=19619

Residuals :
    Min.  1st Qu.   Median  3rd Qu.     Max.
-1.77000 -0.10100  0.00293  0.11000  2.90000

Coefficients :
                Estimate Std. Error t-value  Pr(>|t|)
ttl_exp       0.02341057 0.00073832 31.7078 < 2.2e-16 ***
lag(wks_work) 0.00081576 0.00010628  7.6755 1.744e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    1296.9
Residual Sum of Squares: 1126.9
R-Squared:      0.13105
Adj. R-Squared: -0.085379
F-statistic: 1184.39 on 2 and 15706 DF, p-value: < 2.22e-16

但是,与 Stata 产生的结果相比,我得到了不同的结果。

在我的实际模型中,我想用其滞后值来检测内生变量。

谢谢!

作为引用,这里是Stata代码:
webuse nlswork.dta
xtset idcode year
xtreg ln_wage ttl_exp L1.wks_work, fe

统计输出:
Fixed-effects (within) regression               Number of obs     =     10,680
Group variable: idcode                          Number of groups  =      3,671

R-sq:                                           Obs per group:
     within  = 0.1492                                         min =          1
     between = 0.2063                                         avg =        2.9
     overall = 0.1483                                         max =          8

                                                F(2,7007)         =     614.60
corr(u_i, Xb)  = 0.1329                         Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     ttl_exp |   .0192578   .0012233    15.74   0.000     .0168597    .0216558
             |
    wks_work |
         L1. |   .0015891   .0001957     8.12   0.000     .0012054    .0019728
             |
       _cons |   1.502879   .0075431   199.24   0.000     1.488092    1.517666
-------------+----------------------------------------------------------------
     sigma_u |  .40678942
     sigma_e |  .28124886
         rho |  .67658275   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(3670, 7007) = 4.71                  Prob > F = 0.0000

最佳答案

lag()plm 中按行滞后观察而不“查看”时间变量,即它移动变量(每个人)。如果时间维度存在间隙,您可能需要考虑时间变量的值。有(截至目前)未导出的函数 plm:::lagt.pseries 它将时间变量考虑在内,因此可以按照您的预期处理数据中的差距。
编辑 :自 plm 版本 1.7-0 起,plm 中 lag 的默认行为是按时间移动,但可以通过参数 shift ( shift = c("time", "row") ) 控制行为以按时间或按行移动(旧行为)。
使用方法如下:

library(plm)
library(foreign)
nlswork <- read.dta("http://www.stata-press.com/data/r11/nlswork.dta")
pnlswork <- pdata.frame(nlswork, c('idcode', 'year'))
ffe <- plm(ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work,1)
           , model = 'within'
           , data = pnlswork)
summary(ffe)

Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ ttl_exp + plm:::lagt.pseries(wks_work,
    1), data = nlswork, model = "within")

Unbalanced Panel: n=3671, T=1-8, N=10680

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max.
-1.5900 -0.0859  0.0000  0.0957  2.5600

Coefficients :
                                  Estimate Std. Error t-value  Pr(>|t|)
ttl_exp                         0.01925775 0.00122330 15.7425 < 2.2e-16 ***
plm:::lagt.pseries(wks_work, 1) 0.00158907 0.00019573  8.1186 5.525e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    651.49
Residual Sum of Squares: 554.26
R-Squared:      0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16
顺便说一句:最好使用 pdata.frame() 而不是 plm.data()
顺便说一句:您可以使用 plm 的 is.pconsecutive() 检查数据中的差距:
is.pconsecutive(pnlswork)
all(is.pconsecutive(pnlswork))
您也可以先使数据连续,然后使用 lag() ,如下所示:
pnlswork2 <- make.pconsecutive(pnlswork)
pnlswork2$wks_work_lag <- lag(pnlswork2$wks_work)
ffe2 <- plm(ln_wage ~ ttl_exp + wks_work_lag
           , model = 'within'
           , data = pnlswork2)
summary(ffe2)

Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ ttl_exp + wks_work_lag, data = pnlswork2,
    model = "within")

Unbalanced Panel: n=3671, T=1-8, N=10680

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max.
-1.5900 -0.0859  0.0000  0.0957  2.5600

Coefficients :
               Estimate Std. Error t-value  Pr(>|t|)
ttl_exp      0.01925775 0.00122330 15.7425 < 2.2e-16 ***
wks_work_lag 0.00158907 0.00019573  8.1186 5.525e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    651.49
Residual Sum of Squares: 554.26
R-Squared:      0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16
或者干脆:
ffe3 <- plm(ln_wage ~ ttl_exp + lag(wks_work)
            , model = 'within'
            , data = pnlswork2) # note: it is the consecutive panel data set here
summary(ffe3)

Oneway (individual) effect Within Model

Call:
plm(formula = ln_wage ~ ttl_exp + lag(wks_work), data = pnlswork2,
    model = "within")

Unbalanced Panel: n=3671, T=1-8, N=10680

Residuals :
   Min. 1st Qu.  Median 3rd Qu.    Max.
-1.5900 -0.0859  0.0000  0.0957  2.5600

Coefficients :
                Estimate Std. Error t-value  Pr(>|t|)
ttl_exp       0.01925775 0.00122330 15.7425 < 2.2e-16 ***
lag(wks_work) 0.00158907 0.00019573  8.1186 5.525e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Total Sum of Squares:    651.49
Residual Sum of Squares: 554.26
R-Squared:      0.14924
Adj. R-Squared: -0.29659
F-statistic: 614.604 on 2 and 7007 DF, p-value: < 2.22e-16

关于R plm lag - 相当于 Stata 中的 L1.x 是什么?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/43926625/

10-12 13:57