本文介绍了fastLm() 比 lm() 慢得多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

fastLm()lm() 慢得多.基本上,我只是用相同的公式和数据调用 lm()fastLm() ,但是 fastLm() 似乎比lm().这可能吗?我就是不知道怎么会这样?

fastLm() is much slower than lm().Basically, I just call lm() and fastLm() with the same formula and data, but fastLm() seems to be much slower than lm().Is this possible? I just don't know how could this happen?

dim(dat)
#[1] 87462    90
##
library(Rcpp)
library(RcppEigen)
library(rbenchmark)

benchmark(fastLm(formula(mez),data=dat),lm(formula(mez),data=dat))
                              test replications elapsed relative user.self  sys.self user.child sys.child
1 fastLm(formula(mez), data = dat)          100  195.81    7.079    189.36     6.27         NA        NA
2     lm(formula(mez), data = dat)          100   27.66    1.000     24.52     3.02         NA        NA

summary(mez)

Call: lm(formula = totalActualVal ~ township + I(TotalFinishedSF^2) +
    mainfloorSF + nbrFullBaths + township + range + qualityCodeDscr +
    TotalFinishedSF:range + nbrBedRoom + PCT_HISP, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max
-2607622   -53820    -2893    40704  3116043

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)
(Intercept)                   2.418e+05  3.211e+03  75.307 < 2e-16 ***
township1S                    1.907e+04  1.239e+03  15.385 < 2e-16 ***
township2N                   -7.540e+04  1.467e+03 -51.383 < 2e-16 ***
township3N                   -9.482e+04  1.482e+03 -63.976 < 2e-16 ***
I(TotalFinishedSF^2)          1.415e-02  3.923e-04  36.063 < 2e-16 ***
mainfloorSF                   6.754e+01  1.233e+00  54.793 < 2e-16 ***
nbrFullBaths                  5.261e+03  7.542e+02   6.977 3.05e-12 ***
range71                      -2.802e+04  5.172e+03  -5.418 6.03e-08 ***
range72                      -5.599e+04  7.615e+03  -7.353 1.96e-13 ***
range73                      -6.414e+04  1.067e+04  -6.010 1.86e-09 ***
rangeothers                  -6.571e+04  2.662e+03 -24.687  < 2e-16 ***
qualityCodeDscrEXCELLENT      5.800e+05  4.170e+03 139.090   < 2e-16 ***
qualityCodeDscrEXCELLENT +    8.453e+05  9.713e+03  87.027   < 2e-16 ***
qualityCodeDscrEXCELLENT++    8.929e+05  1.013e+04  88.149   < 2e-16 ***
qualityCodeDscrEXCEPTIONAL 1  1.134e+06  8.336e+03 136.005   < 2e-16 ***
qualityCodeDscrEXCEPTIONAL 2  1.536e+06  1.411e+04 108.884   < 2e-16 ***
qualityCodeDscrEXCEPTIONAL 3  2.061e+06  4.679e+04  44.040   < 2e-16 ***
qualityCodeDscrFAIR          -3.288e+04  3.760e+03  -8.744   < 2e-16 ***
qualityCodeDscrGUT            5.931e+04  1.142e+03  51.941   < 2e-16 ***
qualityCodeDscrLOW           -1.394e+05  1.799e+04  -7.748 9.45e-15 ***
qualityCodeDscrVERY GOOD      2.106e+05  2.242e+03  93.925  < 2e-16 ***
qualityCodeDscrVERY GOOD +    3.126e+05  4.406e+03  70.942   < 2e-16 ***
qualityCodeDscrVERY GOOD ++   4.042e+05  3.839e+03 105.275   < 2e-16 ***
nbrBedRoom                    2.334e+04  5.874e+02  39.739   < 2e-16 ***
PCT_HISP                     -1.571e+03  5.162e+01 -30.426   < 2e-16 ***
range70 :TotalFinishedSF      3.997e+01  2.363e+00  16.919   < 2e-16 ***
range71 :TotalFinishedSF      1.300e+02  2.990e+00  43.490   < 2e-16 ***
range72 :TotalFinishedSF     -2.289e+01  4.598e+00  -4.978 6.42e-07 ***
range73 :TotalFinishedSF     -4.111e+01  6.797e+00  -6.048 1.47e-09 ***
rangeothers:TotalFinishedSF  -6.331e+00  2.215e+00  -2.859  0.00426 **

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 129100 on 87432 degrees of freedom Multiple
R-squared:  0.8296, Adjusted R-squared:  0.8295  F-statistic:
1.468e+04 on 29 and 87432 DF,  p-value: < 2.2e-16

推荐答案

RcppArmadillo 有更好的示例脚本,其中不同的版本定时:

The RcppArmadillo has a better example script in which different version are timed:

edd@max:~/git/rcpparmadillo/inst/examples(master)$ Rscript fastLm.r
                       test replications relative elapsed
4             fLmSEXP(X, y)         5000    1.000   0.174
2         fLmTwoCasts(X, y)         5000    1.017   0.177
3         fLmConstRef(X, y)         5000    1.029   0.179
1          fLmOneCast(X, y)         5000    1.069   0.186
6   fastLmPureDotCall(X, y)         5000    1.218   0.212
5          fastLmPure(X, y)         5000    1.908   0.332
8              lm.fit(X, y)         5000    2.207   0.384
7 fastLm(frm, data = trees)         5000   29.609   5.152
9     lm(frm, data = trees)         5000   36.977   6.434

edd@max:~/git/rcpparmadillo/inst/examples(master)$

最后两个使用公式——这清楚地表明如果您追求速度,则不想使用公式,因为解析公式比实际运行要花费更长的时间回归.你可以为 RcppEigen 设置类似的东西,结果会相似.

The last two use a formula -- and this clearly shows that you do not want to use a formula if you are after speed as deparsing the formula takes a lot longer than actually running the regression. You could set something similar up for RcppEigen, the results will be similar.

这篇关于fastLm() 比 lm() 慢得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-13 12:37