我已经使用lm的选项model, x, y, qr做了一些实验.如果将它们全部设置为FALSE,则将尺寸减小38%library(MASS)fit1=lm(medv~lstat,data=Boston)size1 <- object.size(fit1)print(size1, units = "Kb")# 127.4 Kb bytesfit2=lm(medv~lstat,data=Boston,model=F,x=F,y=F,qr=F)size2 <- object.size(fit2)print(size2, units = "Kb")# 78.5 Kb Kb bytes- ((as.integer(size1) - as.integer(size2)) / as.integer(size1)) * 100# -38.37994但是summary(fit2)# Error in qr.lm(object) : lm object does not have a proper 'qr' component.# Rank zero or should not have used lm(.., qr=FALSE).predict(fit2,data=Boston)# Error in qr.lm(object) : lm object does not have a proper 'qr' component.# Rank zero or should not have used lm(.., qr=FALSE).显然,我需要保留qr=TRUE,与默认对象相比,该对象只能将对象大小减少9%fit3=lm(medv~lstat,data=Boston,model=F,x=F,y=F,qr=T)size3 <- object.size(fit3)print(size3, units = "Kb")# 115.8 Kb- ((as.integer(size1) - as.integer(size3)) / as.integer(size1)) * 100# -9.142752如何在不将大量不需要的信息转储到内存和存储中的情况下将"lm"对象的大小降至最低?解决方案此处的链接提供了一个相关的答案(对于glm对象,与lm输出对象非常相似). http://www.win-vector.com/blog/2014/05/trimming-the-fat-from-glm-models-in-r/ 基本上,仅预测使用系数部分,它是glm输出的很小一部分.下面的功能(从链接复制)修整了预测将不会使用的信息. 它确实有一个警告.修整后,summary(fit)或其他summary函数无法使用它,因为这些函数需要更多的预测需求.cleanModel1 = function(cm) { # just in case we forgot to set # y=FALSE and model=FALSE cm$y = c() cm$model = c() cm$residuals = c() cm$fitted.values = c() cm$effects = c() cm$qr$qr = c() cm$linear.predictors = c() cm$weights = c() cm$prior.weights = c() cm$data = c() cm}I want to run lm() on a large dataset with 50M+ observations with 2 predictors. The analysis is run on a remote server with only 10GB for storing the data. I have tested ´lm()´ on 10K observations sampled from the data and the resulting object had size 2GB+. I need the object of class "lm" returned from lm() ONLY to produce the summary statistics of the model (summary(lm_object)) and to make predictions (predict(lm_object)). I have done some experiment with the options model, x, y, qr of lm. If I set them all to FALSE I reduce the size by 38% library(MASS)fit1=lm(medv~lstat,data=Boston)size1 <- object.size(fit1)print(size1, units = "Kb")# 127.4 Kb bytesfit2=lm(medv~lstat,data=Boston,model=F,x=F,y=F,qr=F)size2 <- object.size(fit2)print(size2, units = "Kb")# 78.5 Kb Kb bytes- ((as.integer(size1) - as.integer(size2)) / as.integer(size1)) * 100# -38.37994butsummary(fit2)# Error in qr.lm(object) : lm object does not have a proper 'qr' component.# Rank zero or should not have used lm(.., qr=FALSE).predict(fit2,data=Boston)# Error in qr.lm(object) : lm object does not have a proper 'qr' component.# Rank zero or should not have used lm(.., qr=FALSE).Apparently I need to keep qr=TRUE which reduce the object size by only 9% if compared with the default objectfit3=lm(medv~lstat,data=Boston,model=F,x=F,y=F,qr=T)size3 <- object.size(fit3)print(size3, units = "Kb")# 115.8 Kb- ((as.integer(size1) - as.integer(size3)) / as.integer(size1)) * 100# -9.142752How do I bring the size of the "lm" object to a minimum without dumping a lot of unneeded information in memory and storage? 解决方案 The link here provides a relevant answer (for glm object, which is very similar to lm output object).http://www.win-vector.com/blog/2014/05/trimming-the-fat-from-glm-models-in-r/Basically, predict only use the coefficient part which is very small portion of the glm output. the function below (copied from the link) trim information that will not be used by predict. It does have a caveat though. After trimming, it can't be used by summary(fit) or other summary functions since those functions need more that what predict requires.cleanModel1 = function(cm) { # just in case we forgot to set # y=FALSE and model=FALSE cm$y = c() cm$model = c() cm$residuals = c() cm$fitted.values = c() cm$effects = c() cm$qr$qr = c() cm$linear.predictors = c() cm$weights = c() cm$prior.weights = c() cm$data = c() cm} 这篇关于如何最小化"lm"类对象的大小?在不影响传递给predict()的情况下的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-29 05:32