稀疏特征矩阵的R中的大规模回归

本文介绍了稀疏特征矩阵的R中的大规模回归的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在R中进行具有许多(例如100k)特征的大规模回归(线性/逻辑)，其中每个示例在特征空间中相对稀疏-例如，每个特征中〜1k个非零特征例子.

I'd like to do large-scale regression (linear/logistic) in R with many (e.g. 100k) features, where each example is relatively sparse in the feature space---e.g., ~1k non-zero features per example.

SparseM 包slm应该这样做，但是我很难从sparseMatrix格式转换为slm友好格式.

It seems like the SparseM package slm should do this, but I'm having difficulty converting from the sparseMatrix format to a slm-friendly format.

我有一个标签为y的数字矢量和一个特征为X \ in {0,1}的sparseMatrix.当我尝试

I have a numeric vector of labels y and a sparseMatrix of features X \in {0,1}. When I try

model <- slm(y ~ X)

我收到以下错误:

Error in model.frame.default(formula = y ~ X) :
invalid type (S4) for variable 'X'

大概是因为slm想要一个SparseM对象而不是sparseMatrix.

presumably because slm wants a SparseM object instead of a sparseMatrix.

是否有一种简单的方法要么a)直接填充SparseM对象，要么b)将sparseMatrix转换为SparseM对象?也许有更好/更简单的方法可以做到这一点?

Is there an easy way to either a) populate a SparseM object directly or b) convert a sparseMatrix to a SparseM object? Or perhaps there's a better/simpler way to do this?

(我想我可以使用X和y显式地编写线性回归的解决方案，但是让slm正常工作会很好.)

(I suppose I could explicitly code the solutions for linear regression using X and y, but it would be nice to have slm working.)

推荐答案

不了解SparseM，但是MatrixModels软件包具有未导出的lm.fit.sparse函数，您可以使用它.参见?MatrixModels:::lm.fit.sparse.这是一个示例:

Don't know about SparseM but the MatrixModels package has an unexported lm.fit.sparse function that you can use. See ?MatrixModels:::lm.fit.sparse. Here is an example:

创建数据:

y <- rnorm(30)
x <- factor(sample(letters, 30, replace=TRUE))
X <- as(x, "sparseMatrix")
class(X)
# [1] "dgCMatrix"
# attr(,"package")
# [1] "Matrix"
dim(X)
# [1] 18 30

运行回归:

MatrixModels:::lm.fit.sparse(t(X), y)
#  [1] -0.17499968 -0.89293312 -0.43585172  0.17233007 -0.11899582  0.56610302
#  [7]  1.19654666 -1.66783581 -0.28511569 -0.11859264 -0.04037503  0.04826549
# [13] -0.06039113 -0.46127034 -1.22106064 -0.48729092 -0.28524498  1.81681527

为进行比较:

lm(y~x-1)

# Call:
# lm(formula = y ~ x - 1)
#
# Coefficients:
#       xa        xb        xd        xe        xf        xg        xh        xj
# -0.17500  -0.89293  -0.43585   0.17233  -0.11900   0.56610   1.19655  -1.66784
#       xm        xq        xr        xt        xu        xv        xw        xx
# -0.28512  -0.11859  -0.04038   0.04827  -0.06039  -0.46127  -1.22106  -0.48729
#       xy        xz
# -0.28524   1.81682

这篇关于稀疏特征矩阵的R中的大规模回归的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！