问题描述
我想在R中进行具有许多(例如100k)特征的大规模回归(线性/逻辑),其中每个示例在特征空间中相对稀疏-例如,每个特征中〜1k个非零特征例子.
I'd like to do large-scale regression (linear/logistic) in R with many (e.g. 100k) features, where each example is relatively sparse in the feature space---e.g., ~1k non-zero features per example.
SparseM 包slm
应该这样做,但是我很难从sparseMatrix
格式转换为slm
友好格式.
It seems like the SparseM package slm
should do this, but I'm having difficulty converting from the sparseMatrix
format to a slm
-friendly format.
我有一个标签为y
的数字矢量和一个特征为X
\ in {0,1}的sparseMatrix
.当我尝试
I have a numeric vector of labels y
and a sparseMatrix
of features X
\in {0,1}. When I try
model <- slm(y ~ X)
我收到以下错误:
Error in model.frame.default(formula = y ~ X) :
invalid type (S4) for variable 'X'
大概是因为slm
想要一个SparseM
对象而不是sparseMatrix
.
presumably because slm
wants a SparseM
object instead of a sparseMatrix
.
是否有一种简单的方法要么a)直接填充SparseM
对象,要么b)将sparseMatrix
转换为SparseM
对象?也许有更好/更简单的方法可以做到这一点?
Is there an easy way to either a) populate a SparseM
object directly or b) convert a sparseMatrix
to a SparseM
object? Or perhaps there's a better/simpler way to do this?
(我想我可以使用X
和y
显式地编写线性回归的解决方案,但是让slm
正常工作会很好.)
(I suppose I could explicitly code the solutions for linear regression using X
and y
, but it would be nice to have slm
working.)
推荐答案
不了解SparseM
,但是MatrixModels
软件包具有未导出的lm.fit.sparse
函数,您可以使用它.参见?MatrixModels:::lm.fit.sparse
.这是一个示例:
Don't know about SparseM
but the MatrixModels
package has an unexported lm.fit.sparse
function that you can use. See ?MatrixModels:::lm.fit.sparse
. Here is an example:
创建数据:
y <- rnorm(30)
x <- factor(sample(letters, 30, replace=TRUE))
X <- as(x, "sparseMatrix")
class(X)
# [1] "dgCMatrix"
# attr(,"package")
# [1] "Matrix"
dim(X)
# [1] 18 30
运行回归:
MatrixModels:::lm.fit.sparse(t(X), y)
# [1] -0.17499968 -0.89293312 -0.43585172 0.17233007 -0.11899582 0.56610302
# [7] 1.19654666 -1.66783581 -0.28511569 -0.11859264 -0.04037503 0.04826549
# [13] -0.06039113 -0.46127034 -1.22106064 -0.48729092 -0.28524498 1.81681527
为进行比较:
lm(y~x-1)
# Call:
# lm(formula = y ~ x - 1)
#
# Coefficients:
# xa xb xd xe xf xg xh xj
# -0.17500 -0.89293 -0.43585 0.17233 -0.11900 0.56610 1.19655 -1.66784
# xm xq xr xt xu xv xw xx
# -0.28512 -0.11859 -0.04038 0.04827 -0.06039 -0.46127 -1.22106 -0.48729
# xy xz
# -0.28524 1.81682
这篇关于稀疏特征矩阵的R中的大规模回归的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!