问题描述
作为最佳实践,我试图确定创建一个函数并在整个矩阵中apply()
更好,或者简单地通过该函数循环一个矩阵是否更好.我同时尝试了两种方法,但惊讶地发现apply()
的速度较慢.该任务是获取一个向量并将其评估为正或负,然后返回一个向量(如果为正,则为1,如果为负,则为-1). mash()
函数循环,并且squish()
函数传递给apply()
函数.
As a matter of best practices, I'm trying to determine if it's better to create a function and apply()
it across a matrix, or if it's better to simply loop a matrix through the function. I tried it both ways and was surprised to find apply()
is slower. The task is to take a vector and evaluate it as either being positive or negative and then return a vector with 1 if it's positive and -1 if it's negative. The mash()
function loops and the squish()
function is passed to the apply()
function.
million <- as.matrix(rnorm(100000))
mash <- function(x){
for(i in 1:NROW(x))
if(x[i] > 0) {
x[i] <- 1
} else {
x[i] <- -1
}
return(x)
}
squish <- function(x){
if(x >0) {
return(1)
} else {
return(-1)
}
}
ptm <- proc.time()
loop_million <- mash(million)
proc.time() - ptm
ptm <- proc.time()
apply_million <- apply(million,1, squish)
proc.time() - ptm
loop_million
结果:
user system elapsed
0.468 0.008 0.483
apply_million
结果:
user system elapsed
1.401 0.021 1.423
如果性能下降,在for
循环上使用apply()
有什么优势?我的测试有缺陷吗?我比较了两个结果对象的线索,发现:
What is the advantage to using apply()
over a for
loop if performance is degraded? Is there a flaw in my test? I compared the two resulting objects for a clue and found:
> class(apply_million)
[1] "numeric"
> class(loop_million)
[1] "matrix"
这只会加深神秘感. apply()
函数不能接受简单的数值向量,这就是为什么我一开始使用as.matrix()
对其进行强制转换的原因.但随后它返回一个数字. for
循环可以使用简单的数值向量.它返回与传递给它的类相同的对象.
Which only deepens the mystery. The apply()
function cannot accept a simple numeric vector and that's why I cast it with as.matrix()
in the beginning. But then it returns a numeric. The for
loop is fine with a simple numeric vector. And it returns an object of same class as that one passed to it.
推荐答案
正如Chase所说:使用矢量化功能.您在这里比较两个不好的解决方案.
As Chase said: Use the power of vectorization. You're comparing two bad solutions here.
要弄清楚为什么您的申请解决方案比较慢:
To clarify why your apply solution is slower:
在for循环中,实际上使用了矩阵的向量化索引,这意味着不会进行类型转换.我在这里对此进行了粗略的介绍,但是基本上内部计算类型忽略了维数.它们只是保留为一个属性,并与代表矩阵的向量一起返回.为了说明:
Within the for loop, you actually use the vectorized indices of the matrix, meaning there is no conversion of type going on. I'm going a bit rough over it here, but basically the internal calculation kind of ignores the dimensions. They're just kept as an attribute and returned with the vector representing the matrix. To illustrate :
> x <- 1:10
> attr(x,"dim") <- c(5,2)
> y <- matrix(1:10,ncol=2)
> all.equal(x,y)
[1] TRUE
现在,当您使用apply时,矩阵会在内部分成100,000个行向量,每个行向量(即一个数字)都通过该函数放置,最后将结果合并为适当的形式. apply函数认为在这种情况下向量最好,因此必须将所有行的结果连接起来.这需要时间.
Now, when you use the apply, the matrix is split up internally in 100,000 row vectors, every row vector (i.e. a single number) is put through the function, and in the end the result is combined into an appropriate form. The apply function reckons a vector is best in this case, and thus has to concatenate the results of all rows. This takes time.
此外,sapply函数首先使用as.vector(unlist(...))
将任何内容转换为向量,最后尝试将答案简化为合适的形式.同样,这需要时间,因此在此应用的速度也可能会变慢.但是,它不在我的机器上.
Also the sapply function first uses as.vector(unlist(...))
to convert anything to a vector, and in the end tries to simplify the answer into a suitable form. Also this takes time, hence also the sapply might be slower here. Yet, it's not on my machine.
如果在这里适用(但不是),可以进行比较:
IF apply would be a solution here (and it isn't), you could compare :
> system.time(loop_million <- mash(million))
user system elapsed
0.75 0.00 0.75
> system.time(sapply_million <- matrix(unlist(sapply(million,squish,simplify=F))))
user system elapsed
0.25 0.00 0.25
> system.time(sapply2_million <- matrix(sapply(million,squish)))
user system elapsed
0.34 0.00 0.34
> all.equal(loop_million,sapply_million)
[1] TRUE
> all.equal(loop_million,sapply2_million)
[1] TRUE
这篇关于为什么apply()方法比R中的for循环要慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!