问题描述
我正在寻找一种更快的替代R的hist(x, breaks=XXX, plot=FALSE)$count
函数的方法,因为我不需要生成任何其他输出(因为我想在sapply
调用中使用它,因此需要进行一百万次迭代其中将调用此函数),例如
I am on the lookout for a faster alternative to R's hist(x, breaks=XXX, plot=FALSE)$count
function as I don't need any of the other output that is produced (as I want to use it in an sapply
call, requiring 1 million iterations in which this function would be called), e.g.
x = runif(100000000, 2.5, 2.6)
bincounts = hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count
有什么想法吗?
推荐答案
首次尝试使用table
和cut
:
table(cut(x, breaks=seq(0,3,length.out=100)))
它避免了多余的输出,但是在我的计算机上大约需要34秒:
It avoids the extra output, but takes about 34 seconds on my computer:
system.time(table(cut(x, breaks=seq(0,3,length.out=100))))
user system elapsed
34.148 0.532 34.696
相比,hist
为3.5秒:
system.time(hist(x, breaks=seq(0,3,length.out=100), plot=FALSE)$count)
user system elapsed
3.448 0.156 3.605
使用tabulate
和.bincode
的速度比hist
快一点:
Using tabulate
and .bincode
runs a little bit faster than hist
:
tabulate(.bincode(x, breaks=seq(0,3,length.out=100)), nbins=100)
system.time(tabulate(.bincode(x, breaks=seq(0,3,length.out=100))), nbins=100)
user system elapsed
3.084 0.024 3.107
使用tablulate
和findInterval
相对于table
和cut
可以显着提高性能,并且相对于hist
可以改善:
Using tablulate
and findInterval
provides a significant performance boost relative to table
and cut
and has an OK improvement relative to hist
:
tabulate(findInterval(x, vec=seq(0,3,length.out=100)), nbins=100)
system.time(tabulate(findInterval(x, vec=seq(0,3,length.out=100))), nbins=100)
user system elapsed
2.044 0.012 2.055
这篇关于R-hist(XX,plot = FALSE)$ count的更快替代方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!