问题描述
我有一个非零对称矩阵'matr',它是12000X12000.我需要在R的'matr'中找到前10000个元素的索引.我编写的代码花费了很长时间-我想知道是否有任何指针可以使其更快.
I have a nonzero symmetric matrix 'matr' that is 12000X12000. I need to find the indices of the top 10000 elements in 'matr' in R. The code I have written takes a long time - I was wondering if there was any pointers to make it faster.
listk <- numeric(0)
for( i in 1:10000) {
idx <- which(matr == max(matr), arr.ind=T)
if( length(idx) != 0) {
listk <- rbind( listk, idx[1,])
matr[idx[1,1], idx[1,2]] <- 0
matr[idx[2,1], idx[2,2]] <- 0
}
}
推荐答案
参加聚会有点晚了,但是我想出了这个,避免了排序.
A bit late into the party, but I came up with this, which avoids the sort.
假设您要从12k x 12k矩阵中选出前10k个元素.想法是将数据裁剪"到与该大小的分位数相对应的元素.
Say you want the top 10k elements from you 12k x 12k matrix. The idea is to "clip" the data to the elements corresponding to a quantile of that size.
find_n_top_elements <- function( x, n ){
#set the quantile to correspond to n top elements
quant <- n / (dim(x)[1]*dim(x)[2])
#select the cutpoint to get the quantile above quant
lvl <- quantile(x, probs=1.0-quant)
#select the elements above the cutpoint
res <- x[x>lvl[[1]]]
}
#create a 12k x 12k matrix (1,1Gb!)
n <- 12000
x <- matrix( runif(n*n), ncol=n)
system.time( res <- find_n_top_elements( x, 10e3 ) )
产生
system.time( res <- find_n_top_elements( x, 10e3 ) )
user system elapsed
3.47 0.42 3.89
为了进行比较,只需在系统上对x进行排序
For comparison, just sorting x on my system takes
system.time(sort(x))
user system elapsed
30.69 0.21 31.33
这篇关于如何在R中的对称矩阵(12k X 12k)中找到前10,000个元素的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!