本文介绍了如何在R中的对称矩阵(12k X 12k)中找到前10,000个元素的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非零对称矩阵'matr',它是12000X12000.我需要在R的'matr'中找到前10000个元素的索引.我编写的代码花费了很长时间-我想知道是否有任何指针可以使其更快.

I have a nonzero symmetric matrix 'matr' that is 12000X12000. I need to find the indices of the top 10000 elements in 'matr' in R. The code I have written takes a long time - I was wondering if there was any pointers to make it faster.

listk <- numeric(0)
for( i in 1:10000) {
    idx <- which(matr == max(matr), arr.ind=T)
    if( length(idx) != 0) {
        listk <- rbind( listk, idx[1,])
        matr[idx[1,1], idx[1,2]] <- 0
        matr[idx[2,1], idx[2,2]] <- 0
    }
}

推荐答案

参加聚会有点晚了,但是我想出了这个,避免了排序.

A bit late into the party, but I came up with this, which avoids the sort.

假设您要从12k x 12k矩阵中选出前10k个元素.想法是将数据裁剪"到与该大小的分位数相对应的元素.

Say you want the top 10k elements from you 12k x 12k matrix. The idea is to "clip" the data to the elements corresponding to a quantile of that size.

find_n_top_elements <- function( x, n ){

  #set the quantile to correspond to n top elements
  quant <- n / (dim(x)[1]*dim(x)[2])

  #select the cutpoint to get the quantile above quant
  lvl <- quantile(x, probs=1.0-quant)

  #select the elements above the cutpoint
  res <- x[x>lvl[[1]]]
}

#create a 12k x 12k matrix (1,1Gb!)
n <- 12000
x <- matrix( runif(n*n), ncol=n)

system.time( res <- find_n_top_elements( x, 10e3 ) )

产生

system.time( res <- find_n_top_elements( x, 10e3 ) )
 user  system elapsed
 3.47    0.42    3.89

为了进行比较,只需在系统上对x进行排序

For comparison, just sorting x on my system takes

system.time(sort(x))
   user  system elapsed
  30.69    0.21   31.33

这篇关于如何在R中的对称矩阵(12k X 12k)中找到前10,000个元素的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-15 10:48