加速MATLAB代码以进行FDR估算

function pfdr = fdr_from_random_permutations(p, pr)%# ... skipping arguments checkspfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);任何想法如何使其更快?也欢迎您对此处的统计问题发表评论.测试数据可以生成为p = rand(N,1); pr = rand(N,M);.解决方案好吧，诀窍确实是对向量进行排序.我为此赞扬@EgonGeerardyn.另外，也无需使用mean.之后，您可以将所有内容除以M.对p进行排序时，查找小于当前x的值的数量只是一个运行索引. pr是一个更有趣的情况-我使用一个名为place的运行索引来发现有多少元素少于x. 编辑(2):这是我想出的最快的版本: function Speedup2() N = 10000/4 ; M = 100/4 ; p = rand(N,1); pr = rand(N,M); tic pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p); toc tic out = zeros(numel(p),1); [p,sortIndex] = sort(p); pr = sort(pr(:)); pr(end+1) = Inf; place = 1; N = numel(pr); for i=1:numel(p) x = p(i); while pr(place)<=x place = place+1; end exp1a = place-1; exp2 = i; out(i) = exp1a/exp2; end out(sortIndex) = out/ M; toc disp(max(abs(pfdr-out)));end以及N = 10000/4 ; M = 100/4的基准测试结果: 经过的时间为0.898689秒. 经过的时间为0.007697秒. 2.220446049250313e-016 和N = 10000 ; M = 100; 经过的时间为39.730695秒. 经过的时间是0.088870秒. 2.220446049250313e-016 I have 2 input variables: a vector of p-values (p) with N elements (unsorted)and N x M matrix with p-values obtained by random permutations (pr) with M iterations. N is quite large, 10K to 100K or more. M let's say 100.I'm estimating the False Discovery Rate (FDR) for each element of p representing how many p-values from random permutations will pass if the current p-value (from p) will be the threshold.I wrote the function with ARRAYFUN, but it takes lot of time for large N (2 min for N=20K), comparable to for-loop.function pfdr = fdr_from_random_permutations(p, pr)%# ... skipping arguments checkspfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p);Any ideas how to make it faster?Comments about statistical issues here are also welcome.The test data can be generated as p = rand(N,1); pr = rand(N,M);. 解决方案 Well, the trick was indeed sorting the vectors. I give credit to @EgonGeerardyn for that. Also, there is no need to use mean. You can just divide everything afterwards by M. When p is sorted, finding the amount of values that are less than current x, is just a running index. pr is a more interesting case - I used a running index called place to discover how many elements are less than x.Edit(2): Here is the fastest version I come up with: function Speedup2() N = 10000/4 ; M = 100/4 ; p = rand(N,1); pr = rand(N,M); tic pfdr = arrayfun( @(x) mean(sum(pr<=x))./sum(p<=x), p); toc tic out = zeros(numel(p),1); [p,sortIndex] = sort(p); pr = sort(pr(:)); pr(end+1) = Inf; place = 1; N = numel(pr); for i=1:numel(p) x = p(i); while pr(place)<=x place = place+1; end exp1a = place-1; exp2 = i; out(i) = exp1a/exp2; end out(sortIndex) = out/ M; toc disp(max(abs(pfdr-out)));endAnd the benchmark results for N = 10000/4 ; M = 100/4 : Elapsed time is 0.898689 seconds. Elapsed time is 0.007697 seconds. 2.220446049250313e-016 and for N = 10000 ; M = 100 ; Elapsed time is 39.730695 seconds. Elapsed time is 0.088870 seconds. 2.220446049250313e-016 这篇关于加速MATLAB代码以进行FDR估算的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！