问题描述
我正在寻找一种改善以下算法性能的方法.给定两个数组X和Y.
I'm looking for a way to improve the performance of the following algorithm. Given two arrays X and Y.
对于X的每个元素,找到Y中不超过X中元素值的最大值的索引.可以安全地假设X和Y单调递增(排序)并且Y(1)为小于X中的每个值.而且X通常比Y大得多.
For each element of X find the index of the largest value in Y that does not exceed the value of the element in X. It is safe to assume X and Y are monotonically increasing (sorted) and that Y(1) is less than every value in X.Also X is generally much larger than Y.
作为示例,给出以下内容.
As an example given the following.
X = [0.2, 1.5, 2.2, 2.5, 3.5, 4.5, 5.5, 5.8, 6.5];
Y = [0.0, 1.0, 3.0, 4.0, 6.0];
我希望输出是
idx = [1, 2, 2, 2, 3, 4, 4, 4, 5]
我想出的最快方法是下面的函数,该函数无法利用列表已排序并使用for循环遍历数组之一这一事实.这提供了一个有效的解决方案,但在我使用此功能进行的实验中,分析运行总共需要30分钟,而在这里花费了将近27分钟.
The fastest way I've come up with is the function below which fails to take advantage of the fact that the lists are sorted and uses a for loop to step through one of the arrays. This gives a valid solution but on the experiments I'm using this function for, nearly 27 minutes are spent here out of a total 30 minutes the analysis takes to run.
function idx = matchintervals(X,Y)
idx = zeros(size(X));
for i = 1:length(Y)-1
idx(X >= Y(i) & X < Y(i+1)) = i;
end
idx(X >= Y(end)) = length(Y);
end
非常感谢您的帮助.
推荐答案
如果您正在寻找最快的解决方案,那么它最终可能会成为一个简单的while循环,就像这样(它利用了对数组进行排序的事实) ):
If you're looking for the fastest solution, it might end up being a simple while loop like so (which takes advantage of the fact that the arrays are sorted):
X = [0.2, 1.5, 2.2, 2.5, 3.5, 4.5, 5.5, 5.8, 6.5];
Y = [0.0, 1.0, 3.0, 4.0, 6.0];
xIndex = 1;
nX = numel(X);
yIndex = 1;
nY = numel(Y);
index = zeros(size(X))+nY; % Prefill index with the largest index in Y
while (yIndex < nY) && (xIndex <= nX)
if X(xIndex) < Y(yIndex+1)
index(xIndex) = yIndex;
xIndex = xIndex+1;
else
yIndex = yIndex+1;
end
end
>> index
index =
1 2 2 2 3 4 4 4 5
此循环最多可重复numel(X)+numel(Y)-1
次,如果X
中的许多值大于Y
中的最大值,则循环次数可能会减少.
This loop will iterate a maximum of numel(X)+numel(Y)-1
times, potentially fewer if there are many values in X
that are greater than the largest value in Y
.
时间:我用注释中的示例数据运行了一些时间.以下是从最快到最慢排序的结果:
TIMINGS: I ran some timings with the sample data from a comment. Here are the results sorted from fastest to slowest:
X = 1:3:(4e5);
Y = 0:20:(4e5-1);
% My solution from above:
tElapsed =
0.003005977477718 seconds
% knedlsepp's solution:
tElapsed =
0.006939387719075 seconds
% Divakar's solution:
tElapsed =
0.011801273498343 seconds
% H.Muster's solution:
tElapsed =
4.081793325423575 seconds
这篇关于对于X中的每个元素,找到最大索引而不在Y中进行遍历的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!