问题描述
Several posts 约高效计算成对距离在MATLAB。这些职位往往关注快速计算大量的点之间的欧氏距离。
Several posts exist about efficiently calculating pairwise distances in MATLAB. These posts tend to concern quickly calculating euclidean distance between large numbers of points.
我需要创建并迅速计算出点数量较少(一般少于1000对)之间的两两不同的功能。在节目我写的宏大计划,该功能将被执行数千次,所以即使在效率小的收益是很重要的。该功能需要灵活有两种方式:
I need to create a function which quickly calculates the pairwise differences between smaller numbers of points (typically less than 1000 pairs). Within the grander scheme of the program i am writing, this function will be executed many thousands of times, so even small gains in efficiency are important. The function needs to be flexible in two ways:
- 在任何给定的调用,距离度量可以欧几里德或城市街区。
- 的数据的尺寸被加权。
据我所知,没有办法解决这个特殊问题已经公布。该statstics工具箱提供并的,该接受多种不同距离的功能,但不加权。我已经看到了这些功能,允许进行加权扩展,但这些扩展不允许用户选择不同的距离函数。
As far as i can tell, no solution to this particular problem has been posted. The statstics toolbox offers pdist and pdist2, which accept many different distance functions, but not weighting. I have seen extensions of these functions that allow for weighting, but these extensions do not allow users to select different distance functions.
在理想情况下,我想避免使用从统计工具箱功能(我不能肯定该功能的用户将有机会获得这些工具箱)。
Ideally, i would like to avoid using functions from the statistics toolbox (i am not certain the user of the function will have access to those toolboxes).
我已经写了两个函数来完成此任务。第一种使用棘手调用repmat和置换,而第二只需使用for循环
I have written two functions to accomplish this task. The first uses tricky calls to repmat and permute, and the second simply uses for-loops.
function [D] = pairdist1(A, B, wts, distancemetric)
% get some information about the data
numA = size(A,1);
numB = size(B,1);
if strcmp(distancemetric,'cityblock')
r=1;
elseif strcmp(distancemetric,'euclidean')
r=2;
else error('Function only accepts "cityblock" and "euclidean" distance')
end
% format weights for multiplication
wts = repmat(wts,[numA,1,numB]);
% get featural differences between A and B pairs
A = repmat(A,[1 1 numB]);
B = repmat(permute(B,[3,2,1]),[numA,1,1]);
differences = abs(A-B).^r;
% weigh difference values before combining them
differences = differences.*wts;
differences = differences.^(1/r);
% combine features to get distance
D = permute(sum(differences,2),[1,3,2]);
end
和
function [D] = pairdist2(A, B, wts, distancemetric)
% get some information about the data
numA = size(A,1);
numB = size(B,1);
if strcmp(distancemetric,'cityblock')
r=1;
elseif strcmp(distancemetric,'euclidean')
r=2;
else error('Function only accepts "cityblock" and "euclidean" distance')
end
% use for-loops to generate differences
D = zeros(numA,numB);
for i=1:numA
for j=1:numB
differences = abs(A(i,:) - B(j,:)).^(1/r);
differences = differences.*wts;
differences = differences.^(1/r);
D(i,j) = sum(differences,2);
end
end
end
下面是性能测试:
A = rand(10,3);
B = rand(80,3);
wts = [0.1 0.5 0.4];
distancemetric = 'cityblock';
tic
D1 = pairdist1(A,B,wts,distancemetric);
toc
tic
D2 = pairdist2(A,B,wts,distancemetric);
toc
Elapsed time is 0.000238 seconds.
Elapsed time is 0.005350 seconds.
及其清楚,repmat和 - 置换版本更快地工作比双for循环版本,至少对于小数据集。但我也知道,调用repmat往往慢下来,但是。所以我想知道是否有人在SO社区有什么建议提供给任何改善的功能效率!
Its clear that the repmat-and-permute version works much more quickly than the double-for-loop version, at least for smaller datasets. But i also know that calls to repmat often slow things down, however. So I am wondering if anyone in the SO community has any advice to offer to improve the efficiency of either function!
@Luis Mendo使用所提供的repmat-和置换功能的一个很好的清理。
@Luis Mendo offered a nice cleanup of the repmat-and-permute function using bsxfun. I compared his function with my original on datasets of varying size:
随着数据变大,bsxfun版本将成为明显的赢家!
As the data become larger, the bsxfun version becomes the clear winner!
我已经写完的功能,它可以在github [。我最终找到一个pretty好矢量方法计算欧几里得距离[的],所以我使用该方法在欧几里得的情况下,我就拿@ Divakar的。这样做可以避免重复明确,因此它更内存效率,并可能更快:
You can replace repmat
by bsxfun
. Doing so avoids explicit repetition, therefore it's more memory-efficient, and probably faster:
function D = pairdist1(A, B, wts, distancemetric)
if strcmp(distancemetric,'cityblock')
r=1;
elseif strcmp(distancemetric,'euclidean')
r=2;
else
error('Function only accepts "cityblock" and "euclidean" distance')
end
differences = abs(bsxfun(@minus, A, permute(B, [3 2 1]))).^r;
differences = bsxfun(@times, differences, wts).^(1/r);
D = permute(sum(differences,2),[1,3,2]);
end
这篇关于有效地计算MATLAB加权距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!