在我正在处理的问题中有这样一部分代码,如下所示。定义部分只是为了向您展示数组的大小。下面我粘贴了矢量化版本 - 它慢了 2 倍以上。为什么会这样?我知道如果矢量化需要大的临时变量,我就会发生这种情况,但是(似乎)这里不​​是真的。

一般来说,我可以做些什么(除了 parfor,我已经使用过)来加速这段代码?

maxN = 100;
levels = maxN+1;
xElements = 101;
umn = complex(zeros(levels, levels));
umn2 = umn;
bessels = ones(xElements, xElements, levels);    % 1.09 GB
posMcontainer = ones(xElements, xElements, maxN);

tic
for j = 1 : xElements
    for i = 1 : xElements
        for n = 1 : 2 : maxN
            nn = n + 1;
            mm = 1;
            for m = 1 : 2 : n
                umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
                mm = mm + 1;
            end
        end
    end
end
toc % 0.520594 seconds


tic
for j = 1 : xElements
    for i = 1 : xElements
        for n = 1 : 2 : maxN
            nn = n + 1;
            m = 1:2:n;
            numOfEl = ceil(n/2);
            umn2(nn, 1:numOfEl) = bessels(i, j, nn) * posMcontainer(i, j, m);
        end
    end
end
toc % 1.275926 seconds

sum(sum(umn-umn2)) % veryfying, if all done right

此致,
亚历克斯

从分析器:

编辑:

回复 @Jason answer ,这个替代方案需要相同的时间:
for n = 1:2:maxN
    nn(n) = n + 1;
    numOfEl(n) = ceil(n/2);
end

for j = 1 : xElements
    for i = 1 : xElements
        for n = 1 : 2 : maxN
            umn2(nn(n), 1:numOfEl(n)) = bessels(i, j, nn(n)) * posMcontainer(i, j, 1:2:n);
        end
    end
end

Edit2 :
回复@EBH:
关键是要做到以下几点:
parfor i = 1 : xElements
    for j = 1 : xElements
    umn = complex(zeros(levels, levels)); % cleaning
    for n = 0:maxN
        mm = 1;
        for m = -n:2:n
            nn = n + 1; % for indexing

            if m < 0
                umn(nn, mm) = bessels(i, j, nn) * negMcontainer(i, j, abs(m));
            end

            if m > 0
                umn(nn, mm) = bessels(i, j, nn) * posMcontainer(i, j, m);
            end

            if m == 0
                umn(nn, mm) = bessels(i, j, nn);
            end

            mm = mm + 1; % for indexing
        end % m
    end % n
    beta1 = sum(sum(Aj1.*umn));
    betaSumSq1(i, j) = abs(beta1).^2;

    beta2 = sum(sum(Aj2.*umn));
    betaSumSq2(i, j) = abs(beta2).^2;
    end % j
end % i

我尽可能地加快了速度。您所写的是仅采用最后的 besselsposMcontainer 值,因此它不会产生相同的结果。在实际代码中,这两个容器不是用 1 填充的,而是用一些预先计算的值填充的。

最佳答案

编辑后,我可以看到 umn 只是另一个计算的临时变量。它仍然可以主要是可矢量化的:

betaSumSq1 = zeros(xElements); % preallocating
betaSumSq2 = zeros(xElements); % preallocating
% an index matrix to fetch the right values from negMcontainer and
% posMcontainer:
indmat = tril(repmat([0 1;1 0],ceil((maxN+1)/2),floor(levels/2)));
indmat(end,:) = [];
% an index matrix to fetch the values in correct order for umn:
b_ind = repmat([1;0],ceil((maxN+1)/2),1);
b_ind(end) = [];
tempind = logical([fliplr(indmat) b_ind indmat+triu(ones(size(indmat)))]);

% permute the arrays to prevent squeeze:
PM = permute(posMcontainer,[3 1 2]);
NM = permute(negMcontainer,[3 1 2]);
B = permute(bessels,[3 1 2]);

for k = 1 : maxN+1 % third dim
    for jj = 1 : xElements % columns
        b = B(:,jj,k); % get one vector of B

        % perform b*NM for every row of NM*indmat, than flip the result:
        neg = fliplr(bsxfun(@times,bsxfun(@times,indmat,NM(:,jj,k).'),b));

        % perform b*PM for every row of PM*indmat:
        pos = bsxfun(@times,bsxfun(@times,indmat,PM(:,jj,k).'),b);

        temp = [neg mod(1:levels,2).'.*b pos].'; % concat neg and pos
        % assign them to the right place in umn:
        umn = reshape(temp(tempind.'),[levels levels]).';

        beta1 = Aj1.*umn;
        betaSumSq1(jj,k) = abs(sum(beta1(:))).^2;
        beta2 = Aj2.*umn;
        betaSumSq2(jj,k) = abs(sum(beta2(:))).^2;
    end
end

这将运行时间从 ~95 秒减少到 3 秒(两者都没有 parfor ),因此它几乎提高了 97%

关于performance - 矢量化代码比循环慢? MATLAB,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/39183125/

10-12 04:42