对元胞数组的每个元素执行算术运算的最快方法是什么

对元胞数组的每个元素执行算术运算的最快方法是什么

本文介绍了对元胞数组的每个元素执行算术运算的最快方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我想将元胞数组 A 的每个元素与一个系数 k 相乘.我可以这样做:

A = cellfun(@(x) k*x, A, 'UniformOutput', false)

但这非常慢.有没有更快更好的方法?元胞数组元素是可变长度向量,因此 cell2num 不适用.

编辑:基于 fpe 对 for 循环的推荐,这里是一个示例基准.从这个数据开始

A = arrayfun(@(n) rand(n,1), randi(5,1000,1000), 'UniformOutput',false);

上面的 cellfun 调用需要 9.45 秒,而 for 循环:

A2 = cell(size(A));对于 i = 1:size(A,1), 对于 j = 1:size(A,2), A2{i,j} = A{i,j}*k;结尾;结尾A = A2;

需要 1.67 秒,这是一个显着的改进.我仍然更喜欢速度快几个数量级的东西.(我也不明白为什么 Matlab 解释器无法像 for 循环一样快速调用 cellfun.它们在语义上是相同的.)

编辑 2: Amro 的建议是让单个 for 循环明显更快:

for i = 1:numel(A), A{i} = A{i}*k;结尾

需要 1.11 秒,如果我在它之前运行 pack 来对齐内存只需 0.88 秒.

实现一个MEX函数来做到这一点其实也好不到哪里去:0.73 seconds, (0.53 seconds after pack),这表明在 Matlab 中分配许多小矩阵很慢.

#include "mex.h"void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {如果(nrhs != 2)mexErrMsgTxt("需要 2 个参数 (Cell, Coefficient)");mwSize const* size = mxGetDimensions(prhs[0]);int N = mxGetNumberOfDimensions(prhs[0]);如果(mxGetNumberOfElements(prhs[1])!= 1)mexErrMsgTxt("multicell 的第二个参数必须是一个标量");双系数 = *mxGetPr(prhs[1]);plhs[0] = mxCreateCellArray(N, size);int M = mxGetNumberOfElements(prhs[0]);for (int i = 0; i < M; i++) {mxArray *r = mxGetCell(prhs[0], i);mxArray *l = mxCreateNumericArray(mxGetNumberOfDimensions(r),mxGetDimensions(r),mxDOUBLE_CLASS,mxREAL);double *rp = mxGetPr(r);double *lp = mxGetPr(l);int num_elements = mxGetNumberOfElements(r);for (int i = 0; i 

然而,有点作弊,并实现实际编辑内存的 MEX 函数就地似乎是从操作中获得合理性能的唯一方法:0.030 秒.这使用了 Amro 建议的未记录的 mxUnshareArray.

#include "mex.h"extern "C" bool mxUnshareArray(mxArray *array_ptr, bool noDeepCopy);void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {如果(nrhs != 2)mexErrMsgTxt("需要 2 个参数 (Cell, Coefficient)");mwSize const* size = mxGetDimensions(prhs[0]);int N = mxGetNumberOfDimensions(prhs[0]);如果(mxGetNumberOfElements(prhs[1])!= 1)mexErrMsgTxt("multicell 的第二个参数必须是一个标量");双系数 = *mxGetPr(prhs[1]);mxUnshareArray(const_cast(prhs[0]), false);plhs[0] = const_cast(prhs[0]);int M = mxGetNumberOfElements(prhs[0]);for (int i = 0; i < M; i++) {mxArray *r = mxGetCell(prhs[0], i);double *rp = mxGetPr(r);int num_elements = mxGetNumberOfElements(r);for (int i = 0; i 
解决方案

不完全是答案,但这里有一种方法可以查看 JIT 编译器和加速器在两种方法(cellfun 与 for-loop)中的影响:

>

feature('jit', 'off');功能('加速','关闭');tic, A = cellfun(@(x) k*x, A, 'UniformOutput', false);目录tic, for i=1:numel(A), A{i} = A{i}*k;结束,目录特征('jit', 'on');功能('加速','开启');tic, A = cellfun(@(x) k*x, A, 'UniformOutput', false);目录tic, for i=1:numel(A), A{i} = A{i}*k;结束,目录

我得到以下内容

经过的时间是 25.913995 秒.经过的时间是 13.050288 秒.

对比

经过的时间是 10.053347 秒.经过的时间是 1.978974 秒.

第二次开启优化.

顺便说一下,并行 parfor 的表现要差得多(至少在我的本地测试机器上,池大小为 2 个进程).

看到你发布的结果,MEX-function 是要走的路:)

Say I want to multiply each element of a cell array A with a coefficent k. I can do that by:

A = cellfun(@(x) k*x, A, 'UniformOutput', false)

But this is extremely slow. Is there a faster and better way? The cell array elements are variable length vectors, so cell2num doesn't apply.

Edit: Based on fpe's recommendation of a for loop, here is an example benchmark. Starting with this data

A = arrayfun(@(n) rand(n,1), randi(5,1000,1000), 'UniformOutput',false);

The cellfuncall above takes 9.45 seconds, while a for loop:

A2 = cell(size(A));
for i = 1:size(A,1), for j = 1:size(A,2), A2{i,j} = A{i,j}*k; end; end
A = A2;

takes 1.67 seconds, which is a significant improvement. I'd still prefer something a few orders of magnitude faster. (I also don't understand why the Matlab interpreter is unable to make the cellfun call as fast as the for loop. They are semantically identical.)

Edit 2: Amro's suggestion to make one single for loop is significantly faster:

for i = 1:numel(A), A{i} = A{i}*k; end

takes 1.11 seconds, and if I run pack prior to it to align the memory just 0.88 seconds.

Implementing a MEX function to do this is actually not much better: 0.73 seconds, (0.53 seconds after pack), which indicates that allocating many small matrices is slow in Matlab.

#include "mex.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
    if (nrhs != 2)
        mexErrMsgTxt("need 2 arguments (Cell, Coefficient)");

    mwSize const* size = mxGetDimensions(prhs[0]);
    int N = mxGetNumberOfDimensions(prhs[0]);

    if (mxGetNumberOfElements(prhs[1]) != 1)
        mexErrMsgTxt("second argument to multcell must be a scalar");

    double coefficient = *mxGetPr(prhs[1]);

    plhs[0] = mxCreateCellArray(N, size);

    int M = mxGetNumberOfElements(prhs[0]);

    for (int i = 0; i < M; i++) {
        mxArray *r = mxGetCell(prhs[0], i);
        mxArray *l = mxCreateNumericArray(mxGetNumberOfDimensions(r),
                                          mxGetDimensions(r),
                                          mxDOUBLE_CLASS,
                                          mxREAL);
        double *rp = mxGetPr(r);
        double *lp = mxGetPr(l);
        int num_elements = mxGetNumberOfElements(r);
        for (int i = 0; i < num_elements; i++)
            lp[i] = rp[i] * coefficient;
        mxSetCell(plhs[0], i, l);
    }
}

Cheating a bit, however, and implementing a MEX function that actually edits the memory in place seems to be the only way to get reasonable performance out the operation: 0.030 seconds. This uses the undocumented mxUnshareArray as suggested by Amro.

#include "mex.h"

extern "C" bool mxUnshareArray(mxArray *array_ptr, bool noDeepCopy);

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
    if (nrhs != 2)
        mexErrMsgTxt("need 2 arguments (Cell, Coefficient)");

    mwSize const* size = mxGetDimensions(prhs[0]);
    int N = mxGetNumberOfDimensions(prhs[0]);

    if (mxGetNumberOfElements(prhs[1]) != 1)
        mexErrMsgTxt("second argument to multcell must be a scalar");

    double coefficient = *mxGetPr(prhs[1]);

    mxUnshareArray(const_cast<mxArray *>(prhs[0]), false);
    plhs[0] = const_cast<mxArray *>(prhs[0]);

    int M = mxGetNumberOfElements(prhs[0]);

    for (int i = 0; i < M; i++) {
        mxArray *r = mxGetCell(prhs[0], i);
        double *rp = mxGetPr(r);
        int num_elements = mxGetNumberOfElements(r);
        for (int i = 0; i < num_elements; i++)
            rp[i] = rp[i] * coefficient;
    }
}
解决方案

Not exactly an answer, but here is a way to see the affect of JIT compiler and accelerator in both approaches (cellfun vs. for-loop):

feature('jit', 'off'); feature('accel', 'off');
tic, A = cellfun(@(x) k*x, A, 'UniformOutput', false); toc
tic, for i=1:numel(A), A{i} = A{i}*k; end, toc

feature('jit', 'on'); feature('accel', 'on');
tic, A = cellfun(@(x) k*x, A, 'UniformOutput', false); toc
tic, for i=1:numel(A), A{i} = A{i}*k; end, toc

I get the following

Elapsed time is 25.913995 seconds.
Elapsed time is 13.050288 seconds.

vs.

Elapsed time is 10.053347 seconds.
Elapsed time is 1.978974 seconds.

with optimization turned on in the second.

By the way, parallel parfor performed much worse (at least on my local test machine with a pool size of 2 processes).

Seeing the results you posted, MEX-function is the way to go :)

这篇关于对元胞数组的每个元素执行算术运算的最快方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-29 05:20