数组元素的重复副本：在MATLAB运行长度译码

本文介绍了数组元素的重复副本：在MATLAB运行长度译码的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想多个值插入到使用价值阵列和反阵列中的。例如，如果

I'm trying to insert multiple values into an array using a 'values' array and a 'counter' array. For example, if:

a=[1,3,2,5]
b=[2,2,1,3]

我想要一些函数的输出

I want the output of some function

c=somefunction(a,b)

是

c=[1,1,3,3,2,5,5,5]

凡（1）再次出现B（1）的次数，一（2）再次出现B（2）倍，等等...

Where a(1) recurs b(1) number of times, a(2) recurs b(2) times, etc...

是否有一个MATLAB内置函数，这是否？我想避免使用for循环如果可能的话。我试过的repmat（）'和'KRON（）'的变化无济于事。

Is there a built-in function in MATLAB that does this? I'd like to avoid using a for loop if possible. I've tried variations of 'repmat()' and 'kron()' to no avail.

这基本上是。

This is basically Run-length encoding.

Problem Statement

We have an array of values, vals and runlengths, runlens:

vals     = [1,3,2,5]
runlens  = [2,2,1,3]

我们需要重复每个元素丘壑次 runlens 每个对应的元素。因此，最终输出将是：

We are needed to repeat each element in vals times each corresponding element in runlens. Thus, the final output would be:

output = [1,1,3,3,2,5,5,5]

前瞻性方法

一个最快的工具，MATLAB是，是非常有用的。在规定的问题，不规则附带了 runlens 的不同元素。

Prospective Approach

One of the fastest tools with MATLAB is cumsum and is very useful when dealing with vectorizing problems that work on irregular patterns. In the stated problem, the irregularity comes with the different elements in runlens.

现在，利用 cumsum ，我们需要做两件事情：初始化为零的数组和地点超过零点阵列钥匙位置，这样，经过 cumsum 应用，我们最终会得到重复<$ C $的最后一个数组适当的价值观的 runlens C时代>丘壑。

Now, to exploit cumsum, we need to do two things here: Initialize an array of zeros and place "appropriate" values at "key" positions over the zeros array, such that after "cumsum" is applied, we would end up with a final array of repeated vals of runlens times.

步骤：让我们数了上述步骤，给未来的做法更简单的角度来看：

Steps: Let's number the above mentioned steps to give the prospective approach an easier perspective:

1）初始化数组零：什么必须的长度是多少？既然我们都在重复 runlens 次，零数组的长度必须是所有的总和 runlens 。

1) Initialize zeros array: What must be the length? Since we are repeating runlens times, the length of the zeros array must be the summation of all runlens.

2）找到关键位置/指数：现在，这些关键位置沿零数组，其中从丘壑每个元素开始重复的地方。
因此，对于 runlens = [2,2,1,3] ，映射到零阵的重点部位是：

2) Find key positions/indices: Now these key positions are places along the zeros array where each element from vals start to repeat.Thus, for runlens = [2,2,1,3], the key positions mapped onto the zeros array would be:

[X 0 X 0 X X 0 0], where X's are those key positions.

3）找到合适的值：最后一个钉子，使用前敲定 cumsum 将是把适当的价值观为这些重要职务。现在，因为我们会做 cumsum 不久后，如果你认真地思考，你需要的区别版本值使用，使 cumsum 那些会带回我们的值 。由于这些差异值将在由 runlens 距离分离，地方放置一个零阵列上使用后 cumsum 我们会让每个丘壑元素重复 runlens 倍作为最终的输出。

3) Find appropriate values: The final nail to be hammered before using cumsum would be to put "appropriate" values into those key positions. Now, since we would be doing cumsum soon after, if you think closely, you would need a differentiated version of values with diff, so that cumsum on those would bring back our values. Since these differentiated values would be placed on a zeros array at places separated by the runlens distances, after using cumsum we would have each vals element repeated runlens times as the final output.

解决方案code

下面的实施缝合了所有上述步骤 -

Here's the implementation stitching up all the above mentioned steps -

%// Calculate cumsumed values of runLengths.
%// We would need this to initialize zeros array and find key positions later on.
clens = cumsum(runlens)

%// Initalize zeros array
array = zeros(1,(clens(end)))

%// Find key positions/indices
key_pos = [1 clens(1:end-1)+1]

%// Find appropriate values
app_vals = diff([0 vals])

%// Map app_values at key_pos on array
array(pos) = app_vals

%// cumsum array for final output
output = cumsum(array)

pre-分配哈克

如可以看出的是，以上列出的code使用$ P $对分配零。现在，根据更快的pre-分配这，可以实现更快的pre-分配与 -

As could be seen that the above listed code uses pre-allocation with zeros. Now, according to this UNDOCUMENTED MATLAB blog on faster pre-allocation, one can achieve much faster pre-allocation with -

`array(clens(end)) = 0` instead of `array = zeros(1,(clens(end)))`

结束语：功能code

要结束谈判，我们将有一个小型功能code实现，像这样这个运行长度解码 -

To wrap up everything, we would have a compact function code to achieve this run-length decoding like so -

function out = rle_cumsum_diff(vals,runlens)
clens = cumsum(runlens);
idx(clens(end))=0;
idx([1 clens(1:end-1)+1]) = diff([0 vals]);
out = cumsum(idx);
return;

标杆

标杆code

是基准code，比较了在这个岗位上的的 -

Listed next is the benchmarking code to compare runtimes and speedups for the stated cumsum+diff approach in this post over the other cumsum-only based approach on MATLAB 2014B-

datasizes = [reshape(linspace(10,70,4).'*10.^(0:4),1,[]) 10^6 2*10^6]; %//'
fcns = {'rld_cumsum','rld_cumsum_diff'}; %// approaches to be benchmarked

for k1 = 1:numel(datasizes)
    n = datasizes(k1); %// Create random inputs
    vals = randi(200,1,n);
    runs = [5000 randi(200,1,n-1)]; %// 5000 acts as an aberration
    for k2 = 1:numel(fcns) %// Time approaches
        tsec(k2,k1) = timeit(@() feval(fcns{k2}, vals,runs), 1);
    end
end

figure,      %// Plot runtimes
loglog(datasizes,tsec(1,:),'-bo'), hold on
loglog(datasizes,tsec(2,:),'-k+')
set(gca,'xgrid','on'),set(gca,'ygrid','on'),
xlabel('Datasize ->'), ylabel('Runtimes (s)')
legend(upper(strrep(fcns,'_',' '))),title('Runtime Plot')

figure,      %// Plot speedups
semilogx(datasizes,tsec(1,:)./tsec(2,:),'-rx')
set(gca,'ygrid','on'), xlabel('Datasize ->')
legend('Speedup(x) with cumsum+diff over cumsum-only'),title('Speedup Plot')

相关的功能code为 rld_cumsum.m ：

function out = rld_cumsum(vals,runlens)
index = zeros(1,sum(runlens));
index([1 cumsum(runlens(1:end-1))+1]) = 1;
out = vals(cumsum(index));
return;

运行和加速图解

所提出的方法似乎是给我们在 cumsum只的办法，这大约是一个明显的加速 3倍 ！

The proposed approach seems to be giving us a noticeable speedup over the cumsum-only approach, which is about 3x!

为什么这个新的 cumsum + DIFF 为基础的方法比previous更好 cumsum只办法？

Why is this new cumsum+diff based approach better than the previous cumsum-only approach?

好了，原因本质在于在需要映射cumsumed值到的 cumsum只办法的最后一步瓦尔斯。在新的 cumsum + DIFF 为基础的方法，我们正在做的差异（瓦尔斯）来代替它MATLAB仅处理 N 与此相比，总和（游程）元素数量的映射元素（其中n是游程的数量）在 cumsum只的办法，这个数字要超过很多次 N 并因此与这一新方法的明显加速！

Well, the essence of the reason lies at the final step of the cumsum-only approach that needs to map the "cumsumed" values into vals. In the new cumsum+diff based approach, we are doing diff(vals) instead for which MATLAB is processing only n elements (where n is the number of runLengths) as compared to the mapping of sum(runLengths) number of elements for the cumsum-only approach and this number must be many times more than n and therefore the noticeable speedup with this new approach!

这篇关于数组元素的重复副本：在MATLAB运行长度译码的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！