该操作是将数组的每个第i个元素(称为A)与相同大小(B)的矩阵的第i个元素相乘,并用获得的值更新A的第i个元素。

在算术公式中
A'[i] = A [i] * B [i](0
在多核环境中优化此操作的最佳方法是什么?

这是我当前的代码;

var learningRate = 0.001f;
var m = 20000;
var n = 40000;
var W = float[m*n];
var C = float[m*n];

//my current code ...[1]
Parallel.ForEach(Enumerable.Range(0, m), i =>
{
    for (int j = 0; j <= n - 1; j++)
    {
         W[i*n+j] *= C[i*n+j];
    }
});

//This is somehow far slower than [1], but I don't know why ... [2]
Parallel.ForEach(Enumerable.Range(0, n*m), i =>
{
    w[i] *= C[i]
});


//This is faster than [2], but not as fast as [1] ... [3]
for(int i = 0; i < m*n; i++)
{
    w[i] *= C[i]
}

测试了以下方法。但是性能根本没有改善。
http://msdn.microsoft.com/en-us/library/dd560853.aspx

   public static void Test1()
    {
        Random rnd = new Random(1);

        var sw1 = new Stopwatch();
        var sw2 = new Stopwatch();
        sw1.Reset();
        sw2.Reset();

        int m = 10000;
        int n = 20000;
        int loops = 20;

        var W = DummyDataUtils.CreateRandomMat1D(m, n);
        var C = DummyDataUtils.CreateRandomMat1D(m, n);

        for (int l = 0; l < loops; l++)
        {
            var v = DummyDataUtils.CreateRandomVector(n);
            var b = DummyDataUtils.CreateRandomVector(m);

            sw1.Start();

            Parallel.ForEach(Enumerable.Range(0, m), i =>
            {
                for (int j = 0; j <= n - 1; j++)
                {
                    W[i*n+j] *= C[i*n+j];
                }
            });
            sw1.Stop();

            sw2.Start();
            // Partition the entire source array.
            var rangePartitioner = Partitioner.Create(0, n*m);

            // Loop over the partitions in parallel.
            Parallel.ForEach(rangePartitioner, (range, loopState) =>
            {
                // Loop over each range element without a delegate invocation.
                for (int i = range.Item1; i < range.Item2; i++)
                {
                    W[i] *= C[i];
                }
            });

            sw2.Stop();

            Console.Write("o");
        }

        var t1 = (double)sw1.ElapsedMilliseconds / loops;
        var t2 = (double)sw2.ElapsedMilliseconds / loops;

        Console.WriteLine("t1: " + t1);
        Console.WriteLine("t2: " + t2);
    }

结果:

t1:119

t2:120.4

最佳答案

问题在于,虽然调用委托(delegate)的速度相对较快,但在多次调用它时就会加起来,并且委托(delegate)中的代码非常简单。

您可以尝试使用Partitioner来指定要迭代的范围,这使您可以为每个委托(delegate)调用迭代多个项目(类似于[1]中的操作):

Parallel.ForEach(Partitioner.Create(0, n * m), partition =>
    {
        for (int i = partition.Item1; i < partition.Item2; i++)
        {
            W[i] *= C[i];
        }
    });

关于c# - 我如何才能在C#中最大化大数组上按元素操作的性能,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/24695231/

10-10 00:45
查看更多