为什么GCC不会自动矢量化这个循环

为什么GCC不会自动矢量化这个循环

本文介绍了为什么GCC不会自动矢量化这个循环?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



但是当我打开使用-O3的自动向量化-ffast - 时,我会尝试优化一个循环,



我使用的是GCC 4.4.5



代码:

  ///找到具有最大v参数的路径中的点
void prediction :: find_knife_edge(
const float * __restrict__ const elevation_path,
float * __restrict__ const diff_path,
const float path_res,
const unsigned a,
const unsigned b,
const float h_a,
const float h_b,
const float f,
const float r_e,
)const
{
float波长=(speed_of_light * 1e-6f)/ f;

float d_ab = path_res * static_cast< float>(b - a);

for(unsigned n = a + 1; n {
float d_an = path_res * static_cast< float>(n - ;
float d_nb = path_res * static_cast< float>(b-n);

float h = elevation_path [n] +(d_an * d_nb)/(2.0f * r_e) - (h_a * d_nb + h_b * d_an)/ d_ab;
float v = h * std :: sqrt((2.0f * d_ab)/(wavelength * d_an * d_nb));

diff_path [n] = v;
}
}

来自GCC的邮件:

 注意:not vectorized:无法计算迭代次数。 
note:not vectorized:unhandled data-ref

在自动矢量化)




$ b

如果我用

替换

 <$> c $ c> for(unsigned n = 0; n   





缺少关于这些消息是什么意思的详细文档




感谢David我将循环改为:

  for(unsigned n = a + 1; n   



现在GCC尝试将循环向量化,但抛出此错误:

 注意:not vectorized:unhandled data-ref 
注意:使用剥皮强制使用访问。
注意:矢量化一个未对齐的访问。
note:vect_model_induction_cost:inside_cost = 1,outside_cost = 2。
注意:不向量化:相关stmt不支持:D.76777_65 =(float)n_34;

D.76777_65 =(float)n_34;是什么意思?

解决方案

我可能稍微琢磨细节,但这是你需要重组你的循环,矢量化。诀窍是预先计算迭代次数,并从0到该数字的一个短。请勿更改 for 语句。您可能需要修复它之前的两行和循环顶部的两行。他们是大约对。 ;)

  const unsigned it =(b-a)-1; 
const unsigned diff = b-a;
for(unsigned n = 0; n {
float d_an = path_res * static_cast< float>(n)
float d_nb = path_res * static_cast< float>(diff - n);

float h = elevation_path [n] +(d_an * d_nb)/(2.0f * r_e) - (h_a * d_nb + h_b * d_an)/ d_ab;
float v = h * sqrt((2.0f * d_ab)/(wavelength * d_an * d_nb));

diff_path [n] = v;
}


I am attempting to optimize a loop that accounts for a lot of my program's computation time.

But when I turn on auto-vectorization with -O3 -ffast-math -ftree-vectorizer-verbose=6 GCC outputs that it can not vectorize the loop.

I am using GCC 4.4.5

The code:

/// Find the point in the path with the largest v parameter
void prediction::find_knife_edge(
    const float * __restrict__ const elevation_path,
    float * __restrict__ const diff_path,
    const float path_res,
    const unsigned a,
    const unsigned b,
    const float h_a,
    const float h_b,
    const float f,
    const float r_e,
) const
{
    float wavelength = (speed_of_light * 1e-6f) / f;

    float d_ab = path_res * static_cast<float>(b - a);

    for (unsigned n = a + 1; n <= b - 1; n++)
    {
        float d_an = path_res * static_cast<float>(n - a);
        float d_nb = path_res * static_cast<float>(b - n);

        float h = elevation_path[n] + (d_an * d_nb) / (2.0f * r_e) - (h_a * d_nb + h_b * d_an) / d_ab;
        float v = h * std::sqrt((2.0f * d_ab) / (wavelength * d_an * d_nb));

        diff_path[n] = v;
    }
}

The messages from GCC:

note: not vectorized: number of iterations cannot be computed.
note: not vectorized: unhandled data-ref

On the page about auto-vectorization ( http://gcc.gnu.org/projects/tree-ssa/vectorization.html ) it states that it supports unknown loop bounds.

If I replace the for with

for (unsigned n = 0; n <= 100; n++)

then it vectorizes it.

What am I doing wrong?

The lack of detailed documentation on exactly what these messages mean and the ins/outs of GCC auto-vectorization is rather annoying.

EDIT:

Thanks to David I changed the loop to this:

 for (unsigned n = a + 1; n < b; n++)

Now GCC attempts to vectorize the loop but throws out this error:

 note: not vectorized: unhandled data-ref
 note: Alignment of access forced using peeling.
 note: Vectorizing an unaligned access.
 note: vect_model_induction_cost: inside_cost = 1, outside_cost = 2 .
 note: not vectorized: relevant stmt not supported: D.76777_65 = (float) n_34;

What does "D.76777_65 = (float) n_34;" mean?

解决方案

I may have slightly botched the details, but this is the way you need to restructure your loop to get it to vectorize. The trick is to precompute the number of iterations and iterate from 0 to one short of that number. Do not change the for statement. You may need to fix the two lines before it and the two lines at the top of the loop. They're approximately right. ;)

const unsigned it=(b-a)-1;
const unsigned diff=b-a;
for (unsigned n = 0; n < it; n++)
{
    float d_an = path_res * static_cast<float>(n);
    float d_nb = path_res * static_cast<float>(diff - n);

    float h = elevation_path[n] + (d_an * d_nb) / (2.0f * r_e) - (h_a * d_nb + h_b * d_an) / d_ab;
    float v = h * sqrt((2.0f * d_ab) / (wavelength * d_an * d_nb));

    diff_path[n] = v;
}

这篇关于为什么GCC不会自动矢量化这个循环?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 16:02