问题描述
我正在研究C99中简单循环的不同实现如何基于函数签名自动矢量化.
I am exploring how different implementations of simple loops in C99 auto-vectorize based upon the function signature.
这是我的代码:
/* #define PRAGMA_SIMD _Pragma("simd") */
#define PRAGMA_SIMD
#ifdef __INTEL_COMPILER
#define ASSUME_ALIGNED(a) __assume_aligned(a,64)
#else
#define ASSUME_ALIGNED(a)
#endif
#ifndef ARRAY_RESTRICT
#define ARRAY_RESTRICT
#endif
void foo1(double * restrict a, const double * restrict b, const double * restrict c)
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < 2048; ++i) {
if (c[i] > 0) {
a[i] = b[i];
} else {
a[i] = 0.0;
}
}
}
void foo2(double * restrict a, const double * restrict b, const double * restrict c)
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < 2048; ++i) {
a[i] = ((c[i] > 0) ? b[i] : 0.0);
}
}
/* Undetermined size version */
void foo3(int n, double * restrict a, const double * restrict b, const double * restrict c)
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < n; ++i) {
if (c[i] > 0) {
a[i] = b[i];
} else {
a[i] = 0.0;
}
}
}
void foo4(int n, double * restrict a, const double * restrict b, const double * restrict c)
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < n; ++i) {
a[i] = ((c[i] > 0) ? b[i] : 0.0);
}
}
/* Static array versions */
void foo5(double ARRAY_RESTRICT a[2048], const double ARRAY_RESTRICT b[2048], const double ARRAY_RESTRICT c[2048])
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < 2048; ++i) {
if (c[i] > 0) {
a[i] = b[i];
} else {
a[i] = 0.0;
}
}
}
void foo6(double ARRAY_RESTRICT a[2048], const double ARRAY_RESTRICT b[2048], const double ARRAY_RESTRICT c[2048])
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < 2048; ++i) {
a[i] = ((c[i] > 0) ? b[i] : 0.0);
}
}
/* VLA versions */
void foo7(int n, double ARRAY_RESTRICT a[n], const double ARRAY_RESTRICT b[n], const double ARRAY_RESTRICT c[n])
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < n; ++i) {
if (c[i] > 0) {
a[i] = b[i];
} else {
a[i] = 0.0;
}
}
}
void foo8(int n, double ARRAY_RESTRICT a[n], const double ARRAY_RESTRICT b[n], const double ARRAY_RESTRICT c[n])
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < n; ++i) {
a[i] = ((c[i] > 0) ? b[i] : 0.0);
}
}
当我编译时
$ icc -O3 -std=c99 -opt-report5 -mavx -S foo.c
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location
我看到VLA情况没有被自动向量化,但是当我添加标志以断言没有别名-fno-alias
时,它们是.因此,我得出结论,我应该在源代码中对此进行规定,因此我尝试通过使用
I see that the VLA cases are not auto-vectorized, but when I add the flag to assert no aliasing -fno-alias
, they are. Thus, I conclude that I should prescribe this in the source, so I attempt to do that by compiling with
$ icc -O3 -std=c99 -opt-report5 -mavx -DARRAY_RESTRICT=restrict -S foo.c
icc: remark #10397: optimization reports are generated in *.optrpt files in the output location
编译器错误输出包括
foo.c(98): error: "restrict" is not allowed
void foo7(int n, double ARRAY_RESTRICT a[n], const double ARRAY_RESTRICT b[n],
const double ARRAY_RESTRICT c[n])
^
但是如您所见,我的VLA参数上不允许使用限制.
but as you can see, restrict is not allowed on my VLA arguments.
所以我的问题是:没有办法在ISO C中断言VLA没有别名吗?
请注意,我无法使用编译指示在源代码中声明任何别名-例如simd
,omp simd
,ivdep
等-并获得我想要的自动矢量化功能,但这些不是ISO C.
Note that I can assert no aliasing in the source code using pragmas - e.g. simd
, omp simd
, ivdep
etc. - and get the auto-vectorization that I want but these aren't ISO C.
在这种情况下,ISO C表示C的最新版本,在撰写本文时,当然是C11.
In this context, ISO C means the most recent version of C, which of course is C11 as of the writing of this post.
推荐答案
您的原始代码对我来说很失败,并显示以下消息:
Your original code fails nicely for me with messages such as:
void foo7(int n, double ARRAY_RESTRICT a[n], const double ARRAY_RESTRICT b[n], const double ARRAY_RESTRICT c[n])
^
restrict.c:126:1: error: invalid use of ‘restrict’
restrict.c:126:1: error: invalid use of ‘restrict’
restrict.c:145:1: error: invalid use of ‘restrict’
§6.7.6.3函数声明器(包括原型)具有示例5,其中说以下函数原型声明符是等效的:
§6.7.6.3 Function declarators (including prototypes) has Example 5 which says the following function prototype declarators are equivalent:
void f(double (* restrict a)[5]);
void f(double a[restrict][5]);
void f(double a[restrict 3][5]);
void f(double a[restrict static 3][5]);
这是标准中唯一出现与数组类型直接相关的限制的地方. §6.7.6通常是在声明器上,而§6.7.6.2是在数组声明器上,在我看来,限制似乎必须出现在数组维的第一个组件内.在您的情况下,应为:
This is the only place in the standard where restrict appears associated directly with array types. §6.7.6 is on declarators generally, and §6.7.6.2 on array declarators, and it looks to me as though the restrict has to appear inside the first component of the array dimension. In your context, it should be:
void foo7(int n, double a[ARRAY_RESTRICT n],
const double b[ARRAY_RESTRICT n],
const double c[ARRAY_RESTRICT n])
如果不看标准中的示例,而您问这个问题,我将不会相信这种表示法!请注意,这适用于数组以及VLA.
I wouldn't have believed that notation without seeing the examples in the standard and you asking the question! Note that this applies to arrays as well as VLAs.
此修订后的代码基于注释,可以在相同的编译选项下干净地编译:
This revised code, based on the commentary, compiles cleanly under the same compilation options:
gcc -g -O3 -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
-Wold-style-definition -Wold-style-declaration -Werror -c new.restrict.c
编译选项要求事先声明非静态函数,因此声明位于文件顶部.我还强制在源代码中使用#define ARRAY_RESTRICT restrict
,而不是将其保留为编译选项.
The compilation options demand prior declarations of non-static functions, hence the declarations at the top of the file. I also forced #define ARRAY_RESTRICT restrict
in the source, rather than leaving it as a compilation option.
编译器是在Ubuntu 14.04衍生产品上运行的GCC 4.9.2.
The compiler is GCC 4.9.2 running on an Ubuntu 14.04 derivative.
/* #define PRAGMA_SIMD _Pragma("simd") */
#define PRAGMA_SIMD
#ifdef __INTEL_COMPILER
#define ASSUME_ALIGNED(a) __assume_aligned(a, 64)
#else
#define ASSUME_ALIGNED(a)
#endif
#define ARRAY_RESTRICT restrict
#ifndef ARRAY_RESTRICT
#define ARRAY_RESTRICT
#endif
void foo1(double *restrict a, const double *restrict b, const double *restrict c);
void foo2(double *restrict a, const double *restrict b, const double *restrict c);
void foo3(int n, double *restrict a, const double *restrict b, const double *restrict c);
void foo4(int n, double *restrict a, const double *restrict b, const double *restrict c);
void foo5(double a[ARRAY_RESTRICT 2048], const double b[ARRAY_RESTRICT 2048], const double c[ARRAY_RESTRICT 2048]);
void foo6(double a[ARRAY_RESTRICT 2048], const double b[ARRAY_RESTRICT 2048], const double c[ARRAY_RESTRICT 2048]);
void foo7(int n, double a[ARRAY_RESTRICT n], const double b[ARRAY_RESTRICT n], const double c[ARRAY_RESTRICT n]);
void foo8(int n, double a[ARRAY_RESTRICT n], const double b[ARRAY_RESTRICT n], const double c[ARRAY_RESTRICT n]);
void foo1(double *restrict a, const double *restrict b, const double *restrict c)
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < 2048; ++i)
{
if (c[i] > 0)
{
a[i] = b[i];
}
else
{
a[i] = 0.0;
}
}
}
void foo2(double *restrict a, const double *restrict b, const double *restrict c)
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < 2048; ++i)
{
a[i] = ((c[i] > 0) ? b[i] : 0.0);
}
}
/* Undetermined size version */
void foo3(int n, double *restrict a, const double *restrict b, const double *restrict c)
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < n; ++i)
{
if (c[i] > 0)
{
a[i] = b[i];
}
else
{
a[i] = 0.0;
}
}
}
void foo4(int n, double *restrict a, const double *restrict b, const double *restrict c)
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < n; ++i)
{
a[i] = ((c[i] > 0) ? b[i] : 0.0);
}
}
/* Static array versions */
void foo5(double a[ARRAY_RESTRICT 2048], const double b[ARRAY_RESTRICT 2048], const double c[ARRAY_RESTRICT 2048])
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < 2048; ++i)
{
if (c[i] > 0)
{
a[i] = b[i];
}
else
{
a[i] = 0.0;
}
}
}
void foo6(double a[ARRAY_RESTRICT 2048], const double b[ARRAY_RESTRICT 2048], const double c[ARRAY_RESTRICT 2048])
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < 2048; ++i)
{
a[i] = ((c[i] > 0) ? b[i] : 0.0);
}
}
/* VLA versions */
void foo7(int n, double a[ARRAY_RESTRICT n], const double b[ARRAY_RESTRICT n], const double c[ARRAY_RESTRICT n])
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < n; ++i)
{
if (c[i] > 0)
{
a[i] = b[i];
}
else
{
a[i] = 0.0;
}
}
}
void foo8(int n, double a[ARRAY_RESTRICT n], const double b[ARRAY_RESTRICT n], const double c[ARRAY_RESTRICT n])
{
ASSUME_ALIGNED(a);
ASSUME_ALIGNED(b);
ASSUME_ALIGNED(c);
PRAGMA_SIMD
for (int i = 0; i < n; ++i)
{
a[i] = ((c[i] > 0) ? b[i] : 0.0);
}
}
这篇关于将限制限定符与C99可变长度数组(VLA)结合使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!