问题描述
我想CUDA添加到在90年代末写于现有的单线程C程序。
要做到这一点,我需要混合两种语言,C和C ++(NVCC是一个C ++编译器)。
的问题是,在C ++编译器看到的结构有一定的尺寸,而C编译看到相同的结构作为略有不同的大小。那很糟。我真的觉得奇怪,因为我无法找到一个4字节的差异的原因。
/usr/lib/gcc/i586-suse-linux/4.3 /../../../../ i586的-SUSE Linux的/斌/劳工处:警告:符号'树'大小从324 /tmp/ccvx8fpJ.o改为328 gpu.o
我的C ++看起来像
的#include<&stdio.h中GT;
#包括LT&;&stdlib.h中GT;
#包括ASSERT.H
为externC
{
#包括structInfo.h//包含结构声明
}
...
和我的C文件看起来像
的#includestructInfo.h
...
与structInfo.h看上去就像
结构TB {
INT nbranch,nnode,根,枝[NBRANCH] [2];
双LNL;
}树;
...
我的化妆文件看起来像
PRGS = PROG
CC = CC
CFLAGS = -std = gnu99 -m32
CUCC = NVCC
CuFlags = -arch = sm_20
LIBS = -lm -L在/ usr /本地/ CUDA-5.0 / lib目录-lcuda -lcudart
所有:$(PRGS)
PROG:
$(CC)$(CFLAGS)prog.c中gpu.o $(LIBS)-o PROG
gpu.o:
$(CUCC)$(CuFlags)-C gpu.cu
有人问我,为什么我没有用一个不同的主机编译选项。我想既然2年前发布主机编译选项已经去precated?此外一直没有出现做什么它说,它会做。
NVCC警告:选择主机汇编已经去precated被忽略
GPU的要求对所有数据的自然对齐,例如一个4字节INT必须对齐到4字节边界和8字节双或long long需要有8个字节对齐。 CUDA强制实施此主机code,以及确保结构是为code的主机和设备之间的部分尽可能地兼容。另一方面一般不要求数据的x86 CPU到自然对齐(虽然性能损失可能是由于缺乏取向的)。
在这种情况下,CUDA需要该结构的双组件对齐,以一个8字节的边界。由于奇数INT组件$ P $的pceed双,这需要填充。开关元件的顺序,即把双组分第一,不会因为帮助在这种结构的阵列,每个结构体必须是8字节对齐和结构的大小,因此必须是8字节的倍数来实现这一,这也需要填充。
要强制GCC对齐双打以同样的方式做CUDA,传递标志 -malign双
。
I am trying to add CUDA to an existing single threaded C program that was written sometime in the late 90s.
To do this I need to mix two languages, C and C++ (nvcc is a c++ compiler).
The problem is that the C++ compiler sees a structure as a certain size, while the C compile sees the same structure as a slightly different size. Thats bad. I am really puzzled by this because I can't find a cause for a 4 byte discrepancy.
/usr/lib/gcc/i586-suse-linux/4.3/../../../../i586-suse-linux/bin/ld: Warning: size of symbol `tree' changed from 324 in /tmp/ccvx8fpJ.o to 328 in gpu.o
My C++ looks like
#include <stdio.h>
#include <stdlib.h>
#include "assert.h"
extern "C"
{
#include "structInfo.h" //contains the structure declaration
}
...
and my C files look like
#include "structInfo.h"
...
with structInfo.h looking like
struct TB {
int nbranch, nnode, root, branches[NBRANCH][2];
double lnL;
} tree;
...
My make file looks like
PRGS = prog
CC = cc
CFLAGS=-std=gnu99 -m32
CuCC = nvcc
CuFlags =-arch=sm_20
LIBS = -lm -L/usr/local/cuda-5.0/lib -lcuda -lcudart
all : $(PRGS)
prog:
$(CC) $(CFLAGS) prog.c gpu.o $(LIBS) -o prog
gpu.o:
$(CuCC) $(CuFlags) -c gpu.cu
Some people asked me why I didn't use a different host compilation option. I think the host compilation option has been deprecated since 2 release ago? Also it never appeared to do what it said it would do.
nvcc warning : option 'host-compilation' has been deprecated and is ignored
GPUs require natural alignment for all data, e.g. a 4-byte int needs to be aligned to a 4-byte boundary and an 8-byte double or long long needs to have 8-byte alignment. CUDA enforces this for host code as well to make sure structs are as compatible as possible between the host and device portions of the code. x86 CPUs on the other hand do not generally require data to be naturally aligned (although performance penalty may result from a lack of alignment).
In this case, CUDA needs to align the double component of the struct to an 8-byte boundary. Since an odd number of int components preceed the double, this requires padding. Switching the order of components, i.e. putting the double component first, does not help because in an array of such structs each struct would have to be 8-byte aligned and the size of the struct therefore must be a multiple of 8 bytes to accomplish that, which also requires padding.
To force gcc to align doubles in the same way CUDA does, pass the flag -malign-double
.
这篇关于为什么gcc和NVCC(G ++)看到两种不同的结构尺寸?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!