use: pushq %rbx movq %rdi, %rbx ;remember out subq $32, %rsp ;memory for returned object movq %rsp, %rdi ;hidden pointer to %rdi call do_something movdqu (%rsp), %xmm0 ;copy memory to out movq 16(%rsp), %rax movups %xmm0, (%rbx) movq %rax, 16(%rbx) addq $32, %rsp ;unwind/restore popq %rbx ret我知道,指针out的别名(例如,用作全局变量)可以在do_something中使用,因此out不能作为隐藏的指针传递给do_something:如果可以的话,out将在do_something内部进行更改,而不是在do_something返回时进行更改,因此某些计算可能会出错.例如,此版本的do_something会返回错误的结果:struct Vec3 global; //initialized somewherestruct Vec3 do_something(void){ struct Vec3 res; res.x = 2*global.x; res.y = global.y+global.x; res.z = 0; return res;}如果out用作全局变量global的别名,并用作在%rdi中传递的隐藏指针,则res也是global的别名,因为编译器将使用指向的内存直接由隐藏的指针(C中的一种RVO)创建,而无需实际创建临时对象并在返回时将其复制,则res.y将是2*x+y(如果x,y是global的旧值),而不是x+y以及其他任何隐藏的指针.有人建议我,使用restrict应该可以解决问题,即void use(struct Vec3 *restrict out){ *out = do_something();}因为编译器知道,现在do_something中没有可以使用的out别名,所以汇编程序可以像这样简单:use: jmp do_something ; %rdi is now the hidden pointer但是,无论是gcc还是clang都不是这种情况-汇编程序保持不变(请参见 godbolt ).什么阻止了将out用作隐藏指针?注意:对于稍微不同的功能签名,可以实现所需的(或非常相似的)行为:struct Vec3 use_v2(){ return do_something();}结果(请参见 godbolt ):use_v2: pushq %r12 movq %rdi, %r12 call do_something movq %r12, %rax popq %r12 ret解决方案允许函数假定其返回值对象(由隐藏指针指向)与任何对象其他.也就是说,其输出指针(作为隐藏的第一个arg传递)没有任何别名.您可以将其视为隐藏的第一个arg输出指针,上面带有隐式的restrict. (因为在C抽象机中,返回值是一个单独的对象,并且x86-64系统V指定了调用方提供空间.x86-64SysV没有给予调用方许可以引入别名.)使用其他地方的本地地址作为目的地(而不是使用单独的专用空间,然后再复制到实际本地地址)是可以的,但是不得使用可能指向其他方式的指针.这需要进行转义分析,以确保没有将指向此类局部变量的指针传递到函数外部.我认为x86-64 SysV调用约定通过让调用者提供真实的返回值对象,而不是强迫 callee 发明在此处为C抽象机建模该临时临时文件,以确保所有对retval的写操作均在其他任何写操作之后进行. IMO,这不是调用方为返回值提供空间"的意思.这绝对是GCC和其他编译器在实践中的解释方式,这是一个已经存在很长时间的调用约定(自从第一批AMD64芯片问世之前的一两年,直到2000年代初期)重要的一部分.在这种情况下,您的优化一旦完成便会中断:struct Vec3{ double x, y, z;};struct Vec3 glob3;__attribute__((noinline))struct Vec3 do_something(void) { // copy glob3 to retval in some order return (struct Vec3){glob3.y, glob3.z, glob3.x};}__attribute__((noinline))void use(struct Vec3 * out){ // copy do_something() result to *out *out = do_something();}void caller(void) { use(&glob3);}通过建议的优化,do_something的输出对象将为glob3.但它也显示为glob3. do_something的有效实现是按照源顺序将元素从glob3复制到(%rdi),这将在读取glob3.x作为返回值的第三个元素之前执行glob3.x = glob3.y.实际上完全 gcc -O1的作用( Godbolt编译器浏览器 )do_something: movq %rdi, %rax # tmp90, .result_ptr movsd glob3+8(%rip), %xmm0 # glob3.y, glob3.y movsd %xmm0, (%rdi) # glob3.y, <retval>.x movsd glob3+16(%rip), %xmm0 # glob3.z, _2 movsd %xmm0, 8(%rdi) # _2, <retval>.y movsd glob3(%rip), %xmm0 # glob3.x, _3 movsd %xmm0, 16(%rdi) # _3, <retval>.z ret在glob3.x加载之前通知glob3.y, <retval>.x存储. 因此,在源代码中的任何地方都没有restrict的情况下,GCC已经为do_something发出了asm,假定在retval和glob3之间没有混叠.我认为使用struct Vec3 *restrict out根本无济于事:这只告诉编译器在use()内部您将无法通过任何其他名称访问*out对象.由于use()没有引用glob3,因此将&glob3作为arg传递给use的restrict版本不是UB.我在这里可能错了; @ M.M在评论中指出,*restrict out可能会使此优化安全,因为do_something()的执行发生在out()期间. (编译器实际上仍然没有这样做,但是也许允许它们使用restrict指针.) 更新: Richard Biener说 GCC错过了MM正确的错误报告,并且如果编译器可以证明函数正常返回(不是异常或longjmp),则该优化理论上是合法的(但GCC可能不会这样做)寻找): 如果是这样,那么只要我们能够证明限制条件,该限制将使此优化安全 do_something是"noexcept",并且不是longjmp. 是的有一个noexecpt声明,但没有(AFAIK)您可以放在原型上的nolongjmp声明. 因此,这意味着(即使从理论上来说)只有当我们可以看到另一个函数的主体时,才可以作为过程间优化.除非noexcept也不意味着没有longjmp.I try to understand the implication of System V AMD64 - ABI's calling convention and looking at the following example:struct Vec3{ double x, y, z;};struct Vec3 do_something(void);void use(struct Vec3 * out){ *out = do_something();}A Vec3-variable is of type MEMORY and thus the caller (use) must allocate space for the returned variable and pass it as hidden pointer to the callee (i.e. do_something). Which is what we see in the resulting assembler (on godbolt, compiled with -O2):use: pushq %rbx movq %rdi, %rbx ;remember out subq $32, %rsp ;memory for returned object movq %rsp, %rdi ;hidden pointer to %rdi call do_something movdqu (%rsp), %xmm0 ;copy memory to out movq 16(%rsp), %rax movups %xmm0, (%rbx) movq %rax, 16(%rbx) addq $32, %rsp ;unwind/restore popq %rbx retI understand, that an alias of pointer out (e.g. as global variable) could be used in do_something and thus out cannot be passed as hidden pointer to do_something: if it would, out would be changed inside of do_something and not when do_something returns, thus some calculations might become faulty. For example this version of do_something would return faulty results:struct Vec3 global; //initialized somewherestruct Vec3 do_something(void){ struct Vec3 res; res.x = 2*global.x; res.y = global.y+global.x; res.z = 0; return res;}if out where an alias for the global variable global and were used as hidden pointer passed in %rdi, res were also an alias of global, because the compiler would use the memory pointed to by hidden pointer directly (a kind of RVO in C), without actually creating a temporary object and copying it when returned, then res.y would be 2*x+y(if x,y are old values of global) and not x+y as for any other hidden pointer.It was suggested to me, that using restrict should solve the problem, i.e.void use(struct Vec3 *restrict out){ *out = do_something();}because now, the compiler knows, that there are no aliases of out which could be used in do_something, so the assembler could be as simple as this:use: jmp do_something ; %rdi is now the hidden pointerHowever, this is not the case neither for gcc nor for clang - the assembler stays unchanged (see on godbolt).What prevents the usage of out as hidden pointer?NB: The desired (or very similar) behavior would be achieved for a slightly different function-signature:struct Vec3 use_v2(){ return do_something();}which results in (see on godbolt):use_v2: pushq %r12 movq %rdi, %r12 call do_something movq %r12, %rax popq %r12 ret 解决方案 A function is allowed to assume its return-value object (pointed-to by a hidden pointer) is not the same object as anything else. i.e. that its output pointer (passed as a hidden first arg) doesn't alias anything.You could think of this as the hidden first arg output pointer having an implicit restrict on it. (Because in the C abstract machine, the return value is a separate object, and the x86-64 System V specifies that the caller provides space. x86-64 SysV doesn't give the caller license to introduce aliasing.)Using an otherwise-private local as the destination (instead of separate dedicated space and then copying to a real local) is fine, but pointers that may point to something reachable another way must not be used. This requires escape analysis to make sure that a pointer to such a local hasn't been passed outside of the function.I think the x86-64 SysV calling convention models the C abstract machine here by having the caller provide a real return-value object, not forcing the callee to invent that temporary if needed to make sure all the writes to the retval happened after any other writes. That's not what "the caller provides space for the return value" means, IMO.That's definitely how GCC and other compilers interpret it in practice, which is a big part of what matters in a calling convention that's been around this long (since a year or two before the first AMD64 silicon, so very early 2000s).Here's a case where your optimization would break if it were done:struct Vec3{ double x, y, z;};struct Vec3 glob3;__attribute__((noinline))struct Vec3 do_something(void) { // copy glob3 to retval in some order return (struct Vec3){glob3.y, glob3.z, glob3.x};}__attribute__((noinline))void use(struct Vec3 * out){ // copy do_something() result to *out *out = do_something();}void caller(void) { use(&glob3);}With the optimization you're suggesting, do_something's output object would be glob3. But it also reads glob3.A valid implementation for do_something would be to copy elements from glob3 to (%rdi) in source order, which would do glob3.x = glob3.y before reading glob3.x as the 3rd element of the return value.That is in fact exactly what gcc -O1 does (Godbolt compiler explorer)do_something: movq %rdi, %rax # tmp90, .result_ptr movsd glob3+8(%rip), %xmm0 # glob3.y, glob3.y movsd %xmm0, (%rdi) # glob3.y, <retval>.x movsd glob3+16(%rip), %xmm0 # glob3.z, _2 movsd %xmm0, 8(%rdi) # _2, <retval>.y movsd glob3(%rip), %xmm0 # glob3.x, _3 movsd %xmm0, 16(%rdi) # _3, <retval>.z retNotice the glob3.y, <retval>.x store before the load of glob3.x.So without restrict anywhere in the source, GCC already emits asm for do_something that assumes no aliasing between the retval and glob3.I don't think using struct Vec3 *restrict out wouldn't help at all: that only tells the compiler that inside use() you won't access the *out object through any other name. Since use() doesn't reference glob3, it's not UB to pass &glob3 as an arg to a restrict version of use.I may be wrong here; @M.M argues in comments that *restrict out might make this optimization safe because the execution of do_something() happens during out(). (Compilers still don't actually do it, but maybe they would be allowed to for restrict pointers.)Update: Richard Biener said in the GCC missed-optimization bug-report that M.M is correct, and if the compiler can prove that the function returns normally (not exception or longjmp), the optimization is legal in theory (but still not something GCC is likely to look for): If so, restrict would make this optimization safe if we can prove that do_something is "noexcept" and doesn't longjmp. Yes.There's a noexecpt declaration, but there isn't (AFAIK) a nolongjmp declaration you can put on a prototype.So that means it's only possible (even in theory) as an inter-procedural optimization when we can see the other function's body. Unless noexcept also means no longjmp. 这篇关于是什么阻止了将函数参数用作隐藏指针?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持! 上岸,阿里云!
09-05 09:38
查看更多