如何用扩展的gcc程序集指定x87 FPU堆栈的底部底部？

本文介绍了如何用扩展的gcc程序集指定x87 FPU堆栈的底部底部？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我们的代码库中，我发现这个代码片段用于x87上的快速，朝向负无穷的舍入：

  inline int my_int（double x）
 {
 int r; 
 #ifdef _GCC_ 
 asm（fldl％1 \\\

fistpl％0 \\\

：= m（r）
：的 m（X））; 
 #else 
 // ... 
 #endif 
 return r; 
}

我不是很熟悉GCC扩展汇编语法，但是从我从文档中收集：

r 必须是内存位置， m写回东西;

x 也必须是一个内存位置，数据从哪里来。

没有clobber规范，所以编译器可以放心，在代码片段的最后，寄存器就是他留下的。
现在，来回答我的问题：确实最终FPU堆栈是平衡的，但如果所有8个位置都已经被使用并且我正在使用它，那该怎么办？编译器如何知道它不能相信 ST（7）是它离开它的地方？我试着指定 st（7） 在clobber列表中，它似乎会影响codegen，现在我会等待一些确认这个事实。

附注：在glibc和MinGW中查看准系统 lrint 的实现我看到类似于

__ asm__ __volatile__（fistpl％0 ：= m（retval）：t（x）：st）;
我们要求输入直接放在 ST（0 ）（它避免了可能无用的 fldl ）;那是什么st clobber？文档似乎只提及 t （即堆栈的顶部）。

是的，它取决于当前的舍入模式，在我们的应用程序中应该总是朝向负无穷大。

解决方案
这实际上是代表你想要的内联汇编代码的正确方法。

为了获得最佳可能的代码生成，你想要使用输入和输出。而不是硬编码必要的加载/存储指令，让编译器生成它们。这不仅会引入消除可能不必要的指令的可能性，还意味着编译器可以在需要时更好地调度这些指令（即，它可以在先前的代码序列内交错指令，通常会降低成本）。
st clobber引用 st（0）寄存器，即，x87 FPU堆栈的顶部。 AT& T / GAS表示法通常指的是简单的 st ，其中英特尔/ MASM表示法称为 st（0）。并且，根据GCC关于的文档，该clobber列表是注册名称或特殊clobbers（cc（条件代码/标志）和内存）。所以这意味着内联汇编clobbers（覆盖） st（0）寄存器。为什么这个clobber是必要的，因为 fistpl 指令弹出堆栈的顶部，因此会破坏 st（0）的原始内容。 code>。
有关此代码的唯一问题是文档中的以下段落：正如你所知道的那样， t 意味着x87 FPU堆栈的顶部。问题是，这与 st 寄存器是一样的，文档非常清楚地表示我们不能有一个clobber，它指定与输入/输出操作数。此外，由于文档声明编译器被禁止使用任何破坏寄存器来表示输入/输出操作数，所以这个内联汇编程序产生了一个不可能的请求 - 在x87 FPU堆栈的顶部加载这个值，而不把它放在<$现在，我认为glibc的作者知道他们在做什么，并且更熟悉编译器的功能内联汇编的实现比你或我的，所以这段代码可能是合法和合法的。实际上，似乎不寻常的情况下x87的堆栈式寄存器强制异常处理clobbers和操作数之间的正常交互。说：我们的案例正是如此。进一步的确认是通过（链接部分的底部）：这里，clobber st（1 ）与输入约束 u 相同，这似乎违反了有关clobbers的上述文档，但是恰恰用于因为 fistpl 会弹出输入内容，所以st被用作原始代码中的clobber。所有这些说，现在你知道如何正确地编写内联汇编代码，我必须回声先前的评论者谁建议最佳解决方案根本不要使用内联汇编。只需调用 lrint ，它不仅具有所需的确切语义，还可以在某些情况下被编译器更好地优化（ eg ），当目标体系结构支持SSE时，将它转换为一个 cvtsd2si 指令。） In a codebase of ours I found this snippet for fast, towards-negative-infinity rounding on x87: inline int my_int(double x) { int r; #ifdef _GCC_ asm ("fldl %1\n" "fistpl %0\n" :"=m"(r) :"m"(x)); #else // ... #endif return r; } I'm not extremely familiar with GCC extended assembly syntax, but from what I gather from the documentation: r must be a memory location, where I'm writing back stuff; x must be a memory location too, whence the data comes from. there's no clobber specification, so the compiler can rest assured that at the end of the snippet the registers are as he left them. Now, to come to my question: it's true that in the end the FPU stack is balanced, but what if all the 8 locations were already in use and I'm overflowing it? How can the compiler know that it cannot trust ST(7) to be where it left it? Should some clobber be added? Edit I tried to specify st(7) in the clobber list and it seems to affect the codegen, now I'll wait for some confirmation of this fact. As a side note: looking at the implementation of the barebones lrint both in glibc and in MinGW I see something like __asm__ __volatile__ ("fistpl %0" : "=m" (retval) : "t" (x) : "st"); where we are asking for the input to be placed directly in ST(0) (which avoids that potentially useless fldl); what is that "st" clobber? The docs seems to mention only t (i.e. the top of the stack). yes, it depends from the current rounding mode, which in our application should always be "towards negative infinity". 解决方案 This is actually the correct way to represent the code you want as inline assembly. To get the most optimal possible code generated, you want to make use of the inputs and outputs. Rather than hard-coding the necessary load/store instructions, let the compiler generate them. Not only does this introduce the possibility of eliding potentially unnecessary instructions, it also means that the compiler can better schedule these instructions when they are required (that is, it can interleave the instruction within a prior sequence of code, often minimizing its cost). The "st" clobber refers to the st(0) register, i.e., the top of the x87 FPU stack. What Intel/MASM notation calls st(0), AT&T/GAS notation generally refers to as simply st. And, as per GCC's documentation for clobbers, the items in the clobber list are "either register names or the special clobbers" ("cc" (condition codes/flags) and "memory"). So this just means that the inline assembly clobbers (overwrites) the st(0) register. The reason why this clobber is necessary is that the fistpl instruction pops the top of the stack, thus clobbering the original contents of st(0). The only thing that concerns me regarding this code is the following paragraph from the documentation: As you already know, the t constraint means the top of the x87 FPU stack. The problem is, this is the same as the st register, and the documentation very clearly said that we could not have a clobber that specifies the same register as one of the input/output operands. Furthermore, since the documentation states that the compiler is forbidden to use any of the clobbered registers to represent input/output operands, this inline assembly makes an impossible request—load this value at the top of the x87 FPU stack without putting it in st! Now, I would assume that the authors of glibc know what they are doing and are more familiar with the compiler's implementation of inline assembly than you or I, so this code is probably legal and legitimate. Actually, it seems that the unusual case of the x87's stack-like registers forces an exception to the normal interactions between clobbers and operands. The official documentation says: That fits our case exactly. Further confirmation is provided by an example appearing in the official documentation (bottom of the linked section): Here, the clobber st(1) is the same as the input constraint u, which seems to violate the above-quoted documentation regarding clobbers, but is used and justified for precisely the same reason that "st" is used as the clobber in your original code, because fistpl pops the input. All of that said, and now that you know how to correctly write the code in inline assembly, I have to echo previous commenters who suggested that the best solution would be not to use inline assembly at all. Just call lrint, which not only has the exact semantics that you want, but can also be better optimized by the compiler under certain circumstances (e.g., transforming it into a single cvtsd2si instruction when the target architecture supports SSE). 这篇关于如何用扩展的gcc程序集指定x87 FPU堆栈的底部底部？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！