问题描述
我是装配的新用户,目前正在尝试使用asm标记创建一个c ++代码。我使用cygwin编译。这是我的代码:
I'm new with assembly and currently trying to create a c++ code with asm tag. I'm using cygwin to compile. Here is my code:
#include <iostream>
using namespace std;
int main()
{
float flp1_num, flp2_num, flp_rslt1;
cin >>flp1_num >>flp2_num;
__asm
{
FLD flp1_num
FLDPI
FADD flp2_num
FST flp_rslt1
}
cout << flp_rslt1;
}
使用的语法是。
我正在使用 g ++ arq.cpp -o arq.exe
编译,这会给我错误说:
I'm compiling using g++ arq.cpp -o arq.exe
which gives me error saying:
arq.cpp: In function ‘int main()’:
arq.cpp:13:5: error: expected ‘(’ before ‘{’ token
{
^
arq.cpp:14:9: error: ‘FLD’ was not declared in this scope
FLD flp1_num
^
然后我尝试将 __ asm {}
更改为 __ asm / code>,它给了我不同的错误:
Then I tried changing __asm {}
into __asm()
and it gave me different error:
arq.cpp: In function ‘int main()’:
arq.cpp:14:9: error: expected string-literal before ‘FLD’
FLD flp1_num
我已经搜索过,发现很少的替代品可能工作,但他们没有为我工作,例如 __ asm __(fld flp1_num);
和 asm(fld flp1_num);
给我错误说 /tmp/cccDDfUP.o:arq.cpp :(。
I've searched around and found few alternatives that may work, but they didn't work for me. For example both __asm__("fld flp1_num");
and asm("fld flp1_num");
give me error saying /tmp/cccDDfUP.o:arq.cpp:(.text+0x32): undefined reference to flp1_num
.
如何修复此错误?
推荐答案
正如其他人所说,您正在查看Microsoft的编译器文档,它的内联汇编形式与GCC使用的形式非常不同。实际上,。对于一个温和的介绍,有一个很好的教程,我特别喜欢David Wohlferd的回答。虽然这是一个不相关的问题,如果你只是为了它的解释,他会给出一个非常好的介绍内联汇编的基础知识。
You will need to consult the documentation for the Gnu inline assembly syntax, available here. For a gentler introduction, there is a good tutorial here, and I particularly like David Wohlferd's answer here. Although it is to an unrelated question, he gives a very good introduction to the basics of inline assembly if you just follow along with his explanation for the sake of it.
无论如何,对你的具体问题。几个直接的问题:
Anyway, on to your specific problem. A couple of immediate issues:
-
代码很可能不会做你想象的。你的代码实际上是什么是添加pi到
flp2_num
,然后把结果放入flp_rslt1
。它不会对flp1_num
做任何事情。
The code very likely does not do what you think it does. What your code actually does is add pi to
flp2_num
, and then put that result intoflp_rslt1
. It doesn't do anything at all withflp1_num
.
如果我不得不猜测,我会想象你想要添加 flp1_num
,pi和 flp2_num
,然后将结果返回 flp_rslt1
。
If I had to guess, I would imagine that you want to add flp1_num
, pi, and flp2_num
all together, and then return the result in flp_rslt1
. (But maybe not; it isn't really clear, since you don't have any comments stating your intent, nor a descriptive function name.)
您的帐户名称是您的帐户名称,代码也是 ,因为它没有正确清理浮点堆栈。您有两个加载说明,但没有弹出说明!您压入/加载到浮点堆栈上的所有内容必须弹出/卸载,否则会导致浮点堆栈不平衡,从而导致重大问题。
Your code is also broken because it does not properly clean up the floating-point stack. You had two "load" instructions, but no pop instructions! Everything you push/load onto the floating point stack must be popped/unloaded, or you imbalance the floating-point stack, which causes major problems.
因此,在MSVC语法中,您的代码应该看起来像下面的内容(为了方便和清晰起见,它被包装到一个函数中):
Therefore, in the MSVC syntax, your code should have looked something like the following (wrapped up into a function for convenience and clarity):
float SumPlusPi(float flp1_num, float flp2_num)
{
float flp_rslt1;
__asm
{
fldpi ; load the constant PI onto the top of the FP stack
fadd DWORD PTR [flp2_num] ; add flp2_num to PI, and leave the result on the top of the stack
fadd DWORD PTR [flp1_num] ; add flp1_num to the top of the stack, again leaving the result there
fstp DWORD PTR [flp_rslt1] ; pop the top of the stack into flp_rslt1
}
return flp_rslt1;
}
我只推了一次( fldpi
),所以我只弹出一次( fstp
)。对于添加,我使用在内存操作数上工作的 fadd
的形式;这导致该值被隐式地加载到堆栈上,但是否则看起来作为单个指令执行。然而,有许多不同的方式,你可以写这个。重要的是平衡推动次数和弹出次数。有指令显式弹出( fstp
),还有其他指令执行操作,然后弹出(,例如, faddp
)。
I only pushed one time (fldpi
), so I only popped one time (fstp
). For the additions, I used the form of fadd
that works on a memory operand; this causes the value to be implicitly loaded onto the stack, but otherwise appears to execute as a single instruction. There are, however, many different ways you could have written this. The important thing is to balance the number of pushes with the number of pops. There are instructions that explicitly pop (fstp
), and there are other instructions that perform an operation and then pop (e.g., faddp
). Different combinations of instructions, in certain orders, are very likely more optimal than others, but my code above does work.
这里是一个非常有用的指令,等效代码翻译成GAS语法:
And here is the equivalent code translated into GAS syntax:
float SumPlusPi(float flp1_num, float flp2_num)
{
float flp_rslt1;
__asm__("fldpi \n\t"
"faddl %[two] \n\t"
"faddl %[one]"
: [result] "=t" (flp_rslt1) // tell compiler result is left at the top of the floating-point stack,
// making an explicit pop unnecessary
: [one] "m" (flp1_num), // input operand from memory (inefficient)
[two] "m" (flp2_num)); // input operand from memory (inefficient)
return flp_rslt1;
}
虽然这是有效的,但它也是次优的,因为它没有利用的GAS内联汇编语法的高级功能,特别是消耗已经加载到浮点堆栈作为输入的值的能力。
Although this works, it is also sub-optimal because it does not take advantage of the advanced features of the GAS inline assembly syntax, particularly the ability to consume values already loaded onto the floating-point stack as inputs.
最重要的是,请不要错过(也由David Wohlferd)!这是一个内联汇编的真正无意义的用法。 编译器将生成更好的代码,并且您需要明显更少的工作。因此,喜欢写上述函数,如下所示:
Most importantly, though, don't miss the reasons why you should not use inline assembly (also by David Wohlferd)! This is a truly pointless usage of inline assembly. The compiler will generate better code, and it will require significantly less work on your part. Therefore, prefer to write the above function like this:
#include <cmath> // for M_PI constant
float SumPlusPi(float flp1_num, float flp2_num)
{
return (flp1_num + flp2_num + static_cast<float>(M_PI));
}
注意,如果你真的想实现不同于我假设的逻辑,修改这个代码来做你想要的是很简单的。
Notice that if you actually want to implement different logic than I had been assuming, it is trivial to alter this code to do what you want.
如果你不相信我生成的代码与你的内联汇编一样好 - 如果不是更好 - 其中是GCC 6.2为上述函数生成的确切的对象代码(Clang发出相同的代码):
In case you don't believe me that this produces code that is equally good as your inline assembly—if not better—here is the exact object code generated by GCC 6.2 for the above function (Clang emits the same code):
fld DWORD PTR [flp2_num] ; load flp2_num onto top of FPU stack
fadd DWORD PTR [flp1_num] ; add flp1_num to value at top of FPU stack
fadd DWORD PTR [M_PI] ; add constant M_PI to value at top of FPU stack
ret ; return, with result at top of FPU stack
使用 fldpi
与从GCC加载常量的值。如果有什么,强制使用这条指令实际上是一个pessimization,因为这意味着你的代码不能利用SSE / SSE2指令,允许比旧的x87更有效地操作浮点值。 FPU。为上述C代码启用SSE / SSE2非常简单,只需引入一个编译器开关(或指定支持它的目标体系结构,这将隐式启用它)。这将给你以下:
There is no speed win in using fldpi
versus loading the value from a constant like GCC does. If anything, forcing the use of this instruction is actually a pessimization, because it means your code cannot ever take advantage of the SSE/SSE2 instructions that allow manipulating floating-point values far more efficiently than the old x87 FPU. Enabling SSE/SSE2 for the above C code is as simple as throwing a compiler switch (or specifying a target architecture that supports it, which will implicitly enable it). That will give you the following:
sub esp, 4 ; reserve space on the stack
movss xmm0, DWORD PTR [M_PI] ; load M_PI constant
addss xmm0, DWORD PTR [flp2_num] ; add flp2_num
addss xmm0, DWORD PTR [flp1_num] ; add flp1_num
movss DWORD PTR [esp], xmm0 ; store result in temporary space on stack
fld DWORD PTR [esp] ; load result from stack to top of FPU stack
add esp, 4 ; clean up stack space
ret ; return, with result at top of FPU stack
这篇关于Cygwin:使用asm标签编译cpp文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!