问题描述
我试图找出如何在汇编中计算模 10,所以我在 gcc 中编译了以下 c 代码以查看它的结果.
I was trying to work out how to calculate modulo 10 in assembly so i compiled the following c code in gcc to see what it came up with.
unsigned int i=999;
unsigned int j=i%10;
出乎我的意料
movl -4(%ebp), %ecx
movl $-858993459, %edx
movl %ecx, %eax
mull %edx
shrl $3, %edx
movl %edx, %eax
sall $2, %eax
addl %edx, %eax
addl %eax, %eax
movl %ecx, %edx
subl %eax, %edx
movl %edx, %eax
movl %eax, -12(%ebp)
其中 -4(%ebp) 或 "i" 是输入,-12(%ebp) 或 "j" 是答案.我已经对此进行了测试,无论您制作什么数字 -4(%ebp),它都可以工作.
Where -4(%ebp) or "i" is the input and -12(%ebp) or "j" is the answer. I've tested this and it does work no matter what number you make -4(%ebp).
我的问题是这段代码是如何工作的,它比使用 div 操作数有什么好处.
My question is how does this code work and how is it better than using the div operand.
推荐答案
第一个问题:div
是一条非常慢的指令(超过 20 个时钟周期).上面的序列包含更多的指令,但它们都相对较快,因此在速度方面是绝对胜利.
Second question first: div
is a very slow instruction (more than 20 clock cycles). The sequence above consists of more instructions, but they're all relatively fast, so it's a net win in terms of speed.
前五个指令(直到并包括 shrl
)计算 i/10(我将在一分钟内解释如何).
The first five instructions (up to and including the shrl
) compute i/10 (I'll explain how in a minute).
接下来的几条指令再次将结果乘以 10,但避免使用 mul
/imul
指令(这是否成功取决于您的确切处理器)重新定位 - 较新的 x86 具有非常快的乘法器,但较旧的则没有).
The next few instructions multiply the result by 10 again, but avoiding the mul
/imul
instructions (whether this is a win or not depends on the exact processor you're targeting - newer x86s have very fast multipliers, but older ones don't).
movl %edx, %eax ; eax=i/10
sall $2, %eax ; eax=(i/10)*4
addl %edx, %eax ; eax=(i/10)*4 + (i/10) = (i/10)*5
addl %eax, %eax ; eax=(i/10)*5*2 = (i/10)*10
然后再次从 i
中减去,得到 i - (i/10)*10
即 i % 10
(对于无符号数字).
This is then subtracted from i
again to obtain i - (i/10)*10
which is i % 10
(for unsigned numbers).
最后,关于i/10的计算:基本思想是用乘以1/10代替除以10.编译器通过乘以 (2**35/10 + 1) 来对此进行定点近似 - 这是加载到 edx
中的魔法值,尽管它输出为有符号值,即使它确实是无符号 - 并将结果右移 35.结果证明所有 32 位整数都能得到正确的结果.
Finally, on the computation of i/10: The basic idea is to replace division by 10 with multiplication by 1/10. The compiler does a fixed-point approximation of this by multiplying with (2**35 / 10 + 1) - that's the magic value loaded into edx
, though it's output as a signed value even though it's really unsigned - and right-shifting the result by 35. This turns out to give the right result for all 32-bit integers.
有算法可以确定这种近似值,保证误差小于 1(对于整数意味着它是正确的值),而 GCC 显然使用了一个 :)
There's algorithms to determine this kind of approximation which guarantee that the error is less than 1 (which for integers means it's the right value) and GCC obviously uses one :)
最后一句话:如果你想真正看到 GCC 计算模数,请制作除数变量(例如函数参数),这样它就不能做这种优化.无论如何,在 x86 上,您使用 div
计算模.div
期望 edx:eax
中的 64 位红利(edx 中的高 32 位,eax 中的低 32 位 - 如果您正在使用 32-bit 数)并除以您指定的任何操作数(例如 div ebx
将 edx:eax
除以 ebx
).它返回 eax
中的商和 edx
中的余数.idiv
对有符号的值做同样的事情.
Final remark: If you want to actually see GCC compute a modulo, make the divisor variable (e.g. a function parameter) so it can't do this kind of optimization. Anyway, on x86, you compute modulo using div
. div
expects the 64-bit dividend in edx:eax
(high 32 bits in edx, low 32 bits in eax - clear edx to zero if you're working with a 32-bit number) and divides that by whatever operand you specify (e.g. div ebx
divides edx:eax
by ebx
). It returns the quotient in eax
and the remainder in edx
. idiv
does the same for signed values.
这篇关于模(%)的GCC实现是如何工作的,为什么不使用div指令?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!