本文介绍了CS:APP示例将idivq与两个操作数一起使用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过《计算机系统从程序员的角度》(第3版)一书中介绍x86-64(通常是汇编语言).根据网络上的其他来源,作者指出idivq仅采用一个操作数-就像有人声称.但是随后,在某些章节的后面,作者给出了一个使用指令idivq $9, %rcx的示例.

I am reading about x86-64 (and assembly in general) through the book "computer systems a programmer's perspective"(3rd edition). The author, in compliance with other sources from the web, states that idivq takes one operand only - just as this one claims. But then, the author, some chapters later, gives an example with the instruction idivq $9, %rcx.

两个操作数?我最初以为这是一个错误,但从那本书开始就经常发生.

Two operands? I first thought this was a mistake but it happens a lot in the book from there.

此外,应该从寄存器%rdx(高阶64位)和%rax(低阶64位)中的数量获得红利-因此,如果在体系结构中定义了该值,则似乎没有第二个操作数可能是指定的被除数.

Also, the dividend should be given from the quantity in registers %rdx (high-order 64 bits) and %rax (low-order 64 bits) - so if this is defined in the architecture then it does not seem possible that the second operand could be a specified dividend.

这里是一个练习的示例(太懒惰而无法将其全部写下来-因此,图片是必经之路).它声称GCC在编译简短的C函数时发出idivq $9, %rcx.

Here is an example of an exercise (too lazy to write it all down - so a picture is the way to go). It claims that GCC emits idivq $9, %rcx when compiling a short C function.

推荐答案

那是一个错误.只有 imul 具有立即数和2寄存器形式.

That's a mistake. Only imul has immediate and 2-register forms.

mul,div或idiv仍然仅以8086引入的单操作数形式存在,使用RDX:RAX作为输出(以及输入用于除法)的隐式双角操作数.

mul, div, or idiv still only exist in the one-operand form introduced with 8086, using RDX:RAX as the implicit double-width operand for output (and input for division).

还是EDX:EAX,DX:AX或AH:AL,具体取决于操作数的大小.请查阅ISA参考,例如英特尔的手册,而不是本书! https://www.felixcloutier.com/x86/idiv

Or EDX:EAX, DX:AX, or AH:AL, depending on operand-size of course. Consult an ISA reference like Intel's manual, not this book! https://www.felixcloutier.com/x86/idiv

另请参阅何时以及为什么我们在mul/div上签名extend并使用cdq?

x86-64的唯一硬件划分指令是idivdiv. 64位模式已删除aam,后者立即执行8位除法. (在Assembler x86中进行划分和举例说明了在16位模式下使用aam的情况.

x86-64's only hardware division instructions are idiv and div. 64-bit mode removed aam, which does 8-bit division by an immediate. (Dividing in Assembler x86 and Displaying Time in Assembly has an example of using aam in 16-bit mode).

当然,除以常量idivdiv(和aam)效率很低.除非要针对代码大小而不是性能进行优化,否则请对2的乘方使用shift,否则将乘以逆.

Of course for division by constants idiv and div (and aam) are very inefficient. Use shifts for powers of 2, or a multiplicative inverse otherwise, unless you're optimizing for code-size instead of performance.

CS:APP 3e全球版在实践中显然存在多个严重的x86-64指令集错误,声称GCC发出了不可能的指令.不只是错别字或细微的错误,还有误导性的废话,这对熟悉x86-64指令集的人来说显然是错误的.这不仅仅是语法错误,它还在尝试使用不可编码的指令(除了可以扩展为多个指令的宏之外,没有语法可以表达它们.使用宏将idivq定义为伪指令将是很奇怪).

CS:APP 3e Global Edition apparently has multiple serious x86-64 instruction-set mistakes like this in practice problems, claiming that GCC emits impossible instructions. Not just typos or subtle mistakes, but misleading nonsense that's very obviously wrong to people familiar with the x86-64 instruction set. It's not just a syntax mistake, it's trying to use instructions that aren't encodeable (no syntax can exist to express them, other than a macro that expands to multiple instructions. Defining idivq as a pseudo-instruction using a macro would be pretty weird).

例如我正确地猜到了函数的缺失部分,但是gcc生成的汇编代码与答案不符,这是另一个提示(%rbx, %rdi, %rsi)(%rsi, %rsi, 9)是有效的寻址模式!比例因子实际上是2位移位计数,因此它们是总垃圾,是作者严重缺乏对他们正在教授的ISA知识的征兆,而不是错字.

e.g. I correctly guessed missing part of a function, but gcc generated assembly code doesn't match the answer is another one where it suggests that (%rbx, %rdi, %rsi) and (%rsi, %rsi, 9) are valid addressing modes! The scale factor is actually a 2-bit shift count so these are total garbage and a sign of a serious lack of knowledge by the authors about the ISA they're teaching, not a typo.

他们的代码无法与任何AT& T语法汇编程序一起进行汇编.

Their code won't assemble with any AT&T syntax assembler.

x86-64 addq指令仅具有一个操作数是什么意思? (摘自CSAPP第三版)是另一个示例,其中它们具有addq %eax而不是inc %rdx荒谬,并且mov存储区中的操作数大小不匹配.

Also What does this x86-64 addq instruction mean, which only have one operand? (From CSAPP book 3rd Edition) is another example, where they have a nonsensical addq %eax instead of inc %rdx, and a mismatched operand-size in a mov store.

似乎他们只是在捏造东西,并声称它是由GCC发出的. IDK是从真正的GCC输出开始,然后将其编辑为他们认为更好的示例,还是在没有测试的情况下从头开始手工编写.

It seems that they're just making stuff up and claiming it was emitted by GCC. IDK if they start with real GCC output and edit it into what they think is a better example, or actually write it by hand from scratch without testing it.

GCC的实际输出将使用乘以魔术常数(定点乘法逆数)的乘积除以9(即使在-O0,但这显然不是调试模式代码.他们可能使用了-Os)

GCC's actual output would have used multiplication by a magic constant (fixed-point multiplicative inverse) to divide by 9 (even at -O0, but this is clearly not debug-mode code. They could have used -Os).

想必他们不想谈论,然后用自己的编成指令替换该代码块.从上下文中,您可能可以找出他们期望输出结果的位置.也许他们的意思是rcx /= 9.

Presumably they didn't want to talk about Why does GCC use multiplication by a strange number in implementing integer division? and replaced that block of code with their made-up instruction. From context you can probably figure out where they expect the output to go; perhaps they mean rcx /= 9.

在发布者的网站上( https://csapp.cs.cmu.edu/3e/errata.html )

因此,只要您获得北美版,CS:APP 3e可能就是一本好教科书,或者忽略练习/作业问题.这就解释了教科书的声誉和广泛使用与严重而明显的(对于熟悉x86-64 asm的人而言)这样的错误之间的巨大脱节,这种错误已经超越草率​​地进入了不知道"的领域. >


如何设计假设的idiv reg, regidiv $imm, reg

So CS:APP 3e is probably a good textbook, as long as you get the North American edition, or ignore the practice / homework problems. This explains the huge disconnect between the textbook's reputation and wide use vs. the serious and obvious (to people familiar with x86-64 asm) errors like this one that go beyond sloppy into don't-know-the-language territory.

如果Intel或AMD 已经dividiv引入了一种新的便捷形式,则他们将其设计为使用单宽度除数,因为这是编译器始终使用的方式它.

If Intel or AMD had introduced a new convenient forms for div or idiv, they would have designed it to use a single-width dividend because that's how compilers always use it.

大多数语言都像C一样,将+-*/的两个操作数隐式提升为相同的类型,并产生该宽度的结果.当然,如果已知输入很窄,则可以对其进行优化. (例如,使用一个imul r32来实现a * (int64_t)b).

Most languages are like C and implicitly promote both operands for + - * / to the same type and produce a result of that width. Of course if the inputs are known to be narrow that can be optimized away. (e.g. using one imul r32 to implement a * (int64_t)b).

但是如果商溢出,则dividiv会出错,因此在编译int32_t q = (int64_t)a / (int32_t)b时使用单个32位idiv是不安全的.

But div and idiv fault if the quotient overflows so it's not safe to use a single 32-bit idiv when compiling int32_t q = (int64_t)a / (int32_t)b.

编译器始终在DIV之前使用xor edx,edx或在IDIV之前使用cdqcqo进行n/n => n位除法.

Compilers always use xor edx,edx before DIV or cdq or cqo before IDIV to actually do n / n => n-bit division.

使用不仅仅为零或符号扩展的除数进行的实际全角除法只能由内在函数或asm手动完成(因为gcc/clang和其他编译器不知道优化何时安全),或在例如的gcc helper函数中用32位代码进行64位/64位除法. (或以64位代码进行128位除法.)

Real full-width division using a dividend that isn't just zero- or sign-extended is only done by hand with intrinsics or asm (because gcc/clang and other compilers don't know when the optimization is safe), or in gcc helper functions that do e.g. 64-bit / 64-bit division in 32-bit code. (Or 128-bit division in 64-bit code).

因此,最有用的是div/idiv,它也避免了额外的设置RDX的指令,并最大程度地减少了隐式寄存器操作数的数量. (就像 imul r32, r/m32imul r32, r/m32, imm 一样:做普通的非扩展乘法没有隐式寄存器更方便.这是Intel语法,例如手册,目的地优先)

So what would be most helpful is a div/idiv that avoids the extra instruction to set up RDX, too, as well as minimizing the number of implicit register operands. (Like imul r32, r/m32 and imul r32, r/m32, imm do: making the common case of non-widening multiplication more convenient with no implicit registers. That's Intel-syntax like the manuals, destination first)

最简单的方法是执行dst /= src的2操作数指令.或用商和余数替换两个操作数.对3个操作数使用VEX编码,例如 BMI1 andn ,您也许可以拥有
idivx remainder_dst, dividend, divisor.使用第二个操作数时,也是商的输出.或者,您也可以将其余部分写入到RDX中,并使用商的非破坏性目标地址.

The simplest way would be a 2-operand instruction that did dst /= src. Or maybe replaced both operands with quotient and remainder. Using a VEX encoding for 3 operands like BMI1 andn, you could maybe have
idivx remainder_dst, dividend, divisor. With the 2nd operand also an output for the quotient. Or you could have the remainder written to RDX with a non-destructive destination for the quotient.

或更可能针对仅需要商的简单情况进行优化,idivx quot, dividend, divisor,而不将余数存储在任何地方.需要商时,您始终可以使用常规的idiv.

Or more likely to optimize for the simple case where only the quotient is needed, idivx quot, dividend, divisor and not store the remainder anywhere. You can always use regular idiv when you want the quotient.

BMI2 mulx 使用隐式的rdx输入操作数,因为其目的是为了允许带进位的多个dep链进行扩展的精确乘法.因此,它仍然必须产生2个输出.但是,这种idiv的新形式将存在,以节省代码大小和对不会扩展的idiv常规用法的误解.因此,386 imul reg, reg/mem是比较点,而不是BMI2 mulx.

BMI2 mulx uses an implicit rdx input operand because its purpose is to allow multiple dep chains of add-with-carry for extended-precision multiply. So it still has to produce 2 outputs. But this hypothetical new form of idiv would exist to save code-size and uops around normal uses of idiv that aren't widening. So 386 imul reg, reg/mem is the point of comparison, not BMI2 mulx.

IDK(如果也可以引入直接形式的idivx);您仅出于代码大小的原因使用它.乘法逆运算可以更有效地除以常量,因此这种指令在现实世界中几乎没有用例.

IDK if it would make sense to introduce an immediate form of idivx as well; you'd only use it for code-size reasons. Multiplicative inverses are more efficient division by constants so there's very little real-world use-case for such an instruction.

这篇关于CS:APP示例将idivq与两个操作数一起使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-10 23:31