问题描述
我在x86汇编编写了最近(的乐趣),并想知道代表prefixed串指令是否真正具有现代处理器的性能优势,或者如果他们只是为向后兼容性实现的。
i've been writing in x86 assembly lately (for fun) and was wondering whether or not rep prefixed string instructions actually have a performance edge on modern processors or if they're just implemented for back-compatibility.
我可以理解为什么英特尔将有原来执行的指令代表回来时,处理器只跑一次一个指令,但有一个好处,现在使用它们?
i can understand why Intel would have originally implemented the rep instructions back when processors only ran one instruction at a time, but is there a benefit to using them now?
通过一个循环,编译为更多的指令,有更多的填充管道和/或发行外的顺序。现代处理器构建以优化这些REP-prefixed指令,或者是现代code所以很少使用说明代表,他们是并不重要的制造商?
With a loop that compiles to more instructions, there is more to fill up the pipeline and/or be issued out-of-order. Are modern processors built to optimize for these rep-prefixed instructions, or are rep instructions used so rarely in modern code that they're not important to the manufacturers?
推荐答案
有是给在AMD和英特尔的优化导游都这样的问题了大量的空间。在这方面给出建议的有效性有一个半条命 - 不同的CPU世代表现不同,例如:
There is a lot of space given to questions like this in both AMD and Intel's optimization guides. Validity of advice given in this area has a "half life" - different CPU generations behave differently, for example:
- ,8.3节,第167:结果
的避免使用REP preFIX 的执行字符串操作时,尤其是复制的内存块的时候。 - ,第9.3节,第148:结果
的使用REP preFIX明智的执行字符串操作时。
- AMD Software Optimization Guide (Sep/2005), section 8.3, pg. 167:
Avoid using the REP prefix when performing string operations, especially when copying blocks of memory. - AMD Software Optimization Guide (Apr/2011), section 9.3, pg. 148:
Use the REP prefix judiciously when performing string operations.
的Intel结构优化手册给出了上的表7-2各块复制技术(包括代表STOSD
)的性能比较数字。存储器复制例程的,PG的相对表现。 7-37f,对于不同的CPU,并再次什么是最快的在一个可能不是最快的他人。
The Intel Architecture Optimization Manual gives performance comparison figures for various block copy techniques (including rep stosd
) on Table 7-2. Relative Performance of Memory Copy Routines, pg. 7-37f., for different CPUs, and again what's fastest on one might not be fastest on others.
有关许多情况下,近期的x86 CPU(其中有串SSE4.2操作)可通过SIMD单元做的字符串操作,看的。
For many cases, recent x86 CPUs (which have the "string" SSE4.2 operations) can do string operations via the SIMD unit, see this investigation.
要跟进这一切(和/或保持自己的更新时,事情再次发生变化,必然),请阅读。
To follow up on all this (and/or keep yourself updated when things change again, inevitably), read Agner Fog's Optimization guides/blogs.
这篇关于x86架构代表说明,现代(流水线/超标量体系结构)的处理器性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!