本文介绍了微架构归零通过寄存器更名寄存器的:性能与一个MOV?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读的,最近的X86微架构还能够处理常见的寄存器清零成语(如异或与自身的寄存器)登记更名;在笔者的话:

Does anybody know how this works in practice? I know that some ISAs, like MIPS, contain an architectural register that is always set to zero in hardware; does this mean that internally, the X86 microarchitecture has similar "zero" registers internally that registers are mapped to when convenient? Or is my mental model not quite correct on how this stuff works microarchitecturally?

The reason why I am asking is because (from some observation) it seems that a mov from one register containing zero to a destination, in a loop, is still substantially faster than zeroing the register via xor within the loop.

Basically what it happening is that I would like to zero a register within a loop depending on a condition; this can either be done by allocating an architectural register ahead of time to store zero (%xmm3, in this case), which is not modified for the entire duration of the loop, and executing the following within it:

or instead with the xor trick:

(Both AT&T syntax).

In other words choice is between hoisting a constant zero outside of the loop or rematerializing it within it for each iteration. The latter reduces the number of live architectural registers by one, and, with the supposed special case awareness and handling of the xor idiom by the processor, it seems like it ought to be as fast as the former (especially since these machines have more physical registers than architectural registers anyway, so it should be able to internally do the equivalent to what I've done in the assembly by hoisting out the constant zero or even better, internally, with full awareness and control over its own resources). But it doesn't seem to be, so I'm curious if anyone with CPU architecture knowledge can explain if there's a good theoretical reason for that.

解决方案

Executive summary: You can run up to four xor ax, ax instructions per cycle as compared to the slower mov immediate, reg instructions.

Details and references:

Wikipedia has a nice overview of register renaming in general: http://en.wikipedia.org/wiki/Register_renaming

Torbj¨orn Granlund's timings forinstruction latencies and throughput forAMD and Intel x86 processors are at: http://gmplib.org/~tege/x86-timing.pdf

Agner Fog nicely covers the specifics in his Micro-architecture study:

这篇关于微架构归零通过寄存器更名寄存器的:性能与一个MOV?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-01 20:56