本文介绍了RS调度,完成或其他时间是否从RS释放了负载操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在现代Intel x86上,它们在 dispatch ,或他们 complete ,还是介于之间的某个地方?

On modern Intel x86, are load uops freed from the RS (Reservation Station) at the point they dispatch, or when they complete, or somewhere in-between?

我也对AMD Zen和续集感兴趣,因此也可以将其包括在内,但是为了使问题易于处理,我将其限于Intel.而且,AMD的负载管道似乎与英特尔有所不同,这可能会使在AMD上进行调查成为一项单独的任务.

I am also interested in AMD Zen and sequels, so feel free to include that too, but for the purposes of making the question manageable I limit it to Intel. Also, AMD seems to have a somewhat different load pipeline from Intel which may make investigating this on AMD a separate task.

此处的调度意味着将RS留给执行.

Dispatch here means leave the RS for execution.

这里的完成是指加载数据返回并准备满足相关的uops时.

Complete here means when the load data returns and is ready to satisfy dependent uops.

或什至在这两个事件定义的时间范围之外的某个地方,这似乎不太可能,但有可能.

Or even somewhere outside of the range of time defined by these two events, which seems unlikely but possible.

推荐答案

以下实验表明,在完成加载之前,将uops释放.虽然这并不是您问题的完整答案,但它可能会提供一些有趣的见解.

The following experiments suggest that the uops are deallocated at some point before the load completes. While this is not a complete answer to your question, it might provide some interesting insights.

在Skylake上,有33个条目的预订站用于装载货物(请参见 https://stackoverflow.com/a/58575898/10461973 ).对于Coffee Lake i7-8700K也是如此,该咖啡用于以下实验.

On Skylake, there is a 33-entry reservation station for loads (see https://stackoverflow.com/a/58575898/10461973). This should also be the case for the Coffee Lake i7-8700K, which is used for the following experiments.

我们假定R14包含有效的内存地址.

We assume that R14 contains a valid memory address.

clflush [R14]
clflush [R14+512]
mfence

# start measuring cycles

mov RAX, [R14]
mov RAX, [R14]
...
mov RAX, [R14]

mov RBX, [R14+512]

# stop measuring cycles

mov RAX, [R14]展开35次.在此系统上,来自内存的负载至少需要280个周期.如果负载一直停留在33个入口的保留站中直到完成,则最后的负载只能在280个以上的循环之后开始,并且还需要大约280个循环.但是,此实验的总测量时间仅为约340个循环.这表明负载在完成之前的某个时间离开了RS.

mov RAX, [R14] is unrolled 35 times. A load from memory takes at least about 280 cycles on this system. If the load uops stayed in the 33-entry reservation station until completion, the last load could only start after more than 280 cycles and would need another ~280cycles. However, the total measured time for this experiment is only about 340 cycles. This indicates that the load uops leave the RS at some time before completion.

相反,以下实验显示了一种情况,其中大多数uops被迫停留在保留空间中,直到第一个加载完成:

In contrast, the following experiments shows a case where most uops are forced to stay in the reservation until the first load completes:

mov RAX, R14
mov [RAX], RAX
clflush [R14]
clflush [R14+512]
mfence

# start measuring cycles

mov RAX, [RAX]
mov RAX, [RAX]
...
mov RAX, [RAX]

mov RBX, [R14+512]

# stop measuring cycles

现在,前35个加载相互依赖.该实验的测量时间约为600个循环.

The first 35 loads now have dependencies on each other. The measured time for this experiment is about 600 cycles.

在禁用一个内核的情况下进行了所有实验,并将CPU调节器设置为性能(cpupower frequency-set --governor performance).

The experiments were performed with all but one core disabled, and with the CPU governor set to performance (cpupower frequency-set --governor performance).

以下是我使用的 nanoBench 命令:

./nanoBench.sh -unroll 1 -basic -asm_init "clflush [R14]; clflush [R14+512]; mfence" -asm "mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RAX, [R14]; mov RBX, [R14+512]"

./nanoBench.sh -unroll 1 -basic -asm_init "mov RAX, R14; mov [RAX], RAX; clflush [R14]; clflush [R14+512]; mfence" -asm "mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RAX, [RAX]; mov RBX, [R14+512]"

这篇关于RS调度,完成或其他时间是否从RS释放了负载操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-13 08:26