问题描述
x86/x86_64体系结构的每个现代高性能CPU都有一定程度的数据高速缓存:L1,L2,有时甚至是L3(在极少数情况下还包括L4),并且从/向主RAM加载的数据在某些情况下被高速缓存其中.
Every modern high-performance CPU of the x86/x86_64 architecture has some hierarchy of data caches: L1, L2, and sometimes L3 (and L4 in very rare cases), and data loaded from/to main RAM is cached in some of them.
有时候程序员可能希望某些数据不被缓存在某些或所有缓存级别中(例如,当要memset 16 GB RAM并将某些数据保留在缓存中时):有一些非临时性的(NT )像MOVNTDQA( https://stackoverflow.com/a/37092 http://lwn.net/Articles/255364/)
Sometimes the programmer may want some data to not be cached in some or all cache levels (for example, when wanting to memset 16 GB of RAM and keep some data still in the cache): there are some non-temporal (NT) instructions for this like MOVNTDQA (https://stackoverflow.com/a/37092 http://lwn.net/Articles/255364/)
但是有没有一种编程方式(对于某些AMD或Intel CPU系列,如P3,P4,Core,Core i *,...)完全(但暂时)关闭部分或所有级别的缓存以进行更改每个内存访问指令(全局或某些应用程序/RAM区域)如何使用内存层次结构?例如:关闭L1,关闭L1和L2?或将每种内存访问类型更改为未缓存的UC"(CR0的CD + NW位??? SDM vol3a页面 423 424 , 425 和"三级缓存禁用"标志,IA32_MISC_ENABLE MSR的位6(仅在基于Intel NetBurst微体系结构的处理器中可用)–允许独立于L1和L2缓存来禁用和启用L3缓存.").
But is there a programmatic way (for some AMD or Intel CPU families like P3, P4, Core, Core i*, ...) to completely (but temporarily) turn off some or all levels of the cache, to change how every memory access instruction (globally or for some applications / regions of RAM) uses the memory hierarchy? For example: turn off L1, turn off L1 and L2? Or change every memory access type to "uncached" UC (CD+NW bits of CR0??? SDM vol3a pages 423 424, 425 and "Third-Level Cache Disable flag, bit 6 of the IA32_MISC_ENABLE MSR (Available only in processors based on Intel NetBurst microarchitecture) — Allows the L3 cache to be disabled and enabled, independently of the L1 and L2 caches.").
我认为这样的动作将有助于保护数据免受高速缓存侧通道的攻击/泄漏,例如窃取AES密钥,秘密高速缓存通道,Meltdown/Spectre.尽管禁用此功能会带来巨大的性能损失.
I think such action will help to protect data from cache side channel attacks/leaks like stealing AES keys, covert cache channels, Meltdown/Spectre. Although this disabling will have an enormous performance cost.
PS:我记得很多年前在某个技术新闻网站上发布过这样的程序,但是现在找不到它.仅仅是Windows exe,它可以将一些神奇的值写入MSR,并使每个Windows程序运行起来都非常慢.缓存将一直关闭,直到重新启动或使用撤消"选项启动程序为止.
PS: I remember such a program posted many years ago on some technical news website, but can't find it now. It was just a Windows exe to write some magical values into an MSR and make every Windows program running after it very slow. The caches were turned off until reboot or until starting the program with the "undo" option.
推荐答案
英特尔手册3A 第11.5.3节提供了一种在全球范围内 禁用缓存:
The Intel's manual 3A, Section 11.5.3, provides an algorithm to globally disable the caches:
要在启用L1,L2和L3缓存并收到缓存填充后禁用它们,请执行以下步骤:
To disable the L1, L2, and L3 caches after they have been enabled and have received cache fills, perform the following steps:
- 进入不填充缓存模式. (将控制寄存器CR0中的CD标志设置为1,将NW标志设置为0.
- 使用WBINVD指令刷新所有缓存.
- 禁用MTRR,并将默认内存类型设置为未缓存,或将所有MTRR设置为未缓存的内存 类型(请参见第11.11.2.1节中有关TYPE字段和E标志的讨论, "IA32_MTRR_DEF_TYPE MSR").
- Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0.
- Flush all caches using the WBINVD instruction.
- Disable the MTRRs and set the default memory type to uncached or set all MTRRs for the uncached memory type (see the discussion of the discussion of the TYPE field and the E flag in Section 11.11.2.1, "IA32_MTRR_DEF_TYPE MSR").
设置CD标志后,必须刷新缓存(第2步)以确保系统内存一致性.如果缓存是 如果不刷新,仍会发生读取时缓存命中,并且将从有效缓存行读取数据.
The caches must be flushed (step 2) after the CD flag is set to ensure system memory coherency. If the caches are not flushed, cache hits on reads will still occur and data will be read from valid cache lines.
上面列出的三个单独步骤的目的是解决三个不同的要求:(i)停止新数据 替换缓存中的现有数据(ii)确保已经将缓存中的数据逐出到内存中,(iii)确保后续的内存引用遵守UC内存类型语义.不同处理器的缓存实现 控制硬件可以允许这三个要求的软件实现有所不同.请参阅下面的注释.
The intent of the three separate steps listed above addresses three distinct requirements: (i) discontinue new data replacing existing data in the cache (ii) ensure data already in the cache are evicted to memory, (iii) ensure subsequent memory references observe UC memory type semantics. Different processor implementation of caching control hardware may allow some variation of software implementation of these three requirements. See note below.
注释 设置控制寄存器CR0中的CD标志会修改处理器的缓存行为,如下所示 表11-5中的内容,但仅设置CD标志可能不足以在所有处理器系列中 强制所有物理内存的有效内存类型为UC也不强制使用严格的内存 由于不同处理器系列之间硬件实现的差异而导致订购.强迫 UC内存类型和所有物理内存上严格的内存顺序,只要满足以下条件之一 将所有物理内存的MTRR编程为UC内存类型或禁用所有MTRR.
NOTES Setting the CD flag in control register CR0 modifies the processor’s caching behaviour as indicated in Table 11-5, but setting the CD flag alone may not be sufficient across all processor families to force the effective memory type for all physical memory to be UC nor does it force strict memory ordering, due to hardware implementation variations across different processor families. To force the UC memory type and strict memory ordering on all of physical memory, it is sufficient to either program the MTRRs for all physical memory to be UC memory type or disable all MTRRs.
对于Pentium 4和Intel Xeon处理器,在完成上面给出的步骤序列之后 执行时,高速缓存行包含WBINVD指令的末尾与 在实际禁用MTRRS之前,可以将其保留在缓存层次结构中.在这里,要从缓存中完全删除代码,必须在 MTRR已被禁用.
For the Pentium 4 and Intel Xeon processors, after the sequence of steps given above has been executed, the cache lines containing the code between the end of the WBINVD instruction and before the MTRRS have actually been disabled may be retained in the cache hierarchy. Here, to remove code from the cache completely, a second WBINVD instruction must be executed after the MTRRs have been disabled.
那是一个很长的报价,但归结为这段代码
That's a long quote but it boils down to this code
;Step 1 - Enter no-fill mode
mov eax, cr0
or eax, 1<<30 ; Set bit CD
and eax, ~(1<<29) ; Clear bit NW
mov cr0, eax
;Step 2 - Invalidate all the caches
wbinvd
;All memory accesses happen from/to memory now, but UC memory ordering may not be enforced still.
;For Atom processors, we are done, UC semantic is automatically enforced.
xor eax, eax
xor edx, edx
mov ecx, IA32_MTRR_DEF_TYPE ;MSR number is 2FFH
wrmsr
;P4 only, remove this code from the L1I
wbinvd
大多数不能在用户模式下执行.
most of which is not executable from user mode.
AMD手册2 在7.6.2节中提供了类似的算法
AMD's manual 2 provides a similar algorithm in section 7.6.2
禁用缓存. CR0寄存器的第30位是禁用缓存的位CR0.CD.启用缓存 当CR0.CD清除为0时,并且当CR0.CD设置为1时禁用缓存. 禁用,读写访问主存储器.
Cache Disable. Bit 30 of the CR0 register is the cache-disable bit, CR0.CD. Caching is enabled when CR0.CD is cleared to 0, and caching is disabled when CR0.CD is set to 1. When caching is disabled, reads and writes access main memory.
当高速缓存仍保留有效数据(或指令)时,软件可以禁用高速缓存.如果读或写 当CR0.CD = 1时,命中L1数据缓存或L2缓存,处理器将执行以下操作:
Software can disable the cache while the cache still holds valid data (or instructions). If a read or write hits the L1 data cache or the L2 cache when CR0.CD=1, the processor does the following:
- 如果缓存行处于修改或拥有状态,则将其写回.
- 使缓存行无效.
- 执行不可缓存的主内存访问以读取或写入数据.
- Writes the cache line back if it is in the modified or owned state.
- Invalidates the cache line.
- Performs a non-cacheable main-memory access to read or write the data.
如果在CR0.CD = 1时一条指令提取命中L1指令高速缓存,则某些处理器模型可能会读取 缓存的指令,而不是访问主存储器.当CR0.CD = 1时,L2的确切行为 L3高速缓存与模型有关,并且可能因不同类型的内存访问而有所不同.
If an instruction fetch hits the L1 instruction cache when CR0.CD=1, some processor models may read the cached instructions rather than access main memory. When CR0.CD=1, the exact behavior of L2 and L3 caches is model-dependent, and may vary for different types of memory accesses.
当CR0.CD = 1时,处理器还响应高速缓存探测.命中缓存的探针会导致 处理器执行步骤1.仅当探针为 代表内存写入或互斥读取执行.
The processor also responds to cache probes when CR0.CD=1. Probes that hit the cache cause the processor to perform Step 1. Step 2 (cache-line invalidation) is performed only if the probe is performed on behalf of a memory write or an exclusive read.
禁止写入. CR0寄存器的位29是非写禁止位CR0.NW.在 早期的x86处理器,CR0.NW用于控制缓存的写操作,以及 CR0.NW和CR0.CD确定缓存操作模式.
Writethrough Disable. Bit 29 of the CR0 register is the not writethrough disable bit, CR0.NW. In early x86 processors, CR0.NW is used to control cache writethrough behavior, and the combination of CR0.NW and CR0.CD determines the cache operating mode.
[...]
在AMD64架构的实现中,CR0.NW不用于限定缓存操作 由CR0.CD建立的模式.
In implementations of the AMD64 architecture, CR0.NW is not used to qualify the cache operating mode established by CR0.CD.
这将转换为以下代码(非常类似于Intel的代码):
This translates to this code (very similar to the Intel's one):
;Step 1 - Disable the caches
mov eax, cr0
or eax, 1<<30
mov cr0, eax
;For some models we need to invalidated the L1I
wbinvd
;Step 2 - Disable speculative accesses
xor eax, eax
xor edx, edx
mov ecx, MTRRdefType ;MSR number is 2FFH
wrmsr
也可以在以下位置有选择地禁用缓存:
Caches can also be selectively disabled at:
- 页面级别,具有属性位PCD(禁用页面缓存)[仅适用于Pentium Pro和Pentium II].
如果都清除了相关的MTTR,则将PCD设置为疼痛 - 页面级别,具有PAT(页面属性表)机制.
通过使用缓存类型填充IA32_PAT
并使用PAT,PCD,PWT位作为3位索引,可以选择六种缓存类型之一(UC-,UC,WC,WT,WP,WB). - 使用MTTR(固定或可变).
通过为特定的物理区域将缓存类型设置为UC或UC-.
- Page level, with the attribute bits PCD (Page Cache Disable) [Only for Pentium Pro and Pentium II].
When both are clear the MTTR of relevance is used, if PCD is set the aching - Page level, with the PAT (Page Attribute Table) mechanism.
By filling theIA32_PAT
with caching types and using the bits PAT, PCD, PWT as a 3-bit index it's possible to select one the six caching types (UC-, UC, WC, WT, WP, WB). - Using the MTTRs (fixed or variable).
By setting the caching type to UC or UC- for specific physical areas.
在这些选项中,只有页面属性可以公开给用户模式程序(例如,请参见此).
Of these options only the page attributes can be exposed to user mode programs (see for example this).
这篇关于如何在现代x86/amd64芯片上关闭L1,L2,L3 CPU缓存?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!