问题描述
关于L1D,L2是非唯一性,这是有据可查的事实,这意味着L2不必包含L1DCache拥有的所有行.
It's a well-documented fact that L2 is non-unclusive with respect to L1D meaning that L2 does not have to contain all lines L1DCache has.
是否也会错过L2的L1d缺失(读,RFO)填充L1d线而不填充相应的L2线?英特尔公司对此有何解释?更新:有.英特尔第3卷,有关内存类型的部分.
Can L1d miss (Read, RFO) that also misses L2 fill the L1d line without filling the corresponding L2 line? Is there any explanation of that in Intel mans? Update: There is. Intel Vol.3, section about memory type.
或以另一种方式改写问题:缺少L2的查找是否总是导致其行被填充?
Or rephrasing the question in another way: Does a lookup missing L2 always cause its line to be filled?
经过一番挖掘,我自己找到了答案.这是回写内存类型的属性,而不是高速缓存级别
After some digging in I discovered the answer by myself. It is a property of Write-back memory type, not a cache level
推荐答案
答案取决于 我们可以放心地假设,除非另行指定(独占缓存器或受害缓存器),否则读取-分配发生在任何缓存级别中.
The answer depends on the cache inclusion policy of the outer caches. We can safely assume that read-allocate happens in any cache level unless otherwise specified (exclusive or victim cache).
在Intel上,NT预取可以绕过L2(例如,在包含L3的Intel CPU上仅填充L1d和L3的一种方式),但是正常需求负载是通过L2提取的,并在L2中进行分配以及L1d .(以及SW预取而不是 prefetchnta
)
On Intel, NT prefetch can bypass L2 (just filling L1d and a single way of L3, for example, on Intel CPUs with inclusive L3), but normal demand loads are fetched through L2 and do allocate in L2 as well as L1d. (And SW prefetch other than prefetchnta
)
以上适用于大多数CPU(NINE L2).但是某些微架构具有唯一的L2/L1d,因此没有,仅在开始时才在L1d中分配,并且线路移至L2.与英特尔相比,AMD在独家缓存方面进行了更多的尝试.
The above applies to most CPUs (NINE L2). But some microarchitectures have exclusive L2/L1d and thus no, only allocating in L1d at first, with the line moving to L2. AMD has experimented more with exclusive caches than Intel.
AMD已使用独家和/或受害者缓存,例如Zen的每CCX L3是该4核复合体中L2缓存的牺牲品缓存( https://en.wikichip.org/wiki/amd/microarchitectures/zen#Memory_Hierarchy , https://www.anandtech.com/show/11170/the-amd-zen-and-ryzen-7-review-dive-dive on 1800x-1700x-and-1700/9 .Skylake-X/Cascade Lake的非包含L3也是L2的牺牲品缓存.
AMD has built some CPUs with exclusive and/or victim caches, e.g. Zen's per-CCX L3 is a victim cache for the L2 caches in that complex of 4 cores (https://en.wikichip.org/wiki/amd/microarchitectures/zen#Memory_Hierarchy, https://www.anandtech.com/show/11170/the-amd-zen-and-ryzen-7-review-a-deep-dive-on-1800x-1700x-and-1700/9). Skylake-X / Cascade Lake's non-inclusive L3 is also a victim cache for L2.
在那些CPU中,读取不在L3中分配,只有L2和L1d分配.(或者使用L1i进行代码提取).
In those CPUs, reads don't allocate in L3, only L2 and L1d. (Or L1i for code fetches).
巴塞罗那(又名K10)具有一个共享的L3和一个L1/L2,彼此互斥(来源:大卫·坎特的出色著作).因此,在K10上,是的,在L1d中分配的行肯定不会在L2中分配.从L1d撤出以为新生产线腾出空间的生产线通常会移至L2,从而从L2撤出较旧的生产线.
Barcelona (aka K10) has a shared L3, and an L1/L2 that are exclusive of each other (source: David Kanter's excellent writeup). So on K10, yes a line allocated in L1d will definitely not be allocated in L2. The line evicted from L1d to make room for the new line will typically be moved to L2, evicting an older line from L2.
K8具有与L1d相同的L2,但没有共享的L3.
K8 had the same L2 exclusive of L1d, but no shared L3.
与此相关:哪种缓存映射技术是在Intel Core i7处理器中使用的吗?
Intel的vol.3手册只是对未来的证明而已.这只是保证它将被缓存在缓存层次结构中的某个位置.
Intel's vol.3 manual is just abstract guarantees that are future proof. That's only guaranteeing that it will be cache somewhere in the cache hierarchy.
对于预期在同一行中其他读入将包含在L1d中的任何理智的设计(立即的空间局部性很常见).但是根据设计的不同,它不必立即包含L2甚至L3.也就是说,这并不意味着所有级别.
For any sane design that will include in L1d in anticipation of other reads of the same line (immediate spatial locality is very common). But it doesn't have to include L2 or even L3 right away, depending on the design. i.e. it doesn't mean all levels.
x86不保证任何有关具有多个级别的高速缓存的内容.(或者甚至有一个 高速缓存,但ISA文档中有关高速缓存作为RAM模式的部分以及类似的内容除外.)这些文档是在假设CPU至少具有2个级别的情况下编写的,因为自从P6(和P5带有提供L2高速缓存的主板)以来就是这种情况,但是诸如 clflush
之类的任何内容都应理解为假设有高速缓存".
x86 doesn't guarantee anything on paper about having more than one level of cache. (Or even that there is a cache, except for the parts of the ISA docs about cache-as-RAM mode and stuff like that.) The docs are written assuming a CPU with at least 2 levels because that's been the case since P6 (and P5 with motherboards that provided an L2 cache), but anything like clflush
should be read as "assuming there is a cache".
这篇关于L2线填充是否总是在查找时触发?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!