问题描述
根据英特尔64和IA-32架构优化参考手册,
2012年4月第2-23页
According to "Intel 64 and IA-32 architectures optimization reference manual,"April 2012 page 2-23
我的计算机是2核Sandy Bridge,带有3 MB,12路组关联LLC缓存。但是,这似乎与英特尔的文档不一致。根据数据,看来我应该有24路。我可以想象核心/缓存切片的数量正在发生变化,但我还不太清楚。如果我有2个核心,因此每个切片有2个1.5 MB的缓存切片,那么根据Intel,每个缓存切片将有12种方式,这似乎与我的CPU规格不一致。有人可以向我澄清一下吗?
My computer is a 2-core Sandy Bridge with a 3 MB, 12-way set associative LLC cache. That does not seem to be coherent with Intels documentation though. According to the data it seems that I should have 24-ways. I can imagine there is something going on with the number of cores/cache-slices but I can't quite figure it out. If I have 2 cores and hence 2 cache slices 1.5 MB per slice, I would have 12 ways per cache slice according to Intel and that does not seem consistent with my CPU specs. Can someone clarify this to me?
如果我想逐出整个缓存行,我需要以128 KB或256 KB的步幅访问缓存吗?实际上,这就是我要实现的目标。
If I wanted to evict an entire cache line would I need to access the cache in strides of 128 KB or 256 KB? In fact this is what I am trying to achieve.
任何建议的阅读方法都非常受欢迎。
Any suggested readings are very welcome.
推荐答案
关联性与切片数或哈希函数完成的映射正交。如果将给定地址映射到某个缓存切片(以及其中的给定集合),则它只能与映射到同一位置的其他行竞争方式。
具有2个切片不会提高关联性,它只会减少争用(因为最终最终将行均匀地分布在更多的集合上)。
Associativity is orthogonal to the number of slices or to the mapping done by the hash function. If a given address is mapped to some cache slice(and a given set within it), it can only compete over the ways with other lines that were mapped to the same place.Having 2 slices does not raise associativity, it only reduces the contention (since lines are evenly distributed over more sets eventually).
因此,每行有12种方法切片,但每组的整体关联性仍为12种方式。
Therefore you have 12 ways per slice, but the overall associativity per set is still 12 ways.
如果要通过访问映射到同一集合的不同行来测试关联性,您将很难选择这样的行(您需要了解哈希函数),但在12行之后您仍然会遇到麻烦。
但是,如果您忽略哈希,并假设行仅由其设置的位映射,则我可能会出现,就好像您具有较高的关联性,只是因为行会在行之间均匀地划分切片,因此打动将花费更长的时间。这不是真正的关联性,但出于某些实际目的而接近。不过,只有在使用较大的物理内存范围的情况下,此方法才有效,因为需要更改高位以使散列产生任何影响。
If you were to test your associativity by accessing different lines mapped to the same set, you will just have a harder time picking such lines (you'll need to know the hash function), but you're still going to get thrashing after 12 lines.However, if you were to ignore the hashing, and assume lines are simply mapped by their set bits, I could appear as if you have higher associativity simply because the lines would divide uniformly between the slices, so thrashing would take longer. This isn't real associativity, but it comes close for some practical purposes. It would only work if you're using a wide physical memory range though, since the upper bits need to change for the hashing to make any impact.
这篇关于根据英特尔的说法,我的缓存应该是12路的,但应该是24路的。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!