本文介绍了多线程和CPU缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用多线程实现C中的图像滤波操作,使其尽可能优化。我有一个问题是:如果一个存储器由线程0访问,并通过线程并发1,如果同一访问存储器,它会从缓存中得到它?这个问题由这两线程可以运行在CPU中的两个不同的内核的可能性茎。所以把这个另一种方式是:不要所有核心共享相同的通用高速缓存?

I am implementing an image filtering operation in C using multiple threads and making it as optimized as possible. I have one question though: If a memory is accessed by thread-0, and concurrently if the same memory is accessed by thread-1, will it get it from the cache ? This question stems from the possibility that these two threads could be running into two different cores of the CPU. So another way of putting this is: do all the cores share the same common cache memory ?

假设我有一个内存布局像下面的

Suppose i have a memory layout like the following

INT输出[100];

int output[100];

假设有2个CPU内核,所以我产卵两个线程并行工作。一种方案可能是内存分为两个块,0-49和50-99,让每个块的每个线程的工作。另一种方式可以是,让在偶数索引的线程0工作,如0 2 4等..而在奇数索引的其他线程工作像1 3 5 ....这以后的技术是更容易实现(专为3D数据),但我不知道我是否可以使用缓存高效这种方式。

Assume there are 2 CPU cores and hence I spawn two threads to work concurrently. One scheme could be to divide the memory into two chunks, 0-49 and 50-99 and let each thread work on each chunk. Another way could be to let thread-0 work on even indices, like 0 2 4 and so on.. while the other thread work on odd indices like 1 3 5 .... This later technique is easier to implement (specially for 3D data) but I am not sure if I could use the cache efficiently this way.

推荐答案

在总体上是一个坏主意,分享重叠一样,如果一个线程处理0,2,4内存区域...和其他进程1,3, 5 ...虽然有些架构可以支持这一点,大多数架构不会,你可能不能指定哪些机器的code将运行。另外,OS是免费的code分配给它喜欢的(单一的一个,两个相同的物理处理器上,或者在不同的处理器两个核心)的核心。还每个CPU通常有一个单独的第一级高速缓存,即使其在同一处理器上

In general it is a bad idea to share overlapping memory regions like if one thread processes 0,2,4... and the other processes 1,3,5... Although some architectures may support this, most architectures will not, and you probably can not specify on which machines your code will run on. Also the OS is free to assign your code to any core it likes (a single one, two on the same physical processor, or two cores on separate processors). Also each CPU usually has a separate first level cache, even if its on the same processor.

在大多数情况下0,2,4 ... / 1​​,3,5 ......会降低性能极为多达可能比单一的CPU速度较慢。
赫伯Sutters 说明了这一点很好。

In most situations 0,2,4.../1,3,5... will slow down performance extremely up to possibly being slower than a single CPU.Herb Sutters "Eliminate False Sharing" demonstrates this very well.

使用计划[... N / 2-1]和[N / 2 ... N]将扩大在大多数系统要好得多。它甚至可能导致超线性性能的总和中所有CPU的缓存大小可以使用可能。使用的线程数应该始终配置的,并且应默认发现处理器核心的数量。

Using the scheme [...n/2-1] and [n/2...n] will scale much better on most systems. It even may lead to super linear performance as the cache size of all CPUs in sum can be possibly used. The number of threads used should be always configurable and should default to the number of processor cores found.

这篇关于多线程和CPU缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-28 23:10