问题描述
一个矩阵 A [i] [j]的
中给出。如果我们想要添加矩阵,哪一种方法更好的元素,为什么?
A matrix A[i][j]
is given. If we want to add the elements of the matrix, which method is better and why?
- 智慧列
- 行明智
从我的角度来看,排明智的更好,因为阵列重新presentation元素存储在连续的内存位置,以便访问它们花费较少time.But因为在RAM中读取每个位置需要相同的时间,有什么关系?
From my point of view, row wise is better because in array representation elements are stored in contiguous memory locations so accessing them takes less time.But since in RAM fetching every location takes equal time, does it matter?
推荐答案
的局部性优势
Take advantage of Spatial Locality
在C,矩阵存储在r中。所以,如果你访问的元素 A [I] [J]
,以元素的访问 A [I] [J + 1]
将有可能击中高速缓存。将执行主内存中没有访问。缓存比主内存快,所以访问模式很重要。
In C, matrixes are stored in row-major order. So if you access element a[i][j]
, an access to element a[i][j+1]
will be likely to hit the cache. No access to main memory will be performed. Caches are faster than main memory, so the access pattern does matter.
当然,更多的因素必须考虑到户头,如写访问/读访问,写策略(直写,写回/写分配,无写分配),多级缓存,等等。但是,这似乎是这个问题矫枉过正。
Of course, more factors must be taken into accout, such as write access / read access, write policy (write through, write back / write allocate , no write allocate), multilevel caches, and so on. But that seems an overkill for this question.
有一些有趣的一个分析工具,如,看看它自己。
Have some fun with a profiling tool, such as cachegrind, and see it for yourself.
例如,考虑一个虚拟程序accesing 4MB矩阵。退房的命中率之间的差异上的每个访问模式。
For example, consider a dummy program accesing 4MB matrices. Check out the differences between the miss rates on each access pattern.
列访问
$ cat col_major.c
#include <stdio.h>
int main(){
size_t i,j;
const size_t dim = 1024 ;
int matrix [dim][dim];
for (i=0;i< dim; i++){
for (j=0;j <dim;j++){
matrix[j][i]= i*j;
}
}
return 0;
}
$ valgrind --tool=cachegrind ./col_major
==3228== D refs: 10,548,665 (9,482,149 rd + 1,066,516 wr)
==3228== D1 misses: 1,049,704 ( 968 rd + 1,048,736 wr)
==3228== L2d misses: 1,049,623 ( 893 rd + 1,048,730 wr)
==3228== D1 miss rate: 9.9% ( 0.0% + 98.3% )
==3228== L2d miss rate: 9.9% ( 0.0% + 98.3% )
一行访问
$ cat row_major.c
#include <stdio.h>
int main(){
size_t i,j;
const size_t dim = 1024 ;
int matrix [dim][dim];
for (i=0;i< dim; i++)
for (j=0;j <dim;j++)
matrix[i][j]= i*j;
return 0;
}
$ valgrind --tool=cachegrind ./row_major
==3524== D refs: 10,548,665 (9,482,149 rd + 1,066,516 wr)
==3524== D1 misses: 66,664 ( 968 rd + 65,696 wr)
==3524== L2d misses: 66,583 ( 893 rd + 65,690 wr)
==3524== D1 miss rate: 0.6% ( 0.0% + 6.1% )
==3524== L2d miss rate: 0.6% ( 0.0% + 6.1% )
这篇关于访问矩阵的元素在行与列明智的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!