论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2012-06-20 12:38 |只看该作者 |正序浏览

比如在函数里面，声明一个1MB大小的buffer，会影响cpu缓存的命中率吗？

void test(char* data)
{
char buf[1024][1024];
//then do something with 'data' and 'buf'

}

如果buf只声明而基本不使用，会影响cpu缓存吗？

文库|博客

oooooxxxxx

丰衣足食

论坛徽章:: 0

23楼 [报告]

发表于 2012-06-23 12:12 |只看该作者

如果你觉得需要考虑cache了，那么就不要那么写。在栈上分大内存有几个问题，首先是可能的cache问题，其次是os的虚拟内存页面预测可能会失败，还有就是不小心会栈溢出，因为你很难保证栈到底有多大，你的调用层数到底有多深。从堆上分，可能失败，这时候你可以选择清理后退出或者其他动作，但是在栈上失败以后你连什么死掉都不知道。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

starwing83

巨富豪门

论坛徽章:: 5

22楼 [报告]

发表于 2012-06-21 00:23 |只看该作者

回复 7# 塑料袋

不对，你这句话里混淆了两个概念，说”缓存优化“的是一般意义的”一维数组“，即“标量的向量”（scalar's vector），而你后来证明时采用的一维数组是指C语言里面的数组，允许数组本身作为元素。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

塑料袋

白手起家

论坛徽章:: 4

21楼 [报告]

发表于 2012-06-21 00:04 |只看该作者

我也不清楚x86的预读原理，但可肯定的是比mips,arm等复杂的多。

预读分cpu从cache预取数据，以及cache从内存预取数据。
cpu自cache的预取，从mips的设计文档看，1维和100维没什么区别

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wwwsq

富足长乐

论坛徽章:: 0

20楼 [报告]

发表于 2012-06-20 22:31 |只看该作者

本帖最后由 wwwsq 于 2012-06-20 22:39 编辑

塑料袋发表于 2012-06-20 21:10
预读比你想象的要复杂的多，要考虑太多问题。

简单说，就两大类：

“顺序读，交织读”，靠不靠谱的？不要忽悠我啊。。。。

另外，Intel sandy bridge xeon CPU有复杂的读预测逻辑么？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

塑料袋

白手起家

论坛徽章:: 4

19楼 [报告]

发表于 2012-06-20 21:10 |只看该作者

wwwsq 发表于 2012-06-20 18:21
一维数组，如果是连续读的，那么是cache friendly的，cpu会prefetch。

比如：

预读比你想象的要复杂的多，要考虑太多问题。

简单说，就两大类：

1) CPU并没有什么复杂的读预测逻辑：由于pipeline的存在，读指令解码完==>读操作真正应该执行，中间的这段时间有预读的发生。解码完的读指令，本身就是对预读的启示，指出了应该预读那个地址。这种情况下即使800维数组都有预读，而且这种情况是RISC的绝大多数情况。

2) CPU有非常复杂的的读预测逻辑：这种情况，我不大了解。听说有的CISC连非常逆天的预写都出来了，不知道真假。但是既然这种预读逻辑非常复杂，他不可能只支持简单的顺序读，至少和kernel预读磁盘至内存一样，顺序读，交织读.......

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wwwsq

富足长乐

论坛徽章:: 0

18楼 [报告]

发表于 2012-06-20 20:20 |只看该作者

本帖最后由 wwwsq 于 2012-06-20 20:20 编辑

folklore 发表于 2012-06-20 18:35
回复 6# wwwsq

明白了。谢谢~~

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

folklore

巨富豪门

论坛徽章:: 59

17楼 [报告]

发表于 2012-06-20 18:37 |只看该作者

回复 15# wwwsq

it depend on the cache line size.

in many case, what you think is right. if it is a must to answer "right" or "wrong" about it.

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

folklore

巨富豪门

论坛徽章:: 59

16楼 [报告]

发表于 2012-06-20 18:35 |只看该作者

回复 6# wwwsq

to cpu, there are no difference between the 2/1 dim array.

in may case, the cache replacement policy are base on the actaul address of memory.
my english is poor, so , use the the following figure to descript what i means.

of course it is just a "as is" model (i use that the easiest understand model), not all cpu act like this.

the cpu cache are split into 128 byte blocks. the cache capacility is 128K, so that there are 1K cache lines available

+------+ cache line 1: 0~127
+------+ cache line 2: 128~255
+------+ cache line 3: 256~...

and the cache replacement policy is just use the following mapping function
aa= actual address
ca=cache line address

ca=aa&(1<<10+7) ; modeling to 128K ,the cache size
and if the cache not be hit, read one cache line arround the accessed memory.

then if the memory 0 be access: via code int a=*(int*)0;
and if the cpu found that the memory is not be hit in cache, cpu would load ((char*)0)[0...127] to cache line 0;
then the program may access the address around 0 such as 3,4,10,..., all the above may be found that they are allready stay in
the cache. so that the program may be speed up by caching.

so that if no memory access happen, nothing happened about the cache.
but if you do any accession of the memory , the cpu cache line may be changed(if this access is not hit the cache).

no worry about that you access object is the 2 or 1 dim of array, the caching policy can't understand what they are.
it just base on the memory address. (if no more complex cache memory policy applied)

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

wwwsq

富足长乐

论坛徽章:: 0

15楼 [报告]

发表于 2012-06-20 18:21 |只看该作者

塑料袋发表于 2012-06-20 16:53
这个问题是不是应该这样考虑，我们的结论是认为1维数组cpu cache friendly的说法不成立

一维数组，如果是连续读的，那么是cache friendly的，cpu会prefetch。

比如：
char ay[10000];
for (i = 0; i < 10000; i++)
{
sum += ay;
}
cpu将会去预读ay，从而提高cpu cache命中率。

二维数组的访问方式一般是跳着读写的，就没法预读了。那是不是访问二维数组的cpu cache命中率会非常低？
char aa[100][1000];
for (i = 0; i < 100; i++)
{
sum += aa;
}
由于aa[0]和aa[1]不是连续的，cpu的预读机制不能发挥作用。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

12 3 / 3 页下一页

返回列表

Chinaunix › 论坛 › 程序设计 › C/C++ › 在stack上分配大块内存，是否会影响效率？

在stack上分配大块内存，是否会影响效率？ [复制链接]