Chinaunix

标题: 内存屏障问题 [打印本页]

作者: yshwuxian 时间: 2016-06-17 12:45
标题: 内存屏障问题
在intel cpu上，防止写读乱序为什么只能用mfence不能用sfence+lfence？这两者有何区别？
个人理解是:
store
sfence
lfence
load
在这个序列中cpu可能把lfence乱序到store之前，不知道这个理解对不对？

作者: yshwuxian 时间: 2016-06-17 13:01
顺便还有一个问题，有一个对性能要求非常高的程序，有两个线程，共享一个全局数据表（链表、数组），需要同步操作（都要进行读写增删改查找）
但是，线程1操作频率非常高，线程2操作频率非常低，实际用于同步等待的概率非常小，大部分情况都是线程1在无意义的加锁
有什么方式能避免这种加锁的开销？
现在用的是spinlock，其频繁加锁的开销依然不能接受。

作者: codechurch 时间: 2016-06-17 13:18
对于release操作
sfence
store A
是为了防止，在sfence之前的store越到store A之后。如果越过，则release就是错误的。
但，如果规避了编译器重排序（用编译器的定序方法），x86的不会进行这种重排序。所以没必要加。

对于acquire操作
load A
lfence
是为了防止，之后的load重排序到load A之前，如果越过，也是错误的。
但，如果规避了编译器重排序（用编译器的定序方法），x86的不会进行这种重排序。所以没必要加。

x86唯一会重排序的是
store A
load B
cpu可能会执行为
load B
store A
但是，这种调换并不破坏release和acquire的语义。
所以，多数时候根本无需关心。（只需规避编译器重排序！！！）

还有一点是需要注意的：
x86 没有“核缓存一致性协议”，所以在同步时，必须要使用导致刷新缓存的指令，如 xchg 还有在内存操作指令前加前缀lock 。
x64 有了核缓存一致性协议，连这个都不需要担心。只要写/读是自然对齐的，那么操作就是原子的，且改动是原子同步到其他核的。

作者: codechurch 时间: 2016-06-17 13:27

yshwuxian 发表于 2016-06-17 13:01
顺便还有一个问题，有一个对性能要求非常高的程序，有两个线程，共享一个全局数据表（链表、数组），需要同 ...

用pthread mutex就可以，已经做了“避让”优化，但无需等待时（直接获得资源），锁退化为一条内存操作。
这已经非常高效了，无需再优化，也基本无法优化。

作者: yshwuxian 时间: 2016-06-19 17:25

codechurch 发表于 2016-06-17 13:18
对于release操作
sfence
store A

你说的我基本知道，我想问不考虑编译器，sfence+lfence是否等于mfence
我在x86_64下进行多核peterson算法的实验，发现只能用mfence

作者: yshwuxian 时间: 2016-06-19 17:29

codechurch 发表于 2016-06-17 13:27
用pthread mutex就可以，已经做了“避让”优化，但无需等待时（直接获得资源），锁退化为一条内存操作 ...

我那个程序，单次循环里进行任何锁总线操作的性能损失都不能接受，现在改成加锁一次循环n次可以接受了，因为另一个线程性能要求不高

作者: lxyscls 时间: 2016-06-20 09:41
本帖最后由 lxyscls 于 2016-10-24 17:19 编辑

回复 5# yshwuxian

讲错了，X86不存在load-load,store-store,load-store乱序，lfence,sfence,mfence并不和“防止”这几种乱序对应。

作者: lxyscls 时间: 2016-10-24 17:32
回复 5# yshwuxian

lfence

Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruc-
tion. Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruc-
tion begins execution until LFENCE completes. In particular, an instruction that loads from memory and that
precedes an LFENCE receives data from memory prior to completion of the LFENCE. (An LFENCE that follows an
instruction that stores to memory might complete before the data being stored have become globally visible.)
Instructions following an LFENCE may be fetched from memory before the LFENCE, but they will not execute until
the LFENCE completes.

sfence

Performs a serializing operation on all store-to-memory instructions that were issued prior the SFENCE instruction.
This serializing operation guarantees that every store instruction that precedes the SFENCE instruction in program
order becomes globally visible before any store instruction that follows the SFENCE instruction. The SFENCE
instruction is ordered with respect to store instructions, other SFENCE instructions, any LFENCE and MFENCE
instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to load
instructions.

mfence

Performs a serializing operation on all load-from-memory and store-to-memory instructions that were issued prior
the MFENCE instruction. This serializing operation guarantees that every load and store instruction that precedes
the MFENCE instruction in program order becomes globally visible before any load or store instruction that follows
the MFENCE instruction.1 The MFENCE instruction is ordered with respect to all load and store instructions, other

MFENCE instructions, any LFENCE and SFENCE instructions, and any serializing instructions (such as the CPUID
instruction). MFENCE does not serialize the instruction stream.

sfence只保证 sfence前的store被之后的store看到，并没有保证被sfence之后的load看到，那么你即便后面跟上lfence也无济于事。

作者: lxyscls 时间: 2016-10-24 17:49
回复 3# codechurch

AMD64，又称“x86-64”或“x64”，是一种64位元的电脑处理器架构。它是建基于现有32位元的x86架构，由AMD公司所开发，应用AMD64指令集的自家产品有Athlon（速龙） 64、Athlon 64 FX、Athlon 64 X2、Turion（炫龙） 64、Opteron（皓龙）、Sempron（闪龙）、Phenom（羿龙）及最新的Phenom II、Athlon II处理器。

如果我这个搜得没错的话...
Memory ordering

AMD64 同样是StoreLoad reordered的，IA-64更relaxed了

作者: codechurch 时间: 2016-10-25 17:28
回复 9# lxyscls

我说的是有些问题，
你看AMD64和X86之间的区别，只在最后一项：指令缓冲管线非一致性。我说成，数据相关的核心缓存非一致性了。

我更正：

X86和AMD64都是如此的，即，
如果存/取指令的操作数地址是自然对齐的，那么操作就是原子的。

二者区别在于，“自修改代码”所造成核之间的指令缓存不一致的现象。

作者: mr_sev 时间: 2016-11-04 17:49
rwlock

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)