- 论坛徽章:
- 11
|
编译器乱序叫做优化屏障, 其实就是 asm 关键字前加上一个 volatile, 作为提示之前之后各成一组, 不可跨组排序指令。
至于第二个问题, 我只能针对 arm 解释, x86 等没看过; 以下是 arm 手册表述:
Data Memory Barrier (DMB)
The DMB instruction is a data memory barrier. The processor that executes the DMB instruction is referred to as the executing processor, Pe. The DMB instruction takes the required shareability domain and required access types as arguments. If the required shareability is Full system then the operation applies to all observers within the system.
A DMB creates two groups of memory accesses, Group A and Group B:
Group A
Contains:
• All explicit memory accesses of the required access types from observers in the same required shareability domain as Pe that are observed by Pe before the DMB instruction. These accesses include any accesses of the required access types and required shareability domain performed by Pe.
• All loads of required access types from observers in the same required shareability domain as Pe that have been observed by any given observer, Py, in the same required shareability domain as Pe before Py has performed a memory access that is a member of Group A.
Contains:
• All explicit memory accesses of the required access types by Pe that occur in program
order after the DMB instruction.
• All explicit memory accesses of the required access types by any given observer Px in the same required shareability domain as Pe that can only occur after Px has observed a store that is a member of Group B.
Group B
Any observer with the same required shareability domain as Pe observes all members of Group A before it observes any member of Group B to the extent that those group members are required to be observed, as determined by the shareability and cacheability of the memory locations accessed by the group members. Where members of Group A and Group B access the same memory-mapped peripheral, all members of Group A will be visible at the memory-mapped peripheral before any members of Group B are visible at that peripheral.
因此, wmb 套上屏障的描述, 产生的语意就是保证屏障前后在写操作完成的顺序是按代码顺序来的。 但不保证被观察到的顺序也是按顺序来的, 这就是为什么需要 rmb 的原因, rmb 保证观察顺序与代码顺序的一致性; 因此, 如果能观察到 flag 的更改, 则肯定能观察到 data 的更改, 因为 wmb 保证了 data 的更改一定比 flag 的更改先完成。
假设 data 和 flag 分别保存在 cache 0 中的不同列,如果 data 所在列比较繁忙,那么相应的更新消息有可能会比 flag 发送得晚,导致 cpu 1 先收到 flag 的值,这样 cpu 1 下一步就死翘翘了。
从 MESI 的角度, 似乎这个描述也有问题吧, 我也不是很懂 MESI ,只是猜测;
首先, write 方是 CPU 0, CPU 0 在写之前, 发送RFO 消息, 这个消息invalidate 其他 CPU 的cache, 然后才写; 在 invalidate 完成之前, 他应该会等待一个特定延时, 以保证所有 CPU 全部收到了这个消息; 而一旦写到 cache, 它并没有发什么更新消息, 而是其他 CPU 在使用值之前, 主动查询, 由于 wmb 保证在 CPU 0 本地 cache, data 更新一定在 flag 前完成, 而 rmb 保证 其他CPU 一定是先查询flag, 那么一旦 flag 查询结果是更新后的值, 自然后继的 data 查询不可能查到旧值。
|
|