mordorwww 发表于 2016-11-04 11:41

内存原子操作 READ_ONCE WRITE_ONCE

本帖最后由 mordorwww 于 2016-11-04 11:43 编辑

kernel 4.7

#ifndef CONFIG_DEBUG_LIST
static inline void __list_add(struct list_head *new,
                              struct list_head *prev,
                              struct list_head *next)
{
      next->prev = new;
      new->next = next;
      new->prev = prev;
      WRITE_ONCE(prev->next, new);
}
#else
extern void __list_add(struct list_head *new,
                              struct list_head *prev,
                              struct list_head *next);
#endif



kernel 3.14
#ifndef CONFIG_DEBUG_LIST
static inline void __list_add(struct list_head *new,
                              struct list_head *prev,
                              struct list_head *next)
{
      next->prev = new;
      new->next = next;
      new->prev = prev;
      prev->next = new;
}
#else
extern void __list_add(struct list_head *new,
                              struct list_head *prev,
                              struct list_head *next);
#endif


READ_ONCEWRITE_ONCE 到底什么鬼。查谷歌有说是为了原子操作。锁总线赋值不是原子操作么

_nosay 发表于 2016-11-04 12:36

定义是什么样的?

mordorwww 发表于 2016-11-04 13:37

_nosay 发表于 2016-11-04 12:36
定义是什么样的?

/*
* Prevent the compiler from merging or refetching reads or writes. The
* compiler is also forbidden from reordering successive instances of
* READ_ONCE, WRITE_ONCE and ACCESS_ONCE (see below), but only when the
* compiler is aware of some particular ordering.One way to make the
* compiler aware of ordering is to put the two invocations of READ_ONCE,
* WRITE_ONCE or ACCESS_ONCE() in different C statements.
*
* In contrast to ACCESS_ONCE these two macros will also work on aggregate
* data types like structs or unions. If the size of the accessed data
* type exceeds the word size of the machine (e.g., 32 bits or 64 bits)
* READ_ONCE() and WRITE_ONCE() will fall back to memcpy(). There's at
* least two memcpy()s: one for the __builtin_memcpy() and then one for
* the macro doing the copy of variable - '__u' allocated on the stack.
*
* Their two major use cases are: (1) Mediating communication between
* process-level code and irq/NMI handlers, all running on the same CPU,
* and (2) Ensuring that the compiler does notfold, spindle, or otherwise
* mutilate accesses that either do not require ordering or that interact
* with an explicit memory barrier or atomic instruction that provides the
* required ordering.
*/

#define __READ_ONCE(x, check)                                                \
({                                                                        \
        union { typeof(x) __val; char __c; } __u;                        \
        if (check)                                                        \
                __read_once_size(&(x), __u.__c, sizeof(x));                \
        else                                                                \
                __read_once_size_nocheck(&(x), __u.__c, sizeof(x));        \
        __u.__val;                                                        \
})
#define READ_ONCE(x) __READ_ONCE(x, 1)

mordorwww 发表于 2016-11-04 13:40

看起来和内存屏障有关

/*
* Following functions are taken from kernel sources and
* break aliasing rules in their original form.
*
* While kernel is compiled with -fno-strict-aliasing,
* perf uses -Wstrict-aliasing=3 which makes build fail
* under gcc 4.4.
*
* Using extra __may_alias__ type to allow aliasing
* in this case.
*/
typedef __u8__attribute__((__may_alias__))__u8_alias_t;
typedef __u16 __attribute__((__may_alias__)) __u16_alias_t;
typedef __u32 __attribute__((__may_alias__)) __u32_alias_t;
typedef __u64 __attribute__((__may_alias__)) __u64_alias_t;

static __always_inline void __read_once_size(const volatile void *p, void *res, int size)
{
        switch (size) {
        case 1: *(__u8_alias_t*) res = *(volatile __u8_alias_t*) p; break;
        case 2: *(__u16_alias_t *) res = *(volatile __u16_alias_t *) p; break;
        case 4: *(__u32_alias_t *) res = *(volatile __u32_alias_t *) p; break;
        case 8: *(__u64_alias_t *) res = *(volatile __u64_alias_t *) p; break;
        default:
                barrier();
                __builtin_memcpy((void *)res, (const void *)p, size);
                barrier();
        }
}

_nosay 发表于 2016-11-04 19:43

volatile在编译时就“解决”同步问题(http://bbs.chinaunix.net/forum.php?mod=viewthread&tid=4255813{:qq23:})
barrier()好像就是你说的“内存路障”,执行个啥指令,把高级缓存里面的东西搞到内存里去。

总之就是同步,但又不同于多线程多进程的那种用锁可以解决的同步,这里考虑了硬件、多CPU吧。

nswcfd 发表于 2016-11-05 12:32

在list_add这个场景下,就是简单的赋值,出发点是为了防止编译器过渡的优化。跟多核同步没有(直接)关系?
早期只有ACCESS_ONCE,就是 (*volatile typeof(x) *)&(x)),避免编译器把前后对同一个变量的访问合并。(2.6.32)

不过,根据搜到的一篇文章,https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
感觉这个宏的引入至少可以更清晰的标准出在并发场景下需要关注的地方。

PS,alias跟对同一个变量(地址)用不同的(不兼容的)类型访问有关系。
参考-fstrict-alias(-O2开启), -Wstrict-alais, attribute((__may_alias__))。
个人理解attribute((__may_alias__))是对strict alliasg规则的一种“豁免”?

nswcfd 发表于 2016-11-05 15:42

本帖最后由 nswcfd 于 2016-11-05 16:00 编辑

摘抄几句link上的描述。

As the consequence C compilers 【stopped guarantying that "word accesses are atomic".】 There is a number of ways how compilers can miscompile non-race-free programs, see, for example, ACCESS_ONCE() article and How to miscompile programs with “benign” data races. Some particularly nasty compiler transformations include:
•If code makes a plain write to a variable, then compiler can conclude that there are no concurrent accesses to the variable and use it as scratch storage prior to the write to reduce stack usage. This will make concurrent reads observe garbage values.
•If x is a shared location,x = NULL; ...; x = NULL;can often be correctly "optimized" by removing one of those assignments if you omit theWRITE_ONCE() . Note that this is true even if there is a single critical section between the two. And this optimization 【can cross function boundaries】.
•If code writes to a bit-field, then compiler can introduce writes to adjacent bit-fields that are not present in the code

里面又有两个link:
* ACCESS_ONCE():http://lwn.net/Articles/508991/
It comes down to the fact that the C compiler will, if not given reasons to the contrary, 【assume that there is 【only one】 thread of execution】 in the address space of the program it is compiling. Concurrency is not built into the C language itself, so mechanisms for dealing with concurrent access must be built on top of the language; ACCESS_ONCE() is one such mechanism.

* How to miscompile programs https://www.usenix.org/legacy/ev ... nal_files/Boehm.pdf

mordorwww 发表于 2016-11-05 18:21

本帖最后由 mordorwww 于 2016-11-05 18:26 编辑

nswcfd 发表于 2016-11-05 12:32
在list_add这个场景下,就是简单的赋值,出发点是为了防止编译器过渡的优化。跟多核同步没有(直接)关系? ...
那不是有内存屏障么?内存屏障不是可以防止编译器过渡的优化么?还有volatile之类的指示


nswcfd 发表于 2016-11-14 18:07

那倒是。

新老版本编译出来的汇编指令有没有显著的区别?(新=有WRITE_ONCE)
页: [1]
查看完整版本: 内存原子操作 READ_ONCE WRITE_ONCE