请问：为什么加入__volatile__还是被优化？

netdoger 发表于 2016-07-13 16:21

请问：为什么加入volatile还是被优化？

请看：
#include <stdio.h>
int main()
{
char input={"This is a test message.\n"};
char output;
int length=25;
__asm__ __volatile__("cld\n\t"
"rep movsb"
:
:"S"(input),"D"(output),"c"(length)
);
printf("%s\n",output);
return 0;
}

我是这样编译的：gcc -O asm1.c -o asm1,
可是我运行时，没有输出字符串；
如果我这样编译gcc asm1.c -o asm1
可以输出This is a test message.。
书上说加__volatile__，编译时不会被优化，
可是为什么我加了__volatile__，还是被优化，
请问为什么？怎么办？

nswcfd 发表于 2016-07-13 20:50

本帖最后由 nswcfd 于 2016-07-13 20:51 编辑

比较一下汇编，并没有被优化掉

未优化的

0x00000000004004c4 <+0>: push %rbp
0x00000000004004c5 <+1>: mov %rsp,%rbp
0x00000000004004c8 <+4>: sub $0x50,%rsp
0x00000000004004cc <+8>: movl $0x73696854,-0x30(%rbp)
0x00000000004004d3 <+15>: movl $0x20736920,-0x2c(%rbp)
0x00000000004004da <+22>: movl $0x65742061,-0x28(%rbp)
0x00000000004004e1 <+29>: movl $0x6d207473,-0x24(%rbp)
0x00000000004004e8 <+36>: movl $0x61737365,-0x20(%rbp)
0x00000000004004ef <+43>: movl $0xa2e6567,-0x1c(%rbp)
0x00000000004004f6 <+50>: movl $0x0,-0x18(%rbp)
0x00000000004004fd <+57>: movw $0x0,-0x14(%rbp)
0x0000000000400503 <+63>: movl $0x19,-0x4(%rbp)
0x000000000040050a <+70>: lea -0x30(%rbp),%rax
0x000000000040050e <+74>: lea -0x50(%rbp),%rdx
0x0000000000400512 <+78>: mov -0x4(%rbp),%ecx
0x0000000000400515 <+81>: mov %rax,%rsi
0x0000000000400518 <+84>: mov %rdx,%rdi
0x000000000040051b <+87>: cld
0x000000000040051c <+88>: rep movsb %ds:(%rsi),%es:(%rdi)
0x000000000040051e <+90>: lea -0x50(%rbp),%rax
0x0000000000400522 <+94>: mov %rax,%rdi
0x0000000000400525 <+97>: callq0x4003b8 <puts@plt>

优化之后的

0x00000000004004c4 <+0>: sub $0x48,%rsp
0x00000000004004c8 <+4>: movl $0x73696854,0x20(%rsp)
0x00000000004004d0 <+12>:movl $0x20736920,0x24(%rsp)
0x00000000004004d8 <+20>:movl $0x65742061,0x28(%rsp)
0x00000000004004e0 <+28>:movl $0x6d207473,0x2c(%rsp)
0x00000000004004e8 <+36>:movl $0x61737365,0x30(%rsp)
0x00000000004004f0 <+44>:movl $0xa2e6567,0x34(%rsp)
0x00000000004004f8 <+52>:movl $0x0,0x38(%rsp)
0x0000000000400500 <+60>:movw $0x0,0x3c(%rsp)
0x0000000000400507 <+67>:lea 0x20(%rsp),%rsi
0x000000000040050c <+72>:mov %rsp,%rdi
0x000000000040050f <+75>:mov $0x19,%ecx
0x0000000000400514 <+80>:cld
0x0000000000400515 <+81>:rep movsb %ds:(%rsi),%es:(%rdi)
0x0000000000400517 <+83>:callq0x4003b8 <puts@plt>

nswcfd 发表于 2016-07-13 21:16

rep movsb都在，只是执行完毕之后，rdi指向字符串的末尾。

未优化的场景，重新从栈上把目的地址找出来了。
而在优化的场景下，rdi原封不动的送给puts做参数了，所以打印的是outputs后面的随机内容。

nswcfd 发表于 2016-07-13 21:50

两个办法，一个引入中间寄存器，并声明rdi可变
   __asm__ __volatile__("cld\n\t"
            "mov %1,%%rdi;"
            "rep movsb"
            :
            :"S"(input),"r"(output),"c"(length)
            :"rdi");

另一个方法是使用&修饰符，但需要引入一个临时变量
   int di;
   __asm__ __volatile__("cld; rep movsb": "=&D"(di) : "S"(input), "0"(output), "c"(length));

nswcfd 发表于 2016-07-13 21:59

按照http://ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html#s7的例子，貌似把%edi放入clobber list就可以，但貌似不行，不知道为什么？

#define mov_blk(src, dest, numwords) \
__asm__ __volatile__ (                                        \
                  "cld\n\t"                            \
                  "rep\n\t"                            \
                  "movsl"                               \
                  :                                     \
                  : "S" (src), "D" (dest), "c" (numwords)\
                  : "%ecx", "%esi", "%edi"             \
                  )

nswcfd 发表于 2016-07-14 09:49

按照 https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html 第 6.44.2.5 Input Operands 节的说法

When the compiler selects the registers to use to represent the input operands, it does not use any of the clobbered registers (see Clobbers).

那么，void test() { asm("nop"::"D"(1):"di"); } 出现如下错误也是合情合理的

<stdin>:1: error: can't find a register in class 'DIREG' while reloading 'asm'
<stdin>:1: error: 'asm' operand has impossible constraints

因为D类只有di一个寄存器，clobber又把它排除了，自然就没得选了。

不知道为什么楼上的帖子会那么用，估计是老版本的行为？

-----------------------------------------------------------------------------

PS，关于volatile的语义，上面链接的第 6.44.2.1 Volatile 节有几个很好的例子。

感觉extend asm不关心asm template里面的汇编语句是怎么写的？完全按照input operand/output operand/clobber的约束声明来决定是否可以优化？

shang2010 发表于 2016-07-14 11:24

欢迎cpp编译大神归来

页: [1]

Chinaunix's Archiver

请问：为什么加入volatile还是被优化？