关于内联函数inline

allkillers 发表于 2016-04-28 10:10

本帖最后由 allkillers 于 2016-04-28 10:12 编辑

一直看书上说内联函数是通过在使用时直接替换，没有了调用的开销，从而执行效率很高。字面意思可以理解，但具体是怎么反映出执行效率高的，真没看出来。，如下分两种情况进行汇编对比，如下：

第一种，inline函数：
static inline void test(int i, int j)
{
}

int main(void)
{
test(1,5);
return 0;
}

汇编为：

00000000004004ed <test>:
4004ed:    55                   push %rbp
4004ee:    48 89 e5             mov %rsp,%rbp
4004f1:    89 7d fc             mov %edi,-0x4(%rbp)
4004f4:    89 75 f8             mov %esi,-0x8(%rbp)
4004f7:    5d                   pop %rbp
4004f8:    c3                   retq

00000000004004f9 <main>:
4004f9:    55                   push %rbp
4004fa:    48 89 e5             mov %rsp,%rbp
4004fd:    be 05 00 00 00       mov $0x5,%esi
400502:    bf 01 00 00 00       mov $0x1,%edi
400507:    e8 e1 ff ff ff       callq4004ed <test>
40050c:    b8 00 00 00 00       mov $0x0,%eax
400511:    5d                   pop %rbp
400512:    c3                   retq
400513:    66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40051a:    00 00 00
40051d:    0f 1f 00             nopl (%rax)

第二种，普通函数：
static void test(int i, int j)
{
}

int main(void)
{
test(1,5);
return 0;
}
汇编为：
00000000004004ed <test>:
4004ed:    55                   push %rbp
4004ee:    48 89 e5             mov %rsp,%rbp
4004f1:    89 7d fc             mov %edi,-0x4(%rbp)
4004f4:    89 75 f8             mov %esi,-0x8(%rbp)
4004f7:    5d                   pop %rbp
4004f8:    c3                   retq

00000000004004f9 <main>:
4004f9:    55                   push %rbp
4004fa:    48 89 e5             mov %rsp,%rbp
4004fd:    be 05 00 00 00       mov $0x5,%esi
400502:    bf 01 00 00 00       mov $0x1,%edi
400507:    e8 e1 ff ff ff       callq4004ed <test>
40050c:    b8 00 00 00 00       mov $0x0,%eax
400511:    5d                   pop %rbp
400512:    c3                   retq
400513:    66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40051a:    00 00 00
40051d:    0f 1f 00             nopl (%rax)

结论：
从汇编来看并无区别，并没有“内联”的动作，还是普通的函数调用啊，那究竟是怎么提高效率的？求解。

Buddy_Zhang1 发表于 2016-04-28 10:30

回复 1# allkillers

您可以去看一下“预处理”这个阶段，inline 的处理是在这个阶段，还没到汇编阶段

allkillers 发表于 2016-04-28 10:46

预处理的处理结果更应该在汇编中体现才对啊，如果汇编中都没有体现，只能在运行期体现了，例如#define N (30) ，预处理的结果就是你可以在汇编中看到常量“30”。我的看法是，因为相同的汇编码必定会是相同的机器码，相同的机器码那就意味着执行序列及效率必然相同。不知道我的理解有没有问题。回复 2# Buddy_Zhang1

Buddy_Zhang1 发表于 2016-04-28 10:55

回复 3# allkillers

   你可以在您的 demo code 上使用下面命令来看预处理过程
   gcc xxx.c -E

   四个阶段：预处理-编译-汇编-链接

allkillers 发表于 2016-04-28 11:03

好的，我再研究研究。但我贴出来的汇编代码已经是链接后的结果了，这一点应该是确定的。回复 4# Buddy_Zhang1

amarant 发表于 2016-04-28 11:06

本帖最后由 amarant 于 2016-04-29 10:05 编辑

你没有加优化等级，默认不会内联。如果需要强行内联，需要按下面语法操作。
GCC does not inline any functions when not optimizing unless you specify the ‘always_inline’ attribute for the function, like this:

/* Prototype.*/
inline void foo (const char) __attribute__((always_inline));

参考
https://gcc.gnu.org/onlinedocs/gcc/Inline.html

xzko 发表于 2016-04-28 14:19

可以参见gcc源码中处理inline的两个相关pass，pass_early_inline和pass_ipa_inline，pass_early_inline在不指定优化选项-Onum时，只通过inline_always_inline_functions来内联指定了“always_inline”attribute的函数，pass_ipa_inine在不指定优化选项-Onum时直接返回不做任何inline处理

我爱你我的菜 发表于 2016-04-28 16:18

按说内连函数是预处理阶段直接处理的，汇编中应该有函数调用保存现场的动作，是不是你的test函数太简单，编译器优化掉了呢？

_nosay 发表于 2016-04-28 16:41

回复 8# 我爱你我的菜

我第一反应也是优化级别，分别加-O0、-O1、-O2编译，看看有不有区别，内联函数不像宏函数，编译器“视情况”确定将其当作普通函数还是宏函数，“视情况”估计就是看优化级别是多少吧，没试过。

我爱你我的菜 发表于 2016-04-28 16:50

理解错了，上面是inline没有生效

生效代码：
#include<stdio.h>

inlinestaticvoid test(int i, int j)__attribute__ ((always_inline));
staticvoid test(int i, int j)
{
int sum;
sum=i+j;
}

int main(void)
{
test(1,5);
return 0;
}

对比汇编：
.file "1.c"
.text
.type test, @function
test:
.LFB0:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $16, %esp
movl 12(%ebp), %eax
movl 8(%ebp), %edx
addl %edx, %eax
movl %eax, -4(%ebp)
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE0:
.size test, .-test
.globl main
.type main, @function
main:
.LFB1:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $8, %esp
movl $5, 4(%esp)
movl $1, (%esp)
call test
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",@progbits

inline汇编：
.file "2.c"
.text
.globl main
.type main, @function
main:
.LFB1:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
subl $16, %esp
movl $1, -12(%ebp)
movl $5, -8(%ebp)
movl -8(%ebp), %eax
movl -12(%ebp), %edx
addl %edx, %eax
movl %eax, -4(%ebp)
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",@progbits

页: [1] 2

Chinaunix's Archiver

关于内联函数inline