- 论坛徽章:
- 0
|
linux内核中断、异常 .
中断:
•可屏蔽中断:所有有I/O设备请求的中断都是,被屏蔽的中断会一直被CPU 忽略,直到屏蔽位被重置。
•不可屏蔽中断:非常危险的事件引起(如硬件失败)。
异常:
•处理器产生的(Fault,Trap,Abort)异常
•programmed exceptions(软中断):由程序员通过INT或INT3指令触发,通常当做trap处理,用处:实现系统调用。
中断描述符表(IDT):256项,其中的每一项关联一个中断/异常处理过程,有三种类型:
1.Task Gate Descriptor. Linux未使用该类型的描述符。
2.Interrupt Gate Descriptor.用于处理中断。
3.Trap Gate Descriptor. 用于处理异常。
•中断门: 用于硬件中断,DPL为0,不允许用户态直接使用int指令访问,硬件中断免去这一判断,因此可以在用户态响应中断,见set_intr_gate
•DPL3 陷阱门: 用于系统调用,DPL为3,允许用户态直接使用int指令访问,这样才能通过int80访问系统调用,只有80号向量属于此门,见 set_system_gate
•DPL0陷阱门: 用于CPU异常,不允许用户态直接使用int指令访问,硬件中断免去这一判断,因此可以在用户产生CPU异常,见set_trap_gate
在指令执行过程中控制单元检测是否有中断/异常发生,如果有,等待该条指令执行完成以后,硬件按如下过程执行:
1.确定 中断向量的编号i。
2.从IDT表中得到第i个门描述符。(idtr指向IDT)
3.由第i项中的选择符和gdtr 查到位于GDT中的段描述符,从而得到中断处理程序的基地址,而偏移量位于门描述符中。
4.做权限检查:比较cs中的CPL和GDT中 段描述符的DPL,确保中断处理程序的特权级不低于调用者。对于programed exception 还需检查CPL与门描述符的DPL,还应确保CPL大于等于门的DPL。Why?因为INT指令允许用户态的进程产生中断信号,其向量值 可以为0到255的任一值,为了避免用户通过INT指令产生非法中断,在初始化的时候,将向量值为80H的门描述符(系统调用使用该门)的DPL设为3, 将其他需要避免访问的门描述符的DPL值设为0,这样在做权限检查的时候就可以检查出来非法的情况。
5.检查是否发 生了特权级的变化,一般指是否由用户态陷入了内核态。如果是由用户态陷入了内核态,控制单元必须开始使用与新的特权级相关的堆栈a. 读tr寄存器,访问运行进程的tss段。why?因为任何进程从用户态陷入内核态都必须从TSS获得内核堆栈指针。
b. 用与新特权级相关的栈段和栈指针装载ss和esp寄存器。这些值可以在进程的tss段中找到。
c. 在新的栈(内核栈)中保存用户态的ss和esp,这些值指明了用户态相关栈的逻辑地址。
6.若发生的是故障,用引起异常的指令 地址修改cs和eip寄存器的值,以使得这条指令在异常处理结束后能被再次执行
7.在栈中保存eflags、cs和eip的内容
8.如 果异常带有一个硬件出错码,则将它保存在栈中
9.装载cs和eip寄存器,其值分别是在GDT中找到的段描述符段基址和IDT表中第i 个门的偏移量。这样就得到了中断/异常处理程序第一条指令的逻辑地址。
从中断/异 常返回:
中断/异常处理完后,相应的处理程序会执行一条iret指令,做了如下事情:
1)用保存在 栈中的值装载cs、eip和eflags寄存器。如果一个硬件出错码曾被压入栈中,那么弹出这个硬件出错码
2)检查处理程序的特权级是 否等于cs中最低两位的值(这意味着进程在被中断的时候是运行在内核态还是用户态)。若是内核态,iret终止执行;否则,转入3
3) 从栈中装载ss和esp寄存器。这步意味着返回到与旧特权级相关的栈。
4)检查ds、es、fs和gs段寄存器的内容,如果其中一个寄 存器包含的选择符是一个段描述符,并且特权级比当前特权级高,则清除相应的寄存器。这么做是防止怀有恶意的用户程序利用这些寄存器访问内核空间。
关于硬件中断和异常的原理简单描述为:当中断到到来时,由硬件触发中断引脚,通过引脚号找到中断号,然后通过中断号从中断描述符表(IDT)中找到对应的项。从gdtr寄存器中获得GDT的基地址,并在GDT中查找,以读取IDT表项中的选择符所标识的段描述符。这个描述符指定中断或异常处理程序所在段的基地址。权限检查。保存现场。装载cs和eip寄存器,其值分别是IDT表中第i想们描述符的段选择符和偏移量字段。这些值给出了中断或者异常处理程序的第一条指令的逻辑地址。中断或异常返回后,相应的处理程序必须产生一条iret指令,把控制权转交给被中断的进程。
中断流:
中断描述符表的初始化
在内核初始化过程中,setup_idt汇编语言函数用同一个中断门(即指向ignore_int中断处理程序)来填充所有这256个表项
view plaincopy to clipboardprint?- 01./*
- 02. * setup_idt
- 03. *
- 04. * sets up a idt with 256 entries pointing to
- 05. * ignore_int, interrupt gates. It doesn't actually load
- 06. * idt - that can be done only after paging has been enabled
- 07. * and the kernel moved to PAGE_OFFSET. Interrupts
- 08. * are enabled elsewhere, when we can be relatively
- 09. * sure everything is ok.
- 10. *
- 11. * Warning: %esi is live across this function.
- 12. */
- 13.setup_idt:
- 14. lea ignore_int,%edx
- 15. movl $(__KERNEL_CS << 16),%eax
- 16. movw %dx,%ax /* selector = 0x0010 = cs */
- 17. movw $0x8E00,%dx /* interrupt gate - dpl=0, present */
- 18.
- 19. lea idt_table,%edi
- 20. mov $256,%ecx
- 21.rp_sidt:
- 22. movl %eax,(%edi)
- 23. movl %edx,4(%edi)
- 24. addl $8,%edi
- 25. dec %ecx
- 26. jne rp_sidt
- 27.
- 28..macro set_early_handler handler,trapno
- 29. lea \handler,%edx
- 30. movl $(__KERNEL_CS << 16),%eax
- 31. movw %dx,%ax
- 32. movw $0x8E00,%dx /* interrupt gate - dpl=0, present */
- 33. lea idt_table,%edi
- 34. movl %eax,8*\trapno(%edi)
- 35. movl %edx,8*\trapno+4(%edi)
- 36..endm
- 37.
- 38. set_early_handler handler=early_divide_err,trapno=0
- 39. set_early_handler handler=early_illegal_opcode,trapno=6
- 40. set_early_handler handler=early_protection_fault,trapno=13
- 41. set_early_handler handler=early_page_fault,trapno=14
- 42.
- 43. ret
- /*
- * setup_idt
- *
- * sets up a idt with 256 entries pointing to
- * ignore_int, interrupt gates. It doesn't actually load
- * idt - that can be done only after paging has been enabled
- * and the kernel moved to PAGE_OFFSET. Interrupts
- * are enabled elsewhere, when we can be relatively
- * sure everything is ok.
- *
- * Warning: %esi is live across this function.
- */
- setup_idt:
- lea ignore_int,%edx
- movl $(__KERNEL_CS << 16),%eax
- movw %dx,%ax /* selector = 0x0010 = cs */
- movw $0x8E00,%dx /* interrupt gate - dpl=0, present */
- lea idt_table,%edi
- mov $256,%ecx
- rp_sidt:
- movl %eax,(%edi)
- movl %edx,4(%edi)
- addl $8,%edi
- dec %ecx
- jne rp_sidt
- .macro set_early_handler handler,trapno
- lea \handler,%edx
- movl $(__KERNEL_CS << 16),%eax
- movw %dx,%ax
- movw $0x8E00,%dx /* interrupt gate - dpl=0, present */
- lea idt_table,%edi
- movl %eax,8*\trapno(%edi)
- movl %edx,8*\trapno+4(%edi)
- .endm
- set_early_handler handler=early_divide_err,trapno=0
- set_early_handler handler=early_illegal_opcode,trapno=6
- set_early_handler handler=early_protection_fault,trapno=13
- set_early_handler handler=early_page_fault,trapno=14
- ret
复制代码 在start_kernel中调用trap_init函数想idt表中添加项(主要是异常处理)
view plaincopy to clipboardprint?- 01.void __init trap_init(void)
- 02.{
- 03. int i;
- 04.
- 05.#ifdef CONFIG_EISA
- 06. void __iomem *p = early_ioremap(0x0FFFD9, 4);
- 07.
- 08. if (readl(p) == 'E' + ('I'<<8) + ('S'<<16) + ('A'<<24))
- 09. EISA_bus = 1;
- 10. early_iounmap(p, 4);
- 11.#endif
- 12.
- 13. set_intr_gate(0, ÷_error);
- 14. set_intr_gate_ist(1, &debug, DEBUG_STACK);
- 15. set_intr_gate_ist(2, &nmi, NMI_STACK);
- 16. /* int3 can be called from all */
- 17. set_system_intr_gate_ist(3, &int3, DEBUG_STACK);
- 18. /* int4 can be called from all */
- 19. set_system_intr_gate(4, &overflow);
- 20. set_intr_gate(5, &bounds);
- 21. set_intr_gate(6, &invalid_op);
- 22. set_intr_gate(7, &device_not_available);
- 23.#ifdef CONFIG_X86_32
- 24. set_task_gate(8, GDT_ENTRY_DOUBLEFAULT_TSS);
- 25.#else
- 26. set_intr_gate_ist(8, &double_fault, DOUBLEFAULT_STACK);
- 27.#endif
- 28. set_intr_gate(9, &coprocessor_segment_overrun);
- 29. set_intr_gate(10, &invalid_TSS);
- 30. set_intr_gate(11, &segment_not_present);
- 31. set_intr_gate_ist(12, &stack_segment, STACKFAULT_STACK);
- 32. set_intr_gate(13, &general_protection);
- 33. set_intr_gate(14, &page_fault);
- 34. set_intr_gate(15, &spurious_interrupt_bug);
- 35. set_intr_gate(16, &coprocessor_error);
- 36. set_intr_gate(17, &alignment_check);
- 37.#ifdef CONFIG_X86_MCE
- 38. set_intr_gate_ist(18, &machine_check, MCE_STACK);
- 39.#endif
- 40. set_intr_gate(19, &simd_coprocessor_error);
- 41.
- 42. /* Reserve all the builtin and the syscall vector: */
- 43. for (i = 0; i < FIRST_EXTERNAL_VECTOR; i++)
- 44. set_bit(i, used_vectors);
- 45.
- 46.#ifdef CONFIG_IA32_EMULATION
- 47. set_system_intr_gate(IA32_SYSCALL_VECTOR, ia32_syscall);
- 48. set_bit(IA32_SYSCALL_VECTOR, used_vectors);
- 49.#endif
- 50.
- 51.#ifdef CONFIG_X86_32
- 52. if (cpu_has_fxsr) {
- 53. printk(KERN_INFO "Enabling fast FPU save and restore... ");
- 54. set_in_cr4(X86_CR4_OSFXSR);
- 55. printk("done.\n");
- 56. }
- 57. if (cpu_has_xmm) {
- 58. printk(KERN_INFO
- 59. "Enabling unmasked SIMD FPU exception support... ");
- 60. set_in_cr4(X86_CR4_OSXMMEXCPT);
- 61. printk("done.\n");
- 62. }
- 63.
- 64. set_system_trap_gate(SYSCALL_VECTOR, &system_call);
- 65. set_bit(SYSCALL_VECTOR, used_vectors);
- 66.#endif
- 67.
- 68. /*
- 69. * Should be a barrier for any external CPU state:
- 70. */
- 71. cpu_init();
- 72.
- 73. x86_init.irqs.trap_init();
- 74.}
- void __init trap_init(void)
- {
- int i;
- #ifdef CONFIG_EISA
- void __iomem *p = early_ioremap(0x0FFFD9, 4);
- if (readl(p) == 'E' + ('I'<<8) + ('S'<<16) + ('A'<<24))
- EISA_bus = 1;
- early_iounmap(p, 4);
- #endif
- set_intr_gate(0, ÷_error);
- set_intr_gate_ist(1, &debug, DEBUG_STACK);
- set_intr_gate_ist(2, &nmi, NMI_STACK);
- /* int3 can be called from all */
- set_system_intr_gate_ist(3, &int3, DEBUG_STACK);
- /* int4 can be called from all */
- set_system_intr_gate(4, &overflow);
- set_intr_gate(5, &bounds);
- set_intr_gate(6, &invalid_op);
- set_intr_gate(7, &device_not_available);
- #ifdef CONFIG_X86_32
- set_task_gate(8, GDT_ENTRY_DOUBLEFAULT_TSS);
- #else
- set_intr_gate_ist(8, &double_fault, DOUBLEFAULT_STACK);
- #endif
- set_intr_gate(9, &coprocessor_segment_overrun);
- set_intr_gate(10, &invalid_TSS);
- set_intr_gate(11, &segment_not_present);
- set_intr_gate_ist(12, &stack_segment, STACKFAULT_STACK);
- set_intr_gate(13, &general_protection);
- set_intr_gate(14, &page_fault);
- set_intr_gate(15, &spurious_interrupt_bug);
- set_intr_gate(16, &coprocessor_error);
- set_intr_gate(17, &alignment_check);
- #ifdef CONFIG_X86_MCE
- set_intr_gate_ist(18, &machine_check, MCE_STACK);
- #endif
- set_intr_gate(19, &simd_coprocessor_error);
- /* Reserve all the builtin and the syscall vector: */
- for (i = 0; i < FIRST_EXTERNAL_VECTOR; i++)
- set_bit(i, used_vectors);
- #ifdef CONFIG_IA32_EMULATION
- set_system_intr_gate(IA32_SYSCALL_VECTOR, ia32_syscall);
- set_bit(IA32_SYSCALL_VECTOR, used_vectors);
- #endif
- #ifdef CONFIG_X86_32
- if (cpu_has_fxsr) {
- printk(KERN_INFO "Enabling fast FPU save and restore... ");
- set_in_cr4(X86_CR4_OSFXSR);
- printk("done.\n");
- }
- if (cpu_has_xmm) {
- printk(KERN_INFO
- "Enabling unmasked SIMD FPU exception support... ");
- set_in_cr4(X86_CR4_OSXMMEXCPT);
- printk("done.\n");
- }
- set_system_trap_gate(SYSCALL_VECTOR, &system_call);
- set_bit(SYSCALL_VECTOR, used_vectors);
- #endif
- /*
- * Should be a barrier for any external CPU state:
- */
- cpu_init();
- x86_init.irqs.trap_init();
- }
-
复制代码 异常处理
异常处理程序有一个标准的结构,由以下三部分组成:
1,在内核堆栈中保存大多数寄存器的内容(这部分用汇编语言实现)
例如,对于除0异常的汇编
view plaincopy to clipboardprint?- 01.ENTRY(divide_error)
- 02. RING0_INT_FRAME
- 03. pushl $0 # no error code
- 04. CFI_ADJUST_CFA_OFFSET 4
- 05. pushl $do_divide_error
- 06. CFI_ADJUST_CFA_OFFSET 4
- 07. jmp error_code
- 08. CFI_ENDPROC
- 09.END(divide_error)
- ENTRY(divide_error)
- RING0_INT_FRAME
- pushl $0 # no error code
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_divide_error
- CFI_ADJUST_CFA_OFFSET 4
- jmp error_code
- CFI_ENDPROC
- END(divide_error)
复制代码 其中入口divide_error为idt表中对应项的处理函数地址,也就是说,产生异常后首先跳到这里执行。当异常产生时,如果控制单元没有自动地把一个硬件出错代码插入到栈中,相应的汇编片段会含一条pushl $0指令,在栈中垫上一个空值。然后,把高级c函数的地址压入栈中,他的名字由异常处理程序名与do_前缀组成。然后跳转到error_code中执行
view plaincopy to clipboardprint?- 01.error_code:
- 02. /* the function address is in %gs's slot on the stack */
- 03. pushl %fs
- 04. CFI_ADJUST_CFA_OFFSET 4
- 05. /*CFI_REL_OFFSET fs, 0*/
- 06. pushl %es
- 07. CFI_ADJUST_CFA_OFFSET 4
- 08. /*CFI_REL_OFFSET es, 0*/
- 09. pushl %ds
- 10. CFI_ADJUST_CFA_OFFSET 4
- 11. /*CFI_REL_OFFSET ds, 0*/
- 12. pushl %eax
- 13. CFI_ADJUST_CFA_OFFSET 4
- 14. CFI_REL_OFFSET eax, 0
- 15. pushl %ebp
- 16. CFI_ADJUST_CFA_OFFSET 4
- 17. CFI_REL_OFFSET ebp, 0
- 18. pushl %edi
- 19. CFI_ADJUST_CFA_OFFSET 4
- 20. CFI_REL_OFFSET edi, 0
- 21. pushl %esi
- 22. CFI_ADJUST_CFA_OFFSET 4
- 23. CFI_REL_OFFSET esi, 0
- 24. pushl %edx
- 25. CFI_ADJUST_CFA_OFFSET 4
- 26. CFI_REL_OFFSET edx, 0
- 27. pushl %ecx
- 28. CFI_ADJUST_CFA_OFFSET 4
- 29. CFI_REL_OFFSET ecx, 0
- 30. pushl %ebx
- 31. CFI_ADJUST_CFA_OFFSET 4
- 32. CFI_REL_OFFSET ebx, 0
- 33. cld
- 34. movl $(__KERNEL_PERCPU), %ecx
- 35. movl %ecx, %fs
- 36. UNWIND_ESPFIX_STACK
- 37. GS_TO_REG %ecx
- 38. movl PT_GS(%esp), %edi # get the function address
- 39. movl PT_ORIG_EAX(%esp), %edx # get the error code
- 40. movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart
- 41. REG_TO_PTGS %ecx
- 42. SET_KERNEL_GS %ecx
- 43. movl $(__USER_DS), %ecx
- 44. movl %ecx, %ds
- 45. movl %ecx, %es
- 46. TRACE_IRQS_OFF
- 47. movl %esp,%eax # pt_regs pointer
- 48. call *%edi
- 49. jmp ret_from_exception
- error_code:
- /* the function address is in %gs's slot on the stack */
- pushl %fs
- CFI_ADJUST_CFA_OFFSET 4
- /*CFI_REL_OFFSET fs, 0*/
- pushl %es
- CFI_ADJUST_CFA_OFFSET 4
- /*CFI_REL_OFFSET es, 0*/
- pushl %ds
- CFI_ADJUST_CFA_OFFSET 4
- /*CFI_REL_OFFSET ds, 0*/
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET eax, 0
- pushl %ebp
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET ebp, 0
- pushl %edi
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET edi, 0
- pushl %esi
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET esi, 0
- pushl %edx
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET edx, 0
- pushl %ecx
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET ecx, 0
- pushl %ebx
- CFI_ADJUST_CFA_OFFSET 4
- CFI_REL_OFFSET ebx, 0
- cld
- movl $(__KERNEL_PERCPU), %ecx
- movl %ecx, %fs
- UNWIND_ESPFIX_STACK
- GS_TO_REG %ecx
- movl PT_GS(%esp), %edi # get the function address
- movl PT_ORIG_EAX(%esp), %edx # get the error code
- movl $-1, PT_ORIG_EAX(%esp) # no syscall to restart
- REG_TO_PTGS %ecx
- SET_KERNEL_GS %ecx
- movl $(__USER_DS), %ecx
- movl %ecx, %ds
- movl %ecx, %es
- TRACE_IRQS_OFF
- movl %esp,%eax # pt_regs pointer
- call *%edi
- jmp ret_from_exception
- error_code
复制代码 汇编代码主要完成大部分寄存器的保存,然后调用call *%edi代码调用上面保存在栈中的c函数执行。
在linux2.6内核中,采用宏的方式定义这类do_函数:
view plaincopy to clipboardprint?- 01.DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip)
- 02.DO_ERROR(4, SIGSEGV, "overflow", overflow)
- 03.DO_ERROR(5, SIGSEGV, "bounds", bounds)
- 04.DO_ERROR_INFO(6, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip)
- 05.DO_ERROR(9, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun)
- 06.DO_ERROR(10, SIGSEGV, "invalid TSS", invalid_TSS)
- 07.DO_ERROR(11, SIGBUS, "segment not present", segment_not_present)
- 08.#ifdef CONFIG_X86_32
- 09.DO_ERROR(12, SIGBUS, "stack segment", stack_segment)
- 10.#endif
- DO_ERROR_INFO(0, SIGFPE, "divide error", divide_error, FPE_INTDIV, regs->ip)
- DO_ERROR(4, SIGSEGV, "overflow", overflow)
- DO_ERROR(5, SIGSEGV, "bounds", bounds)
- DO_ERROR_INFO(6, SIGILL, "invalid opcode", invalid_op, ILL_ILLOPN, regs->ip)
- DO_ERROR(9, SIGFPE, "coprocessor segment overrun", coprocessor_segment_overrun)
- DO_ERROR(10, SIGSEGV, "invalid TSS", invalid_TSS)
- DO_ERROR(11, SIGBUS, "segment not present", segment_not_present)
- #ifdef CONFIG_X86_32
- DO_ERROR(12, SIGBUS, "stack segment", stack_segment)
- #endif
复制代码 我们对上面的宏,看一个
view plaincopy to clipboardprint?- 01.#define DO_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr) \
- 02.dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \
- 03.{ \
- 04. siginfo_t info; \
- 05. info.si_signo = signr; \
- 06. info.si_errno = 0; \
- 07. info.si_code = sicode; \
- 08. info.si_addr = (void __user *)siaddr; \
- 09. if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) \
- 10. == NOTIFY_STOP) \
- 11. return; \
- 12. conditional_sti(regs); \
- 13. do_trap(trapnr, signr, str, regs, error_code, &info); \
- 14.}
- #define DO_ERROR_INFO(trapnr, signr, str, name, sicode, siaddr) \
- dotraplinkage void do_##name(struct pt_regs *regs, long error_code) \
- { \
- siginfo_t info; \
- info.si_signo = signr; \
- info.si_errno = 0; \
- info.si_code = sicode; \
- info.si_addr = (void __user *)siaddr; \
- if (notify_die(DIE_TRAP, str, regs, error_code, trapnr, signr) \
- == NOTIFY_STOP) \
- return; \
- conditional_sti(regs); \
- do_trap(trapnr, signr, str, regs, error_code, &info); \
- }
复制代码 可见最后都调用了do_trap函数来执行。
|
|