内核虚拟化KVM/QEMU——Guest OS, Qemu, KVM工作流程 作者 李万鹏
这里主要介绍基于x86平台的Guest Os, Qemu, Kvm工作流程,如图,通过KVM APIs可以将qemu的command传递到kvm:
1.创建VM
system_fd = open("/dev/kvm", xxx);
vm_fd = ioctl(system_fd, KVM_CREATE_VM, xxx);
2.创建VCPU
vcpu_fd = kvm_vm_ioctl(vm_fd, VM_CREATE_VCPU, xxx);
3.运行KVM
status = kvm_vcpu_ioctl(vcpu_fd, KVM_RUN, xxx);
Qemu通过KVM APIs进入KVM后,KVM会切入Guest OS,假如Guest OS运行运行,需要访问IO等,也就是说要访问physical device,那么Qemu与KVM就要进行emulate。 如果是KVM emulate的则由KVM emulate,然后切回Guest OS。如果是Qemu emulate的,则从KVM中进入Qemu,等Qemu中的device model执行完emulate之后,再次在Qemu中调用kvm_vcpu_ioctl(vcpu_fd, KVM_RUN, xxx)进入KVM运行,然后再切回Guest OS.
(图片勘误,如果KVM can emulate那么emulate之后应该层层返回到kvm_x86_ops->run(vcpu),然后才切入guest os,不是直接切入,图画完了,不好修改)
Qemu是一个应用程序,所以入口函数当然是main函数,但是一些被type_init修饰的函数会在main函数之前运行。这里分析的代码是emulate x86 的一款i440板子。main函数中会调用在main函数中会调用kvm_init函数来创建一个VM(virtual machine),然后调用机器硬件初始化相关的函数,对PCI,memory等进行emulate。然后调用qemu_thread_create创建线程,这个函数会调用pthread_create创建一个线程,每个VCPU依靠一个线程来运行。在线程的处理函数qemu_kvm_cpu_thread_fn中,会调用kvm_init_vcpu来创建一个VCPU(virtual CPU),然后调用kvm_vcpu_ioctl,参数KVM_RUN,这样就进入KVM中了。进入KVM中第一个执行的函数名字相同,也叫kvm_vcpu_ioctl,最终会调用到kvm_x86_ops->run()进入到Guest OS,如果Guest OS要写某个端口,会产生一条IO instruction,这时会从Guest OS中退出,调用kvm_x86_ops->handle_exit函数,其实这个函数被赋值为vmx_handle_exit,最终会调用到kvm_vmx_exit_handlers[exit_reason](vcpu),kvm_vmx_exit_handlers是一个函数指针,会根据产生事件的类型来匹配使用那个函数。这里因为是ioport访问产生的退出,所以选择handle_io函数。
[html] view plaincopyprint?- 01.5549static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
- 02.5550 [EXIT_REASON_EXCEPTION_NMI] = handle_exception,
- 03.5551 [EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt,
- 04.5552 [EXIT_REASON_TRIPLE_FAULT] = handle_triple_fault,
- 05.5553 [EXIT_REASON_NMI_WINDOW] = handle_nmi_window,
- 06.5554 [EXIT_REASON_IO_INSTRUCTION] = handle_io,
- 07.5555 [EXIT_REASON_CR_ACCESS] = handle_cr,
- 08.5556 [EXIT_REASON_DR_ACCESS] = handle_dr,
- 09.5557 [EXIT_REASON_CPUID] = handle_cpuid,
- 10.5558 [EXIT_REASON_MSR_READ] = handle_rdmsr,
- 11.5559 [EXIT_REASON_MSR_WRITE] = handle_wrmsr,
- 12.5560 [EXIT_REASON_PENDING_INTERRUPT] = handle_interrupt_window,
- 13.5561 [EXIT_REASON_HLT] = handle_halt,
- 14.5562 [EXIT_REASON_INVD] = handle_invd,
- 15.5563 [EXIT_REASON_INVLPG] = handle_invlpg,
- 16.5564 [EXIT_REASON_VMCALL] = handle_vmcall,
- 17.5565 [EXIT_REASON_VMCLEAR] = handle_vmclear,
- 18.5566 [EXIT_REASON_VMLAUNCH] = handle_vmlaunch,
- 19.5567 [EXIT_REASON_VMPTRLD] = handle_vmptrld,
- 20.5568 [EXIT_REASON_VMPTRST] = handle_vmptrst,
- 21.5569 [EXIT_REASON_VMREAD] = handle_vmread,
- 22.5570 [EXIT_REASON_VMRESUME] = handle_vmresume,
- 23.5571 [EXIT_REASON_VMWRITE] = handle_vmwrite,
- 24.5572 [EXIT_REASON_VMOFF] = handle_vmoff,
- 25.5573 [EXIT_REASON_VMON] = handle_vmon,
- 26.5574 [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold,
- 27.5575 [EXIT_REASON_APIC_ACCESS] = handle_apic_access,
- 28.5576 [EXIT_REASON_WBINVD] = handle_wbinvd,
- 29.5577 [EXIT_REASON_XSETBV] = handle_xsetbv,
- 30.5578 [EXIT_REASON_TASK_SWITCH] = handle_task_switch,
- 31.5579 [EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check,
- 32.5580 [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
- 33.5581 [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
- 34.5582 [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
- 35.5583 [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
- 36.5584 [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
- 37.5585};
- 5549static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
- 5550 [EXIT_REASON_EXCEPTION_NMI] = handle_exception,
- 5551 [EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt,
- 5552 [EXIT_REASON_TRIPLE_FAULT] = handle_triple_fault,
- 5553 [EXIT_REASON_NMI_WINDOW] = handle_nmi_window,
- 5554 [EXIT_REASON_IO_INSTRUCTION] = handle_io,
- 5555 [EXIT_REASON_CR_ACCESS] = handle_cr,
- 5556 [EXIT_REASON_DR_ACCESS] = handle_dr,
- 5557 [EXIT_REASON_CPUID] = handle_cpuid,
- 5558 [EXIT_REASON_MSR_READ] = handle_rdmsr,
- 5559 [EXIT_REASON_MSR_WRITE] = handle_wrmsr,
- 5560 [EXIT_REASON_PENDING_INTERRUPT] = handle_interrupt_window,
- 5561 [EXIT_REASON_HLT] = handle_halt,
- 5562 [EXIT_REASON_INVD] = handle_invd,
- 5563 [EXIT_REASON_INVLPG] = handle_invlpg,
- 5564 [EXIT_REASON_VMCALL] = handle_vmcall,
- 5565 [EXIT_REASON_VMCLEAR] = handle_vmclear,
- 5566 [EXIT_REASON_VMLAUNCH] = handle_vmlaunch,
- 5567 [EXIT_REASON_VMPTRLD] = handle_vmptrld,
- 5568 [EXIT_REASON_VMPTRST] = handle_vmptrst,
- 5569 [EXIT_REASON_VMREAD] = handle_vmread,
- 5570 [EXIT_REASON_VMRESUME] = handle_vmresume,
- 5571 [EXIT_REASON_VMWRITE] = handle_vmwrite,
- 5572 [EXIT_REASON_VMOFF] = handle_vmoff,
- 5573 [EXIT_REASON_VMON] = handle_vmon,
- 5574 [EXIT_REASON_TPR_BELOW_THRESHOLD] = handle_tpr_below_threshold,
- 5575 [EXIT_REASON_APIC_ACCESS] = handle_apic_access,
- 5576 [EXIT_REASON_WBINVD] = handle_wbinvd,
- 5577 [EXIT_REASON_XSETBV] = handle_xsetbv,
- 5578 [EXIT_REASON_TASK_SWITCH] = handle_task_switch,
- 5579 [EXIT_REASON_MCE_DURING_VMENTRY] = handle_machine_check,
- 5580 [EXIT_REASON_EPT_VIOLATION] = handle_ept_violation,
- 5581 [EXIT_REASON_EPT_MISCONFIG] = handle_ept_misconfig,
- 5582 [EXIT_REASON_PAUSE_INSTRUCTION] = handle_pause,
- 5583 [EXIT_REASON_MWAIT_INSTRUCTION] = handle_invalid_op,
- 5584 [EXIT_REASON_MONITOR_INSTRUCTION] = handle_invalid_op,
- 5585};
复制代码 如果KVM中的handle_io函数可以处理,那么处理完了再次切入Guest OS。如果是在Qemu中emulate,那么在KVM中的代码执行完后,会再次回到Qemu中,调用Qemu中的kvm_handle_io函数,如果可以处理,那么再次调用kvm_vcpu_ioctl,参数KVM_RUN,进入KVM,否则出错退出 |