Google Summer of Code 2017 Proposal

Organization is FreeBSD

Project Information


NVMe Controller Emulator for bhyve.

Peter Grehan

Hypervisor is important technique for cloud computing, malware analysis and so on. There are many hypervisor. For example, Hyper-V[1], VirtualBox[2] and KVM[3]. FreeBSD has this one too. It is called bhyve[4]. The bhyve is type 2 hypervisor for FreeBSD. However, a part of the hypervisor is included in FreeBSD kernel as kernel module (vmm.ko) and use hardware virtualization technology (intel VT-x, AMD-V). The bhyve is announced by NetApp[5] inc at BSDCon 2011.

 Under existing conditions bhyve has some method for accessing disks. The fastest one is PCI pass through, because, we can remove overhead of virtualization by this method. However, this method become the disk is occupied from the guest os. In the case of some guest OSs shared same disk, pci pass through is not available(*1). Using file as disk image is also the same. To solve this problem, bhyve has disk controller emulators. We can select virtio or ahci. Virtio is paravirtualized driver for Linux KVM. If guest OS have virtio driver, this is better solution, although, windows os don’t have the driver usually(*2). The other hand is method ahci. AHCI is defined by Intel inc for Serial ATA bus. AHCI have to access MMI/O many times in comparison to virtio and NVMe[6], since It is defined for HDD.

  This project is implementation of disk controller emulator for bhyve. The controller is NVMe. This is defined and optimized

for non-volatile storage media attached via PCIe bus. The advantages of this  are

etc.

The small operation is effective for performance of controller emulation. It’s mean that can reduce MMI/O from guest OS. MMI/O emulation is overhead for disk accessing. Therefore, if guest OS use NVMe controller emulation, it is possible that we can improve performance, thus I’ll try to this project.

Guest OSs (FreeBSD, Linux, Windows) can transfer data via emulated NVMe controller on bhyve.

nvme_command_processing.jpg

The source of this image is section 7.2.1 “Command Processing” in NVMe specification [6]. This image shows flow of command processing. Before that, host has to prepare Admin Submission Queue, Admin Completion Queue and register to ASQ and ACQ in the Controller Register, in addition, host prepare I/O Submission Queue, I/O Completion Queue and dispatch Create Submission and Completion Queue command.

I read bhyve and FreeBSD code. I try to contribute with some bug fix, testing and add functions. I makes plan9 and some OS run on bhyve.

                AHCI via PCI controller is implemented to bhyve, hence, I can reference the code(pci_ahci.c) for my project.

  1. dummy PCIe controller

Bhyve has already dummy pci controller. I add to code for a part of extension. struct pci_devemu which is core of pci device emulator is defined in pci_emul.h. I have to set as below.

“””

struct pci_devemu pci_de_nvme {

        .pe_emu = “nvme”,

        .pe_init = pci_nvme_init,

        .pe_barwrite = pci_nvme_write,

        .pe_barread = pci_nvme_read,

}

PCI_EMUL_SET(pci_de_nvme);

“””

bhyve has some runtime option. “-s <slot,driver,configinfo>” is PCI slot config. Internal bhyve find device emulator by compared the “driver” with .pe_emu.

Controller Register have many element. I’l implement as below code.

“””

struct pci_nvme_softc {

        uint64_t cap;

        uint32_t vs;

        uint32_t intms;

        …

        struct qpair_doorbell* queue_doorbells;

        struct qpair* admin_qpair;

...

}

static uint64_t

pci_nvme_read(struct pci_nvme_softc *sc, uint64_t offset) {

switch(offset) {

                case NVME_CAP:

                {

                        //TODO

                        // read value from sc and return function.

                }

                ….

}

}

“””

                

  1. dummy NVMe controller

PCI id of NVMe controller is 0x011118086 which is hard-coded to FreeBSD nvme driver.

test: FreeBSD guest run and can detect NVMe. (Maybe, FreeBSD cannot complete initialization, because, controller emulator is not implement Controller Register, commands ,interrupt and so on).  

  1. NVMe Controller Register

FreeBSD NVMe driver source code is in sys/dev/nvme/*. nvme_ctrlr_construct() function initialize nvme controller in nvme_ctrlr.c. I'll read this code, read NVMe specification and implement controller emulator. For example, this driver don’t accept that CAP(Controller Capabilities).DSTRD(Doorbell Stride) is not 0, and then, create admin submission and completion queues and other field of Control Register.

test: Run FreeBSD as guest and finish initialization(not include read and write data).

  1. NVMe commands

Commands are 64 bytes in size. The format is defined at Figure 11 in nvme specification.

Controller implementation is below

“””

static void

pci_nvme_write(struct pci_nvme_softc *sc, uint64_t offset, uint64_t value){

        …

        switch(offset) {

                …

                case NVME_DOORBELL:

                {

                        //TODO

                // read command from submission queue

                // execute command

                // write data to completion queue

                                        }

        …

        }

        …

}

“””

when controller receive data to doorbell register, controller do tasks.

  1. the smallest command set.
  1. Create Submission Queue (admin command)

                                This command have set below area

                                Create Completion Queue (admin command)

This command is almost the same as Create Submission Queue.

  1. Read & Write (NVMe command)

                                

For FreeBSD can access disk .(I have to survey FreeBSD NVMe driver)

test: Run FreeBSD guest with NVMe.

  1. another needed commands for Linux and Windows.

Implementation is loop of task 3 and 4.

Community Bonding

                Guest FreeBSD can detect my NVMe controller.

                Guest FreeBSD can transfer data via my controller.

        Guest OSs(Linux, FreeBSD, Windows) can transfer data via implemented controller.

References


[1] Hyper-V overview, https://msdn.microsoft.com/en-us/library/hh831531(v=ws.11).aspx

[2] Oracle VM Virtualbox web page, https://www.virtualbox.org/

[3] Linux KVM web page,https://www.linux-kvm.org/page/Main_Page

[4] bhyve web page, http://bhyve.org/

[5] NetApp web page, http://www.netapp.com/us/index.aspx

[6] NVMe specification, http://www.nvmexpress.org/wp-content/uploads/NVM_Express_1_2_1_Gold_20160603.pdf

Note


*1, SR-IOV(Single Root I/O Virtualization) is for this problem. , https://msdn.microsoft.com/windows/hardware/drivers/network/single-root-i-o-virtualization--sr-iov-

*2, There is window virtio driver project. It is external that. https://fedoraproject.org/wiki/Windows_Virtio_Drivers