Affichage des articles dont le libellé est qemu. Afficher tous les articles
Affichage des articles dont le libellé est qemu. Afficher tous les articles

samedi 18 avril 2015

[qemu] Spice your qemu up

Spice is a free and open-source alternative to the VNC protocol. It is actually faster and safer. It enables you to use a remote viewer on your application.

In the host:
apt-get install virt-viewer
This will install remote-viewer, don't use spicec as it is deprecated.

In the guest VM:
apt-get install spice-vdagent
Now let's boot our VM using spice for the display.
qemu-system-x86_64 -drive file=debian_bootstrap2.qcow2,cache=none,if=virtio \ 
--enable-kvm \ 
-net nic,model=virtio,macaddr=52:54:00:12:34:58 \ 
-net vde,sock=/tmp/switch1.ctl -cpu host \ 
-smp 4 -m 2G \ 
-vga qxl -spice port=5900,addr=127.0.0.1,disable-ticketing \ 
-device virtio-serial-pci \ 
-device virtserialport,chardev=spicechannel0,name=com.redhat.spice.0 \ 
-chardev spicevmc,id=spicechannel0,name=vdagent
There is a lot to process here. all the options before the -vga are my usual options, enabling the net via vde (see my post on this here) and enabling virtio for everything.

The -smp 4 option tell kvm to use 4 cores on my computer.

I use -cpu host because I don't want some feature to disappear in the emulation, also It might be faster.

The -vga qxl is the recommanded vga for use with the spice protocol.

The -spice gives the Qemu integrated spice server its configuration. Here we specify the port and the adress of the server. As this is a local setup, we don't need a certified connection, so I disabled the ticketing.

If your VM is on your server, you might want to reconsider this choice as anyone can hijack the connection if you are not careful. Hackers are hardcore people, they are waiting for you in the internet, wearing ski masks and typing in the most awkward position. So encrypt your communications if you don't want to be a victim of these kind of people.






I will describe the use of spice with certificates in another article as you need to set up a CA to get it working.

With these options, you can already use the spice procol to get your virtual machine exported display. In fact you don't need to install the spice-vdagent for that.

But you can get more: copy and paste between the host and the vm and automatic resizing of X.

The -device virtio-serial-pci is the bus where all the virtserialport are plugged, so you need it to declare the use of a virtserialport device.

The -device virtserialport,chardev=spicechannel0,name=com.redhat.spice.0 is the actual port plugged in the virtio serial bus. its name enable the guest agent to connect to it.

The -chardev spicevmc,id=spicechannel0,name=vdagent is the character device to which the spice client will connect to communicate with guest agent using the virtio serial bus.

Now you can connect to your VM using:
remote-viever spice://127.0.0.1:5900
Have fun :)

Links

http://www.linux-kvm.org/page/SPICE
https://wiki.archlinux.org/index.php/QEMU#Spice_support

dimanche 12 avril 2015

[qemu] Create a complete subnet of virtual machines

Let's create a virtual network with qemu/kvm and vde :)

First, create a virtual switch where you will plug your virtual machines.

vde_switch -s /tmp/switch1.ctl -m 666 --daemon --mgmt /tmp/switch1.mgmt 

The -m option is here to set the flags so as to let everyone use the emulated switch. This is useful if you declare a switch with a link to a tap device.

Without the --daemon option, the vde_switch doesn't go in background and provides us a shell to configure its internals. This shell doesn't support history and readline commodities, so a better option is to use the --daemon mode coupled with the --mgmt option, this provide a socket where the virtual switch will listen for configuration input.
 
vdeterm /tmp/switch1.mgmt 

This command will give you a fully fledged shell. (You know you deserve it)

Finally the -s option is the most important as it is the interface you have to use to connect something to this virtual switch.

Congratulation, there is now an emulated empty switch running on your computer !

You can connect a vm to this switch with the following command:
qemu-system-x86_64 -drive file=your_image.qcow2,if=virtio --enable-kvm -net nic,model=virtio,macaddr=52:54:00:12:34:57 -net vde,sock=/tmp/switch1.ctl -cpu host -smp 4 -m 1G -vga qxl

The important thing is the -net vde option here.
Your vm should have registered to the vde_switch.

You can find out by using the command port/print in the vdeterm. You should get something like that:



 



The VMs you plug like this will be able to communicate with each other.

To enable the VMs to have an internet access, you can set up a NAT. This can be
done using slirp.

slirpvde -s /tmp/switch1.ctl --dhcp

That's it, it handles DNS, dhcp, name it, it does it :)
However, with this setup, you won't be able to access the VMs from the host, if you try and add a tap interface to the switch, it will get a dhcp and dns setup, it will get a default route rule. This will conflict with the nat and the ip forwarding won't work because the path are not correct.

You can prevent that by not using the dhcp server at all whith the tap interface. You can also use the tap interface as gateway for the VMs and enable the ip forwarding and masquerade nat rules instead of using the slirp tool (I will describe this in another post)


Sources: 

  • http://wiki.v2.cs.unibo.it/wiki/index.php/VDE_Basic_Networking
  • https://wiki.archlinux.org/index.php/QEMU#Networking_with_VDE2
  • http://wiki.qemu.org/Documentation/Networkin
  • https://xkcd.com/350/

lundi 9 février 2015

[Trivia] Debugging Qemu



Intro

First time I tried debugging Qemu, I had some problems with signals, serials, etc.

Debug

configure

Configure qemu to use debug
./configure --enable-debug --enable-fdt --target-list=arm-softmmu

 

signals

add the following lines to your .gdbinit:
handle SIGUSR1 SIGUSR2 noprint nostop 

Qemu uses those signals for timeout and internal stuff, we don't need to track them

 

serials

Then I had problems with serial lines in gdb, Reading the serial worked well but I couldn't write to it...

My workaround is to call:

socat -d -d pty pty 

You get two pty.
Call screen on the second one to be able to write to the serial port. 

References

[0] http://lnotestoself.blogspot.fr/2014/01/arm-debugging-with-qemu-and-gdb.html

samedi 10 janvier 2015

[System] Emulate a PCI device with Qemu


I wanted to learn more about Linux drivers, but in order to write your own driver, you need to own the device you want to drive and if it doesn't exist (or if it's not stable yet), you are basically screwed. But do not panic, Qemu is here for you. You just need to create the device.

I will cover the following things:
  • A PCI device supporting interrupts, MMIO and PIO regions, DMA
  • A Linux PCI driver supporting this device (kernel 3.2) 
The code of the device c
an be found at https://github.com/grandemk/qemu_devices/blob/master/hello_tic.c

The code of the driver can be found at https://github.com/grandemk/qemu_devices/blob/master/driver_pci.c

The Qemu PCI Device

 

Here is the flow of a device qemu code. (See [1] for a good explanation)



The first thing Qemu do is to register our device inside its core. this is done with the macro type_init. This macro is called before the Qemu main. What's important here is that the Typeinfo struct representing our device is passed along.
static const TypeInfo pci_hello_info = {
    .name           = TYPE_PCI_HELLO_DEV,
    .parent         = TYPE_PCI_DEVICE,
    .instance_size  = sizeof(PCIHelloDevState),
    .class_init     = pci_hellodev_class_init,
};
As our device is a PCI peripheral, we will extend a PCIDevice struct so name, parent and instance_size are for inheritance purpose and are implementations details.

The exciting thing here is the class_init function. It will be called when the device is registered. Its goal is to allow you to overwrite the virtual methods of the PCIDeviceClass with the one of your device and to set the different PCI configuration bytes which represents your device.

The PCI configuration space is used by the Bios, bootloader or the kernel depending on your configuration in order to ease the probe phase of device. It is used to declare to the world how your device work and where to write in order to communicate with it.



static void pci_hellodev_class_init(ObjectClass *klass, void *data)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
    k->init = pci_hellodev_init;
    k->exit = pci_hellodev_uninit;
    /* this identify our device */
    k->vendor_id = 0x1337;
    k->device_id = 0x0001;
    k->class_id = PCI_CLASS_OTHERS;
    set_bit(DEVICE_CATEGORY_MISC, dc->categories);

    k->revision  = 0x00;
    dc->desc = "PCI Hello World";
    /* qemu user things */
    dc->props = hello_properties;
    dc->reset = qdev_pci_hellodev_reset;
}
The vendor_id and device_id fields identifies the PCI device. They are used in PCI Linux driver to decide which driver is chosen to operate your device. The PCIDeviceClass inherits from DeviceClass which is the base class used to represent a device and the function it provides to the Qemu core.

Init, exit and reset are the basic function you will need to write for your function. Init will be called when your device is plugged and exit when it's unplugged.

Exit and reset aren't really important function for us. exit is basically used to free the memory of your device. Init is the function you want to look at.
 static int pci_hellodev_init(PCIDevice *pci_dev)
{
    /* init the internal state of the device */
    PCIHelloDevState *d = PCI_HELLO_DEV(pci_dev);
    printf("d=%lu\n", (unsigned long) &d);
    d->dma_size = 0x1ffff * sizeof(char);
    d->dma_buf = malloc(d->dma_size);
    d->id = 0x1337;
    d->threw_irq = 0;
    uint8_t *pci_conf;

    /* create the memory region representing the MMIO and PIO
     * of the device
     */

    memory_region_init_io(&d->mmio, OBJECT(d), &hello_mmio_ops, d, "hello_mmio", HELLO_MMIO_SIZE);
    memory_region_init_io(&d->io, OBJECT(d), &hello_io_ops, d, "hello_io", HELLO_IO_SIZE);

    /*
     * See linux device driver (Edition 3) for the definition of a bar
     * in the PCI bus.
     */
    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->io);
    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, &d->mmio);

    pci_conf = pci_dev->config;

   /* also in ldd, a pci device has 4 pin for interrupt
     * here we use pin B.
     */
    pci_conf[PCI_INTERRUPT_PIN] = 0x02;

    /* this device support interrupt */
    // d->irq = pci_allocate_irq(pci_dev);

    printf("Hello World loaded\n");
    return 0;
}

You can see a cast to PCIHelloDevState at the beginning. This is our own extension of the PCIDevice Class. We will store the logic of our device in it. So the first line are just initialization of the internal state of the dummy PCI device we are writing.

pci_conf[] represents the configuration space of your pci device. We just need to specify our device support interrupts on some pin (here pin B). The
Linux kernel PCI bus driver will get this information for us.

 You can check it with the command lspci -v. If you want more information, you can also look at the /sys filesystem. For more information see PCI Chapter in [0].

Qemu represents MMIO and PIO regions as MemoryRegion struct. When the driver access the memory region, the callbacks we registered with memory_region_init_io will be called.

static const MemoryRegionOps hello_mmio_ops = {
    .read = hello_mmioread,
    .write = hello_mmiowrite,
    .endianness = DEVICE_NATIVE_ENDIAN,
    .valid = {
        .min_access_size = 4,
        .max_access_size = 4,
    },
};

If the CPU tries to read the MemoryRegion, then the callback read will be called. There is one more step necessary for our MemoryRegion to be accessible from outside of the device. We have to register it in a PCI bar.

 A PCI device implements up to six I/O address regions. Each region consists of either memory or I/O locations. A bar is the PCI word for these regions. We call register_bar for this purpose.

You can request an irq with pci_allocate_irq and have an irq object or you can use the pci functions for irq.

static void hello_iowrite(void *opaque, hwaddr addr, uint64_t value, unsigned size)
{
    int i;
    PCIHelloDevState *d = (PCIHelloDevState *) opaque;
    PCIDevice *pci_dev = (PCIDevice *) opaque;
    printf("Write Ordered, addr=%x, value=%lu, size=%d\n", (unsigned) addr, value, size);
    switch (addr) {
        case 0:
            if (value) {
                /* throw an interrupt */
                printf("irq assert\n");
                d->threw_irq = 1;
                pci_irq_assert(pci_dev);
            } else {
                /* ack interrupt */
                printf("irq deassert\n");
                pci_irq_deassert(pci_dev);
                d->threw_irq = 0;
            }
            break;
        case 4:
            /* throw a random DMA */
            for ( i = 0; i < d->dma_size; ++i)
                d->dma_buf[i] = rand();
            cpu_physical_memory_write(0xa0000, (void *) d->dma_buf, d->dma_size); 
            break;
        default:
            printf("Io not used\n");
    }
}

The function iowrite, mmiowrite have the same prototype. opaque our PCIHelloDevState structure. addr is the address the driver accessed, value what was written in the memory and size is the granularity of the write: 1, 2 or 4 Bytes. ioread and mmioread follow the same principles.

You can use  pci_irq_assert and pci_irq_deassert to generate an interruptions. It's very easy.

You can use cpu_physical_memory_write to directly access the physical memory. This simulates a DMA but bypass the IOMMU system of Qemu. Its use is now deprecated, but if you want to simulate that there is no Iommu, it is the fastest way.

I will cover the IOMMU DMA in an update of this article.

Linux PCI driver


The code of the driver can be found at https://github.com/grandemk/qemu_devices/blob/master/driver_pci.c

Most of the time, to understand a driver, you need to begin from the end of the file. So let's begin with the last lines.

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("hello world");
MODULE_AUTHOR("Kevin Grandemange");

These are macros are used to describe the module. The license is not a trivial thing because some api of the kernel are only available to GPL licensed driver via the EXPORT_SYMBOL_GPL. Moreover, non GPL driver are not likely to be accepted in the mainstream branch of Linux.


module_pci_driver(hello_tic);
MODULE_DEVICE_TABLE(pci, hello_tic_ids);

There are to type of driver in Linux. The bus drivers and the device drivers. What we commonly call drivers are the device drivers. The bus driver provides an API to the device driver for them to communicate with the hardware using a specific bus. PCI is a specific type of bus. So here, module_pci_driver register our driver inside the bus driver handling all the low level pci stuff. MODULE_DEVICE_TABLE achieve this registration. Inside this table, we write what device we are able to drive. So when this device is found, our driver will be called to handle the communication.

/* vendor and device (+ subdevice and subvendor) 
 * identifies a device we support
 */
static struct pci_device_id hello_tic_ids[] = {
    { PCI_DEVICE(PCI_TIC_VENDOR, PCI_TIC_DEVICE) },
    { 0, }, /* sentinel */
};
/* id_table describe the device this driver support
 * probe is called when a device we support exist and
 * when we are chosen to drive it.
 * remove is called when the driver is unloaded or
 * when the device disappears
 */
static struct pci_driver hello_tic = {
    .name = "hello_tic",
    .id_table = hello_tic_ids,
    .probe = hello_tic_probe,
    .remove = hello_tic_remove,
};

A driver is a service based program. The pci_driver struct we provided to the pci bus driver is a set of function we support. Probe will be called when the pci bus driver find your device (PCI is really easy to handle, you don't need to search for your device, the BIOS or assimilate has already set everything in its right place).

This function is used to read the pci configuration part of the device, to request all the resources you will need for your driver (interrupts, virtual adresses for the mmio zone, etc.).

Remove is the opposite of probe and is called when the device disappear. The id table is a list of struct representing the device we support. The device are identified by their vendor id and device id.

You can use the PCI_DEVICE macro to ease the structure construction. You can also specify the subvendor and subdevice field for more precision but we don't need that here.

static int hello_tic_probe(struct pci_dev *dev, const struct pci_device_id *id)
{
    struct tic_info *info;
    info = kzalloc(sizeof(struct tic_info), GFP_KERNEL);
    if (!info)
        return -ENOMEM;

    if (pci_enable_device(dev))
        goto out_free;

    pr_alert("enabled device\n");

    if (pci_request_regions(dev, "hello_tic"))
        goto out_disable;

    pr_alert("requested regions\n");

    /* BAR 0 has IO */
    info->port[0].name = "tic-io";
    info->port[0].start = pci_resource_start(dev, 0);
    info->port[0].size = pci_resource_len(dev, 0);

    /* BAR 1 has MMIO */
    info->mem[0].name = "tic-mmio";
    info->mem[0].start = pci_ioremap_bar(dev, 1);
    info->mem[0].size = pci_resource_len(dev, 1);

    if (!info->mem[0].start)
        goto out_unrequest;   

    pr_alert("remaped addr for kernel uses\n");
    /* get device irq number */
    if (pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &info->irq))
        goto out_iounmap;
    /* request irq */
    if (devm_request_irq(&dev->dev, info->irq, hello_tic_handler, IRQF_SHARED, hello_tic.name, (void *) info))
        goto out_iounmap;

    /* get a mmio reg value and change it */
    pr_alert("device id=%x\n", ioread32(info->mem[0].start + 4));
    iowrite32(0x4567, info->mem[0].start + 4);
    pr_alert("modified device id=%x\n", ioread32(info->mem[0].start + 4));

    /* assert an irq */
    outb(1, info->port[0].start);
    /* try dma without iommu */
    outl(1, info->port[0].start + 4);

    pci_set_drvdata(dev, info);

    return 0;

out_iounmap:
    pr_alert("tic:probe_out:iounmap");
    iounmap(info->mem[0].start);
out_unrequest:
    pr_alert("tic:probe_out_unrequest\n");
    pci_release_regions(dev);
out_disable:
    pr_alert("tic:probe_out_disable\n");
    pci_disable_device(dev);
out_free:
    pr_alert("tic:probe_out_free\n");
    kfree(info);
    return -ENODEV;
}


The first things to do is in a pci driver is to call the two functions pci_enable_device and pci_request_regions.

The first one ask low-level code to enable I/O and memory and the second one reserves all these I/O and memory for our driver. You need this two function to success to continue.

 We then get the PIO with pci_ressource_start on BAR 0 (remember how we declared those in the qemu device).

The mmio part is trickier because the CPU only use virtual addresses. We need to remap the physical io address. This is done with the ioremap function. Here there is the pci_ioremap_bar helper function which does everything you need in one call.

 In order to know which interrupt our device has been assigned, we use pci_read_config_byte to read where this information is stored. After that we request the interrupt. This register our interrupt handler to the interrupt vector table code.

After this call, if an interrupt fires, all the interrupt handler linked to this interrupt will be called.

One important thing to notice is that the interrupt you are using can be shared. This means you need to verify by some means that your device really triggered this interrupt. If it didn't you will tell the kernel code that you're not interested by this interrupt.

 If you need some locking mechanism, better initialize it before calling this function, because after that, your interrupt handler can be called whenever the device feels like it.

The devm part of the request_irq means that this function is device managed and that the irq will automagically be freed when your driver is unloaded.

You can play with the mmio remapped memory with the function iowrite{8,16,32} and ioread{8,16,32} and talk with the ports of the device with the out{b,w,l} and in{b,w,l} functions.

These size are the same as the device mmio operation sizes in the qemu device. This means you can do some weird stuff in qemu where reading a 8 bits value does something radically different from reading the same addr with a 32 bit granularity.

The pci_set_drvdata allow us to store some relevant data inside the driver struct for further uses.

static void hello_tic_remove(struct pci_dev *dev)
{
    /* we get back the driver data we store in
     * the pci_dev struct */
    struct tic_info *info = pci_get_drvdata(dev);
    /* let's clean a little */
    pci_release_regions(dev);
    pci_disable_device(dev);
    iounmap(info->mem[0].start);
    kfree(info);
}

The remove function is the exact mirror of the initialization inside the probe function.

static irqreturn_t hello_tic_handler(int irq, void *dev_info)
{
    struct tic_info *info = (struct tic_info *) dev_info;
    pr_alert("IRQ %d handled\n", irq);
    /* is it our device throwing an interrupt ? */
    if (inl(info->port[0].start)) {
        /* deassert it */
        outl(0, info->port[0].start );
        return IRQ_HANDLED;
    } else {
        /* not mine, we have done nothing */
        return IRQ_NONE;
    }
}

This function is called when the interrupt we registered to fires. It is called in interrupt context. This means we may not sleep inside this function. If you need to sleep, you can use tasklets or threaded interrupts, which are easy to use.

You can see that dev_info contains the structure we specified in the devm_regist_irq. Notice the IRQ_NONE macro signaling the kernel that we don't care about this interrupt.

Have fun ;)

 

Hope you found it useful. I plan to create devices and drivers for other bus as well (platform driver, isa, usb, etc.)

 Sources


[0] Linux Device Driver (3rd Edition)
[1] http://ilevex.eu/post/88944209761/how-to-create-a-custom-pci-device-in-qemu
[3] Ctags of qemu and git log ;)
[4] http://free-electrons.com/doc/training/linux-kernel/linux-kernel-slides.pdf