samedi 31 janvier 2015

[write-up] While Not Challenge


Context

 

There was a little challenge proposed this week by a friend at Securimag.
The goal was to write an infinite loop without the while instruction in python.

You can see the original article here: https://securimag.org/wp/news/while-not-challenge/

I remember saying this challenge was pretty dumb and then spending the next three hours searching how to do the perfect infinite loop without a while, for, map, iter, infinite /dev/urandom and all that stuff. Here is my write up.

The final result is for python3. I had to choose because I was messing with the internals of python.

The challenge 

 

No while, no for, no lambdas, no lists growing in memory, no infinite file like /dev/urandom, no infinite generator, what's left?













My intention was to find out how I could write bytecode and make python eval it in some way. I found the great dis module which enable its user to see the bytecode of a function.

import dis

def test_dis():
    a = 1 + 2
    b = 3 + 4
    return a + b


dis.dis(test_dis)

gives us

 4           0 LOAD_CONST              5 (3)
              3 STORE_FAST               0 (a)

  5          6 LOAD_CONST              6 (7)
              9 STORE_FAST               1 (b)

  6         12 LOAD_FAST                0 (a)
             15 LOAD_FAST                1 (b)
             18 BINARY_ADD
             19 RETURN_VALUE
Great, now we are talking!

Where is this bytecode stored? Reading the dis documentation, i find out that the python bytecode is held by code objects.

Let's find out where these code objects are hidden in the python function object.
>>> def func():
...   pass

>>> dir(func)
['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']

>>> dir(func.__code__)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags', 'co_freevars', 'co_kwonlyargcount', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames']

>>> list(func.___code__.co_code)
['d', '\x00', '\x00', 'S'] 

>>> func.__code__.co_code = '10'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: readonly attribute

Dammit!

There must be a workaround for this. We need to know  how the function object really work and what we can override in it.

[0] tells us that the __code__ and the func_code both represents the function code and can be overwritten. So we only need to create our own code object with the bytecode we want and we've won.

This is not complicated, because all the objects we need in order to create the new code object are already created by the function object. Moreover, someone already did this ! See [1] for more details in the creation of the new object.

The next step is to find the jump immediate equivalent opcode in python in order to build our infinite loop.

import dis

def test_dis():
    while True:
        pass

dis.dis(test_dis)
print(list(test_dis.__code__.co_code))

We get the following result:

  4              0 SETUP_LOOP                   3 (to 6)

  5     >>    3 JUMP_ABSOLUTE            3
         >>    6 LOAD_CONST                  0 (None)
                  9 RETURN_VALUE
[120, 3, 0, 113, 3, 0, 100, 0, 0, 83]

So  "JUMP_ABSOLUTE 3" opcode is [113, 3, 0], an infinite loop could be a unique opcode [113, 0, 0]. The opcode would jump on itself.

The final code:
import dis

def func():
    this_code_is_never_executed
    return tralala


# we redefine the code object to make it mutable
# code object is an implementation detail and differs between python versions
fco = func.__code__
func_code = list(fco.co_code)
# jump absolute based infinite loop
func_code = [ 113, 0, 0 ]

# we define a new instance of code object
# similar to the previous one but
# with our modified bytecode
func.__code__ = type(fco)(
        fco.co_argcount,
        fco.co_kwonlyargcount,
        fco.co_nlocals,
        fco.co_stacksize,
        fco.co_flags,
        bytes(func_code),
        fco.co_consts,
        fco.co_names,
        fco.co_varnames,
        fco.co_filename,
        fco.co_name,
        fco.co_firstlineno,
        fco.co_lnotab,
        fco.co_freevars,
        fco.co_cellvars
)

# did it work ?
print(list(func.__code__.co_code))
dis.dis(func)

# infinite loop
func()

Challenge Complete :-)










I had fun looking a bit in python internals, I hope you did too ;p

References

[0] https://docs.python.org/3/reference/datamodel.html
[1] http://www.jonathon-vogel.com/posts/patching_function_bytecode_with_python/
 

samedi 10 janvier 2015

[System] Emulate a PCI device with Qemu


I wanted to learn more about Linux drivers, but in order to write your own driver, you need to own the device you want to drive and if it doesn't exist (or if it's not stable yet), you are basically screwed. But do not panic, Qemu is here for you. You just need to create the device.

I will cover the following things:
  • A PCI device supporting interrupts, MMIO and PIO regions, DMA
  • A Linux PCI driver supporting this device (kernel 3.2) 
The code of the device c
an be found at https://github.com/grandemk/qemu_devices/blob/master/hello_tic.c

The code of the driver can be found at https://github.com/grandemk/qemu_devices/blob/master/driver_pci.c

The Qemu PCI Device

 

Here is the flow of a device qemu code. (See [1] for a good explanation)



The first thing Qemu do is to register our device inside its core. this is done with the macro type_init. This macro is called before the Qemu main. What's important here is that the Typeinfo struct representing our device is passed along.
static const TypeInfo pci_hello_info = {
    .name           = TYPE_PCI_HELLO_DEV,
    .parent         = TYPE_PCI_DEVICE,
    .instance_size  = sizeof(PCIHelloDevState),
    .class_init     = pci_hellodev_class_init,
};
As our device is a PCI peripheral, we will extend a PCIDevice struct so name, parent and instance_size are for inheritance purpose and are implementations details.

The exciting thing here is the class_init function. It will be called when the device is registered. Its goal is to allow you to overwrite the virtual methods of the PCIDeviceClass with the one of your device and to set the different PCI configuration bytes which represents your device.

The PCI configuration space is used by the Bios, bootloader or the kernel depending on your configuration in order to ease the probe phase of device. It is used to declare to the world how your device work and where to write in order to communicate with it.



static void pci_hellodev_class_init(ObjectClass *klass, void *data)
{
    DeviceClass *dc = DEVICE_CLASS(klass);
    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
    k->init = pci_hellodev_init;
    k->exit = pci_hellodev_uninit;
    /* this identify our device */
    k->vendor_id = 0x1337;
    k->device_id = 0x0001;
    k->class_id = PCI_CLASS_OTHERS;
    set_bit(DEVICE_CATEGORY_MISC, dc->categories);

    k->revision  = 0x00;
    dc->desc = "PCI Hello World";
    /* qemu user things */
    dc->props = hello_properties;
    dc->reset = qdev_pci_hellodev_reset;
}
The vendor_id and device_id fields identifies the PCI device. They are used in PCI Linux driver to decide which driver is chosen to operate your device. The PCIDeviceClass inherits from DeviceClass which is the base class used to represent a device and the function it provides to the Qemu core.

Init, exit and reset are the basic function you will need to write for your function. Init will be called when your device is plugged and exit when it's unplugged.

Exit and reset aren't really important function for us. exit is basically used to free the memory of your device. Init is the function you want to look at.
 static int pci_hellodev_init(PCIDevice *pci_dev)
{
    /* init the internal state of the device */
    PCIHelloDevState *d = PCI_HELLO_DEV(pci_dev);
    printf("d=%lu\n", (unsigned long) &d);
    d->dma_size = 0x1ffff * sizeof(char);
    d->dma_buf = malloc(d->dma_size);
    d->id = 0x1337;
    d->threw_irq = 0;
    uint8_t *pci_conf;

    /* create the memory region representing the MMIO and PIO
     * of the device
     */

    memory_region_init_io(&d->mmio, OBJECT(d), &hello_mmio_ops, d, "hello_mmio", HELLO_MMIO_SIZE);
    memory_region_init_io(&d->io, OBJECT(d), &hello_io_ops, d, "hello_io", HELLO_IO_SIZE);

    /*
     * See linux device driver (Edition 3) for the definition of a bar
     * in the PCI bus.
     */
    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO, &d->io);
    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, &d->mmio);

    pci_conf = pci_dev->config;

   /* also in ldd, a pci device has 4 pin for interrupt
     * here we use pin B.
     */
    pci_conf[PCI_INTERRUPT_PIN] = 0x02;

    /* this device support interrupt */
    // d->irq = pci_allocate_irq(pci_dev);

    printf("Hello World loaded\n");
    return 0;
}

You can see a cast to PCIHelloDevState at the beginning. This is our own extension of the PCIDevice Class. We will store the logic of our device in it. So the first line are just initialization of the internal state of the dummy PCI device we are writing.

pci_conf[] represents the configuration space of your pci device. We just need to specify our device support interrupts on some pin (here pin B). The
Linux kernel PCI bus driver will get this information for us.

 You can check it with the command lspci -v. If you want more information, you can also look at the /sys filesystem. For more information see PCI Chapter in [0].

Qemu represents MMIO and PIO regions as MemoryRegion struct. When the driver access the memory region, the callbacks we registered with memory_region_init_io will be called.

static const MemoryRegionOps hello_mmio_ops = {
    .read = hello_mmioread,
    .write = hello_mmiowrite,
    .endianness = DEVICE_NATIVE_ENDIAN,
    .valid = {
        .min_access_size = 4,
        .max_access_size = 4,
    },
};

If the CPU tries to read the MemoryRegion, then the callback read will be called. There is one more step necessary for our MemoryRegion to be accessible from outside of the device. We have to register it in a PCI bar.

 A PCI device implements up to six I/O address regions. Each region consists of either memory or I/O locations. A bar is the PCI word for these regions. We call register_bar for this purpose.

You can request an irq with pci_allocate_irq and have an irq object or you can use the pci functions for irq.

static void hello_iowrite(void *opaque, hwaddr addr, uint64_t value, unsigned size)
{
    int i;
    PCIHelloDevState *d = (PCIHelloDevState *) opaque;
    PCIDevice *pci_dev = (PCIDevice *) opaque;
    printf("Write Ordered, addr=%x, value=%lu, size=%d\n", (unsigned) addr, value, size);
    switch (addr) {
        case 0:
            if (value) {
                /* throw an interrupt */
                printf("irq assert\n");
                d->threw_irq = 1;
                pci_irq_assert(pci_dev);
            } else {
                /* ack interrupt */
                printf("irq deassert\n");
                pci_irq_deassert(pci_dev);
                d->threw_irq = 0;
            }
            break;
        case 4:
            /* throw a random DMA */
            for ( i = 0; i < d->dma_size; ++i)
                d->dma_buf[i] = rand();
            cpu_physical_memory_write(0xa0000, (void *) d->dma_buf, d->dma_size); 
            break;
        default:
            printf("Io not used\n");
    }
}

The function iowrite, mmiowrite have the same prototype. opaque our PCIHelloDevState structure. addr is the address the driver accessed, value what was written in the memory and size is the granularity of the write: 1, 2 or 4 Bytes. ioread and mmioread follow the same principles.

You can use  pci_irq_assert and pci_irq_deassert to generate an interruptions. It's very easy.

You can use cpu_physical_memory_write to directly access the physical memory. This simulates a DMA but bypass the IOMMU system of Qemu. Its use is now deprecated, but if you want to simulate that there is no Iommu, it is the fastest way.

I will cover the IOMMU DMA in an update of this article.

Linux PCI driver


The code of the driver can be found at https://github.com/grandemk/qemu_devices/blob/master/driver_pci.c

Most of the time, to understand a driver, you need to begin from the end of the file. So let's begin with the last lines.

MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("hello world");
MODULE_AUTHOR("Kevin Grandemange");

These are macros are used to describe the module. The license is not a trivial thing because some api of the kernel are only available to GPL licensed driver via the EXPORT_SYMBOL_GPL. Moreover, non GPL driver are not likely to be accepted in the mainstream branch of Linux.


module_pci_driver(hello_tic);
MODULE_DEVICE_TABLE(pci, hello_tic_ids);

There are to type of driver in Linux. The bus drivers and the device drivers. What we commonly call drivers are the device drivers. The bus driver provides an API to the device driver for them to communicate with the hardware using a specific bus. PCI is a specific type of bus. So here, module_pci_driver register our driver inside the bus driver handling all the low level pci stuff. MODULE_DEVICE_TABLE achieve this registration. Inside this table, we write what device we are able to drive. So when this device is found, our driver will be called to handle the communication.

/* vendor and device (+ subdevice and subvendor) 
 * identifies a device we support
 */
static struct pci_device_id hello_tic_ids[] = {
    { PCI_DEVICE(PCI_TIC_VENDOR, PCI_TIC_DEVICE) },
    { 0, }, /* sentinel */
};
/* id_table describe the device this driver support
 * probe is called when a device we support exist and
 * when we are chosen to drive it.
 * remove is called when the driver is unloaded or
 * when the device disappears
 */
static struct pci_driver hello_tic = {
    .name = "hello_tic",
    .id_table = hello_tic_ids,
    .probe = hello_tic_probe,
    .remove = hello_tic_remove,
};

A driver is a service based program. The pci_driver struct we provided to the pci bus driver is a set of function we support. Probe will be called when the pci bus driver find your device (PCI is really easy to handle, you don't need to search for your device, the BIOS or assimilate has already set everything in its right place).

This function is used to read the pci configuration part of the device, to request all the resources you will need for your driver (interrupts, virtual adresses for the mmio zone, etc.).

Remove is the opposite of probe and is called when the device disappear. The id table is a list of struct representing the device we support. The device are identified by their vendor id and device id.

You can use the PCI_DEVICE macro to ease the structure construction. You can also specify the subvendor and subdevice field for more precision but we don't need that here.

static int hello_tic_probe(struct pci_dev *dev, const struct pci_device_id *id)
{
    struct tic_info *info;
    info = kzalloc(sizeof(struct tic_info), GFP_KERNEL);
    if (!info)
        return -ENOMEM;

    if (pci_enable_device(dev))
        goto out_free;

    pr_alert("enabled device\n");

    if (pci_request_regions(dev, "hello_tic"))
        goto out_disable;

    pr_alert("requested regions\n");

    /* BAR 0 has IO */
    info->port[0].name = "tic-io";
    info->port[0].start = pci_resource_start(dev, 0);
    info->port[0].size = pci_resource_len(dev, 0);

    /* BAR 1 has MMIO */
    info->mem[0].name = "tic-mmio";
    info->mem[0].start = pci_ioremap_bar(dev, 1);
    info->mem[0].size = pci_resource_len(dev, 1);

    if (!info->mem[0].start)
        goto out_unrequest;   

    pr_alert("remaped addr for kernel uses\n");
    /* get device irq number */
    if (pci_read_config_byte(dev, PCI_INTERRUPT_LINE, &info->irq))
        goto out_iounmap;
    /* request irq */
    if (devm_request_irq(&dev->dev, info->irq, hello_tic_handler, IRQF_SHARED, hello_tic.name, (void *) info))
        goto out_iounmap;

    /* get a mmio reg value and change it */
    pr_alert("device id=%x\n", ioread32(info->mem[0].start + 4));
    iowrite32(0x4567, info->mem[0].start + 4);
    pr_alert("modified device id=%x\n", ioread32(info->mem[0].start + 4));

    /* assert an irq */
    outb(1, info->port[0].start);
    /* try dma without iommu */
    outl(1, info->port[0].start + 4);

    pci_set_drvdata(dev, info);

    return 0;

out_iounmap:
    pr_alert("tic:probe_out:iounmap");
    iounmap(info->mem[0].start);
out_unrequest:
    pr_alert("tic:probe_out_unrequest\n");
    pci_release_regions(dev);
out_disable:
    pr_alert("tic:probe_out_disable\n");
    pci_disable_device(dev);
out_free:
    pr_alert("tic:probe_out_free\n");
    kfree(info);
    return -ENODEV;
}


The first things to do is in a pci driver is to call the two functions pci_enable_device and pci_request_regions.

The first one ask low-level code to enable I/O and memory and the second one reserves all these I/O and memory for our driver. You need this two function to success to continue.

 We then get the PIO with pci_ressource_start on BAR 0 (remember how we declared those in the qemu device).

The mmio part is trickier because the CPU only use virtual addresses. We need to remap the physical io address. This is done with the ioremap function. Here there is the pci_ioremap_bar helper function which does everything you need in one call.

 In order to know which interrupt our device has been assigned, we use pci_read_config_byte to read where this information is stored. After that we request the interrupt. This register our interrupt handler to the interrupt vector table code.

After this call, if an interrupt fires, all the interrupt handler linked to this interrupt will be called.

One important thing to notice is that the interrupt you are using can be shared. This means you need to verify by some means that your device really triggered this interrupt. If it didn't you will tell the kernel code that you're not interested by this interrupt.

 If you need some locking mechanism, better initialize it before calling this function, because after that, your interrupt handler can be called whenever the device feels like it.

The devm part of the request_irq means that this function is device managed and that the irq will automagically be freed when your driver is unloaded.

You can play with the mmio remapped memory with the function iowrite{8,16,32} and ioread{8,16,32} and talk with the ports of the device with the out{b,w,l} and in{b,w,l} functions.

These size are the same as the device mmio operation sizes in the qemu device. This means you can do some weird stuff in qemu where reading a 8 bits value does something radically different from reading the same addr with a 32 bit granularity.

The pci_set_drvdata allow us to store some relevant data inside the driver struct for further uses.

static void hello_tic_remove(struct pci_dev *dev)
{
    /* we get back the driver data we store in
     * the pci_dev struct */
    struct tic_info *info = pci_get_drvdata(dev);
    /* let's clean a little */
    pci_release_regions(dev);
    pci_disable_device(dev);
    iounmap(info->mem[0].start);
    kfree(info);
}

The remove function is the exact mirror of the initialization inside the probe function.

static irqreturn_t hello_tic_handler(int irq, void *dev_info)
{
    struct tic_info *info = (struct tic_info *) dev_info;
    pr_alert("IRQ %d handled\n", irq);
    /* is it our device throwing an interrupt ? */
    if (inl(info->port[0].start)) {
        /* deassert it */
        outl(0, info->port[0].start );
        return IRQ_HANDLED;
    } else {
        /* not mine, we have done nothing */
        return IRQ_NONE;
    }
}

This function is called when the interrupt we registered to fires. It is called in interrupt context. This means we may not sleep inside this function. If you need to sleep, you can use tasklets or threaded interrupts, which are easy to use.

You can see that dev_info contains the structure we specified in the devm_regist_irq. Notice the IRQ_NONE macro signaling the kernel that we don't care about this interrupt.

Have fun ;)

 

Hope you found it useful. I plan to create devices and drivers for other bus as well (platform driver, isa, usb, etc.)

 Sources


[0] Linux Device Driver (3rd Edition)
[1] http://ilevex.eu/post/88944209761/how-to-create-a-custom-pci-device-in-qemu
[3] Ctags of qemu and git log ;)
[4] http://free-electrons.com/doc/training/linux-kernel/linux-kernel-slides.pdf