samedi 31 janvier 2015

[write-up] While Not Challenge


Context

 

There was a little challenge proposed this week by a friend at Securimag.
The goal was to write an infinite loop without the while instruction in python.

You can see the original article here: https://securimag.org/wp/news/while-not-challenge/

I remember saying this challenge was pretty dumb and then spending the next three hours searching how to do the perfect infinite loop without a while, for, map, iter, infinite /dev/urandom and all that stuff. Here is my write up.

The final result is for python3. I had to choose because I was messing with the internals of python.

The challenge 

 

No while, no for, no lambdas, no lists growing in memory, no infinite file like /dev/urandom, no infinite generator, what's left?













My intention was to find out how I could write bytecode and make python eval it in some way. I found the great dis module which enable its user to see the bytecode of a function.

import dis

def test_dis():
    a = 1 + 2
    b = 3 + 4
    return a + b


dis.dis(test_dis)

gives us

 4           0 LOAD_CONST              5 (3)
              3 STORE_FAST               0 (a)

  5          6 LOAD_CONST              6 (7)
              9 STORE_FAST               1 (b)

  6         12 LOAD_FAST                0 (a)
             15 LOAD_FAST                1 (b)
             18 BINARY_ADD
             19 RETURN_VALUE
Great, now we are talking!

Where is this bytecode stored? Reading the dis documentation, i find out that the python bytecode is held by code objects.

Let's find out where these code objects are hidden in the python function object.
>>> def func():
...   pass

>>> dir(func)
['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']

>>> dir(func.__code__)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags', 'co_freevars', 'co_kwonlyargcount', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames']

>>> list(func.___code__.co_code)
['d', '\x00', '\x00', 'S'] 

>>> func.__code__.co_code = '10'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: readonly attribute

Dammit!

There must be a workaround for this. We need to know  how the function object really work and what we can override in it.

[0] tells us that the __code__ and the func_code both represents the function code and can be overwritten. So we only need to create our own code object with the bytecode we want and we've won.

This is not complicated, because all the objects we need in order to create the new code object are already created by the function object. Moreover, someone already did this ! See [1] for more details in the creation of the new object.

The next step is to find the jump immediate equivalent opcode in python in order to build our infinite loop.

import dis

def test_dis():
    while True:
        pass

dis.dis(test_dis)
print(list(test_dis.__code__.co_code))

We get the following result:

  4              0 SETUP_LOOP                   3 (to 6)

  5     >>    3 JUMP_ABSOLUTE            3
         >>    6 LOAD_CONST                  0 (None)
                  9 RETURN_VALUE
[120, 3, 0, 113, 3, 0, 100, 0, 0, 83]

So  "JUMP_ABSOLUTE 3" opcode is [113, 3, 0], an infinite loop could be a unique opcode [113, 0, 0]. The opcode would jump on itself.

The final code:
import dis

def func():
    this_code_is_never_executed
    return tralala


# we redefine the code object to make it mutable
# code object is an implementation detail and differs between python versions
fco = func.__code__
func_code = list(fco.co_code)
# jump absolute based infinite loop
func_code = [ 113, 0, 0 ]

# we define a new instance of code object
# similar to the previous one but
# with our modified bytecode
func.__code__ = type(fco)(
        fco.co_argcount,
        fco.co_kwonlyargcount,
        fco.co_nlocals,
        fco.co_stacksize,
        fco.co_flags,
        bytes(func_code),
        fco.co_consts,
        fco.co_names,
        fco.co_varnames,
        fco.co_filename,
        fco.co_name,
        fco.co_firstlineno,
        fco.co_lnotab,
        fco.co_freevars,
        fco.co_cellvars
)

# did it work ?
print(list(func.__code__.co_code))
dis.dis(func)

# infinite loop
func()

Challenge Complete :-)










I had fun looking a bit in python internals, I hope you did too ;p

References

[0] https://docs.python.org/3/reference/datamodel.html
[1] http://www.jonathon-vogel.com/posts/patching_function_bytecode_with_python/
 

Aucun commentaire:

Enregistrer un commentaire