Context
There was a little challenge proposed this week by a friend at Securimag.
The goal was to write an infinite loop without the while instruction in python.
You can see the original article here: https://securimag.org/wp/news/while-not-challenge/
I remember saying this challenge was pretty dumb and then spending the next three hours searching how to do the perfect infinite loop without a while, for, map, iter, infinite /dev/urandom and all that stuff. Here is my write up.
The final result is for python3. I had to choose because I was messing with the internals of python.
The challenge
No while, no for, no lambdas, no lists growing in memory, no infinite file like /dev/urandom, no infinite generator, what's left?
My intention was to find out how I could write bytecode and make python eval it in some way. I found the great dis module which enable its user to see the bytecode of a function.
import dis def test_dis(): a = 1 + 2 b = 3 + 4 return a + b dis.dis(test_dis)
gives us
4 0 LOAD_CONST 5 (3)Great, now we are talking!
3 STORE_FAST 0 (a)
5 6 LOAD_CONST 6 (7)
9 STORE_FAST 1 (b)
6 12 LOAD_FAST 0 (a)
15 LOAD_FAST 1 (b)
18 BINARY_ADD
19 RETURN_VALUE
Where is this bytecode stored? Reading the dis documentation, i find out that the python bytecode is held by code objects.
Let's find out where these code objects are hidden in the python function object.
>>> def func(): ... pass >>> dir(func) ['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__'] >>> dir(func.__code__) ['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'co_argcount', 'co_cellvars', 'co_code', 'co_consts', 'co_filename', 'co_firstlineno', 'co_flags', 'co_freevars', 'co_kwonlyargcount', 'co_lnotab', 'co_name', 'co_names', 'co_nlocals', 'co_stacksize', 'co_varnames'] >>> list(func.___code__.co_code) ['d', '\x00', '\x00', 'S'] >>> func.__code__.co_code = '10' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: readonly attribute
Dammit!
There must be a workaround for this. We need to know how the function object really work and what we can override in it.
[0] tells us that the __code__ and the func_code both represents the function code and can be overwritten. So we only need to create our own code object with the bytecode we want and we've won.
This is not complicated, because all the objects we need in order to create the new code object are already created by the function object. Moreover, someone already did this ! See [1] for more details in the creation of the new object.
The next step is to find the jump immediate equivalent opcode in python in order to build our infinite loop.
import dis def test_dis(): while True: pass dis.dis(test_dis) print(list(test_dis.__code__.co_code))
We get the following result:
4 0 SETUP_LOOP 3 (to 6)
5 >> 3 JUMP_ABSOLUTE 3
>> 6 LOAD_CONST 0 (None)
9 RETURN_VALUE
[120, 3, 0, 113, 3, 0, 100, 0, 0, 83]
So "JUMP_ABSOLUTE 3" opcode is [113, 3, 0], an infinite loop could be a unique opcode [113, 0, 0]. The opcode would jump on itself.
The final code:
import dis def func(): this_code_is_never_executed return tralala # we redefine the code object to make it mutable # code object is an implementation detail and differs between python versions fco = func.__code__ func_code = list(fco.co_code) # jump absolute based infinite loop func_code = [ 113, 0, 0 ] # we define a new instance of code object # similar to the previous one but # with our modified bytecode func.__code__ = type(fco)( fco.co_argcount, fco.co_kwonlyargcount, fco.co_nlocals, fco.co_stacksize, fco.co_flags, bytes(func_code), fco.co_consts, fco.co_names, fco.co_varnames, fco.co_filename, fco.co_name, fco.co_firstlineno, fco.co_lnotab, fco.co_freevars, fco.co_cellvars ) # did it work ? print(list(func.__code__.co_code)) dis.dis(func) # infinite loop func()Challenge Complete :-)
I had fun looking a bit in python internals, I hope you did too ;p
References
[0] https://docs.python.org/3/reference/datamodel.html[1] http://www.jonathon-vogel.com/posts/patching_function_bytecode_with_python/
Aucun commentaire:
Enregistrer un commentaire