Load binaries in the Miasm VM

Miasm is a reverse engineering framework which, among many things, provides a full emulation engine for many architectures. Miasm already provides loaders for PE and ELF binaries for this engine, that loads the binaries into his own VM (virtual machine). In this section, we’ll see how QBDL can be used to load MachO binaries into the Miasm VM, and then use the emulation engine to actually run the binary.

The full example can be downloaded here.

Instantiate the Miasm emulation engine

This step is mainly based on an example in Miasm’s repository:

from miasm.jitter.csts import PAGE_READ, PAGE_WRITE, EXCEPT_SYSCALL
from miasm.analysis.machine import Machine
from miasm.core.locationdb import LocationDB
from miasm.jitter.csts import PAGE_READ, PAGE_WRITE, EXCEPT_SYSCALL
from miasm.core.utils import pck64, upck64
from miasm.os_dep import linux_stdlib
loc_db = LocationDB()
myjit = Machine("x86_64").jitter(loc_db, args.jitter)
myjit.init_stack()

What’s happening here is the initialisation of an x64 Miasm emulator. We also allocate a simple stack. We can then call print(myjit.vm) to see the current memory mapping of the VM:

Addr               Size               Access Comment
0x1230000          0x10000            RW_    Stack

Define the target machine

We will now define a QBDL target machine to inject the loaded binary into the Miasm VM.

First, we need to define how memory is handled, through the creation of a TargetMemory-based class:

class MiasmVM(pyqbdl.TargetMemory):
    def __init__(self, vm):
        super().__init__()
        self.vm = vm

    def mmap(self, ptr, size):
        self.vm.add_memory_page(ptr, PAGE_READ | PAGE_WRITE, b"\x00"*size)
        return ptr

    def mprotect(self, ptr, size, access):
        return True

    def write(self, ptr, data):
        self.vm.set_mem(ptr, bytes(data))

    def read(self, ptr, size):
        return self.vm.get_mem(ptr, size)

This is what allows us to load the binary inside Miasm’s VM, and not in the running process.

We now need to define the final target machine, with the resolution of external symbols, through the creation of a TargetSystem-based class:

class MiasmSystem(pyqbdl.TargetSystem):
    def __init__(self, jit, arch):
        super().__init__(MiasmVM(jit.vm))
        self.externs = ExternFuncs(jit)
        self.arch = arch

    def symlink(self, loader, sym):
        return self.externs.symlink(loader, sym)

    def supports(self, bin_):
        return pyqbdl.Arch.from_bin(bin_) == self.arch

    def base_address_hint(self, bin_ba, vsize):
        return bin_ba 

ExternFuncs is a class that handles the calls to external functions. What we do is adding a breakpoint (on an arbitrary address of your choice) in Miasm’s VM for each external symbol. Each of this breakpoint will call a Python function that mimic the behavior of the external function. Miasm already provides such implementation for a part of C library. This is far from complete but will be sufficient for our example.

Here is the code that handles this part:

    def symlink(self, loader, sym) -> int:
        return self.add(sym.name)

    def add(self, name):
        ret = self.name2addr.get(name, None)
        if ret is not None:
            return ret
        addr = self.curAddr
        self.curAddr += 8
        self.name2addr[name] = addr

        self.jitter.add_breakpoint(addr, lambda jitter: self.call(jitter, name))
        return addr

When a breakpoint is hit, it will execute the ExternFuncs.call function, which simply does:

    def call(self, jitter, name):
        print("Calling %s" % name)
        getattr(linux_stdlib, "xxx%s" % name)(jitter)
        return True

Load the actual binary

This is now time to put all the pieces together and use QBDL to load the binary inside our initialized Miasm VM, using a MiasmSystem object:

    def call(self, jitter, name):
        print("Calling %s" % name)
        getattr(linux_stdlib, "xxx%s" % name)(jitter)
        return True

After this initialization, we can see that a new mapping appeared in the Miasm VM, in 0x100000000 (which is the base address in our MachO example binary):

Addr               Size               Access Comment
0x1230000          0x10000            RW_    Stack
0x100000000        0x3000             RW_

Last but not least, run the code!

The last thing we need to do is to tell Miasm to run the code at entrypoint:

def code_sentinelle(jitter):
    print("[+] End!")
    return False

myjit.push_uint64_t(0x1337beef)
myjit.add_breakpoint(0x1337beef, code_sentinelle)
myjit.run(loader.entrypoint)

Using this example MachO binary, which is supposed to print the coucou string, here is the output:

Addr               Size               Access Comment
0x1230000          0x10000            RW_    Stack

Loading examples/macho-o-x86-64-hello.bin
Virtual size: 0x3000
Mapping __PAGEZERO - 0x0
Mapping __TEXT - 0x0
Mapping __DATA - 0x1000
Mapping __LINKEDIT - 0x2000
Symbol dyld_stub_binder resolves to address 0x700000000000, stored at address 0x100001000
Symbol _puts resolves to address 0x700000000008, stored at address 0x100001010
Addr               Size               Access Comment
0x1230000          0x10000            RW_    Stack
0x100000000        0x3000             RW_

Calling _puts
coucou
[+] End!