Getting Started

Warning

This library is experimental. Most code has been writing toward satisfying PASTIS objectives. Thus it lacks many features and the API is subject to changes.

Tritondse works in pure emulation so it theoretically have to model all program side-effects syscalls etc. This is not possible so it works in a very opportunistic manner. As of now limitations are the following:

Limited to Linux ELF binaries
Only support a subset of libc function calls (no C++ runtime functions)
No modeling of syscalls, (they have to be modeled manually)
TritonDSE have the same weaknesses than Triton in model floating point, or some specific instruction sets.

I. Program

For this tutorial the following crackme is going to be used:

binary

We first need to create a Program object which is a thin wrapper on LIEF.

[119]:

from tritondse import Program

p = Program("crackme_xor")

print(p.architecture, p.endianness)
print(hex(p.entry_point))

Architecture.X86_64 ENDIANNESS.LITTLE
0x400460

By default Program only exposes the few fields required to peform the loading of the program. That includes segments or imported functions. The main utility is being able to retrieve a function object (as LIEF object).

Note

To load a program with its shared libraries dependencies, the cle loader (see: CleLoader) shall be used.

[120]:

p.find_function_addr("main")

[120]:

[121]:

for seg in p.memory_segments():
    print(f"0x{seg.address:x} size:{len(seg.content)}")

0x400000 size:2036
0x600e10 size:576

II. Single Execution

Now we need to load the program in the TritonDSE context representing containing both the concrete and the symbolic state. A context is a ProcessState object. It provides all primitives to read or write both registers and memory in a concrete or symbolic manner.

Thus, we need to load the program in such context and then starting the symbolic emulation at its entrypoint by providing it a concrete input. Hopefully the loading and execution steps are done transparently by the SymbolicExecutor object.

[123]:

from tritondse import SymbolicExecutor, Config, Seed, CompositeData

config = Config()
seed = Seed(CompositeData(argv=[b"Hello", b"world"]))

executor = SymbolicExecutor(config, seed)
executor.load(p)

WARNING:root:symbol __gmon_start__ imported but unsupported

This object provides features to perform a single execution using the given configuration on the process state using the given program.

The run methods will take care of loading the program, performing the dynamic relocations (PLT stuff etc) and then to start emulating from the entrypoint.

[124]:

executor.run()

We now have successfully performed a single run of our program. After execution, the ProcessState has been updated and represent the program after execution.

and a CoverageSingleRun as been produced which represent to coverage generated by the execution.

[ ]:

pstate = executor.pstate  # ProcessState object

The execution also produces a CoverageSingleRun which represent to coverage generated by the execution.

[125]:

executor.coverage.total_instruction_executed

[125]:

[126]:

executor.exitcode

[126]:

III. Concrete State Manipulation

Both the concrete state and symbolic state can be modified at any time during the execution.

[127]:

from tritondse.types import Architecture

[128]:

pstate = executor.pstate
print(f"arch: {pstate.architecture.name}  ptrsize:{pstate.ptr_bit_size}")

arch: X86_64  ptrsize:64

a. Reading, writing registers (function API)

Most of the API enables addressing register either by an enum identifier (triton one) or directly with theirs string.

[129]:

pstate.registers.rax

[129]:

rax:64 bv[63..0]

A ProcessState also provides some alias to access program counter, stack register, base pointer or return register in a portable way.

[130]:

pstate.program_counter_register, \
pstate.base_pointer_register, \
pstate.stack_pointer_register, \
pstate.return_register

[130]:

(rip:64 bv[63..0], rbp:64 bv[63..0], rsp:64 bv[63..0], rax:64 bv[63..0])

Then both concrete an symbolic values can be modified using a function-style API.

[131]:

pstate.write_register(pstate.registers.rax, 0xdeadbeef)

hex(pstate.read_register(pstate.registers.rax))

[131]:

'0xdeadbeef'

b. Reading, writing registers (Pythonic API)

To ease manipulation of the registers concrete values, the ProcessState introduces a cpu attributes that transparently updates the underlying triton context.

[132]:

pstate.cpu.rax

[132]:

3735928559

[133]:

pstate.cpu.rax += 4

pstate.cpu.rax

[133]:

3735928563

[134]:

print(f"RIP: 0x{pstate.cpu.program_counter:x}")

RIP: 0x400489

c. Reading, writing memory

The memory field of a ProcessState allows reading and writing memory. pstate.memory.read returns bytes and all the other return an int. pstate.memory.read_int and the others return an integer and take endianness into account.

[135]:

print(pstate.memory.read(p.entry_point, 10))  # Reads 10 bytes
print(hex(pstate.memory.read_ptr(p.entry_point)))  # Reads a pointer size
print(pstate.memory.read_int(p.entry_point))  # Reads a 4 bytes signed int
print(pstate.memory.read_uint(p.entry_point))  # Reads a 4 bytes unsigned int
print(pstate.memory.read_char(p.entry_point))  # Reads a 1 bytes signed char
print(pstate.memory.read_uchar(p.entry_point))  # Reads a 1 bytes unsigned char
print(pstate.memory.read_word(p.entry_point))  # Reads a 2 bytes unsigned integer
print(pstate.memory.read_dword(p.entry_point))  # Reads a 4 bytes unsigned integer
print(pstate.memory.read_qword(p.entry_point))  # Reads a 8 bytes unsigned integer

b'1\xedI\x89\xd1^H\x89\xe2H'
0x89485ed18949ed31
-1991643855
2303323441
49
49
60721
2303323441
9892260835563793713

Analogous functions exists for writing: write, write_int, write_ptr, write_long etc.

One can also use the slice interface of the memory to both read and write memory

[ ]:

data = pstate.memory[p.entry_point:10]   # Read 10 bytes in memory

A ProcessState object also enables checking whether an address is mapped in memory:

[136]:

pstate.memory.is_mapped(p.entry_point), pstate.memory.is_mapped(0)

[136]:

(True, False)

IV. Manipulating symbolic state

Both symbolic registers and symbolic memory can be manipulated in a similar fashion than the concrete state.

One should be cautious when manipulating the symbolic state to keep it consistent with the concrete state.

Symbolic values can be read written with a similar API than concrete state.

[137]:

new_sym = pstate.actx.bv(32, 64)  # new symbolic expression representing a constant

pstate.write_symbolic_register(pstate.registers.rax, new_sym)  # the expression can either be a AstNode or SymbolicExpression triton object

pstate.read_symbolic_register(pstate.registers.rax)

[137]:

(define-fun ref!116 () (_ BitVec 64) (_ bv32 64)) ; assign rax:

The same can be done on memory with read_symbolic_memory_byte, read_symbolic_memory_bytes, read_symbolic_memory_int and theirs equivalent for writing.

Disclaimer: Writing an arbitrary symbolic value in a register or memory might break soundness, and the dependency with previous definition of the variable. In standard usage a user, is usually not supposed to modify symbolic values but rather to concretize values or adding new constraints in the path predicate.

We also can push our own constraints in the symbolic state.

[139]:

sym_rax = pstate.read_symbolic_register(pstate.registers.rax)

constraint = sym_rax.getAst() == 4
print(constraint)

pstate.push_constraint(constraint)

(= (_ bv32 64) (_ bv4 64))

V. Configuration

As seen before, a SymbolicExecutor takes a a Config object as input. It tunes multiple parameters that will be used during execution and exploration. These parameters are the following:

seed_format (SeedFormat) : Indicates whether to use RAW or COMPOSITE seeds (see the seeds tutorial for more detail). For now, just note that to provide stdin input, a RAW seed can be used, for anything else (argv, file content etc.), a COMPOSITE seed is needed.
pipe_stdout (bool): Pipe the program stdout to Python’s stdout
pipe_stderr (bool): Pipe the program stderr to Python’s stderr
skip_sleep_routine (bool): Whether to emulate sleeps routine or to skip it
smt_timeout (int): Timeout for a single SMT query in milliseconds
execution_timeout (int): Timeout of a single execution (in secs)
exploration_timeout (int): Overall timeout of the exploration (in secs)*
exploration_limit (int): Number of execution iterations. (0 means unlimited)
thread_scheduling (int): Number of instructions to execute before switching to the next thread
smt_queries_limit (int): Limit of SMT queries to perform for a single execution
coverage_strategy (CoverageStrategy): Coverage strategy to apply for the whole exploration
branch_solving_strategy (BranchCheckStrategy): Branch solving strategy to apply for a single execution
workspace (str): Workspace directory to use
program_argv (List[str]): Concrete program argument as given on the command line
time_inc_coefficient (float): Execution time of each instruction (for rdtsc)

[140]:

from tritondse import SeedFormat
c = Config()
c.seed_format=SeedFormat.COMPOSITE

VI. Exploration

Now that we performed a single run, lets try to explore the program by symbolizing argv to see how many different paths can be taken.

[141]:

from tritondse import SymbolicExplorator, SymbolicExecutor, ProcessState, Seed, Config, CoverageStrategy, Program, SeedFormat, CompositeData
import tritondse.logging

import logging
logging.basicConfig(level=logging.DEBUG)
tritondse.logging.enable() # enable tritondse to print log info

# Load the program
p = Program("crackme_xor")

dse = SymbolicExplorator(Config(pipe_stdout=True, seed_format=SeedFormat.COMPOSITE), p)

# create a dummy seed representing argv and add it to inputs
seed = Seed(CompositeData(argv=[b"./crackme", b"AAAAAAAAAAAAAAA"]))
dse.add_input_seed(seed)

dse.explore()

WARNING:root:symbol __gmon_start__ imported but unsupported

fail
fail

[141]:

<ExplorationStatus.IDLE: 2>

[142]:

dse.execution_count

[142]:

We now have completed a very simple exploration, where we covered two distincts paths.

Question: Find the appropriate configuration parameter that enables solving automatically the challenge.

VII. Workspace & Corpus

All inputs, crashes and various metadata are stored in a workspace. Unless explicitely specified the workspace is created in /tmp/triton_workspace/[timestamp]. If a workspace directory is given via the Config this one is loaded (which enables restarting an interrupted run).

The whole corpus and crashes generated shall now be available in this directory.