Getting Started
Warning
This library is experimental. Most code has been writing toward satisfying PASTIS objectives. Thus it lacks many features and the API is subject to changes.
Tritondse works in pure emulation so it theoretically have to model all program side-effects syscalls etc. This is not possible so it works in a very opportunistic manner. As of now limitations are the following:
Limited to Linux ELF binaries
Only support a subset of libc function calls (no C++ runtime functions)
No modeling of syscalls, (they have to be modeled manually)
TritonDSE have the same weaknesses than Triton in model floating point, or some specific instruction sets.
I. Program
For this tutorial the following crackme is going to be used:
We first need to create a Program
object which is a thin wrapper on LIEF.
[119]:
from tritondse import Program
p = Program("crackme_xor")
print(p.architecture, p.endianness)
print(hex(p.entry_point))
Architecture.X86_64 ENDIANNESS.LITTLE
0x400460
By default Program
only exposes the few fields required to peform the loading of the program. That includes segments or imported functions. The main utility is being able to retrieve a function object (as LIEF object).
Note
To load a program with its shared libraries dependencies, the cle loader (see: CleLoader) shall be used.
[120]:
p.find_function_addr("main")
[120]:
4195763
[121]:
for seg in p.memory_segments():
print(f"0x{seg.address:x} size:{len(seg.content)}")
0x400000 size:2036
0x600e10 size:576
II. Single Execution
Now we need to load the program in the TritonDSE context representing containing both the concrete and the symbolic state. A context is a ProcessState
object. It provides all primitives to read or write both registers and memory in a concrete or symbolic manner.
Thus, we need to load the program in such context and then starting the symbolic emulation at its entrypoint by providing it a concrete input. Hopefully the loading and execution steps are done transparently by the SymbolicExecutor
object.
[123]:
from tritondse import SymbolicExecutor, Config, Seed, CompositeData
config = Config()
seed = Seed(CompositeData(argv=[b"Hello", b"world"]))
executor = SymbolicExecutor(config, seed)
executor.load(p)
WARNING:root:symbol __gmon_start__ imported but unsupported
This object provides features to perform a single execution using the given configuration on the process state using the given program.
The run
methods will take care of loading the program, performing the dynamic relocations (PLT stuff etc) and then to start emulating from the entrypoint.
[124]:
executor.run()
We now have successfully performed a single run of our program. After execution, the ProcessState
has been updated and represent the program after execution.
and a CoverageSingleRun
as been produced which represent to coverage generated by the execution.
[ ]:
pstate = executor.pstate # ProcessState object
The execution also produces a CoverageSingleRun
which represent to coverage generated by the execution.
[125]:
executor.coverage.total_instruction_executed
[125]:
23
[126]:
executor.exitcode
[126]:
255
III. Concrete State Manipulation
Both the concrete state and symbolic state can be modified at any time during the execution.
[127]:
from tritondse.types import Architecture
[128]:
pstate = executor.pstate
print(f"arch: {pstate.architecture.name} ptrsize:{pstate.ptr_bit_size}")
arch: X86_64 ptrsize:64
a. Reading, writing registers (function API)
Most of the API enables addressing register either by an enum identifier (triton one) or directly with theirs string.
[129]:
pstate.registers.rax
[129]:
rax:64 bv[63..0]
A ProcessState
also provides some alias to access program counter, stack register, base pointer or return register in a portable way.
[130]:
pstate.program_counter_register, \
pstate.base_pointer_register, \
pstate.stack_pointer_register, \
pstate.return_register
[130]:
(rip:64 bv[63..0], rbp:64 bv[63..0], rsp:64 bv[63..0], rax:64 bv[63..0])
Then both concrete an symbolic values can be modified using a function-style API.
[131]:
pstate.write_register(pstate.registers.rax, 0xdeadbeef)
hex(pstate.read_register(pstate.registers.rax))
[131]:
'0xdeadbeef'
b. Reading, writing registers (Pythonic API)
To ease manipulation of the registers concrete values, the ProcessState
introduces a cpu
attributes that transparently updates the underlying triton context.
[132]:
pstate.cpu.rax
[132]:
3735928559
[133]:
pstate.cpu.rax += 4
pstate.cpu.rax
[133]:
3735928563
[134]:
print(f"RIP: 0x{pstate.cpu.program_counter:x}")
RIP: 0x400489
c. Reading, writing memory
The memory
field of a ProcessState
allows reading and writing memory. pstate.memory.read
returns bytes and all the other return an int. pstate.memory.read_int
and the others return an integer and take endianness into account.
[135]:
print(pstate.memory.read(p.entry_point, 10)) # Reads 10 bytes
print(hex(pstate.memory.read_ptr(p.entry_point))) # Reads a pointer size
print(pstate.memory.read_int(p.entry_point)) # Reads a 4 bytes signed int
print(pstate.memory.read_uint(p.entry_point)) # Reads a 4 bytes unsigned int
print(pstate.memory.read_char(p.entry_point)) # Reads a 1 bytes signed char
print(pstate.memory.read_uchar(p.entry_point)) # Reads a 1 bytes unsigned char
print(pstate.memory.read_word(p.entry_point)) # Reads a 2 bytes unsigned integer
print(pstate.memory.read_dword(p.entry_point)) # Reads a 4 bytes unsigned integer
print(pstate.memory.read_qword(p.entry_point)) # Reads a 8 bytes unsigned integer
b'1\xedI\x89\xd1^H\x89\xe2H'
0x89485ed18949ed31
-1991643855
2303323441
49
49
60721
2303323441
9892260835563793713
Analogous functions exists for writing: write
, write_int
, write_ptr
, write_long
etc.
One can also use the slice interface of the memory to both read and write memory
[ ]:
data = pstate.memory[p.entry_point:10] # Read 10 bytes in memory
A ProcessState
object also enables checking whether an address is mapped in memory:
[136]:
pstate.memory.is_mapped(p.entry_point), pstate.memory.is_mapped(0)
[136]:
(True, False)
IV. Manipulating symbolic state
Both symbolic registers and symbolic memory can be manipulated in a similar fashion than the concrete state.
One should be cautious when manipulating the symbolic state to keep it consistent with the concrete state.
Symbolic values can be read written with a similar API than concrete state.
[137]:
new_sym = pstate.actx.bv(32, 64) # new symbolic expression representing a constant
pstate.write_symbolic_register(pstate.registers.rax, new_sym) # the expression can either be a AstNode or SymbolicExpression triton object
pstate.read_symbolic_register(pstate.registers.rax)
[137]:
(define-fun ref!116 () (_ BitVec 64) (_ bv32 64)) ; assign rax:
The same can be done on memory with read_symbolic_memory_byte
, read_symbolic_memory_bytes
, read_symbolic_memory_int
and theirs equivalent for writing.
We also can push our own constraints in the symbolic state.
[139]:
sym_rax = pstate.read_symbolic_register(pstate.registers.rax)
constraint = sym_rax.getAst() == 4
print(constraint)
pstate.push_constraint(constraint)
(= (_ bv32 64) (_ bv4 64))
V. Configuration
As seen before, a SymbolicExecutor
takes a a Config
object as input. It tunes multiple parameters that will be used during execution and exploration. These parameters are the following:
seed_format (SeedFormat) : Indicates whether to use
RAW
orCOMPOSITE
seeds (see theseeds
tutorial for more detail). For now, just note that to providestdin
input, aRAW
seed can be used, for anything else (argv
, file content etc.), aCOMPOSITE
seed is needed.pipe_stdout (bool): Pipe the program stdout to Python’s stdout
pipe_stderr (bool): Pipe the program stderr to Python’s stderr
skip_sleep_routine (bool): Whether to emulate sleeps routine or to skip it
smt_timeout (int): Timeout for a single SMT query in milliseconds
execution_timeout (int): Timeout of a single execution (in secs)
exploration_timeout (int): Overall timeout of the exploration (in secs)*
exploration_limit (int): Number of execution iterations. (0 means unlimited)
thread_scheduling (int): Number of instructions to execute before switching to the next thread
smt_queries_limit (int): Limit of SMT queries to perform for a single execution
coverage_strategy (CoverageStrategy): Coverage strategy to apply for the whole exploration
branch_solving_strategy (BranchCheckStrategy): Branch solving strategy to apply for a single execution
workspace (str): Workspace directory to use
program_argv (List[str]): Concrete program argument as given on the command line
time_inc_coefficient (float): Execution time of each instruction (for rdtsc)
[140]:
from tritondse import SeedFormat
c = Config()
c.seed_format=SeedFormat.COMPOSITE
VI. Exploration
Now that we performed a single run, lets try to explore the program by symbolizing argv
to see how many different paths can be taken.
[141]:
from tritondse import SymbolicExplorator, SymbolicExecutor, ProcessState, Seed, Config, CoverageStrategy, Program, SeedFormat, CompositeData
import tritondse.logging
import logging
logging.basicConfig(level=logging.DEBUG)
tritondse.logging.enable() # enable tritondse to print log info
# Load the program
p = Program("crackme_xor")
dse = SymbolicExplorator(Config(pipe_stdout=True, seed_format=SeedFormat.COMPOSITE), p)
# create a dummy seed representing argv and add it to inputs
seed = Seed(CompositeData(argv=[b"./crackme", b"AAAAAAAAAAAAAAA"]))
dse.add_input_seed(seed)
dse.explore()
WARNING:root:symbol __gmon_start__ imported but unsupported
fail
fail
[141]:
<ExplorationStatus.IDLE: 2>
[142]:
dse.execution_count
[142]:
2
We now have completed a very simple exploration, where we covered two distincts paths.
Question: Find the appropriate configuration parameter that enables solving automatically the challenge.
VII. Workspace & Corpus
All inputs, crashes and various metadata are stored in a workspace. Unless explicitely specified the workspace is created in /tmp/triton_workspace/[timestamp]. If a workspace directory is given via the Config
this one is loaded (which enables restarting an interrupted run).
The whole corpus and crashes generated shall now be available in this directory.