{ "cells": [ { "cell_type": "markdown", "id": "nearby-junction", "metadata": {}, "source": [ "# Getting Started\n" ] }, { "cell_type": "raw", "id": "cfff1c78-bf0e-4cea-b126-87a06a897212", "metadata": {}, "source": [ "
\n", "

Warning

\n", "

This library is experimental. Most code has been writing toward satisfying PASTIS objectives. Thus it lacks many\n", "features and the API is subject to changes.

\n", "
" ] }, { "cell_type": "markdown", "id": "valuable-audience", "metadata": {}, "source": [ "Tritondse works in pure emulation so it theoretically have to model all program side-effects syscalls etc.\n", "This is not possible so it works in a very opportunistic manner. As of now limitations are the following:\n", "\n", "* Limited to Linux ELF binaries\n", "* Only support a subset of libc function calls *(no C++ runtime functions)*\n", "* No modeling of syscalls, *(they have to be modeled manually)*\n", "* TritonDSE have the same weaknesses than Triton in model floating point, or some specific instruction sets." ] }, { "cell_type": "markdown", "id": "incorrect-liquid", "metadata": {}, "source": [ "## I. Program" ] }, { "cell_type": "markdown", "id": "literary-survey", "metadata": {}, "source": [ "For this tutorial the following crackme is going to be used:" ] }, { "cell_type": "raw", "id": "9189ef0f-077b-4cff-9129-7c0ec63eb406", "metadata": {}, "source": [ "

binary

" ] }, { "cell_type": "markdown", "id": "b1d78f4a-fe45-4b22-a421-ebc5c76f1996", "metadata": {}, "source": [ "We first need to create a ``Program`` object which is a thin wrapper on [LIEF](https://lief.quarkslab.com)." ] }, { "cell_type": "code", "execution_count": 119, "id": "reflected-narrative", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Architecture.X86_64 ENDIANNESS.LITTLE\n", "0x400460\n" ] } ], "source": [ "from tritondse import Program\n", "\n", "p = Program(\"crackme_xor\")\n", "\n", "print(p.architecture, p.endianness)\n", "print(hex(p.entry_point))" ] }, { "cell_type": "markdown", "id": "super-crest", "metadata": {}, "source": [ "By default ``Program`` only exposes the few fields\n", "required to peform the loading of the program. That includes segments\n", "or imported functions. The main utility is being\n", "able to retrieve a function object *(as LIEF object)*." ] }, { "cell_type": "raw", "id": "8476142b-b39f-4df6-878a-f7062340c593", "metadata": {}, "source": [ "
\n", "

Note

\n", "

To load a program with its shared libraries dependencies, the cle loader (see: CleLoader) shall be used.

\n", "
" ] }, { "cell_type": "code", "execution_count": 120, "id": "supreme-hunger", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4195763" ] }, "execution_count": 120, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p.find_function_addr(\"main\")" ] }, { "cell_type": "code", "execution_count": 121, "id": "outstanding-accommodation", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0x400000 size:2036\n", "0x600e10 size:576\n" ] } ], "source": [ "for seg in p.memory_segments():\n", " print(f\"0x{seg.address:x} size:{len(seg.content)}\")" ] }, { "cell_type": "markdown", "id": "precious-greene", "metadata": {}, "source": [ "## II. Single Execution" ] }, { "cell_type": "markdown", "id": "further-adrian", "metadata": {}, "source": [ "Now we need to load the program in the TritonDSE context representing containing both the concrete and the symbolic state.\n", "A context is a ``ProcessState`` object. It provides all primitives to read or write both registers and memory in a concrete\n", "or symbolic manner.\n", "\n", "Thus, we need to load the program in such context and then starting the symbolic emulation at its entrypoint\n", "by providing it a concrete input. Hopefully the loading and execution steps are done transparently by the ``SymbolicExecutor`` object." ] }, { "cell_type": "code", "execution_count": 123, "id": "wrapped-bangladesh", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:root:symbol __gmon_start__ imported but unsupported\n" ] } ], "source": [ "from tritondse import SymbolicExecutor, Config, Seed, CompositeData\n", "\n", "config = Config()\n", "seed = Seed(CompositeData(argv=[b\"Hello\", b\"world\"]))\n", "\n", "executor = SymbolicExecutor(config, seed)\n", "executor.load(p)" ] }, { "cell_type": "markdown", "id": "killing-solomon", "metadata": {}, "source": [ "This object provides features to perform a single execution using\n", "the given configuration on the process state using the given program.\n", "\n", "The ``run`` methods will take care of loading the program, performing\n", "the dynamic relocations (PLT stuff etc) and then to start emulating from\n", "the entrypoint." ] }, { "cell_type": "code", "execution_count": 124, "id": "unnecessary-louisville", "metadata": {}, "outputs": [], "source": [ "executor.run()" ] }, { "cell_type": "markdown", "id": "blind-annotation", "metadata": {}, "source": [ "We now have successfully performed a single run of our program.\n", "After execution, the ``ProcessState`` has been updated and represent\n", "the program after execution.\n", "\n", "and a ``CoverageSingleRun`` as been produced\n", "which represent to coverage generated by the execution." ] }, { "cell_type": "code", "execution_count": null, "id": "46658e69-8e81-4b0f-827d-8e91124b1161", "metadata": { "tags": [] }, "outputs": [], "source": [ "pstate = executor.pstate # ProcessState object" ] }, { "cell_type": "markdown", "id": "6e66bb52-46fc-4d4c-a96b-2570b2f3f407", "metadata": {}, "source": [ "The execution also produces a ``CoverageSingleRun`` which represent to coverage generated by the execution." ] }, { "cell_type": "code", "execution_count": 125, "id": "royal-arabic", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "23" ] }, "execution_count": 125, "metadata": {}, "output_type": "execute_result" } ], "source": [ "executor.coverage.total_instruction_executed" ] }, { "cell_type": "code", "execution_count": 126, "id": "advance-calendar", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "255" ] }, "execution_count": 126, "metadata": {}, "output_type": "execute_result" } ], "source": [ "executor.exitcode" ] }, { "cell_type": "markdown", "id": "perfect-sheriff", "metadata": {}, "source": [ "## III. Concrete State Manipulation\n", "\n", "Both the concrete state and symbolic state can be modified at any time during the execution." ] }, { "cell_type": "code", "execution_count": 127, "id": "chicken-citizen", "metadata": {}, "outputs": [], "source": [ "from tritondse.types import Architecture" ] }, { "cell_type": "code", "execution_count": 128, "id": "expensive-harvard", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "arch: X86_64 ptrsize:64\n" ] } ], "source": [ "pstate = executor.pstate\n", "print(f\"arch: {pstate.architecture.name} ptrsize:{pstate.ptr_bit_size}\")" ] }, { "cell_type": "markdown", "id": "sharp-scoop", "metadata": {}, "source": [ "#### a. Reading, writing registers (function API)\n", "\n", "Most of the API enables addressing register either by an enum identifier *(triton one)* or directly with theirs string." ] }, { "cell_type": "code", "execution_count": 129, "id": "injured-hazard", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "rax:64 bv[63..0]" ] }, "execution_count": 129, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstate.registers.rax" ] }, { "cell_type": "markdown", "id": "early-integrity", "metadata": {}, "source": [ "A ``ProcessState`` also provides some alias to access program counter, stack register, base pointer or return register in a portable way." ] }, { "cell_type": "code", "execution_count": 130, "id": "competitive-brave", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(rip:64 bv[63..0], rbp:64 bv[63..0], rsp:64 bv[63..0], rax:64 bv[63..0])" ] }, "execution_count": 130, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstate.program_counter_register, \\\n", "pstate.base_pointer_register, \\\n", "pstate.stack_pointer_register, \\\n", "pstate.return_register" ] }, { "cell_type": "markdown", "id": "upset-ethics", "metadata": {}, "source": [ "Then both concrete an symbolic values can be modified using a function-style API." ] }, { "cell_type": "code", "execution_count": 131, "id": "surgical-announcement", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'0xdeadbeef'" ] }, "execution_count": 131, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstate.write_register(pstate.registers.rax, 0xdeadbeef)\n", "\n", "hex(pstate.read_register(pstate.registers.rax))" ] }, { "cell_type": "markdown", "id": "young-fossil", "metadata": {}, "source": [ "#### b. Reading, writing registers (Pythonic API)" ] }, { "cell_type": "markdown", "id": "lesser-temperature", "metadata": {}, "source": [ "To ease manipulation of the registers concrete values, the ``ProcessState`` introduces a ``cpu`` attributes that transparently updates the underlying triton context." ] }, { "cell_type": "code", "execution_count": 132, "id": "comfortable-league", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3735928559" ] }, "execution_count": 132, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstate.cpu.rax" ] }, { "cell_type": "code", "execution_count": 133, "id": "mediterranean-gossip", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3735928563" ] }, "execution_count": 133, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstate.cpu.rax += 4\n", "\n", "pstate.cpu.rax" ] }, { "cell_type": "code", "execution_count": 134, "id": "drawn-duncan", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RIP: 0x400489\n" ] } ], "source": [ "print(f\"RIP: 0x{pstate.cpu.program_counter:x}\")" ] }, { "cell_type": "markdown", "id": "natural-pakistan", "metadata": {}, "source": [ "#### c. Reading, writing memory\n", "\n", "The `memory` field of a `ProcessState` allows reading and writing memory.\n", "`pstate.memory.read` returns bytes and all the other return an int.\n", "`pstate.memory.read_int` and the others return an integer and take endianness into account." ] }, { "cell_type": "code", "execution_count": 135, "id": "attempted-logging", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "b'1\\xedI\\x89\\xd1^H\\x89\\xe2H'\n", "0x89485ed18949ed31\n", "-1991643855\n", "2303323441\n", "49\n", "49\n", "60721\n", "2303323441\n", "9892260835563793713\n" ] } ], "source": [ "print(pstate.memory.read(p.entry_point, 10)) # Reads 10 bytes\n", "print(hex(pstate.memory.read_ptr(p.entry_point))) # Reads a pointer size\n", "print(pstate.memory.read_int(p.entry_point)) # Reads a 4 bytes signed int\n", "print(pstate.memory.read_uint(p.entry_point)) # Reads a 4 bytes unsigned int\n", "print(pstate.memory.read_char(p.entry_point)) # Reads a 1 bytes signed char\n", "print(pstate.memory.read_uchar(p.entry_point)) # Reads a 1 bytes unsigned char\n", "print(pstate.memory.read_word(p.entry_point)) # Reads a 2 bytes unsigned integer\n", "print(pstate.memory.read_dword(p.entry_point)) # Reads a 4 bytes unsigned integer\n", "print(pstate.memory.read_qword(p.entry_point)) # Reads a 8 bytes unsigned integer" ] }, { "cell_type": "markdown", "id": "productive-network", "metadata": {}, "source": [ "Analogous functions exists for writing: `write`, `write_int`, `write_ptr`, `write_long` etc." ] }, { "cell_type": "markdown", "id": "b432ce6d-a21f-461d-8aad-bdb82ac55ff3", "metadata": {}, "source": [ "One can also use the slice interface of the memory to both read and write memory" ] }, { "cell_type": "code", "execution_count": null, "id": "a7a824ad-5d03-492d-84e0-98e2002f5e6c", "metadata": { "tags": [] }, "outputs": [], "source": [ "data = pstate.memory[p.entry_point:10] # Read 10 bytes in memory" ] }, { "cell_type": "markdown", "id": "differential-realtor", "metadata": {}, "source": [ "A `ProcessState` object also enables checking whether an address is mapped in memory:" ] }, { "cell_type": "code", "execution_count": 136, "id": "circular-rabbit", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(True, False)" ] }, "execution_count": 136, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pstate.memory.is_mapped(p.entry_point), pstate.memory.is_mapped(0)" ] }, { "cell_type": "markdown", "id": "healthy-passport", "metadata": {}, "source": [ "## IV. Manipulating symbolic state\n", "\n", "Both symbolic registers and symbolic memory can be manipulated in a similar fashion than the concrete state.\n", "\n", "One should be cautious when manipulating the symbolic state to keep it consistent with the concrete state." ] }, { "cell_type": "markdown", "id": "terminal-reservoir", "metadata": {}, "source": [ "Symbolic values can be read written with a similar API than concrete state." ] }, { "cell_type": "code", "execution_count": 137, "id": "matched-playing", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(define-fun ref!116 () (_ BitVec 64) (_ bv32 64)) ; assign rax: " ] }, "execution_count": 137, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_sym = pstate.actx.bv(32, 64) # new symbolic expression representing a constant\n", "\n", "pstate.write_symbolic_register(pstate.registers.rax, new_sym) # the expression can either be a AstNode or SymbolicExpression triton object\n", "\n", "pstate.read_symbolic_register(pstate.registers.rax)" ] }, { "cell_type": "markdown", "id": "valued-contents", "metadata": {}, "source": [ "The same can be done on memory with `read_symbolic_memory_byte`, `read_symbolic_memory_bytes`, `read_symbolic_memory_int` and theirs equivalent for writing." ] }, { "cell_type": "raw", "id": "8b186e0b-bf3c-42b9-80df-4f8aeb284a51", "metadata": {}, "source": [ "
\n", "Disclaimer: Writing an arbitrary symbolic value in a register or memory might break soundness, and the dependency with previous definition of the variable. In standard usage a user, is usually not supposed to modify symbolic values but rather to concretize values or adding new constraints in the path predicate.\n", "
" ] }, { "cell_type": "markdown", "id": "ready-threshold", "metadata": {}, "source": [ "We also can push our own constraints in the symbolic state." ] }, { "cell_type": "code", "execution_count": 139, "id": "durable-palace", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(= (_ bv32 64) (_ bv4 64))\n" ] } ], "source": [ "sym_rax = pstate.read_symbolic_register(pstate.registers.rax)\n", "\n", "constraint = sym_rax.getAst() == 4\n", "print(constraint)\n", "\n", "pstate.push_constraint(constraint)" ] }, { "cell_type": "markdown", "id": "sought-background", "metadata": {}, "source": [ "## V. Configuration\n", "\n", "As seen before, a `SymbolicExecutor` takes a a `Config` object as input.\n", "It tunes multiple parameters that will be used during execution and exploration.\n", "These parameters are the following:\n", "\n", "\n", "* seed_format (SeedFormat) : Indicates whether to use `RAW` or `COMPOSITE` seeds (see the `seeds` tutorial for more detail). For now, just note that to provide `stdin` input, a `RAW` seed can be used, for anything else (`argv`, file content etc.), a `COMPOSITE` seed is needed.\n", "* pipe_stdout (bool): Pipe the program stdout to Python's stdout\n", "* pipe_stderr (bool): Pipe the program stderr to Python's stderr\n", "* skip_sleep_routine (bool): Whether to emulate sleeps routine or to skip it\n", "* smt_timeout (int): Timeout for a single SMT query in milliseconds\n", "* execution_timeout (int): Timeout of a single execution *(in secs)*\n", "* exploration_timeout (int): Overall timeout of the exploration (in secs)*\n", "* exploration_limit (int): Number of execution iterations. *(0 means unlimited)*\n", "* thread_scheduling (int): Number of instructions to execute before switching to the next thread\n", "* smt_queries_limit (int): Limit of SMT queries to perform for a single execution\n", "* coverage_strategy (CoverageStrategy): Coverage strategy to apply for the whole exploration\n", "* branch_solving_strategy (BranchCheckStrategy): Branch solving strategy to apply for a single execution\n", "* workspace (str): Workspace directory to use\n", "* program_argv (List[str]): Concrete program argument as given on the command line\n", "* time_inc_coefficient (float): Execution time of each instruction *(for rdtsc)*" ] }, { "cell_type": "code", "execution_count": 140, "id": "weighted-sequence", "metadata": {}, "outputs": [], "source": [ "from tritondse import SeedFormat\n", "c = Config()\n", "c.seed_format=SeedFormat.COMPOSITE" ] }, { "cell_type": "markdown", "id": "comparative-situation", "metadata": {}, "source": [ "## VI. Exploration\n", "\n", "Now that we performed a single run, lets try to explore the program by symbolizing\n", "`argv` to see how many different paths can be taken." ] }, { "cell_type": "code", "execution_count": 141, "id": "wireless-peninsula", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:root:symbol __gmon_start__ imported but unsupported\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "fail\n", "fail\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from tritondse import SymbolicExplorator, SymbolicExecutor, ProcessState, Seed, Config, CoverageStrategy, Program, SeedFormat, CompositeData\n", "import tritondse.logging\n", "\n", "import logging\n", "logging.basicConfig(level=logging.DEBUG)\n", "tritondse.logging.enable() # enable tritondse to print log info\n", "\n", "# Load the program\n", "p = Program(\"crackme_xor\")\n", "\n", "dse = SymbolicExplorator(Config(pipe_stdout=True, seed_format=SeedFormat.COMPOSITE), p)\n", "\n", "# create a dummy seed representing argv and add it to inputs\n", "seed = Seed(CompositeData(argv=[b\"./crackme\", b\"AAAAAAAAAAAAAAA\"]))\n", "dse.add_input_seed(seed)\n", "\n", "dse.explore()" ] }, { "cell_type": "code", "execution_count": 142, "id": "physical-source", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2" ] }, "execution_count": 142, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dse.execution_count" ] }, { "cell_type": "markdown", "id": "worst-malawi", "metadata": {}, "source": [ "We now have completed a very simple exploration, where we covered two distincts paths." ] }, { "cell_type": "markdown", "id": "b856680e-0043-4e5d-9ca1-57222826bf7a", "metadata": {}, "source": [ "**Question**: Find the appropriate configuration parameter that enables solving automatically the challenge." ] }, { "cell_type": "markdown", "id": "attached-moscow", "metadata": {}, "source": [ "## VII. Workspace & Corpus\n", "\n", "All inputs, crashes and various metadata are stored in a workspace. Unless explicitely specified\n", "the workspace is created in */tmp/triton_workspace/[timestamp]*. If a workspace directory is given\n", "via the `Config` this one is loaded *(which enables restarting an interrupted run)*.\n", "\n", "The whole corpus and crashes generated shall now be available in this directory." ] } ], "metadata": { "kernelspec": { "display_name": "neopastis", "language": "python", "name": "neopastis" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.2" } }, "nbformat": 4, "nbformat_minor": 5 }