Managing Seeds

This tutorial explains how to deal with input files manipulated and generated during the execution. Let’s reuse the following base snippet. The only difference here is that we are going to change the coverage strategy to PATH_COVERAGE.

[80]:
from tritondse import SymbolicExecutor, Config, Seed, Program, ProcessState, SymbolicExplorator, CoverageStrategy, SeedFormat, CompositeData

config = Config(pipe_stdout=False, coverage_strategy=CoverageStrategy.PATH, seed_format=SeedFormat.COMPOSITE)

def post_exec_hook(se: SymbolicExecutor, state: ProcessState):
    print(f"seed:{se.seed.hash} ({repr(se.seed.content.argv)})   [exitcode:{se.exitcode}]")

dse = SymbolicExplorator(config, Program("crackme_xor"))

dse.add_input_seed(Seed(CompositeData(argv=[b"./crackme", b"AAAAAAAAAAAAAAA"])))

dse.callback_manager.register_post_execution_callback(post_exec_hook)

dse.explore()
WARNING:root:symbol __gmon_start__ imported but unsupported
seed:e2f673d0fd7980a2bdad7910f0f6da7a ([b'./crackme', b'AAAAAAAAAAAAAAA'])   [exitcode:0]
seed:b204f9c8720b4ee299a215ef4c9f168f ([b'./crackme', b'eAAAAAAAAAAAAAA'])   [exitcode:0]
seed:cab6e4b729327d1e088c9d459e0340eb ([b'./crackme', b'elAAAAAAAAAAAAA'])   [exitcode:0]
seed:c8f3df9e460142aed1158aa354d7179d ([b'./crackme', b'eliAAAAAAAAAAAA'])   [exitcode:0]
seed:2cb80846ef5684501c73e1e19f595230 ([b'./crackme', b'elitAAAAAAAAAAA'])   [exitcode:0]
seed:dc1d802d1c2796a1a21d96827ce1cae7 ([b'./crackme', b'eliteAAAAAAAAAA'])   [exitcode:0]
[80]:
<ExplorationStatus.IDLE: 2>

Just by exploring all possible paths we managed to solve the challenge. Let’s play with the corpus.

Initial Corpus

There are two ways to provide an initial corpus:

  • providing in existing workspace directory and putting manually files in the worklist directory

  • via the API by adding the seed with add_input_seed, it will automatically be added in seeds to process.

Managing generated corpus

SymbolicExecutor: This class is solely meant to execute a single seed, not to produce new ones (that is the purpose of the explorator). However if a one wants to generate a new input to process in a callback, it is possible with the method enqueue_seed. That method will just fill a list that will later be picked-up by the SymbolicExplorator instance.

Here is a dummy hook function that manually negates a branching condition to generate a new input file:

[81]:
def hook_cmp(se: SymbolicExecutor, pstate: ProcessState, addr: int):
    zf = pstate.cpu.zf  # concrete value
    sym_zf = pstate.read_symbolic_register(pstate.registers.zf)

    # Revert the current value of zf to "negate" condition
    status, model = pstate.solve(sym_zf.getAst() != zf)

    if status == SolverStatus.SAT:
        new_seed = se.mk_new_seed_from_model(model)
        se.enqueue_seed(new_seed)

By default the status of the seed generated is FRESH. However, one can directly assign it the CRASH status if it is a faulty one.

SymbolicExplorator: All seeds generated during the exploration are moved in the Workspace instance from the worklist folder to the appropriate one corpus, hangs and crashs. One can iterate them using the workspace instance.

[82]:
print("seed to process:", len(list(dse.workspace.iter_worklist())))

print("\nCorpus:")
for seed in dse.workspace.iter_corpus():
      print(seed.filename)
seed to process: 0

Corpus:
2cb80846ef5684501c73e1e19f595230.00000064.tritondse.cov
dc1d802d1c2796a1a21d96827ce1cae7.00000064.tritondse.cov
b204f9c8720b4ee299a215ef4c9f168f.00000064.tritondse.cov
e2f673d0fd7980a2bdad7910f0f6da7a.00000064.tritondse.cov
c8f3df9e460142aed1158aa354d7179d.00000064.tritondse.cov
cab6e4b729327d1e088c9d459e0340eb.00000064.tritondse.cov

Setting seed status

During the execution one, can assign a status to the seed currently being executed in any callback. (Make sure not to override a status previously set by another callback). At the end of an execution if no status has been assigned, the seed is automatically assigned OK_DONE.

## Seed Formats

In TritonDSE, a seed contains the user input that will be processed by the target program.

TritonDSE supports two seed formats: RAW and COMPOSITE.

  • RAW seed is a contiguous buffer of bytes that will be injected in memory. By default it is done when the program reads from stdin.

  • COMPOSITE seeds allow providing more complex input to the program such as argv or the content of files. Note that stdin is treated as a file. Their content is provided as a CompositeData instance which is defined in tritondse/seeds.py as follows:

#Excerpt from /tritondse/seeds.py
class CompositeData:
    argv: List[bytes]
    "list of argv values"
    files: Dict[str, bytes]
    "dictionnary of files and the associated content (stdin is one of them)"
    variables: Dict[str, bytes]
    "user defined variables, that the use must take care to inject at right location"

The following example shows a complex input type where user-input are read on argv, stdin a file called filename.txt and a variable var1. Symbolic variables like var1 will not be injected automatically, the user has to do it himself at the appropriate location.

from tritondse import Seed, CompositeData

composite_seed = Seed(CompositeData(argv=[b"this", b"will", b"injected", b"into", b"argv"],\
                                   files={"stdin": b"This will be injected when the program reads from stdin",\
                                          "filename.txt": b"This will be injected when the program reads from filename.txt"\
                                         },\
                                   variables={"var1": b"The user is responsible for injecting this manually (more on this later)"}\
                                   ))

It is not necessary to provide all the fields in CompositeData when creating a new seed. The following seeds are valid.

[83]:
composite_seed_1 = Seed(CompositeData(argv=[b"this", b"will", b"injected", b"into", b"argv"]))
composite_seed_2 = Seed(CompositeData(files={"stdin": b"This will be injected when the program reads from stdin"}))

When creating a SymbolicExecutor or SymbolicExplorator the seed format can be specified in the Config. The default seed format is RAW. Seed formats cannot be mixed, all the seeds of a given SymbolicExecutor or SymbolicExplorator must have the same format.

[84]:
from tritondse import SymbolicExplorator, Config, SeedFormat, Program
dse_using_raw_seeds = SymbolicExplorator(Config(), Program("crackme_xor"))
dse_using_composite_seeds = SymbolicExplorator(Config(seed_format=SeedFormat.COMPOSITE), Program("crackme_xor"))

How to inject arbitrary variables

The variables field of CompositeData is never injected by the framework. It is the user’s responsability to inject it in the appropriate places with inject_symbolic_variable_memory or inject_symbolic_variable_register.

Let’s see an example. Say the target application contains the following function that we want to explore:

void parse_buffer(char* buffer);

The following script will register a callback at the start of parse_buffer and inject input into buffer.

from tritondse import SymbolicExplorator, Config, SeedFormat, Program, CompositeData

def example_hook(se: SymbolicExecutor, pstate: ProcessState, addr: int):
    # In the callback, retrieve the pointer to the buffer we want to inject
    arg0 = pstate.get_argument_value(0)
    # Inject the data from the seed into the buffer
    # var_prefix should be the key of the variable in dictionary se.seed.content.variables
    se.inject_symbolic_variable_memory(arg0, "buffer", se.seed.content.variables["buffer"])

p = Program("parser_program")

# Create a new symbolic explorator that uses COMPOSITE seeds
dse = SymbolicExplorator(Config(seed_format=SeedFormat.COMPOSITE), p)
# Add an initial seed
composite_data = CompositeData(variables={"buffer" : b"A"*128})
dse.add_input_seed(Seed(composite_data))
# Register a callback on the function whose parameter we want to inject
dse.callback_manager.register_pre_addr_callback(p.find_function_addr("parse_buffer"), example_hook)

dse.explore()