Program & Metadata
The Program object is the root of every Quokka analysis. This page covers how to load a program, navigate its metadata, and explore the binary structure.
The Program Object
Program is a Python dict subclass:
- Keys: function start addresses (
int) - Values:
Functionobjects
import quokka
prog = quokka.Program("bash.quokka", "bash")
# Dict-like access
func = prog[0x401000] # by address
func = prog.fun_names["main"] # by name
# Iteration
for addr, func in prog.items():
print(f"0x{addr:x}: {func.name}")
len(prog) # number of functions
Loading Options
from quokka.types import ExporterMode
# Option 1: direct load (you already have the .quokka file)
prog = quokka.Program("binary.quokka", "binary")
# Option 2: from_binary (invokes a disassembler automatically)
prog = quokka.Program.from_binary(
exec_path="binary",
mode=ExporterMode.LIGHT, # default
decompiled=False,
)
# Option 3: generate only (no Program returned)
path = quokka.Program.generate("binary", output_file="out.quokka")
Note
from_binary requires a working disassembler installation: either IDA
with the Quokka plugin, or Ghidra with the QuokkaExporter extension.
Set the disassembler parameter to choose, or let it auto-detect.
Program Metadata
# Binary identity
prog.name # "bash"
prog.hash # "a4f3..." (hex string, SHA-256 or MD5)
# Architecture
prog.isa # ArchEnum.X86
prog.address_size # 64 (bits)
prog.arch # <class 'X86_64'>
prog.endianness # Endianness.LITTLE_ENDIAN
# Disassembler that produced the export
prog.disassembler # Disassembler.IDA
prog.disassembler_version # "9.0"
# Export mode
prog.mode # ExporterMode.LIGHT
# Base address (lowest segment start)
prog.base_address # 0x400000
Supported Architectures
Quokka leverages Capstone for runtime disassembly, so it supports all architectures Capstone handles:
| Architecture | ISA enum |
|---|---|
| x86 / x86-64 | X86 |
| ARM / AArch64 | ARM, ARM64 |
| MIPS | MIPS |
| PowerPC | PPC |
| SPARC | SPARC |
| M68K | M68K |
| SystemZ | SYSZ |
| EVM | EVM |
from quokka.analysis import ArchEnum
prog.isa == ArchEnum.X86 # True for x86/x86-64
Segments
Segments model the binary's memory layout (.text, .data, .bss, etc.)
# Dict: segment_id → Segment
for seg_id, seg in prog.segments.items():
print(f"[{seg_id}] {seg.name:15s} "
f"0x{seg.start:x}–0x{seg.end:x} "
f"type={seg.type.name} "
f"perm={seg.permissions!r}")
[0] .text 0x401000–0x4b2000 type=CODE perm=<Perm.R|X: 5>
[1] .rodata 0x4b2000–0x4c0000 type=DATA perm=<Perm.R: 4>
[2] .data 0x4c1000–0x4c5000 type=DATA perm=<Perm.R|W: 6>
[3] .bss 0x4c5000–0x4c8000 type=BSS perm=<Perm.R|W: 6>
# Find the segment containing an address
seg = prog.get_segment(0x401234)
seg.in_segment(0x401234) # True
Finding Functions
# By address (dict key)
func = prog[0x401000]
# By exact name (default)
func = prog.get_function("main")
# By partial name (first match containing the substring)
func = prog.get_function("parse", approximative=True)
# Restrict to NORMAL functions (skip imports/thunks)
func = prog.get_function("malloc", normal=True)
# The fun_names dict (name → Function)
for name, func in prog.fun_names.items():
print(name)
Reading Raw Bytes
# Address → file offset
offset = prog.address_to_offset(0x401234)
# Read raw bytes from virtual address
raw = prog.read_bytes(0x401234, 16)
print(raw.hex()) # 'deadbeef...'
Useful for manual disassembly, hashing function bodies, or verifying data patterns.
The Executable Class
prog.executable wraps the raw binary file (read entirely into memory at load time).
All methods work with file offsets — use prog.address_to_offset(addr) to convert a virtual address.
| Method | Returns | Description |
|---|---|---|
read_bytes(offset, size) |
bytes |
Raw bytes at offset |
read_string(offset, size=None) |
str |
UTF-8 string (null-terminated if no size) |
read_int(offset, size, signed=False) |
int |
Integer, endianness-aware |
read_type_value(offset, type) |
TypeValue |
Typed read (int, float, struct, enum, pointer…) |
exe = prog.executable
offset = prog.address_to_offset(0x4b2010)
exe.read_bytes(offset, 4) # b'\x48\x65\x6c\x6c'
exe.read_string(offset) # "Hello" (null-terminated)
exe.read_int(offset, 4) # 0x6c6c6548 (little-endian)
exe.read_type_value(offset, some_type) # dispatches by type kind
Tip
prog.read_bytes(v_addr, size) is a convenience wrapper that computes v_addr - base_address and passes the result as a file offset to executable.read_bytes. For binaries with non-contiguous segments, use prog.address_to_offset(v_addr) followed by prog.executable.read_bytes(offset, size) for correct segment-aware translation.
Call Graph (Program Level)
import networkx as nx
# Lazily computed on first access
cg = prog.call_graph # networkx.DiGraph
# Nodes are function start addresses
# Edges represent call relationships
# Most called functions (in-degree)
top_called = sorted(cg.in_degree(), key=lambda x: x[1], reverse=True)
for addr, degree in top_called[:10]:
print(f"{prog[addr].name:30s} called {degree} times")
Quick Reference
| Concept | Code |
|---|---|
| Load | quokka.Program("f.quokka", "f") |
| Architecture | prog.arch, prog.isa, prog.endianness |
| Segments | prog.segments |
| Find function by address | prog[addr] |
| Find function by name | prog.get_function("name") |
| Call graph | prog.call_graph (networkx DiGraph) |
| Raw bytes | prog.read_bytes(addr, size) |
| Binary file access | prog.executable.read_bytes/string/int/type_value(offset, ...) |
| Add a new type | prog.add_type("struct foo { int x; };") |
Save .quokka only |
prog.write() |
| Apply edits to IDA | prog.commit(database_file="f.i64", overwrite=True) (IDA only) |
| Commit + re-export | prog.regenerate(database_file="f.i64", overwrite=True) (IDA only) |