Skip to content

P-code Lifting

Quokka supports lifting native instructions to Ghidra P-code, an architecture-independent intermediate representation, via the pypcode library.

Optional dependency

P-code support requires pypcode: pip install pypcode

What is P-code?

P-code lifts native instructions into a small set of atomic operations on typed memory locations called varnodes.

Why use P-code?

  • Architecture-independent — write analysis once, run on x86, ARM, MIPS, PowerPC, and more
  • Explicit data-flow — reads/writes to named varnodes are fully explicit
  • Uniform representation — complex instructions (flags, implicit effects) are broken down into simple atomic steps

Quokka uses the pypcode Python bindings for Ghidra's SLEIGH engine.

Entry Points

P-code is available at both instruction and block granularity:

# Per-instruction (IMARK excluded automatically)
ops: list[pypcode.PcodeOp] = inst.pcode_insts

# Per-block (more efficient — one SLEIGH pass for all instructions)
# NOTE: IMARK operations are included; filter them if needed
ops: list[pypcode.PcodeOp] = block.pcode_insts

The block-level call is faster because it translates the whole block in a single SLEIGH pass. Use instruction-level when you need to correlate P-code ops back to a specific native address.

func = prog.get_function("main", approximative=False)
block = func.get_block(func.start)

for addr, inst in block.items():
    print(f"0x{addr:x}  {inst.mnemonic}")
    for op in inst.pcode_insts:
        print(f"      {op}")

PcodeOp — One Atomic Operation

Each P-code operation has the form output = opcode(input0, input1, ...):

Property Type Description
op.opcode pypcode.OpCode The operation kind (e.g. INT_ADD, LOAD)
op.output pypcode.Varnode \| None Destination varnode (None for STORE, branches)
op.inputs list[pypcode.Varnode] Source operands
str(op) str Human-readable form
for op in inst.pcode_insts:
    print(op.opcode.name)          # "INT_ADD"
    if op.output:
        print(f"  → {op.output}")  # "(register, 0x28, 8)"
    for vn in op.inputs:
        print(f"  ← {vn}")

Concrete Example: push rbp → 3 P-code Ops

Native instruction: push rbp (0x55)

COPY   unique[27d80:8] = RBP          # save RBP to a temp
INT_SUB  RSP = RSP - 0x8             # decrement stack pointer
STORE  *[ram]RSP = unique[27d80:8]   # write saved RBP to stack

Each line is a PcodeOp:

  • COPY: output=unique[27d80:8], inputs=[register RBP]
  • INT_SUB: output=register RSP, inputs=[register RSP, const 0x8]
  • STORE: output=None, inputs=[const spaceid, register RSP, unique[27d80:8]]

Tip

One native instruction expands to multiple P-code ops that make implicit effects (like stack pointer updates) explicit.

Varnode — A Typed Memory Location

A varnode is a triple: (space, offset, size_in_bytes)

Property Type Description
vn.space pypcode.AddrSpace Which address space
vn.offset int Offset within that space
vn.size int Width in bytes
vn.getRegisterName() str Register name if in register space, else ""
str(vn) str Human-readable, e.g. "RBP" or "unique[27d80:8]"
for vn in op.inputs:
    name = vn.getRegisterName()
    if name:
        print(f"  reg {name} ({vn.size} bytes)")
    else:
        print(f"  {vn.space.name}[{hex(vn.offset)}:{vn.size}]")

Address Spaces

The vn.space.name string identifies the varnode's kind:

Space name Meaning Typical usage
"register" CPU register file RAX, RBP, ZF, CF
"ram" Main memory Load/store targets
"const" Immediate constant Literal values (offset = value)
"unique" Temporary / scratch SLEIGH-internal temporaries
def describe_varnode(vn) -> str:
    name = vn.space.name
    if name == "register":
        return vn.getRegisterName() or f"reg@{hex(vn.offset)}"
    if name == "const":
        return hex(vn.offset)
    if name == "unique":
        return f"tmp[{hex(vn.offset)}:{vn.size}]"
    return f"*{name}[{hex(vn.offset)}:{vn.size}]"

OpCode Categories

Category Opcodes (examples)
Data transfer COPY, LOAD, STORE
Integer arithmetic INT_ADD, INT_SUB, INT_MULT, INT_DIV, INT_2COMP
Integer comparison INT_EQUAL, INT_NOTEQUAL, INT_LESS, INT_SLESS
Bitwise / shifts INT_AND, INT_OR, INT_XOR, INT_LEFT, INT_RIGHT, INT_SRIGHT
Sign/zero extend INT_SEXT, INT_ZEXT, SUBPIECE, PIECE
Boolean BOOL_AND, BOOL_OR, BOOL_XOR, BOOL_NEGATE
Float FLOAT_ADD, FLOAT_MULT, FLOAT_LESS, FLOAT_INT2FLOAT
Control flow BRANCH, CBRANCH, BRANCHIND, CALL, CALLIND, RETURN
Marker IMARK (instruction boundary — filtered out in inst.pcode_insts)
from pypcode import OpCode

if op.opcode == OpCode.LOAD:
    addr_vn = op.inputs[1]   # [0] = spaceid, [1] = address
    dest_vn = op.output

Examples

Find All Memory Reads

import quokka
from pypcode import OpCode

prog = quokka.Program("binary.quokka", "binary")

reads = []
for func in prog.values():
    for block in func.values():
        for addr, inst in block.items():
            for op in inst.pcode_insts:
                if op.opcode == OpCode.LOAD:
                    # inputs: [spaceid_const, address_varnode]
                    addr_vn = op.inputs[1]
                    reads.append({
                        "inst_addr": hex(addr),
                        "from": str(addr_vn),
                        "dest": str(op.output),
                    })

print(f"{len(reads)} memory reads found")
for r in reads[:5]:
    print(f"  {r['inst_addr']}: {r['dest']} = *[{r['from']}]")

Detect Constant Comparisons

Find all instructions that compare a variable against a constant (e.g. cmp rax, 0):

from pypcode import OpCode

CMP_OPS = {OpCode.INT_EQUAL, OpCode.INT_NOTEQUAL,
           OpCode.INT_LESS, OpCode.INT_SLESS,
           OpCode.INT_LESSEQUAL, OpCode.INT_SLESSEQUAL}

for func in prog.values():
    for block in func.values():
        for addr, inst in block.items():
            for op in inst.pcode_insts:
                if op.opcode not in CMP_OPS:
                    continue
                # Check if one input is a constant
                for vn in op.inputs:
                    if vn.space.name == "const":
                        print(f"0x{addr:x}  {inst.mnemonic}: "
                              f"compare vs {hex(vn.offset)}")

Error Handling

import quokka

try:
    for op in inst.pcode_insts:
        ...
except quokka.PypcodeError:
    # Raised for BadDataError (invalid bytes) or UnimplError (unsupported opcode)
    pass

Quick Reference

Object Key members Notes
inst.pcode_insts list[PcodeOp] Per-instruction; no IMARK
block.pcode_insts list[PcodeOp] Whole block; IMARK included
PcodeOp.opcode pypcode.OpCode Operation kind
PcodeOp.output Varnode \| None Destination (None for STORE/branch)
PcodeOp.inputs list[Varnode] Source operands
str(PcodeOp) str Human-readable dump
Varnode.space AddrSpace Address space
Varnode.space.name str "register", "ram", "const", "unique"
Varnode.offset int Offset in space (= value for "const")
Varnode.size int Width in bytes
Varnode.getRegisterName() str Register name, "" if not a register
str(Varnode) str Human-readable, e.g. "RAX", "unique[100:8]"