Skip to content

pypcode

PyPCode integration

get_arch_from_string(target_id)

Find the architecture for an arch based on the target identification

Parameters:

Name Type Description Default
target_id str

Identifier of the architecture

required

Raises:

Type Description
PypcodeError

if the architecture is not found

Returns:

Type Description
ArchLanguage

The appropriate ArchLang

Source code in quokka/backends/pypcode.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
def get_arch_from_string(target_id: str) -> pypcode.ArchLanguage:
    """Find the architecture for an arch based on the target identification

    Arguments:
        target_id: Identifier of the architecture

    Raises:
        PypcodeError: if the architecture is not found

    Returns:
        The appropriate ArchLang
    """
    pcode_arch: pypcode.Arch
    for pcode_arch in pypcode.Arch.enumerate():
        for lang in pcode_arch.languages:
            if lang.id == target_id:
                return lang

    raise quokka.PypcodeError("Unable to find the appropriate arch: missing lang")

get_pypcode_context(arch, endian=Endianness.LITTLE_ENDIAN)

Convert an arch from Quokka to Pypcode

For the moment, only the arch described in quokka.analysis are supported. This method is a bit slow because enum are generated by pypcode on the fly but should be executed only once.

Parameters:

Name Type Description Default
arch Type[QuokkaArch]

Quokka program architecture

required
endian Type[Endianness]

Architecture endianness

LITTLE_ENDIAN

Raises:

Type Description
PypcodeError

if the conversion for arch is not found

Returns:

Type Description
Context

A pypcode.Context instance

Source code in quokka/backends/pypcode.py
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def get_pypcode_context(
        arch: Type[quokka.analysis.QuokkaArch],
        endian: Type[Endianness] = Endianness.LITTLE_ENDIAN
) -> pypcode.Context:
    """Convert an arch from Quokka to Pypcode

    For the moment, only the arch described in quokka.analysis are supported.
    This method is a bit slow because enum are generated by pypcode on the fly but should
    be executed only once.

    Arguments:
        arch: Quokka program architecture
        endian: Architecture endianness

    Raises:
        PypcodeError: if the conversion for arch is not found

    Returns:
        A pypcode.Context instance
    """
    names: Dict[Type[quokka.analysis.arch.QuokkaArch], str] = {
        quokka.analysis.ArchX64: "x86:LE:64:default",
        quokka.analysis.ArchX86: "x86:LE:32:default",
        quokka.analysis.ArchARM: "ARM:LE:32:v8",
        quokka.analysis.ArchARM64: "AARCH64:LE:64:v8A",
        quokka.analysis.ArchARMThumb: "ARM:LE:32:v8T",
        quokka.analysis.ArchMIPS: "MIPS:LE:32:default",
        quokka.analysis.ArchMIPS64: "MIPS:LE:64:default",
        quokka.analysis.ArchPPC: "PowerPC:LE:32:default",
        quokka.analysis.ArchPPC64: "PowerPC:LE:64:default",
    }

    try:
        target_id = names[arch]
    except KeyError as exc:
        raise quokka.PypcodeError(
            "Unable to find the appropriate arch: missing id"
        ) from exc

    if endian == Endianness.BIG_ENDIAN:
        target_id = target_id.replace(":LE:", ":BE:")

    pcode_arch = get_arch_from_string(target_id)
    return pypcode.Context(pcode_arch)

pypcode_decode_block(block)

Decode a block at once.

This method decode a block of instructions using Pypcode context all at once. This is faster than multiple calls to the decode at the instruction level.

Parameters:

Name Type Description Default
block Block

Block to decode

required

Returns:

Type Description
List[PcodeOp]

A list of pcode operations

Source code in quokka/backends/pypcode.py
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
def pypcode_decode_block(block: quokka.Block) -> List[pypcode.PcodeOp]:
    """Decode a block at once.

    This method decode a block of instructions using Pypcode context all at once.
    This is faster than multiple calls to the decode at the instruction level.

    Arguments:
        block: Block to decode

    Returns:
        A list of pcode operations
    """

    # Fast guard, empty blocks do not have any Pcode operations
    first_instruction: Optional[quokka.Instruction] = next(block.instructions, None)
    if first_instruction is None:
        return []

    # Retrieve the context from the instruction
    context: pypcode.Context = update_pypcode_context(
        block.program, first_instruction.thumb
    )

    try:
        # Translate
        translation = context.translate(
            block.bytes,  # buf
            block.start,  # base_address
            0,  # max_bytes
            0,  # max_instructions
        )
        return translation.ops

    except pypcode.BadDataError as e:
        logger.error(e)
        raise quokka.PypcodeError(f"Decoding error for block at 0x{block.start:x} (BadDataError)")
    except pypcode.UnimplError as e:
        logger.error(e)
        raise quokka.PypcodeError(f"Decoding error for block at 0x{block.start:x} (UnimplError)")

pypcode_decode_instruction(inst)

Decode an instruction using Pypcode

This will return the list of Pcode operations done for the instruction. Note that a (binary) instruction is expected to have several pcode instructions associated. When decoding a single instruction IMARK instructions are excluded!

Parameters:

Name Type Description Default
inst Instruction

Instruction to translate

required

Raises:

Type Description
PypcodeError

if the decoding fails

Returns:

Type Description
Sequence[PcodeOp]

A sequence of PcodeOp

Source code in quokka/backends/pypcode.py
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
def pypcode_decode_instruction(
    inst: quokka.Instruction,
) -> Sequence[pypcode.PcodeOp]:
    """Decode an instruction using Pypcode

    This will return the list of Pcode operations done for the instruction.
    Note that a (binary) instruction is expected to have several pcode instructions
    associated. When decoding a single instruction IMARK instructions are excluded!

    Arguments:
        inst: Instruction to translate

    Raises:
        PypcodeError: if the decoding fails

    Returns:
        A sequence of PcodeOp
    """

    context: pypcode.Context = update_pypcode_context(inst.program, inst.thumb)
    try:
        translation = context.translate(
            inst.bytes,  # buf
            inst.address,  # base_address
            0,  # max_bytes
            1,  # max_instructions
        )

        return [x for x in translation.ops if x.opcode != pypcode.OpCode.IMARK]

    except pypcode.BadDataError as e:
        logger.error(e)
        raise quokka.PypcodeError(f"Unable to decode instruction (BadDataError)")
    except pypcode.UnimplError as e:
        logger.error(e)
        raise quokka.PypcodeError(f"Unable to decode instruction (UnimplError)")

update_pypcode_context(program, is_thumb)

Return an appropriate pypcode context for the decoding

For ARM architecture, if the block starts with a Thumb instruction, we must use a different pypcode Context.

We use the boolean is_thumb directly to allow caching of the call here because it is costly to generate the context.

Parameters:

Name Type Description Default
program Program

Program to consider

required
is_thumb bool

Is the instruction a thumb one?

required

Returns:

Type Description
Context

The correct pypcode context

Source code in quokka/backends/pypcode.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
def update_pypcode_context(program: quokka.Program, is_thumb: bool) -> pypcode.Context:
    """Return an appropriate pypcode context for the decoding

    For ARM architecture, if the block starts with a Thumb instruction, we must use
    a different pypcode Context.

    We use the boolean `is_thumb` directly to allow caching of the call here because it
    is costly to generate the context.

    Arguments:
        program: Program to consider
        is_thumb: Is the instruction a thumb one?

    Returns:
        The correct pypcode context
    """

    if (
        program.arch
        in (
            quokka.analysis.ArchARM,
            quokka.analysis.ArchARM64,
            quokka.analysis.ArchARMThumb,
        )
        and is_thumb
    ):
        return get_pypcode_context(quokka.analysis.ArchARMThumb)

    return program.pypcode