Skip to content

fs: ELF/PE imports/exports and the associated symlinks

Usage

Mapping with Pyrrha

First, create your db with pyrrha. The ROOT_DIRECTORY should contain the whole filesystem you want to map, it should be already extracted or mounted. ROOT_DIRECTORY will be considered by Pyrrha as the filesystem root for all the symlink resolutions.

Usage: pyrrha fs [OPTIONS] ROOT_DIRECTORY

  Map a filesystem into a sourcetrail-compatible db. It maps ELF and PE files, their imports and their
  exports plus the symlinks that points on these executable files.

Options:
  -d, --debug     Set log level to DEBUG
  --db PATH       Sourcetrail DB file path (.srctrldb).  [default: pyrrha.srctrldb]
  -e, --json      Create a JSON export of the resulting mapping.
  -j, --jobs INT  Number of parallel jobs created (threads).  [default: 1; 1<=x<=16]
  -h, --help      Show this message and exit.

You can also export your Pyrrha results as a JSON file (option -j) to be able to postprocess them. For example, you can diff the results between two versions of the same system and list the binaries added/removed and which symbols has been added/removed (cf example script in example).

Visualization with Sourcetrail

Open the resulting project with sourcetrail. You can now navigate on the resulting cartography. The user interface is described in depth in the Sourcetrail documentation.

To match the Sourcetrail language, the binaries, the exported functions and symbols, and the symlinks are represented as follows in Sourcetrail.

Binaries Exported functions Exported symbols Symlinks

An example of the symbols and libraries imported by libgcc_s.so.1 and of the symbols which reference this library.

An example of the symlinks which point on busybox.

Do not hesitate to take a look at Sourcetrail documentation to explore all the possibilities offered by Sourcetrail. Custom Trails could be really useful in a lot of cases.

Demo

An live demo of how we can use Sourcetrail to visualize this mapper results is available here.

Quick Start—Usage Example

Let's take the example of an OpenWRT firmware which is a common Linux distribution for embedded targets like routers.

First, download the firmware and extract its root-fs into a directory. Here we download the last OpenWRT version for generic x86_64 systems.

$ wget https://downloads.openwrt.org/releases/22.03.5/targets/x86/64/openwrt-22.03.5-x86-64-rootfs.tar.gz -O openwrt_rootfs.tar.gz
$ mkdir openwrt_root_fs && cd openwrt_root_fs
$ tar -xf ../openwrt_rootfs.tar.gz
$ cd .. && rm openwrt_rootfs.tar.gz

Then we can run Pyrrha on it. It will produce some logs indicating which symlinks or imports cannot be solved directly by the tool. (Do not forget to activate your virtualenv if you have created one for Pyrrha installation.)

$ pyrrha fs --db openwrt_db openwrt_root_fs
$ ls 
openwrt_root_fs openwrt_db.srctrldb  openwrt_db.srctrlprj

You can now navigate into the resulting cartography with Sourcetrail.

$ sourcetrail openwrt_db.srctrlprj

Pyrrha result opened with Sourcetrail.

Postprocessing fs result: the diffing example

When you have to compare two bunch of executable files, for example two versions of the same firmware, it could be quickly difficult to determine where to start and have results in a short time.

Diffing could be a solution. However, as binary diffing can be quite time-consuming, a first approach could be to diff the symbols contained in the binary files to determine which ones were added/removed. For example, using this technics can help you to determines quickly the files that have changed their internal structures versus the files that only contained little update of their dependency. To do that, you can use the JSON export of fs parser results.

The following script prints on the standard output the list of files that has been added/removed and then the symbol changes file by file.

examples/diffing_pyrrha_export.py
#!/usr/bin/env python3
"""This script diff two Pyrrha result JSON exports.
It removes the kernel mangling"""

import argparse
import json
from pathlib import Path
import re


def existing_file(raw_path: str) -> Path | None:
    """
    This function check if a given path correspond to an existing file, and if so
    creates the corresponding pathlib.Path object.
    :param raw_path: the given path (as a string)
    :return: the corresponding pathlib.Path object
    """
    if not Path(raw_path).exists():
        raise argparse.ArgumentTypeError('"{}" does not exist'.format(raw_path))
    elif not Path(raw_path).is_file():
        raise argparse.ArgumentTypeError('"{}" is not a file'.format(raw_path))
    return Path(raw_path)


class PyrrhaDump():
    def __init__(self, file: Path):
        self.data = json.load(file.open())
        self.sym_by_name = {x['name']: x for x in self.data['symbols'].values()}
        self.bin_by_path = {x['path']: x for x in self.data['binaries'].values()}

    def to_symbol_str(self, symbol_list: list[int]) -> set[str]:
        return set([self.data['symbols'][str(x)]['name'] for x in symbol_list])

    def to_binary_str(self, binary_list: list[int]) -> set[str]:
        return set([self.data['binaries'][str(x)]['name'] for x in binary_list])


def main():
    # parse command line to retrieve OTA path
    parser = argparse.ArgumentParser()
    parser.add_argument('json1', type=existing_file, help='Path to first JSON.')
    parser.add_argument('json2', type=existing_file, help='Path to second JSON.')
    args = parser.parse_args()

    pyrrha1 = PyrrhaDump(args.json1)
    pyrrha2 = PyrrhaDump(args.json2)

    set1o = set(s for s in pyrrha1.bin_by_path)  # sets without kernel mangling
    set2o = set(s for s in pyrrha2.bin_by_path)
    set1 = set(re.sub("\/lib\/modules\/\d+\.\d+\.\d+", "/[KERNEL_VERSION]", s) for s in pyrrha1.bin_by_path)
    set2 = set(re.sub("\/lib\/modules\/\d+\.\d+\.\d+", "/[KERNEL_VERSION]", s) for s in pyrrha2.bin_by_path)


    print(f"Binaries no longer in {args.json2}:")
    for b in set1 - set2:
        print(f"  - {b}")

    print(f"\nBinaries added in {args.json2}:")
    for b in set2 - set1:
        print(f"  - {b}")

    print("\nCommon binaries that have changed:")
    count = 0
    for b1, b2 in ((pyrrha1.bin_by_path[x], pyrrha2.bin_by_path[x]) for x in set1o.intersection(set2o)):
        libs1 = pyrrha1.to_binary_str(b1['imports']['lib']['ids'])
        libs2 = pyrrha2.to_binary_str(b2['imports']['lib']['ids'])
        is_different = False
        if libs1 != libs2:
            count += 1
            print(f"{b1['name']} have changed:")
            is_different = True
            if r := libs1 - libs2:
                print(f"  - lib removed: {r}")
            if r := libs2 - libs1:
                print(f"  - lib added: {r}")

        syms1 = pyrrha1.to_symbol_str(b1['imports']['symbols']['ids'])
        syms2 = pyrrha2.to_symbol_str(b2['imports']['symbols']['ids'])
        if syms1 != syms2:
            if not is_different:
                count += 1
                print(f"{b1['name']} have changed:")
            if r := syms1 - syms2:
                print(f"  - symbols removed: {r}")
            if r := syms2 - syms1:
                print(f"  - symbols added: {r}")
    print(f"Total having changed: {count}")


if __name__ == "__main__":
    main()