Skip to content

fs: ELF/PE imports/exports and the associated symlinks

Demo

An live demo of this mapper and how you can use NumbatUI to visualize its results is available here.

Usage

Mapping with Pyrrha

First, create your db with pyrrha. The ROOT_DIRECTORY should contain the whole filesystem you want to map, it should be already extracted or mounted. ROOT_DIRECTORY will be considered by Pyrrha as the filesystem root for all the symlink resolutions.

Usage: Usage: pyrrha fs [OPTIONS] ROOT_DIRECTORY

  Map a filesystem into a numbatui-compatible db. It maps ELF and PE files, their imports and their exports plus
  the symlinks that points on these executable files.

Options:
  -d, --debug     Set log level to DEBUG
  --db PATH       NumbatUI DB file path (.srctrldb).  [default: pyrrha.srctrldb]
  -e, --json      Create a JSON export of the resulting mapping.
  -j, --jobs INT  Number of parallel jobs created (threads).  [default: 1; 1<=x<=16]
  --ignore        When resolving duplicate imports, ignore them
  --arbitrary     When resolving duplicate imports, select the first one available
  --interactive   When resolving duplicate imports, user manually select which one to use
  -h, --help      Show this message and exit.

You can also export your Pyrrha results as a JSON file (option -j) to be able to postprocess them. For example, you can diff the results between two versions of the same system and list the binaries added/removed and which symbols has been added/removed (cf example script in example).

Visualization with NumbatUI

Open the resulting project with numbatui. You can now navigate on the resulting cartography. The user interface is described in depth in the NumbatUI documentation.

  • Symbols and libraries imported by libgcc_s.so.1.

  • Symlinks pointing on busybox.

Do not hesitate to take a look at NumbatUI documentation to explore all the possibilities offered by Sourcetrail. Custom Trails could be really useful in a lot of cases.

Sourcetrail Representation

If you are visualizing results with Sourcetrail, the exported functions and symbols, and the symlinks are represented as follows:

Binaries Exported functions Exported symbols Symlinks

Quick Start—Usage Example

Let's take the example of an OpenWRT firmware which is a common Linux distribution for embedded targets like routers.

First, download the firmware and extract its root-fs into a directory. Here we download the last OpenWRT version for generic x86_64 systems.

wget https://downloads.openwrt.org/releases/22.03.5/targets/x86/64/openwrt-22.03.5-x86-64-rootfs.tar.gz -O openwrt_rootfs.tar.gz
mkdir openwrt_root_fs && cd openwrt_root_fs
tar -xf ../openwrt_rootfs.tar.gz
cd .. && rm openwrt_rootfs.tar.gz

Then we can run Pyrrha on it. It will produce some logs indicating which symlinks or imports cannot be solved directly by the tool. (Do not forget to activate your virtualenv if you have created one for Pyrrha installation.)

pyrrha fs --db openwrt_db openwrt_root_fs -j $(nproc)

You can now navigate into the resulting cartography with NumbatUI.

numbatUI openwrt_db.srctrlprj

Pyrrha result opened with NumbatUI.

Postprocessing fs result: the diffing example

When you have to compare two bunch of executable files, for example two versions of the same firmware, it could be quickly difficult to determine where to start and have results in a short time.

Diffing could be a solution. However, as binary diffing can be quite time-consuming, a first approach could be to diff the symbols contained in the binary files to determine which ones were added/removed. For example, using this technics can help you to determines quickly the files that have changed their internal structures versus the files that only contained little update of their dependency. To do that, you can use the JSON export of fs parser results.

The following script prints on the standard output the list of files that has been added/removed and then the symbol changes file by file.

Diffing Pyrrha Exports
#!/usr/bin/env python3
"""Diff two Pyrrha result JSON exports.It removes the kernel mangling."""

import argparse
from pathlib import Path

from pyrrha_mapper import FileSystem


def existing_file(raw_path: str) -> Path | None:
    """Check if a path correspond to an existing file and transform it into a pathlib.Path object.

    :param raw_path: the given path (as a string)
    :return: the corresponding pathlib.Path object
    """
    if not Path(raw_path).exists():
        raise argparse.ArgumentTypeError('"{}" does not exist'.format(raw_path))
    elif not Path(raw_path).is_file():
        raise argparse.ArgumentTypeError('"{}" is not a file'.format(raw_path))
    return Path(raw_path)


def main():
    """Diff two exports of `fs` result."""
    parser = argparse.ArgumentParser()
    parser.add_argument("json1", type=existing_file, help="Path to old filesystem JSON.")
    parser.add_argument("json2", type=existing_file, help="Path to new filesystem JSON.")
    args = parser.parse_args()

    old_fs = FileSystem.from_json_export(args.json1)
    new_fs = FileSystem.from_json_export(args.json2)

    # Compute and display changes of binaries
    old_bins = {b.path for b in old_fs.iter_binaries()}
    new_bins = {b.path for b in new_fs.iter_binaries()}
    added_bin = new_bins - old_bins
    removed_bin = old_bins - new_bins
    for type, bin_set in [("no longer", removed_bin), ("added", added_bin)]:
        print(f"\nBinaries {type} in {args.json2}:")
        for b in bin_set:
            print(f"\t- {b}")

    print("\nCommon binaries that have changed:")
    count = 0
    for b1, b2 in (
        (old_fs.get_binary_by_path(path), new_fs.get_binary_by_path(path))
        for path in old_bins.intersection(new_bins)
    ):
        is_different = False
        old_libs, new_libs = set(b1.imported_library_names), set(b2.imported_library_names)
        if old_libs != new_libs:
            count += 1
            print(f"{b1.name} have changed:")
            is_different = True
            for type, bin_set in [("removed", old_libs - new_libs), ("added", new_libs - old_libs)]:
                for lib in bin_set:
                    print(f"\t- lib {type}: {lib}")

        old_symbs, new_symbs = set(b1.imported_symbol_names), set(b2.imported_symbol_names)
        if old_symbs != new_symbs:
            if not is_different:
                count += 1
                print(f"{b1.name} have changed:")
            is_different = True
            for type, bin_set in [
                ("removed", old_symbs - new_symbs),
                ("added", new_symbs - old_symbs),
            ]:
                for lib in bin_set:
                    print(f"\t- imported symbol {type}: {lib}")
    print(f"Total having changed: {count}")


if __name__ == "__main__":
    main()