fs
: ELF/PE imports/exports and the associated symlinks
Usage
Mapping with Pyrrha
First, create your db with pyrrha
. The ROOT_DIRECTORY
should contain the whole filesystem you want to map, it should be already extracted or mounted. ROOT_DIRECTORY
will be considered by Pyrrha as the filesystem root for all the symlink resolutions.
Usage: pyrrha fs [OPTIONS] ROOT_DIRECTORY
Map a filesystem into a sourcetrail-compatible db. It maps ELF and PE files, their imports and their
exports plus the symlinks that points on these executable files.
Options:
-d, --debug Set log level to DEBUG
--db PATH Sourcetrail DB file path (.srctrldb). [default: pyrrha.srctrldb]
-e, --json Create a JSON export of the resulting mapping.
-j, --jobs INT Number of parallel jobs created (threads). [default: 1; 1<=x<=16]
-h, --help Show this message and exit.
You can also export your Pyrrha results as a JSON file (option -j
) to be able to postprocess them. For example, you can diff the results between two versions of the same system and list the binaries added/removed and which symbols has been added/removed (cf example script in example
).
Visualization with Sourcetrail
Open the resulting project with sourcetrail
. You can now navigate on the resulting cartography. The user interface is described in depth in the Sourcetrail documentation.
To match the Sourcetrail language, the binaries, the exported functions and symbols, and the symlinks are represented as follows in Sourcetrail.
Binaries | Exported functions | Exported symbols | Symlinks |
---|---|---|---|
Do not hesitate to take a look at Sourcetrail documentation to explore all the possibilities offered by Sourcetrail. Custom Trails could be really useful in a lot of cases.
Demo
An live demo of how we can use Sourcetrail to visualize this mapper results is available here.
Quick Start—Usage Example
Let's take the example of an OpenWRT firmware which is a common Linux distribution for embedded targets like routers.
First, download the firmware and extract its root-fs into a directory. Here we download the last OpenWRT version for generic x86_64 systems.
$ wget https://downloads.openwrt.org/releases/22.03.5/targets/x86/64/openwrt-22.03.5-x86-64-rootfs.tar.gz -O openwrt_rootfs.tar.gz
$ mkdir openwrt_root_fs && cd openwrt_root_fs
$ tar -xf ../openwrt_rootfs.tar.gz
$ cd .. && rm openwrt_rootfs.tar.gz
Then we can run Pyrrha on it. It will produce some logs indicating which symlinks or imports cannot be solved directly by the tool. (Do not forget to activate your virtualenv if you have created one for Pyrrha installation.)
$ pyrrha fs --db openwrt_db openwrt_root_fs
$ ls
openwrt_root_fs openwrt_db.srctrldb openwrt_db.srctrlprj
You can now navigate into the resulting cartography with Sourcetrail.
Postprocessing fs
result: the diffing example
When you have to compare two bunch of executable files, for example two versions of the same firmware, it could be quickly difficult to determine where to start and have results in a short time.
Diffing could be a solution. However, as binary diffing can be quite time-consuming, a first approach could be to diff the symbols contained in the binary files to determine which ones were added/removed. For example, using this technics can help you to determines quickly the files that have changed their internal structures versus the files that only contained little update of their dependency. To do that, you can use the JSON export of fs
parser results.
The following script prints on the standard output the list of files that has been added/removed and then the symbol changes file by file.
examples/diffing_pyrrha_export.py
#!/usr/bin/env python3
"""This script diff two Pyrrha result JSON exports.
It removes the kernel mangling"""
import argparse
import json
from pathlib import Path
import re
def existing_file(raw_path: str) -> Path | None:
"""
This function check if a given path correspond to an existing file, and if so
creates the corresponding pathlib.Path object.
:param raw_path: the given path (as a string)
:return: the corresponding pathlib.Path object
"""
if not Path(raw_path).exists():
raise argparse.ArgumentTypeError('"{}" does not exist'.format(raw_path))
elif not Path(raw_path).is_file():
raise argparse.ArgumentTypeError('"{}" is not a file'.format(raw_path))
return Path(raw_path)
class PyrrhaDump():
def __init__(self, file: Path):
self.data = json.load(file.open())
self.sym_by_name = {x['name']: x for x in self.data['symbols'].values()}
self.bin_by_path = {x['path']: x for x in self.data['binaries'].values()}
def to_symbol_str(self, symbol_list: list[int]) -> set[str]:
return set([self.data['symbols'][str(x)]['name'] for x in symbol_list])
def to_binary_str(self, binary_list: list[int]) -> set[str]:
return set([self.data['binaries'][str(x)]['name'] for x in binary_list])
def main():
# parse command line to retrieve OTA path
parser = argparse.ArgumentParser()
parser.add_argument('json1', type=existing_file, help='Path to first JSON.')
parser.add_argument('json2', type=existing_file, help='Path to second JSON.')
args = parser.parse_args()
pyrrha1 = PyrrhaDump(args.json1)
pyrrha2 = PyrrhaDump(args.json2)
set1o = set(s for s in pyrrha1.bin_by_path) # sets without kernel mangling
set2o = set(s for s in pyrrha2.bin_by_path)
set1 = set(re.sub("\/lib\/modules\/\d+\.\d+\.\d+", "/[KERNEL_VERSION]", s) for s in pyrrha1.bin_by_path)
set2 = set(re.sub("\/lib\/modules\/\d+\.\d+\.\d+", "/[KERNEL_VERSION]", s) for s in pyrrha2.bin_by_path)
print(f"Binaries no longer in {args.json2}:")
for b in set1 - set2:
print(f" - {b}")
print(f"\nBinaries added in {args.json2}:")
for b in set2 - set1:
print(f" - {b}")
print("\nCommon binaries that have changed:")
count = 0
for b1, b2 in ((pyrrha1.bin_by_path[x], pyrrha2.bin_by_path[x]) for x in set1o.intersection(set2o)):
libs1 = pyrrha1.to_binary_str(b1['imports']['lib']['ids'])
libs2 = pyrrha2.to_binary_str(b2['imports']['lib']['ids'])
is_different = False
if libs1 != libs2:
count += 1
print(f"{b1['name']} have changed:")
is_different = True
if r := libs1 - libs2:
print(f" - lib removed: {r}")
if r := libs2 - libs1:
print(f" - lib added: {r}")
syms1 = pyrrha1.to_symbol_str(b1['imports']['symbols']['ids'])
syms2 = pyrrha2.to_symbol_str(b2['imports']['symbols']['ids'])
if syms1 != syms2:
if not is_different:
count += 1
print(f"{b1['name']} have changed:")
if r := syms1 - syms2:
print(f" - symbols removed: {r}")
if r := syms2 - syms1:
print(f" - symbols added: {r}")
print(f"Total having changed: {count}")
if __name__ == "__main__":
main()