debuginfod server

Related issue: Provide debuginfod server

Debusine builds and hosts Debian repositories but currently has no way to serve debug symbols to developers. When a crash occurs, a developer must manually find and install the correct -dbgsym package for the exact binary they are debugging. This blueprint proposes bringing debuginfod server functionality to Debusine, allowing gdb(1) to automatically fetch debug symbols from a Debusine archive without any manual steps.

Goals

  • Extract .debug ELF files from -dbgsym packages during the Sbuild task and store them as discrete artifacts in Debusine.

  • Serve those artifacts through debuginfod-compatible HTTP endpoints (/buildid/<id>/debuginfo), scoped per archive.

  • Document how to configure gdb(1) to use a Debusine archive as a debuginfod source, so developers can get automatic symbol resolution with a single DEBUGINFOD_URLS environment variable.

Requirements

  • The extraction pipeline must run inside the isolated sbuild worker so that a malformed or malicious ELF file cannot affect the web server or other builds.

  • Each extracted .debug file must be stored as a Debusine artifact indexed by its build-ID for fast HTTP lookups.

  • Debug symbol artifacts must be published into a debian:suite alongside their parent debian:binary-package artifacts, via the existing RELATES_TO relation and CopyCollectionItems mechanism.

  • debuginfod URLs must be resolvable from the root of each archive, so that a single DEBUGINFOD_URLS entry covers all suites in the archive.

  • The HTTP response must include the headers required by the debuginfod protocol specification: X-DEBUGINFOD-FILE and X-DEBUGINFOD-SIZE.

  • Responses must support HTTP 206 Partial Content so that gdb(1) can fetch individual ELF sections without downloading the full file.

  • Two debian:debug-symbols artifacts in the same archive that share a build-ID must have identical file contents, analogous to the existing pool-file uniqueness constraints.

  • The debian:archive collection must expose a per-archive configuration option (similar to Launchpad’s build_debug_symbols) controlling whether debug symbols are extracted. When disabled, DEB_BUILD_OPTIONS=noautodbgsym is passed to sbuild to suppress -dbgsym package generation entirely.

Out of scope

The following parts of the official debuginfod specification are excluded:

  • Source file serving (/buildid/<id>/source): the debuginfod(8) man page explicitly notes that, due to Debian and Ubuntu packaging policies, debuginfod cannot resolve source files for .deb and .ddeb packages. debuginfod.debian.net has the same limitation.

  • Executable serving (/buildid/<id>/executable): debian:binary-package stores the whole .deb as a single file rather than broken-out binaries. Extracting individual executables on-the-fly inside a request handler is not feasible; pre-extraction at build time would require significant additional storage and pipeline changes that are out of scope for the initial implementation.

  • Metrics endpoint (/metrics): this Prometheus statistics endpoint is designed for a standalone C++ daemon; Debusine’s existing application-level logging and system-wide monitoring tools are more appropriate.

  • Metadata search (/metadata): implementing a searchable JSON index for build-ID metadata requires a separate database design that is out of scope for the initial implementation.

  • IMA signatures (X-DEBUGINFOD-IMA-SIGNATURE): this response header carries per-file Integrity Measurement Architecture signatures used primarily for RPM packages and has no standard applicability to Debian .deb packages.

  • Upstream federation: automatically forwarding unresolved build-ID requests to upstream servers such as debuginfod.debian.net is excluded. Debusine’s primary use case is serving self-hosted, private, or localised archives where the operator controls the source.

  • DWZ supplement ingestion: -dbgsym.deb packages built with DWZ compression ship a shared supplement ELF file at ./usr/lib/debug/.dwz/<name>.debug that several regular .debug files reference via their .gnu_debugaltlink section. Ingesting these supplements is excluded from the initial implementation to keep the first iteration small. Until DWZ support is added, gdb(1) requests for the supplement’s build-ID return 404; the practical effect is that DWZ-using debug info renders with the alternate strings table missing, not that debugging fails entirely.

Background: dbgsym packages and ELF build-IDs

When Debian builds a binary package it strips debug information to keep the shipped binary small. For example, util-linux_2.40.2-1_amd64.deb contains the stripped binary while util-linux-dbgsym_2.40.2-1_amd64.deb contains the DWARF debug symbols for the exact same binary.

Inside the data.tar of every -dbgsym.deb the debug files follow a fixed path convention:

./usr/lib/debug/.build-id/XX/YYYY.debug

where XX is the first two hex characters of the 40-character build-ID and YYYY is the remaining 38 hex characters. The build-ID is assigned at link time and uniquely identifies a specific binary.

ELF classification uses three sections:

.note.gnu.build-id

Contains the raw build-ID bytes. Reading them sequentially and converting to lowercase hex yields the familiar 40-character string.

.debug_info / .gnu_debugdata

Presence of either section confirms that DWARF debug information is embedded, marking the file as a .debug artifact to be extracted.

.gnu_debugaltlink

Present when a package was built with DWZ compression. Contains the build-ID of a shared DWZ supplement file living at ./usr/lib/debug/.dwz/<name>.debug inside the same -dbgsym.deb. DWZ supplement ingestion is excluded from the initial implementation (see “Out of scope” above); only files under ./usr/lib/debug/.build-id/ are ingested.

Implementation plan

debian:debug-symbols

New artifact category: debian:debug-symbols

A new artifact category debian:debug-symbols will be added as an enum value in debusine/artifacts/models.py. Each instance represents all .debug ELF files extracted from a single -dbgsym package. Files are stored as multiple entries within one artifact; each entry’s FileInArtifact.path is set to usr/lib/debug/.build-id/<XX>/<YYYY>.debug, where <XX> is the first two hex characters of the build-ID and <YYYY> is the remaining 38 characters — the same path the file occupies inside the -dbgsym.deb with the leading ./ stripped. Storing all debug files for one -dbgsym package in a single artifact avoids creating one artifact per debug file, which would place excessive load on the database during publishing and expiry.

The artifact data carries the following field:

Field

Type

Purpose

build_ids

list of 40-char hex strings

Index of all build-IDs contained in this artifact

X-DEBUGINFOD-SIZE is derived from the stored file size at serve time and does not need to be persisted.

Extraction pipeline in the Sbuild task

Debusine already opens .deb files in upload_artifact() to read control data and create debian:binary-package artifacts. The debug-symbol extraction follows the same pattern, added as two new helpers called from _upload_binary_packages():

_upload_debug_symbols(dbgsym_deb)

For each -dbgsym.deb in the build output:

  1. Open the data.tar archive.

  2. Iterate over every file whose path matches ./usr/lib/debug/.build-id/**/*.debug. Each such file becomes one entry in the resulting debian:debug-symbols artifact, with its build-ID joining the artifact’s build_ids list and its FileInArtifact.path set to usr/lib/debug/.build-id/<XX>/<YYYY>.debug (the same in-tar path with the leading ./ stripped). Files under ./usr/lib/debug/.dwz/ (DWZ supplements) are not ingested in the initial implementation; see “Out of scope” above.

  3. For each file, parse the ELF structure with pyelftools to locate the .note.gnu.build-id section and extract the build-ID. Also check for .debug_info / .gnu_debugdata to confirm it is a debug file.

  4. Verify that the in-tar path matches the convention ./usr/lib/debug/.build-id/<XX>/<YYYY>.debug, where <XX> and <YYYY> are derived from the build-ID extracted in step 3. If the path does not match, the task fails with an explanatory error. dh_strip’s make_debug function constructs this path deterministically from the build-ID and we are not aware of any tooling in Debian that constructs -dbgsym.deb packages by hand, so a mismatch indicates a malformed package and must not be silently ingested.

  5. Accumulate all such files into a single debian:debug-symbols artifact and upload it.

_create_debug_symbol_relations(debug_artifact, binary_artifact)

Records a RELATES_TO relation from each debian:debug-symbols artifact to its parent debian:binary-package artifact.

Running extraction inside the sbuild worker confines the blast radius of any malformed or malicious ELF input to the isolated worker process and avoids re-fetching files from artifact storage, since all build output is already present on disk.

Collection specification changes

The specs in docs/reference/collections/specs/ must be updated to reflect the new artifact type:

  • debian:suite and debian:archive: a uniqueness constraint is added in both collection specifications, in the same shape as the existing pool-file constraints. Within either a single debian:suite or a single debian:archive, two debian:debug-symbols collection items that share a build-ID must refer to files with identical contents. The suite-level constraint allows the archive-level constraint to be relaxed when an obsolete suite is removed; the archive-level constraint prevents two suites in the same archive from disagreeing about the file for a given build-ID. The constraint text in the two specifications is essentially identical.

  • debian:suite: debian:debug-symbols becomes a valid item type. See the per-item data table under “Publishing into a suite” below.

  • debian:archive: a new boolean data field build_debug_symbols (default true) controls whether debug symbols are extracted for this archive.

Database index

A new Django migration adds a partial B-tree index on CollectionItem.data->>'build_id' conditioned on:

  • category = 'debian:debug-symbols'

  • child_type = 'a' (artifact item in Debusine’s CollectionItem model)

  • parent_category = 'debian:suite'

This pattern is taken directly from migration 0005, which adds a similar partial index for repository index path lookups. The partial condition keeps the index small by covering only debug-symbol rows in suite collections, excluding the much larger set of binary and source package rows that carry no build_id field.

Because URLs are anchored at the archive level, build-IDs must be unique across the entire archive (not just within a single suite). This is enforced by the uniqueness constraints described in the collection specification changes above (one each in debian:suite and debian:archive), rather than by the index itself.

Publishing into a suite

When a binary package is published into a suite, package_publish must pull in the matching debug symbols alongside it. A new helper _add_debug_symbols() follows the RELATES_TO relation recorded during the Sbuild task to obtain the debian:debug-symbols artifact, then queues it for copying into the suite via CopyCollectionItems.

Inside DebianSuiteManager.do_add_artifact(), a new elif branch handles the DEBUG_SYMBOLS category and creates one CollectionItem row per build-ID contained in the artifact:

Per-item data for debian:debug-symbols items in a suite

Field

Type

Source

build_id

40-char hex string

The build-ID this collection item represents

srcpkg_name

string

Mirrored from the parent debian:binary-package item

srcpkg_version

string

Mirrored from the parent debian:binary-package item

package

string

Mirrored from the parent debian:binary-package item

version

string

Mirrored from the parent debian:binary-package item

architecture

string

Mirrored from the parent debian:binary-package item

The collection item is named debugsym:<build-id>. The “parent” binary-package item is the one reached via the RELATES_TO relation recorded by _create_debug_symbol_relations() during the Sbuild task.

If the suite already contains a CollectionItem with the same (name, parent_collection) — which occurs when a source package is built reproducibly more than once — the existing item’s file hash is compared against the incoming file. If the hashes match the collision is logged and ignored; if they differ an error is raised, as this would indicate a non-reproducible build with the same build-ID, which is a toolchain problem.

HTTP serving

DebugInfoView

The view is scoped to an archive (inherits from ArchiveFileView) rather than a suite, so that a single DEBUGINFOD_URLS entry resolves build-IDs across all suites in the archive.

URL pattern (appended automatically by gdb(1)):

https://<archive-host>/<scope>/<workspace>/buildid/<build-id>/debuginfo

The view queries CollectionItem filtered by build_id and archive, hitting the partial B-tree index for a fast lookup. It then calls the existing stream_file() helper and appends the mandatory response headers:

Header

Source

Description

X-DEBUGINFOD-FILE

computed from build_id as /usr/lib/debug/.build-id/<XX>/<YYYY>.debug

Path of the .debug file within the binary package. The leading ./ of the in-tar path is an implementation detail of the .deb format and is stripped before the value is emitted.

X-DEBUGINFOD-SIZE

file_in_artifact.file.size

File size in bytes

gdb(1) sends a HEAD request before GET to check availability. Django’s django.views.generic.base.View.setup already aliases head to get when get is defined and head is not, so DebugInfoView needs no explicit handling. HEAD is covered by unit tests alongside GET and Range: requests.