debuginfod server
Related issue: Provide debuginfod server
Debusine builds and hosts Debian repositories but currently has no way to serve
debug symbols to developers. When a crash occurs, a developer must manually
find and install the correct -dbgsym package for the exact binary they are
debugging. This blueprint proposes bringing
debuginfod server
functionality to Debusine, allowing gdb(1) to automatically fetch
debug symbols from a Debusine archive without any manual steps.
Goals
Extract
.debugELF files from-dbgsympackages during the Sbuild task and store them as discrete artifacts in Debusine.Serve those artifacts through debuginfod-compatible HTTP endpoints (
/buildid/<id>/debuginfo), scoped per archive.Document how to configure gdb(1) to use a Debusine archive as a debuginfod source, so developers can get automatic symbol resolution with a single
DEBUGINFOD_URLSenvironment variable.
Requirements
The extraction pipeline must run inside the isolated sbuild worker so that a malformed or malicious ELF file cannot affect the web server or other builds.
Each extracted
.debugfile must be stored as a Debusine artifact indexed by its build-ID for fast HTTP lookups.Debug symbol artifacts must be published into a debian:suite alongside their parent debian:binary-package artifacts, via the existing
RELATES_TOrelation and CopyCollectionItems mechanism.debuginfod URLs must be resolvable from the root of each archive, so that a single
DEBUGINFOD_URLSentry covers all suites in the archive.The HTTP response must include the headers required by the debuginfod protocol specification:
X-DEBUGINFOD-FILEandX-DEBUGINFOD-SIZE.Responses must support HTTP 206 Partial Content so that gdb(1) can fetch individual ELF sections without downloading the full file.
Two debian:debug-symbols artifacts in the same archive that share a build-ID must have identical file contents, analogous to the existing pool-file uniqueness constraints.
The debian:archive collection must expose a per-archive configuration option (similar to Launchpad’s
build_debug_symbols) controlling whether debug symbols are extracted. When disabled,DEB_BUILD_OPTIONS=noautodbgsymis passed to sbuild to suppress-dbgsympackage generation entirely.
Out of scope
The following parts of the official debuginfod specification are excluded:
Source file serving (
/buildid/<id>/source): the debuginfod(8) man page explicitly notes that, due to Debian and Ubuntu packaging policies, debuginfod cannot resolve source files for.deband.ddebpackages. debuginfod.debian.net has the same limitation.Executable serving (
/buildid/<id>/executable): debian:binary-package stores the whole.debas a single file rather than broken-out binaries. Extracting individual executables on-the-fly inside a request handler is not feasible; pre-extraction at build time would require significant additional storage and pipeline changes that are out of scope for the initial implementation.Metrics endpoint (
/metrics): this Prometheus statistics endpoint is designed for a standalone C++ daemon; Debusine’s existing application-level logging and system-wide monitoring tools are more appropriate.Metadata search (
/metadata): implementing a searchable JSON index for build-ID metadata requires a separate database design that is out of scope for the initial implementation.IMA signatures (
X-DEBUGINFOD-IMA-SIGNATURE): this response header carries per-file Integrity Measurement Architecture signatures used primarily for RPM packages and has no standard applicability to Debian.debpackages.Upstream federation: automatically forwarding unresolved build-ID requests to upstream servers such as
debuginfod.debian.netis excluded. Debusine’s primary use case is serving self-hosted, private, or localised archives where the operator controls the source.DWZ supplement ingestion:
-dbgsym.debpackages built with DWZ compression ship a shared supplement ELF file at./usr/lib/debug/.dwz/<name>.debugthat several regular.debugfiles reference via their.gnu_debugaltlinksection. Ingesting these supplements is excluded from the initial implementation to keep the first iteration small. Until DWZ support is added, gdb(1) requests for the supplement’s build-ID return404; the practical effect is that DWZ-using debug info renders with the alternate strings table missing, not that debugging fails entirely.
Background: dbgsym packages and ELF build-IDs
When Debian builds a binary package it strips debug information to keep the
shipped binary small. For example, util-linux_2.40.2-1_amd64.deb
contains the stripped binary while
util-linux-dbgsym_2.40.2-1_amd64.deb contains the DWARF debug symbols
for the exact same binary.
Inside the data.tar of every -dbgsym.deb the debug files follow a
fixed path convention:
./usr/lib/debug/.build-id/XX/YYYY.debug
where XX is the first two hex characters of the 40-character build-ID and
YYYY is the remaining 38 hex characters. The build-ID is assigned at link
time and uniquely identifies a specific binary.
ELF classification uses three sections:
.note.gnu.build-idContains the raw build-ID bytes. Reading them sequentially and converting to lowercase hex yields the familiar 40-character string.
.debug_info/.gnu_debugdataPresence of either section confirms that DWARF debug information is embedded, marking the file as a
.debugartifact to be extracted..gnu_debugaltlinkPresent when a package was built with DWZ compression. Contains the build-ID of a shared DWZ supplement file living at
./usr/lib/debug/.dwz/<name>.debuginside the same-dbgsym.deb. DWZ supplement ingestion is excluded from the initial implementation (see “Out of scope” above); only files under./usr/lib/debug/.build-id/are ingested.
Implementation plan
- debian:debug-symbols
New artifact category: debian:debug-symbols
A new artifact category debian:debug-symbols will be added as an enum
value in debusine/artifacts/models.py. Each instance represents all
.debug ELF files extracted from a single -dbgsym package. Files are
stored as multiple entries within one artifact; each entry’s
FileInArtifact.path is set to
usr/lib/debug/.build-id/<XX>/<YYYY>.debug, where <XX> is the first
two hex characters of the build-ID and <YYYY> is the remaining 38
characters — the same path the file occupies inside the -dbgsym.deb
with the leading ./ stripped. Storing all debug files for one
-dbgsym package in a single artifact avoids creating one artifact per
debug file, which would place excessive load on the database during
publishing and expiry.
The artifact data carries the following field:
Field |
Type |
Purpose |
|---|---|---|
|
list of 40-char hex strings |
Index of all build-IDs contained in this artifact |
X-DEBUGINFOD-SIZE is derived from the stored file size at serve time
and does not need to be persisted.
Extraction pipeline in the Sbuild task
Debusine already opens .deb files in upload_artifact() to read
control data and create debian:binary-package artifacts. The
debug-symbol extraction follows the same pattern, added as two new helpers
called from _upload_binary_packages():
_upload_debug_symbols(dbgsym_deb)For each
-dbgsym.debin the build output:Open the
data.tararchive.Iterate over every file whose path matches
./usr/lib/debug/.build-id/**/*.debug. Each such file becomes one entry in the resulting debian:debug-symbols artifact, with its build-ID joining the artifact’sbuild_idslist and itsFileInArtifact.pathset tousr/lib/debug/.build-id/<XX>/<YYYY>.debug(the same in-tar path with the leading./stripped). Files under./usr/lib/debug/.dwz/(DWZ supplements) are not ingested in the initial implementation; see “Out of scope” above.For each file, parse the ELF structure with
pyelftoolsto locate the.note.gnu.build-idsection and extract the build-ID. Also check for.debug_info/.gnu_debugdatato confirm it is a debug file.Verify that the in-tar path matches the convention
./usr/lib/debug/.build-id/<XX>/<YYYY>.debug, where<XX>and<YYYY>are derived from the build-ID extracted in step 3. If the path does not match, the task fails with an explanatory error.dh_strip’smake_debugfunction constructs this path deterministically from the build-ID and we are not aware of any tooling in Debian that constructs-dbgsym.debpackages by hand, so a mismatch indicates a malformed package and must not be silently ingested.Accumulate all such files into a single debian:debug-symbols artifact and upload it.
_create_debug_symbol_relations(debug_artifact, binary_artifact)Records a
RELATES_TOrelation from each debian:debug-symbols artifact to its parent debian:binary-package artifact.
Running extraction inside the sbuild worker confines the blast radius of any malformed or malicious ELF input to the isolated worker process and avoids re-fetching files from artifact storage, since all build output is already present on disk.
Collection specification changes
The specs in docs/reference/collections/specs/ must be updated to
reflect the new artifact type:
debian:suiteanddebian:archive: a uniqueness constraint is added in both collection specifications, in the same shape as the existingpool-fileconstraints. Within either a single debian:suite or a single debian:archive, two debian:debug-symbols collection items that share a build-ID must refer to files with identical contents. The suite-level constraint allows the archive-level constraint to be relaxed when an obsolete suite is removed; the archive-level constraint prevents two suites in the same archive from disagreeing about the file for a given build-ID. The constraint text in the two specifications is essentially identical.debian:suite: debian:debug-symbols becomes a valid item type. See the per-item data table under “Publishing into a suite” below.debian:archive: a new boolean data fieldbuild_debug_symbols(defaulttrue) controls whether debug symbols are extracted for this archive.
Database index
A new Django migration adds a partial B-tree index on
CollectionItem.data->>'build_id' conditioned on:
category = 'debian:debug-symbols'child_type = 'a'(artifact item in Debusine’sCollectionItemmodel)parent_category = 'debian:suite'
This pattern is taken directly from migration 0005, which adds a similar
partial index for repository index path lookups. The partial condition keeps
the index small by covering only debug-symbol rows in suite collections,
excluding the much larger set of binary and source package rows that carry no
build_id field.
Because URLs are anchored at the archive level, build-IDs must be unique across the entire archive (not just within a single suite). This is enforced by the uniqueness constraints described in the collection specification changes above (one each in debian:suite and debian:archive), rather than by the index itself.
Publishing into a suite
When a binary package is published into a suite, package_publish
must pull in the matching debug symbols alongside it. A new helper
_add_debug_symbols() follows the RELATES_TO relation recorded during
the Sbuild task to obtain the debian:debug-symbols
artifact, then queues it for copying into the suite via
CopyCollectionItems.
Inside DebianSuiteManager.do_add_artifact(), a new elif branch handles
the DEBUG_SYMBOLS category and creates one CollectionItem row per
build-ID contained in the artifact:
Field |
Type |
Source |
|---|---|---|
|
40-char hex string |
The build-ID this collection item represents |
|
string |
Mirrored from the parent debian:binary-package item |
|
string |
Mirrored from the parent debian:binary-package item |
|
string |
Mirrored from the parent debian:binary-package item |
|
string |
Mirrored from the parent debian:binary-package item |
|
string |
Mirrored from the parent debian:binary-package item |
The collection item is named debugsym:<build-id>. The “parent”
binary-package item is the one reached via the RELATES_TO relation
recorded by _create_debug_symbol_relations() during the
Sbuild task.
If the suite already contains a CollectionItem with the same
(name, parent_collection) — which occurs when a source package is built
reproducibly more than once — the existing item’s file hash is compared
against the incoming file. If the hashes match the collision is logged and
ignored; if they differ an error is raised, as this would indicate a
non-reproducible build with the same build-ID, which is a toolchain problem.
HTTP serving
DebugInfoView
The view is scoped to an archive (inherits from ArchiveFileView) rather
than a suite, so that a single DEBUGINFOD_URLS entry resolves build-IDs
across all suites in the archive.
URL pattern (appended automatically by gdb(1)):
https://<archive-host>/<scope>/<workspace>/buildid/<build-id>/debuginfo
The view queries CollectionItem filtered by build_id and archive,
hitting the partial B-tree index for a fast lookup. It then calls the
existing stream_file() helper and appends the mandatory response headers:
Header |
Source |
Description |
|---|---|---|
|
computed from |
Path of the |
|
|
File size in bytes |
gdb(1) sends a HEAD request before GET to check
availability. Django’s
django.views.generic.base.View.setup already aliases head to
get when get is defined and head is not, so DebugInfoView
needs no explicit handling. HEAD is covered by unit tests alongside
GET and Range: requests.