Collecting debug symbols

jamesh · August 23, 2018, 11:17am

As we are now shipping a number of snaps in the default install on Ubuntu, the question has come up about how to collect and process crash reports for these applications.

The actual collection of core dumps can probably be done with the existing apport/whoopsie infrastructure (once they learn how to identify the core dump as coming from a snap). But without debug symbols, there is a limit to what we can do to analyse the problem.

I know that there were some discussions about how such a system could work at the October 2016 sprint in The Hague (way back in the Ubuntu Phone days), but am I right in thinking that nothing went beyond those vague plans?

I’ve been thinking a bit about how Snapcraft could be modified to handle this, so here are my thoughts running roughly top down. If no one else is working on this I’ll have a go implementing some of this, so would appreciate feedback on the overall design.

Don’t ship debug symbols in the .snap

As with Debian packages, we don’t want to separate debug symbols from the .snap package: they increase the size of the package while being of little interest to the vast majority of users, and in they could reveal confidential information in the case of proprietary software.

So I propose that the snapcraft snap command generate a $package_$version_$arch.dbgsym.tar.gz file alongside $package_$version_$arch.snap. This tarball would contain a hierarchy matching what is generally found under /usr/lib/debug. In particular, favouring the .build-id/NN/...debug layout.

Gathering debug symbols for the archive

We already have code in place to read information out of the ELF files destined for a snap using pyelftools, so it should be fairly trivial to extend it to collect the build ID found in the .note.gnu.build-id section. It would then iterate through each part searching for debug symbols matching any of the primed build IDs.

We’d probably need some other strategy to handle debug symbols for system libraries that are brought in, not belonging to any part.

Parts generating debug symbols

Part plugins will need to be modified to do two things:

Configure the build so that debug symbols are generated.
After the build/install completes, process files in parts/$part/install to detach debug symbols and place them in parts/$part/debug (again resembling the normal /usr/lib/debug hierarchy).

The first step is likely to be specific to a particular plugin, but most of the logic for the second can likely be shared. I’d probably do this as a utility method that can be called by the plugin, so it can be replaced if there are any exotic plugins.

One other area that will need special attention is stage-packages: the binaries from these staged packages are already stripped of debug symbols so need to be handled specially. I think something like this should work:

add the corresponding ddebs sources to the Apt cache used for retrieving stage packages.
iterate through all the staged packages:
- If a $name-dbgsym package exists, download it.
- Unpack the dbgsym package and copy the contents of its /usr/lib/debug tree to the part’s debug tree.

I think it should be possible to implement something equivalent for RPM packages when Snapcraft grows support for building against a Fedora base.

For edge cases, it would probably also be useful to let a scriptlet produce debug symbols too. Maybe the existing scriptlets are enough if we just let them write to the parts/$part/debug/ directory directly. The main use case I can see for this is collecting debug symbols for the core snap (and other base snaps built in a similar fashion).

This is clearly only one part of a retracing solution. Off the top of my head, we’d also need the following:

some way to the debug symbols associated with particular revisions of a snap uploaded to the store. Maybe the store should eventually be responsible for this, but it doesn’t need to be.
Apport needs to learn about snaps. In particular:
- when the core file is for an executable found under /snap, recognise it as coming from a snap.
- Record the name and revision number of the snap the executable comes from.
- For strictly confined snaps, record the name and revision of the base snap in use.
- Check for connected content interface plugs, and record the name and revision of the corresponding slot snaps. This is important for cases like the GNOME platform snap.
On the retracing end, somehow retrieve the dbgsym tarballs for each of the snaps referenced in the report. Unpack those tarballs and configure GDB to use these additional sets of debug symbols when retracing the core dump.

Any thoughts on this overall design?

evan · August 23, 2018, 12:02pm

Have you looked at Sentry and its Minidump support?

sergiusens · August 23, 2018, 12:23pm

I was going to suggest breakpad, but minidump already alludes to it.

jamesh · August 23, 2018, 1:39pm

I hadn’t looked at that project. Based on its documentation it still relies on the developer providing them with the debug symbols, so having snapcraft produce that data as an artefact of package build still seems like a necessary first step.

That’s the part of the pipeline I’ve looked at in most detail, and would appreciate feedback on.

jamesh · August 29, 2018, 7:37am

I’ve put together a small pull request with some minor changes to get some of this started. Namely, extending the ElfFile class to identify the build ID of files, and whether they contain debug info:

The next step would be to get a part to generate separated debug info and strip the executables as part of its build process.

evan · August 30, 2018, 11:43am

We’ve added a session to discuss this in Brussels. We’ll get notes pasted back here for the benefit of all.

jamesh · September 4, 2018, 11:31am

I’ve been putting together a trial implementation of collecting debug symbols while building a part:

What works:

using the previous ElfFile PR, the pluginhandler code detects ELF files with debug info, and separates it into a file in parts/$partname/debug named after the file’s build ID. Then strip the original file and add a link to the debug info.
Handle the cross-compile case by using the appropriate architecture version of objcopy and strip.

What doesn’t work:

handle ELF files without a build ID. Some of the builds in the test suite didn’t seem to be doing this, which might just be tests using an old Go toolchain to build. I haven’t investigated more deeply.
locate debug info for files from stage-packages.
Snapcraft doesn’t set any default compile/link flags, so I needed to modify my project to build with -g.

It was enough to get GDB to debug an executable built by snapcraft build:

(gdb) set debug-file-directory ./parts/partname/debug:/usr/lib/debug
(gdb) file ./parts/partname/install/executable

I am now working on extending it to scan the parts/partname/debug directories for information linked to primed executables/libraries, so this basic approach seems sound. This should be a good starting point for what we discuss in Brussels.

alexmurray · September 2, 2019, 11:54pm

@jamesh is this something you are still working on?

jamesh · September 3, 2019, 1:38am

Not actively. I posted a few pull requests, but it’s been radio silence from the Snapcraft team for almost a year now.

sergiusens · September 3, 2019, 6:58pm

It was decided by the high council that the full implementation for this requires a full story including snapd (@pedronis and @mvo) and the Snap Store (@noise).

We can strip and extract on the snapcraft side but we need the story to determine where to put them and how they will be dealt with at runtime. We have an event shy of two weeks from now, so we could discuss this in person if representatives from those two groups have the bandwidth.

jamesh · September 4, 2019, 4:15am

Sure, but any system we end up using is going to need to start by extracting the debug symbols from a snap build. Insisting on a master plan covering the entire stack, seems like an excuse not to work on the parts we know we’ll need.