Add more information to automatically generated crash reports

Snapd uses its own implementation of a crash reporting system similar to apport but there are too few information to make the reports useful.

For example on this frequent problem or this one where mount has been intentionally made unavailable the release and the running environment, the version of the package and system logs which would help with the investigation are missing.
Besides, the package information is also missing from errors.u.c (unknown package), the reports cannot be correctly bucketed and a bug report cannot be created from the problem.

In order to align the report generated by snapd with the data collected by apport, I propose to add the following information to the crash report:

  • CurrentDesktop: Value of $XDG_CURRENT_DESKTOP, if present
  • _LogindSession: logind cgroup path, if present (Used for filtering out crashes that happened in a session that is not running any more)
  • Dependencies (list of packages + version)
  • ExecutablePath: /proc/pid/exe contents
  • Package (name of the debian package + version eg snapd 2.31.1+18.04 + origin)
  • InstallationDate
  • InstallationMedia
  • JournalError (journalctl -b --priority=warning…err --lines=1000)
  • LiveMediaBuild: content of /cdrom/.disk/info is exists
  • ProcCmdline: /proc/pid/cmdline contents
  • ProcCpuinfoMinimal
  • ProcCwd
  • ProcEnviron: A subset of the process’ environment (only some standard variables that do not disclose potentially sensitive information):
    • SHELL
    • TERM
    • LANGUAGE
    • LANG
    • LC_CTYPE
    • LC_COLLATE
    • LC_TIME
    • LC_NUMERIC
    • LC_MONETARY
    • LC_MESSAGES
    • LC_PAPER
    • LC_NAME
    • LC_ADDRESS
    • LC_TELEPHONE
    • LC_MEASUREMENT
    • LC_IDENTIFICATION
    • LOCPATH
  • SourcePackage(snapd)
  • UpgradeStatus

Another option would be to switch to apport and fallback to snapd crash reporter when apport is not available.

I’m fine with adding some more info to the error reports. This list seems excessive, though: while some of the ones there seem reasonable, some seem very ubuntu-specific (do you have a cross-platform way of figuring out Dependencies, or UpgradeStatus, to name a couple?), some of them don’t exist for snapd (snapd will never see XDG_CURRENT_DESKTOP nor LogindSession for example).

Strong -1 to using apport itself.

I assume that the apport -1 is because it’s Ubuntu centric?

@jbl can you highlight the essentials from the list above and the, let’s say, less essential?

You are right there, some informations are specific to the platform/OS/distribution and not to snapd. Those are useful in reports though since they tell you in what environment your software is being used.

While snapd can’t and shouldn’t be the piece having the knowledge of all those details, wouldn’t it make sense to delegate to/integration with the platform report system when possible?

There are a few ways you could achieve that for example:

  • have backends in snap that detect if e.g whoopsie (ubuntu), abrt (fedora), etc are available and delegate to those.
  • create an API for the system to provide those informations to snap. Whoopsie could implement that API, snapd would query it, get some extra info and include them to the report (if you want snapd to keep doing the reporting)
  • go the other way around and create an API telling the system “can you report a bug including those informations?”. Whoopsie could implement the API, receive the request and build a report including the details it has about the system + the ones from snapd

I don’t think we should discard leveraging the platform tools without discussion…

This list is already close to a minimum set and allows to determine in which context the error happened.

Could you please elaborate a little bit and explain why in your opinion apport should not be used ?

Yes, sorry.

  1. it only works in ubuntu classic, and only if it’s installed (although I don’t know whether it’s removable on classic). It’s not there on core, I don’t know if it’s there on Debian, it’s not there on fedora, etc etc, and I’ve seen no suggestions of how we’d leverage the native bug reporters on other platforms to feed into errors.u.c, nor how we’d be able to compare those things.

  2. the current implementation is quick, and takes a very small amount of time and resources to do its work. Can the same be said of the alternatives been proposed? I see no mention of this sort of thing. Has it even been considered, measured, thought of at all? Having an application misbehave is bad for the user; having it misbehave, and then the system bog down because it’s thinking about an error report, is intolerable.

  3. it’s not clear to me (at least from this discussion) if we’re talking about an automatic thing that collects data and stuffs it somewhere, or if we’re talking about those noxious, slow, and unusable ‘something went wrong’ dialog boxes that end up on a launchpad bug. If using apport means users can, by action or inaction, get those dialogs, I’m going to reject any PR that incorporates it, outright. They are terrible, and I’m ashamed that we still have them at all.

HTH.

Looking at that list of things to add to the error report, there are a number that won’t be available because snapd does not run as a user nor in a user’s session, and the errors it reports are not crashes of the apps but of their installation, or of their hooks.

So, XDG_CURRENT_DESKTOP doesn’t make sense. I suspect neither does the logind cgroup path. The environment variables also don’t makes much sense; snapd doesn’t run with a TERM set, and the locale you want is probably the user’s (how do you determine that?).

Dependencies, you mean the versions of the packages snapd depends on?
ExecutablePath will always be /usr/lib/snapd/snapd, not sure how interesting that is.
Package, this is already included isn’t it, via the version entry?
InstallationDate, InstallationMedia, I’m not sure where to get those from, but no issues beyond x-platform
JournalError, ok (but whether you want warning or error there is still an open debate I think? I haven’t been following it but saw some discussion)
LiveMediaBuild, what is this, from a cross-platform perspective?
Proc*, sure (but ProcCwd is going to be always the same)
Environ, as I said above, you probably don’t want snapd’s environ.
SourcePackage, ok if that helps
UpgradeStatus, ok beyond x-platform

To be clear, are you asking that we create entries with exactly these names? Is this because you have tools that use these names? Should we move things around to help?

Could you elaborate why XDG_CURRENT_DESKTOP doesn’t make sense ?

Dependencies: This is a list of dependencies snapd depends on. It helps to understand if the user is running on default installation or installed foreign packages; for example packages from proposed.
ExecutablePath: Tells you if it’s running a default location or a local installation for example.
InstallationDate and Media: On Ubuntu, they come from files in /var/log/installer
JournalErrror: Errors only are not enough to understand what happens.
LiveMediaBuild: Just provide it for platforms that support a live session

Yes, we want these names because that’s the names already used on errors.ubuntu.com.

From a cross-platform perspective, we are talking here about an Ubuntu specific crash report collection service (errors.ubuntu.com) and there is already code specific to Ubuntu in snapd to honour Ubuntu specific tools and settings (specifically whoopsie) So these variables could be collected only for Ubuntu.

because snapd doesn’t run in a user’s session.

working on this, just noticed that apport will almost always report the path as ‘custom’, because

	     elif p != '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games':

is not the default path on a default install.

First pass at this, with only the distro-agnostic ones done:

https://github.com/snapcore/snapd/pull/4971