Taming error handling in snapcraft

niemeyer · February 6, 2018, 2:48am

Hey there,

Somewhat frequently I’m observing simple errors in snapcraft bubbling up as a large traceback that makes it taste like an ugly crash, when it’s really just a minor annoyance.

For example, when removing and reinstalling a lxd from the ground up, I got this:

 $ snapcraft cleanbuild
Creating snapcraft-gladly-vital-filly
error: Failed container creation:
 - https://cloud-images.ubuntu.com/releases: No storage pool found. Please create a new storage pool.
Traceback (most recent call last):
  File "/usr/bin/snapcraft", line 9, in <module>
    load_entry_point('snapcraft==2.35', 'console_scripts', 'snapcraft')()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 542, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
    return ep.load()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2229, in load
    return self.resolve()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2235, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/python3/dist-packages/snapcraft/cli/__main__.py", line 19, in <module>
    run(prog_name='snapcraft')
  File "/usr/lib/python3/dist-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/snapcraft/cli/lifecycle.py", line 219, in cleanbuild
    lifecycle.cleanbuild(project_options, remote)
  File "/usr/lib/python3/dist-packages/snapcraft/internal/lifecycle/_containers.py", line 66, in cleanbuild
    metadata=config.get_metadata(), remote=remote).execute()
  File "/usr/lib/python3/dist-packages/snapcraft/internal/lxd/_containerbuild.py", line 178, in execute
    with self._container_running():
  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/usr/lib/python3/dist-packages/snapcraft/internal/lxd/_containerbuild.py", line 95, in _container_running
    with self._ensure_started():
  File "/usr/lib/python3.5/contextlib.py", line 59, in __enter__
    return next(self.gen)
  File "/usr/lib/python3/dist-packages/snapcraft/internal/lxd/_containerbuild.py", line 112, in _ensure_started
    self._ensure_container()
  File "/usr/lib/python3/dist-packages/snapcraft/internal/lxd/_cleanbuilder.py", line 39, in _ensure_container
    'lxc', 'launch', '-e', self._image, self._container_name])
  File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['lxc', 'launch', '-e', 'ubuntu:xenial/amd64', 'local:snapcraft-gladly-vital-filly']' returned non-zero exit status 1

That’s a pretty straightforward error. All I needed was the three clean lines from lxc at the top to tell what to do. All the rest is eye-opening noise. And that’s just one example. I see other similar crashes relatively frequently.

Might be worthwhile to drive into the error handling logic, and try to make it more precise about when to display a fat panic, versus a nice little error message.

niemeyer · February 6, 2018, 5:20pm

One more from @popey today in bug #1747715.

I suspect these fat tracebacks are pretty usual indeed.

popey · February 6, 2018, 5:33pm

Yours is bug 1740265 “Stack trace makes it hard to discern real errors”

lucyllewy · February 8, 2018, 3:46pm

I’ve been banging this drum for what seems like months. Is anyone actually working to reduce these in any way?!

niemeyer · February 8, 2018, 3:53pm

I think there’s just a bit too much contemplation regarding the UX polishing in Snapcraft. When it works, it’s great, but there are still some rough edges that we need to polish further so it feels solid. We’ll definitely work on this.

sergiusens · February 13, 2018, 3:23pm

We are working on this; all these errors used to be really opaque and confusing, we’ve decided that we will let exceptions we don’t really handle well through as in most cases they were hiding the actual root cause where the message wasn’t really clear. I am sorry for the inconvenience, but many times people were more frustrated by opaque error messages than an exception. The case you show though is quite obviously not the case where this would happen.

@kalikiana can you please fix this one and cover all the bugs you know of related to containers?

kalikiana · February 13, 2018, 3:58pm

This was fixed in 2.38 - and I confirmed it shows a clean error, with a message pointing at online docs for LXD, in this same case.

niemeyer · February 13, 2018, 4:27pm

@sergiusens Both opaque error messages and gigantic tracebacks that come with an opaque error message feel pretty bad in terms of user experience. It feels to the user as lack of care. Just consider it for a moment: the error above was reasonable. What you are saying is “Because we don’t know for sure, here is a massive traceback just in case!”.

If this is happening all the time and coming from third-party code, you can grab the traceback, find the origin of the error, and present a nice message saying which subsystem presented the error. The traceback may be logged so you can always find more about it when desired.

Then it’s your message, and it can read nicely, polished. That’d be better than waiting until people report every single possible traceback that Python and all third-party packages can possibly ever report.

sergiusens · February 13, 2018, 5:28pm

I think we are on the same page here. We ran through a design session during the snapcraft summit a week ago and have a polished plan to fix it. It just needs doing.

niemeyer · February 21, 2018, 10:44am

Any timelines for that? This issue affects the user experience and perception with snapcraft across the board, so would be nice to not delay it much further.

sergiusens · February 27, 2018, 12:01pm

2.39.2 has most issues solved, 2.40 (next release) will avoid the stack traces completely (unless requested by the user) and 2.41 (tentatively) will include a new error experience we are still designing (whiteboard mode still, we can brainstorm next week and summarize here afterwards).