Snapd hangs with 100% CPU usage on arm board

Related bug: #1674778 (https://bugs.launchpad.net/snappy/+bug/1674778)

Hi folks,

Someone just faced the same problem with various cases, so I’d like to raise the discussion to take care this. For my case, the symptom is easy to reproduce it as follows:

  1. Use ubuntu-image to create an image with extra-snaps to install network-manager snap.
  2. Boot an image on arm board. (https://github.com/xapp-le/SnappyUbuntuCore)
  3. After boot-up, the snapd hangs with 100% CPU and can’t do any snap command.

Then I did some experiments to trace issue:

  1. Use ubuntu-image to create an image without extra-snaps.
  2. Boot an image on arm board.
  3. snap install network-manager snap.

After step 3., the snap failed to install n-m snap and reported some errors:

May 5 06:29:11 localhost systemd[1]: Started Service for snap application network-manager.networkmanager.
May 5 06:29:11 localhost snap[8318]: cannot perform readlinkat() on the mount namespace file descriptor of the init process: Permission denied
May 5 06:29:11 localhost kernel: [74671.652437] type=1400 audit(1493965751.960:58): apparmor=“DENIED” operation=“ptrace” profile="/usr/lib/snapd/snap-confine" pid=8318 comm=“snap-confine” requested_mask=“read” denied_mask=“read” peer=“unconfined”
May 5 06:29:11 localhost systemd[1]: Couldn’t stat device /dev/pts/ptmx
May 5 06:29:11 localhost systemd[1]: snap.network-manager.networkmanager.service: Main process exited, code=exited, status=1/FAILURE
May 5 06:29:11 localhost systemd[1]: snap.network-manager.networkmanager.service: Unit entered failed state.
May 5 06:29:11 localhost systemd[1]: snap.network-manager.networkmanager.service: Failed with result ‘exit-code’.

The snapd’s taskrunner will undo all the tasks of n-m, and according to stack trace (https://paste.ubuntu.com/24222873/) bug described I suspect the undo process may cause oom and result in CPU 100% usage of its process.

However, the another case is related to kernel crash to happen the same symptom. It seems to me that there are some situations to be analyzed, and I’m not sure if arch-specific is also to be considered. So I need help to figure out more clues on the forum, thanks.


Woodrow

1 Like

note that this is a different bug related to additional snaps while the bug you point to is a gadget issue with using a broken interface in the gadget definition.

i dont think this is in any way related, please file a new bug …´