LXC: snaps don't update

Yes, thanks for pinging. I will try to catch up with @stgraber tomorrow.

Iā€™m happy to chat about it in person, but I suspect Iā€™ll sound like a broken record, basically repeating what I said in https://discuss.linuxcontainers.org/t/snapd-cant-remove-old-revisions-when-running-inside-lxd/452

Sounded like @zyga-snapd had a branch which was getting snap-confine to attempt to fix this, though Iā€™m not sure how exactly that would work given that systemd would still be mounting those snaps automatically on boot, quite possibly much before snap-confine itself is called by the first snap starting. Unless thereā€™s some clever systemd dependency ordering going on there somehow?

I remember that my original suggestion for this was to have a snap.mount unit which would have systemd itself do the bind-mount and MS_SLAVE remount of /snap, doing that would have systemd properly order its mount units, guaranteeing that snap.mount is processed before any other directory underneath it.

I donā€™t remember if you can have the systemd unit declare both the bind-mount + MS_SLAVE remount in one go, but if not, this should be achievable by using a post-start action on the unit, to have it perform the remount.

I did some attempt but I ran into issues with either systemd or with a security review when trying to work around deficiencies in systemd.

The crux of the limitation was indeed that /snap mount unit is not enough as thereā€™s no way to apply MS_SLAVE this way. I will try your suggestion to have a post-start action that changes sharing.

As one annoying limitation FUSE mounts are not reliably represented in /proc/self/mountinfo so we cannot unmount and remount them to fix something. We must ask systemd to do that but this is too much power to wield from snap-confine. (this is what my earlier branch attempted).

Did this approach work?

@kyrofa no, not really; we discussed this with @mvo today and thereā€™s another attempt in https://github.com/snapcore/snapd/pull/4517

Iā€™m afraid this may not be fixed, or perhaps thereā€™s another problem. Iā€™m using candidate in LXD:

$ snap version
snap    2.31
snapd   2.31
series  16
ubuntu  16.04
kernel  4.4.0-112-generic

Trying to remove a snap I get this:

$ sudo snap remove nextcloud
2018-02-17T18:08:35Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: [stop
snap-nextcloud-5132.mount] failed with exit status 1: Job for snap-nextcloud-5132.mount failed. See
"systemctl status snap-nextcloud-5132.mount" and "journalctl -xe" for details.

Remove snap "nextcloud" (5132) from the system                                                        .^C
ubuntu@nextcloud-proxy-test:~$ snap changes
ID   Status  Spawn                 Ready                 Summary
1    Done    2018-02-17T17:40:32Z  2018-02-17T17:40:32Z  Initialize system state
2    Done    2018-02-17T17:42:37Z  2018-02-17T17:43:01Z  Install "core" snap from "candidate" channel
3    Done    2018-02-17T17:42:37Z  2018-02-17T17:42:40Z  Initialize device
4    Done    2018-02-17T17:43:15Z  2018-02-17T17:44:00Z  Install "nextcloud" snap
5    Done    2018-02-17T17:47:04Z  2018-02-17T17:47:06Z  Change configuration of "nextcloud" snap
6    Doing   2018-02-17T18:07:57Z  -                     Remove "nextcloud" snap

ubuntu@nextcloud-proxy-test:~$ snap change 6
Status  Spawn                 Ready                 Summary
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:33Z  Stop snap "nextcloud" services
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:33Z  Run remove hook of "nextcloud" snap if present
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:33Z  Remove aliases for snap "nextcloud"
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:34Z  Make snap "nextcloud" unavailable to the system
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:34Z  Remove security profile for snap "nextcloud" (5132)
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:34Z  Remove data for snap "nextcloud" (5132)
Doing   2018-02-17T18:07:57Z  -                     Remove snap "nextcloud" (5132) from the system
Do      2018-02-17T18:07:57Z  -                     Discard interface connections for snap "nextcloud" (5132)

......................................................................
Remove snap "nextcloud" (5132) from the system

2018-02-17T18:08:35Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: [stop snap-nextcloud-5132.mount] failed with exit status 1: Job for snap-nextcloud-5132.mount failed. See "systemctl status snap-nextcloud-5132.mount" and "journalctl -xe" for details.


ubuntu@nextcloud-proxy-test:~$ systemctl status snap-nextcloud-5132.mount
ā— snap-nextcloud-5132.mount - Mount unit for nextcloud
   Loaded: loaded (/proc/self/mountinfo; enabled; vendor preset: enabled)
   Active: active (mounted) (Result: exit-code) since Sat 2018-02-17 18:08:35 UTC; 42s ago
    Where: /snap/nextcloud/5132
     What: squashfuse
  Process: 12833 ExecUnmount=/bin/umount /snap/nextcloud/5132 (code=exited, status=32)
    Tasks: 1
   Memory: 1.0M
      CPU: 11.157s
   CGroup: /system.slice/snap-nextcloud-5132.mount
           ā””ā”€8363 squashfuse /var/lib/snapd/snaps/nextcloud_5132.snap /snap/nextcloud/5132 -o ro,nodev,

Feb 17 17:43:50 nextcloud-proxy-test systemd[1]: Mounting Mount unit for nextcloud...
Feb 17 17:43:50 nextcloud-proxy-test systemd[1]: Mounted Mount unit for nextcloud.
Feb 17 18:08:35 nextcloud-proxy-test systemd[1]: Unmounting Mount unit for nextcloud...
Feb 17 18:08:35 nextcloud-proxy-test umount[12833]: umount: /snap/nextcloud/5132: not mounted
Feb 17 18:08:35 nextcloud-proxy-test systemd[1]: snap-nextcloud-5132.mount: Mount process exited, code=
Feb 17 18:08:35 nextcloud-proxy-test systemd[1]: Failed unmounting Mount unit for nextcloud.

You will need the 2.31-deb based package to get the fix. I. heard that one is coming out soon though.

The snapd 2.31.1 debs release are in *-proposed - install them from there for testing.

1 Like

This issue should be fixed everywhere now. Please post comments in case you are affected again.

1 Like

Thank you for the fix! Iā€™m really happy to be able to use snaps on my servers again (everything runs inside LXD). Thank you @kyrofa for keeping this issue alive. Iā€™m surprised that there is so few snapd+lxd users out there. We need to fix this :slight_smile:

While I looked through the PR:s I noticed a comment from @stgraber at https://github.com/snapcore/snapd/pull/4560#discussion_r169230231 and I concur, this test will not catch the bug described in this thread. I suggest the test is updated to add a reboot to prevent future regressions.

1 Like