LXC: snaps don't update

As far as I understand when snaps are mounted in /snap/name/revision and /snap isn’t MS_SHARED we are getting affected because then all of those mounts are not shared and updates don’t propagate correctly

@zyga-snapd Thanks! Please go chase layouts!

@stgraber Any chance we might have your help fixing this issue in snapd? I know you have a lot of great insight in that area. Could it be simply a matter of remounting the filesystem in those specific cases instead of just overmounting it?

I was thinking about a Before=snapd.mount in our autogenerated mount units but I was under the impression that systemd must be already doing this (the before I mean). Perhaps something is wrong with the snapd.mount unit and the fallback code in snap-confine is doing the remount and we suffer the bug then. I didn’t check that.

Just blew another hour babysitting all my containers through updating all their snaps. It actually takes less time to blow away an entire testing container, create a new one, and set it all up for my daily tests, than it takes to update.

/me blows more time babysitting containers through another update…

Any progress here?

I’d be happy to help with this if I understood snap-confine at all. I’m afraid I don’t, so all I can really do is offer to help test. I lose hair in frustration every time an update comes out!

@niemeyer any progress on this?

@jdstrand you’re the only other person I know for sure has snap-confine experience. Do you have any insights, here?

I have a fix for this now. I’ll send a PR shortly, I’m trying to see if any tests breaks.

EDIT: I was fighting this all yesterday and it’s still not fixed. I’m trying another strategy. This is one nasty problem.

2 Likes

A quick update: @zyga-snapd put up a proposal that needs to be tweaked:

<zyga-solus> kyrofa: I discussed it with jdstrand, we need to tweak it slightly in order not to make snap-confine too powerful,
<zyga-solus> kyrofa: I’ll get back to it
<zyga-solus> kyrofa: I’m looking at why master breaks so often as this clamps our velocity a lot
<kyrofa> Ah okay, very good. Still a path forward, though?
<zyga-solus> yes, totally

Is anyone still around working on this problem? Or is everyone gone for holidays? I don’t know how to raise the importance of this. I’m seriously considering running stuff from source instead of using my own snaps, and that’s heartbreaking.

@niemeyer any chance you guys can chat about this during the sprint this week?

Yes, thanks for pinging. I will try to catch up with @stgraber tomorrow.

I’m happy to chat about it in person, but I suspect I’ll sound like a broken record, basically repeating what I said in https://discuss.linuxcontainers.org/t/snapd-cant-remove-old-revisions-when-running-inside-lxd/452

Sounded like @zyga-snapd had a branch which was getting snap-confine to attempt to fix this, though I’m not sure how exactly that would work given that systemd would still be mounting those snaps automatically on boot, quite possibly much before snap-confine itself is called by the first snap starting. Unless there’s some clever systemd dependency ordering going on there somehow?

I remember that my original suggestion for this was to have a snap.mount unit which would have systemd itself do the bind-mount and MS_SLAVE remount of /snap, doing that would have systemd properly order its mount units, guaranteeing that snap.mount is processed before any other directory underneath it.

I don’t remember if you can have the systemd unit declare both the bind-mount + MS_SLAVE remount in one go, but if not, this should be achievable by using a post-start action on the unit, to have it perform the remount.

I did some attempt but I ran into issues with either systemd or with a security review when trying to work around deficiencies in systemd.

The crux of the limitation was indeed that /snap mount unit is not enough as there’s no way to apply MS_SLAVE this way. I will try your suggestion to have a post-start action that changes sharing.

As one annoying limitation FUSE mounts are not reliably represented in /proc/self/mountinfo so we cannot unmount and remount them to fix something. We must ask systemd to do that but this is too much power to wield from snap-confine. (this is what my earlier branch attempted).

Did this approach work?

@kyrofa no, not really; we discussed this with @mvo today and there’s another attempt in https://github.com/snapcore/snapd/pull/4517

I’m afraid this may not be fixed, or perhaps there’s another problem. I’m using candidate in LXD:

$ snap version
snap    2.31
snapd   2.31
series  16
ubuntu  16.04
kernel  4.4.0-112-generic

Trying to remove a snap I get this:

$ sudo snap remove nextcloud
2018-02-17T18:08:35Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: [stop
snap-nextcloud-5132.mount] failed with exit status 1: Job for snap-nextcloud-5132.mount failed. See
"systemctl status snap-nextcloud-5132.mount" and "journalctl -xe" for details.

Remove snap "nextcloud" (5132) from the system                                                        .^C
ubuntu@nextcloud-proxy-test:~$ snap changes
ID   Status  Spawn                 Ready                 Summary
1    Done    2018-02-17T17:40:32Z  2018-02-17T17:40:32Z  Initialize system state
2    Done    2018-02-17T17:42:37Z  2018-02-17T17:43:01Z  Install "core" snap from "candidate" channel
3    Done    2018-02-17T17:42:37Z  2018-02-17T17:42:40Z  Initialize device
4    Done    2018-02-17T17:43:15Z  2018-02-17T17:44:00Z  Install "nextcloud" snap
5    Done    2018-02-17T17:47:04Z  2018-02-17T17:47:06Z  Change configuration of "nextcloud" snap
6    Doing   2018-02-17T18:07:57Z  -                     Remove "nextcloud" snap

ubuntu@nextcloud-proxy-test:~$ snap change 6
Status  Spawn                 Ready                 Summary
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:33Z  Stop snap "nextcloud" services
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:33Z  Run remove hook of "nextcloud" snap if present
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:33Z  Remove aliases for snap "nextcloud"
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:34Z  Make snap "nextcloud" unavailable to the system
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:34Z  Remove security profile for snap "nextcloud" (5132)
Done    2018-02-17T18:07:57Z  2018-02-17T18:08:34Z  Remove data for snap "nextcloud" (5132)
Doing   2018-02-17T18:07:57Z  -                     Remove snap "nextcloud" (5132) from the system
Do      2018-02-17T18:07:57Z  -                     Discard interface connections for snap "nextcloud" (5132)

......................................................................
Remove snap "nextcloud" (5132) from the system

2018-02-17T18:08:35Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: [stop snap-nextcloud-5132.mount] failed with exit status 1: Job for snap-nextcloud-5132.mount failed. See "systemctl status snap-nextcloud-5132.mount" and "journalctl -xe" for details.


ubuntu@nextcloud-proxy-test:~$ systemctl status snap-nextcloud-5132.mount
● snap-nextcloud-5132.mount - Mount unit for nextcloud
   Loaded: loaded (/proc/self/mountinfo; enabled; vendor preset: enabled)
   Active: active (mounted) (Result: exit-code) since Sat 2018-02-17 18:08:35 UTC; 42s ago
    Where: /snap/nextcloud/5132
     What: squashfuse
  Process: 12833 ExecUnmount=/bin/umount /snap/nextcloud/5132 (code=exited, status=32)
    Tasks: 1
   Memory: 1.0M
      CPU: 11.157s
   CGroup: /system.slice/snap-nextcloud-5132.mount
           └─8363 squashfuse /var/lib/snapd/snaps/nextcloud_5132.snap /snap/nextcloud/5132 -o ro,nodev,

Feb 17 17:43:50 nextcloud-proxy-test systemd[1]: Mounting Mount unit for nextcloud...
Feb 17 17:43:50 nextcloud-proxy-test systemd[1]: Mounted Mount unit for nextcloud.
Feb 17 18:08:35 nextcloud-proxy-test systemd[1]: Unmounting Mount unit for nextcloud...
Feb 17 18:08:35 nextcloud-proxy-test umount[12833]: umount: /snap/nextcloud/5132: not mounted
Feb 17 18:08:35 nextcloud-proxy-test systemd[1]: snap-nextcloud-5132.mount: Mount process exited, code=
Feb 17 18:08:35 nextcloud-proxy-test systemd[1]: Failed unmounting Mount unit for nextcloud.

You will need the 2.31-deb based package to get the fix. I. heard that one is coming out soon though.

The snapd 2.31.1 debs release are in *-proposed - install them from there for testing.

1 Like