$ ls -l /snap/nextcloud/
total 0
drwxr-xr-x 24 root root 0 Apr 25 17:44 1212
drwxr-xr-x 24 root root 0 Apr 30 08:10 1284
drwxr-xr-x 24 root root 0 May 16 20:27 1337
lrwxrwxrwx 1 root root 4 May 16 21:07 current -> 1337
Note that the oldest is 1212. Once I update, I expect that revision to be gone. This is what happens:
$ sudo snap refresh --candidate nextcloud
2017-05-25T22:30:49Z INFO cannot auto connect nextcloud:network-bind to core:network-bind: (plug auto-connection), existing connection state "nextcloud:network-bind core:network-bind" in the way
2017-05-25T22:30:53Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: [stop snap-nextcloud-1212.mount] failed with exit status 1: Job for snap-nextcloud-1212.mount failed. See "systemctl status snap-nextcloud-1212.mount" and "journalctl -xe" for details.
2017-05-25T22:33:53Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: umount: /snap/nextcloud/1212: not mounted
2017-05-25T22:36:54Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: umount: /snap/nextcloud/1212: not mounted
2017-05-25T22:39:54Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: umount: /snap/nextcloud/1212: not mounted
2017-05-25T22:42:54Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: umount: /snap/nextcloud/1212: not mounted
So it looks like, indeed, it’s gone. But snapd doesn’t expect it to be gone. This goes on forever. Thankfully ctrl+c stops it without causing the entire operation to be canceled: the snap still seems to be updated. But what if this was an automated refresh? Would it just be stuck forever?
$ snap version
snap 2.25
snapd 2.25
series 16
ubuntu 16.04
kernel 4.4.0-78-generic
$ systemctl status snap-nextcloud-1212.mount
â—Ź snap-nextcloud-1212.mount - Mount unit for nextcloud
Loaded: loaded (/proc/self/mountinfo; enabled; vendor preset: enabled)
Active: active (mounted) (Result: exit-code) since Thu 2017-05-25 22:30:53 UTC; 17h ago
Where: /snap/nextcloud/1212
What: squashfuse
Process: 2146 ExecUnmount=/bin/umount /snap/nextcloud/1212 (code=exited, status=32)
Tasks: 1
Memory: 440.0K
CPU: 6ms
CGroup: /system.slice/snap-nextcloud-1212.mount
└─109 squashfuse /var/lib/snapd/snaps/nextcloud_1212.snap /snap/nextcloud/1212 -o ro,nodev,allow_other,suid
May 25 22:25:35 staging systemd[1]: Mounting Mount unit for nextcloud...
May 25 22:25:35 staging systemd[1]: Mounted Mount unit for nextcloud.
May 25 22:30:53 staging systemd[1]: Unmounting Mount unit for nextcloud...
May 25 22:30:53 staging umount[2146]: umount: /snap/nextcloud/1212: not mounted
May 25 22:30:53 staging systemd[1]: snap-nextcloud-1212.mount: Mount process exited, code=exited status=32
May 25 22:30:53 staging systemd[1]: Failed unmounting Mount unit for nextcloud.
The journal shows me:
May 26 16:11:18 staging /usr/lib/snapd/snapd[367]: taskrunner.go:367: DEBUG: Running task 177 on Doing: Remove snap "nextcloud" (1212) from the system
May 26 16:11:18 staging /usr/lib/snapd/snapd[367]: task.go:303: DEBUG: 2017-05-26T16:11:18Z ERROR cannot remove snap file "nextcloud", will retry in 3 mins: umount: /snap/nextcloud/1212: not mounted
Huh, it’s still going this morning even though I ctrl+c’d it yesterday:
$ snap change 34
Status Spawn Ready Summary
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:00Z Download snap "nextcloud" (1446) from channel "candidate"
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:02Z Fetch and check assertions for snap "nextcloud" (1446)
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:03Z Mount snap "nextcloud" (1446)
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:39Z Stop snap "nextcloud" services
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:40Z Remove aliases for snap "nextcloud"
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:41Z Make current revision for snap "nextcloud" unavailable
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:48Z Copy snap "nextcloud" data
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:51Z Setup snap "nextcloud" (1446) security profiles
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:51Z Make snap "nextcloud" (1446) available to the system
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:52Z Set automatic aliases for snap "nextcloud"
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:52Z Setup snap "nextcloud" aliases
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:53Z Start snap "nextcloud" (1446) services
Done 2017-05-25T22:26:31Z 2017-05-25T22:30:53Z Remove data for snap "nextcloud" (1212)
Doing 2017-05-25T22:26:31Z - Remove snap "nextcloud" (1212) from the system
Do 2017-05-25T22:26:31Z - Clean up "nextcloud" (1446) install
Do 2017-05-25T22:26:31Z - Run configure hook of "nextcloud" snap if present
Try just remounting it to something arbitrary… it will be gone soon.
The fact this is using squashfsfuse is likely related… it probably got unmounted elsewhere.
In either case, we need to make that part of the removal more resilient. There’s no reason to fail unmounting if it’s already unmounted, obviously. Thanks for reporting.
Huh, interesting. I was about to try mounting something in place when I noticed this:
$ mount | grep 1212
squashfuse on /snap/nextcloud/1212 type fuse.squashfuse (ro,nodev,relatime,user_id=0,group_id=0,allow_other)
$ sudo umount /snap/nextcloud/1212
umount: /snap/nextcloud/1212: not mounted
I agree that this smells like squashfuse. It seems like it’s somehow partially mounted, but unmountable.
We unmount the snap filesystem with -lazy (aka MNT_DETACH), so it’ll remain in use by the kernel while there are live processes using it.
What is the parent of these processes? If they were started by the snap itself, they should have been terminated when the snap services stopped. This should also not be a reason for the removal to fail (that’s why we use -lazy in the first place).
$ ps l -p2016
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
4 0 2016 1 20 0 4508 628 - Ss ? 0:00 /bin/sh /snap/nextcloud/1446/bin/start-php-fpm
The parent seems to be systemd. But notice that this is actually the one from rev 1446. What is happening here? A reboot doesn’t help, the task just resumes and hangs again. There seems to be no way to get snapd to just move on. If I abort it, will it roll back?
As I said, when I reboot, this just resumes where it left off, in the exact same state:
$ mount | grep 1212
squashfuse on /snap/nextcloud/1212 type fuse.squashfuse (ro,nodev,relatime,user_id=0,group_id=0,allow_other)
However, if I go and move /var/lib/snapd/snaps/nextcloud_1212.snap out of the way, and reboot again, there is nothing in mount. Of course, the change still errors out because it’s still trying to unmount something that isn’t there, but it’s interesting that the .mount file is somehow messing this up in the first place.
This is making every update on my server non-automatic since I have to SSH in and baby it by helping it unmount stuff in order for an update to actually continue. Is no one else using snaps on LXD?
@zyga-snapd I see you’ve put the bug reported in progress on the 25th. Can you please update this topic with details of what happened and what the plan of action is when you have a moment?
Since it’s still marked as In Progress, that’s 2.28 material I assume?