It refreshed successfully on ~575 machines, but I noticed 33 not picking up the new version.
Theyâre all kubernetes hosts, more precisely kubernetes-masters, kubernetes-workers and etcds, deployed with CDK.
Due to the time correlation, I guessed that this was due to a combination of telegraf performing a sudo -u [...], and CDK apparmor profiles, but I donât know that for sure.
I could perform manual refreshes for my machines in a few minutes, but would prefer to identify the root cause, and make sure it doesnât hit that snap again in the future, nor others.
Help investigating this would be appreciated.
this is definitely not the way to run any binary in a snap (and note that classic confinement does not make a difference here), you are breaking confinement and are also not using the correct environment this way ⊠to run a snap application either use snap run <name of the app> or execute it via /snap/bin/<name of the app> (which should have been automatically added to your path when snapd was installed) anything else is broken and wrong.
also please note that there is no (permitted) way for a snap to create users or to run a daemon as a specific user beyond the âsnap_daemonâ user or root âŠ
if the telegraf snap randomly modifies data in the systems password db this is wrong behaviour.
If the snap is running telegraf via a snap service, then the output from ps is correct for that, the ps output will always show /snap/whatever/current/.../bin/telegraf instead of /usr/bin/snap run since the latter will exec() as the former so /proc will say that the process is /snap/whatever/current/...
Oops, iâm sorry, i missed that the excerpt above was from ps outputâŠ
regarding the user, I beg to disagree here ⊠the snap calls system utilities like âinstallâ, âuseraddâ, âgroupaddâ from its hook without shiping them.
even classic snaps should be/need to be self contained.
what happens if you install the snap on a system that is lacking one of these tools, what happens if you install the snap on a system that has one of them but not the other (i.e. you end up with half an updated passwd db when the hook fails) or if the version used on the host simply is a different version that uses other/differently named switches ⊠what/who removes the user if the snap gets uninstalled etc ?
while a classic snap can access bits of the host system, it is still good practice to simply stick to the common snap ways of dealing with the runtime env here and i.e. use the snap_daemon user proper to drop privileges instead of potentially trashing the hosts password db
something you do during upgrade seems to call out to ptrace() which I personally would not expect to block in classic confinement ⊠probably @ijohnson has an idea here ? else we need to pull in someone from the security team âŠ
Ah well if the snap is indeed using host utilities then it should not be doing that and should instead ship all such utilities itself.
Regarding debugging the failed refresh, Iâm not sure what the cause is, but it seems like since the denial is for snap-confine with a classic snap, perhaps your classic snap is trying to call another classic snap and the file descriptors your classic snap opened are trying to be shared with the other classic snap, but they have to go through snap-confine first which doesnât have permissions to access them hence the denial. If that is indeed the cause, this is bug https://bugs.launchpad.net/ubuntu/+source/snapd/+bug/1849753