Duplicate plug/slot names inside the core snap

The latter. Asking every call site to fix the data is extremely error prone. One call site we miss today or tomorrow and we have a lurking bug waiting to crash someone’s system. Putting that into the reading pipeline means we know for a fact that corrupted data won’t go through, and in that same place we should take the chance and reject attempts to load snaps that have such corrupted state which is not the one case we need to dynamically fix.

1 Like

We’re collecting the required PRs in the top post (it is now a wiki).

We have one more data point that seems to explains the last mystery. The auto-connect logic is symmetric and we have a very simplistic approach at that. If a given slot may auto-connect to more than one plug (this applies to network-bind we no longer connect as we don’t know it should be connected in all the places. We should re-consider auto-connection logic for slots for 2.25.

The test that mvo added in PR 3164 fails because there are other consumers that could use the network-bind slot and we already knew there are many providers (because two cores exist briefly):

I added extra logging when len(candidates) is not zero and not one:

INFO cannot auto connect {core network-bind-plug} (plug auto-connection), candidates found: “core:network-bind,ubuntu-core:network-bind”
INFO cannot auto connect {core network-bind} (slot auto-connection), candidates found: “core:network-bind-plug,test-snapd-python-webserver:network-bind”

The root cause that triggered this investigation was:

The ubuntu-core -> core transition ended up with an unconnected “network-bind” plug on core. The reason is described in Auto-connect logic starting from slots of an installed/refreshed snap is naive and it is only happening on network-bind because there may be snaps installed (like a webserver) that need network-bind. However core-support is not affected because there is only a single snap (core) that wants the core-support plug so this transition works.

During the investigation we found that we allow plug/slot with the same name. The new core snap fixes this by renaming the plug to core-support-plug.

One “cheap” fix would be to make the core snap use only “core-support-plug” and fold the network-bind feature into the core-support interfaces (two easy and striaghtforward to review PRs). This way the transition is unblocked. We also have no duplicates plug/slot anymore because we renamed core. We still have the issue that we have not reject snaps with duplicated plug/slot names and also not fixed the described in Auto-connect logic starting from slots of an installed/refreshed snap is naive. But the upside is that we could get 2.24 out of the door. The downside is that we have know issues in it (no regressions though). Fixing all the issues uncovered here will probably take us into the 2.25 time-frame.

this and the difference vs what we see for core-support (which nothing but core and not ubuntu-core or any other snap has a plug for) is caused by the problem/logic explained here:

We can also avoid #3145 if we just have core-support. I.e. we don’t need to fixup the network-bind connection because core-support will be connected.

that means including network-bind in core-support as well? seems a bit messy hmm

Yes, messy. It’s a different trade-off, its fine if we prefer the other one, just wanted to throw it out for the discussion.

Not sure if really messy, I wonder if that’s the direction we ought to take (or reverse) if we have more interfaces that core hooks need. I’d say that given that hooks don’t work if interfaces are disconnected means we should not even expose that. On another hand it feels good to say that hooks have very precise security confinement.

another approach is to fix auto-connection of slots just for network-bind, we know only core has that slot, so it’s probably easier, is just connects all the things basically, the we can replace that with the proper general fix for autoconnection from slots later

i am still having a hard time to understand why this specific hook needs to be confined at all ? we control all the code in it, nothing is executed directly (only via snapd or snap set/get) and no changes to the hook land without any approval …

@mvo Sounds fine. Feel free to proceed with the suggested approach. I really don’t like the idea of this important invariant we have being broken, but that’s already done and taking one or two more weeks to fix it won’t be a big deal.

That gives us more time to test the several PRs we’ve been discussing back and forth instead of rushing them in.

@mvo @zyga-snapd notice that once we fix autoconnect for slots the problem is back because we will see there are two slots during the transition for core-support (on core and on ubuntu-core respectively), so not autoconnecting is correct (we really need somehow to override that for this corner case)

The idea where core-support implies network-bind: https://github.com/snapcore/snapd/pull/3166

I think this is over now. This work is complete.

1 Like