Duplicate plug/slot names inside the core snap

@pedronis I’m currently considering something along those lines:

diff --git a/overlord/ifacestate/helpers.go b/overlord/ifacestate/helpers.go
index 511e17e..a9b4222 100644
--- a/overlord/ifacestate/helpers.go
+++ b/overlord/ifacestate/helpers.go
@@ -93,16 +93,6 @@ func (m *InterfaceManager) addSnaps() error {
 		return err
 	}
 	for _, snapInfo := range snaps {
-		if snapInfo.Name() == "core" {
-			// Some released core snaps had explicitly defined plugs
-			// "network-bind" and "core-support" that clashed with implicit
-			// slots with the same names but this was not validated before.  To
-			// avoid a flag day and any potential issues, transparently rename
-			// the two clashing plugs by appending the "-plug" suffix.
-			for _, plugName := range []string{"network-bind", "core-support"} {
-				snapInfo.RenamePlug(plugName, plugName+"-plug")
-			}
-		}
 		snap.AddImplicitSlots(snapInfo)
 		if err := m.repo.AddSnap(snapInfo); err != nil {
 			logger.Noticef("%s", err)
diff --git a/overlord/snapstate/snapmgr.go b/overlord/snapstate/snapmgr.go
index ed54e01..ba70026 100644
--- a/overlord/snapstate/snapmgr.go
+++ b/overlord/snapstate/snapmgr.go
@@ -248,6 +248,7 @@ func readInfoAnyway(name string, si *snap.SideInfo) (*snap.Info, error) {
 		}
 		return info, nil
 	}
+	info.RenameClashingCorePlugs()
 	return info, err
 }
 
diff --git a/snap/broken.go b/snap/broken.go
index 52ebede..8f2b10a 100644
--- a/snap/broken.go
+++ b/snap/broken.go
@@ -62,3 +62,17 @@ func GuessAppsForBroken(info *Info) map[string]*AppInfo {
 
 	return out
 }
+
+// RenameClashingCorePlugs renames plugs that clash with slot names on core snap.
+//
+// Some released core snaps had explicitly defined plugs "network-bind" and
+// "core-support" that clashed with implicit slots with the same names but this
+// was not validated before.  To avoid a flag day and any potential issues,
+// transparently rename the two clashing plugs by appending the "-plug" suffix.
+func (info *Info) RenameClashingCorePlugs() {
+	if info.Name() == "core" {
+		for _, plugName := range []string{"network-bind", "core-support"} {
+			info.RenamePlug(plugName, plugName+"-plug")
+		}
+	}
+}

This is on top of my rename-clashing-core-plugs branch which lives in this PR: https://github.com/snapcore/snapd/pull/3154

Why do we need to call that explicitly rather than allowing the info itself to rename them when necessary, when loading the data in the first place? This will also need to be done at the place we restore repository connections from the saved state.

It sounds like a good idea as it paints a clear universe in the new snapd no matter which core is currently installed, without breaking the old one on reverts.

I think you are right but I wanted to clear up my understanding of which methods exactly ought to call this. I think that the distinction is that if the helper validates it should also try to correct data like this. Another factor (look at broken.go) is that this not the first method like that so if we want to agree to make all of them private and called implicitly I’d like to make this consistent.

Sounds better to focus on the problem at hand before touching unrelated code. We can then reorganize better in a follow up using lessons learned.

Ok, so this (explicit call) approach is in line with other broken bits. Is that what you expected or do you want to flip this and be implicit with this particular data that needs to be corrected.

The latter. Asking every call site to fix the data is extremely error prone. One call site we miss today or tomorrow and we have a lurking bug waiting to crash someone’s system. Putting that into the reading pipeline means we know for a fact that corrupted data won’t go through, and in that same place we should take the chance and reject attempts to load snaps that have such corrupted state which is not the one case we need to dynamically fix.

1 Like

We’re collecting the required PRs in the top post (it is now a wiki).

We have one more data point that seems to explains the last mystery. The auto-connect logic is symmetric and we have a very simplistic approach at that. If a given slot may auto-connect to more than one plug (this applies to network-bind we no longer connect as we don’t know it should be connected in all the places. We should re-consider auto-connection logic for slots for 2.25.

The test that mvo added in PR 3164 fails because there are other consumers that could use the network-bind slot and we already knew there are many providers (because two cores exist briefly):

I added extra logging when len(candidates) is not zero and not one:

INFO cannot auto connect {core network-bind-plug} (plug auto-connection), candidates found: “core:network-bind,ubuntu-core:network-bind”
INFO cannot auto connect {core network-bind} (slot auto-connection), candidates found: “core:network-bind-plug,test-snapd-python-webserver:network-bind”

The root cause that triggered this investigation was:

The ubuntu-core -> core transition ended up with an unconnected “network-bind” plug on core. The reason is described in Auto-connect logic starting from slots of an installed/refreshed snap is naive and it is only happening on network-bind because there may be snaps installed (like a webserver) that need network-bind. However core-support is not affected because there is only a single snap (core) that wants the core-support plug so this transition works.

During the investigation we found that we allow plug/slot with the same name. The new core snap fixes this by renaming the plug to core-support-plug.

One “cheap” fix would be to make the core snap use only “core-support-plug” and fold the network-bind feature into the core-support interfaces (two easy and striaghtforward to review PRs). This way the transition is unblocked. We also have no duplicates plug/slot anymore because we renamed core. We still have the issue that we have not reject snaps with duplicated plug/slot names and also not fixed the described in Auto-connect logic starting from slots of an installed/refreshed snap is naive. But the upside is that we could get 2.24 out of the door. The downside is that we have know issues in it (no regressions though). Fixing all the issues uncovered here will probably take us into the 2.25 time-frame.

this and the difference vs what we see for core-support (which nothing but core and not ubuntu-core or any other snap has a plug for) is caused by the problem/logic explained here:

We can also avoid #3145 if we just have core-support. I.e. we don’t need to fixup the network-bind connection because core-support will be connected.

that means including network-bind in core-support as well? seems a bit messy hmm

Yes, messy. It’s a different trade-off, its fine if we prefer the other one, just wanted to throw it out for the discussion.

Not sure if really messy, I wonder if that’s the direction we ought to take (or reverse) if we have more interfaces that core hooks need. I’d say that given that hooks don’t work if interfaces are disconnected means we should not even expose that. On another hand it feels good to say that hooks have very precise security confinement.

another approach is to fix auto-connection of slots just for network-bind, we know only core has that slot, so it’s probably easier, is just connects all the things basically, the we can replace that with the proper general fix for autoconnection from slots later

i am still having a hard time to understand why this specific hook needs to be confined at all ? we control all the code in it, nothing is executed directly (only via snapd or snap set/get) and no changes to the hook land without any approval …

@mvo Sounds fine. Feel free to proceed with the suggested approach. I really don’t like the idea of this important invariant we have being broken, but that’s already done and taking one or two more weeks to fix it won’t be a big deal.

That gives us more time to test the several PRs we’ve been discussing back and forth instead of rushing them in.

@mvo @zyga-snapd notice that once we fix autoconnect for slots the problem is back because we will see there are two slots during the transition for core-support (on core and on ubuntu-core respectively), so not autoconnecting is correct (we really need somehow to override that for this corner case)

The idea where core-support implies network-bind: https://github.com/snapcore/snapd/pull/3166

I think this is over now. This work is complete.

1 Like