Duplicate plug/slot names inside the core snap

So we just became aware that several bad things aligned:

  • the validation that ensured that a given snap has valid plug/slot names is not enabled (fixed and unrelated)
  • the core gets implicit definitions of the network-bind and core-support slots
  • the core includes both of those interfaces as plugs as well
  • the ubuntu-core -> core transition ends up with an unconnected network-bind-plug because of Auto-connect logic starting from slots of an installed/refreshed snap is naive

Once https://github.com/snapcore/snapd/pull/2932 lands we will be in some hot water and I suspect (given the recent discovery of bug when ubuntu-core transitions to core) we may have another set of hidden issues related to this duplication. (merged and unrelated)

I think we must rename the plugs on the core snap. I would like to propose to rename network-bind to internal-network-bind network-bind-plug and core-support to internal-core-support core-support-plug. We should make the change in https://github.com/snapcore/core/ directly having made sure that on refresh they are correctly auto-connecting and nothing breaks.

EDIT:

This is the list of PRs that I think are needed to fix this (in this order):

  • internally rename plugs on core snap so that they don’t clash PR 3154
  • validation for ongoing mutation to a snap in the interfaces repository (e.g. AddPlug) PR 3153
  • ensure that core plugs are always connected PR 3145

We can also land this:

  • Rename stored connections on core snap (to match renamed plus) PR 3160
  • In order that we don’t have to keep fixing this in memory we should correct the core snap as well core PR 32

i totally dont care about naming :slight_smile: go ahead as you like … though would there be a way to simply just hide any internally loop connected interfaces, that would seem like the better option.

Hiding them is orthogonal but I agree, we perhaps should not expose them much. We can pursue that as a separate topic. My point is that we broke an invariant and things can really misbehave because of that.

so just change the names then and hide them in the next round … if that fixes it i’m all for it

The first point and the respective PR #2932 looks unrelated to this issue. That’s about validating the names, and all names being discussed are valid from a syntax point of view. The real problem is that they are duplicated, and that’s completely wrong and shouldn’t be allowed by snapd at all during installation. If that wasn’t yet fixed, can we please make sure it is fixed in time for 2.24 (cc @mvo)?

As for the plug names, it sounds more friendly and obvious to have them as network-bind-plug, core-support-plug, and so on. That gives a hint that the reason we’re renaming is because there’s a slot in the same snap, rather than it being internal or obscure in any other way.

1 Like

Indeed, I assumed that the validation would detect the duplication. We may need a deeper look.

This is now addressed by https://github.com/snapcore/snapd/pull/3153 and @mvo is working on new core snap without the clashing plugs.

I also made a branch that automatically renames those two plugs and associated connection when starting snapd: https://github.com/snapcore/snapd/pull/3154

During discussion on IRC @pedronis pointed out that we have a chicken/egg problem. We need to load an invalid snap.info (for the core snap) in order to fix it. New validation added in 3153 will make us bail out and abort early so it’s clearly not a good idea. We also want to be safe in case new snapd is asked to roll-back to old core snap.

My naive idea is to move the “cure” logic from interface manager to the snap manager and to allow the snap manager to load the snap.info on core in a way that lets us bypass validation (or do the fix before we validate). This way we can safely do the rest (where rest is the set of fixes done in the interface manager)

@zyga-snapd thinking though, the new core will have the correct unique names, so maybe the question is more what happens to the connections? It’s a problem with a revert though

On revert two things may happen (not sure):

  • one of the proposed branches migrates connections in place, we can make that not save the state and simply apply the hack when loading connections in the interface manager (the state stays on disk stays as-is). Today I made it save the data so reverted snapd would not see those connections.
  • unless… not sure here… when we revert we also auto-connect, then it should be OK

@zyga-snapd the revert will try from the new snapd to AddSnap the old snap no? (first setup-profiles)

Indeed. In that case the old core snap will be “fixed” (plugs get renamed) and we re-start to that snapd. The restarted snapd (the old one) will just work as before. This seems to imply that we want to avoid touching the state persistently now.

@zyga-snapd with the current code in the PRs that AddSnap will fail though, no?

Yes, if we add the validation we need to do something more. I think that we need to move the transparent rename of the plugs (not connections just plugs) to the snap manager, to the code that loads snaps from disk. This way when the new snapd will be attempting to revert it will not block this operation.

@pedronis I’m currently considering something along those lines:

diff --git a/overlord/ifacestate/helpers.go b/overlord/ifacestate/helpers.go
index 511e17e..a9b4222 100644
--- a/overlord/ifacestate/helpers.go
+++ b/overlord/ifacestate/helpers.go
@@ -93,16 +93,6 @@ func (m *InterfaceManager) addSnaps() error {
 		return err
 	}
 	for _, snapInfo := range snaps {
-		if snapInfo.Name() == "core" {
-			// Some released core snaps had explicitly defined plugs
-			// "network-bind" and "core-support" that clashed with implicit
-			// slots with the same names but this was not validated before.  To
-			// avoid a flag day and any potential issues, transparently rename
-			// the two clashing plugs by appending the "-plug" suffix.
-			for _, plugName := range []string{"network-bind", "core-support"} {
-				snapInfo.RenamePlug(plugName, plugName+"-plug")
-			}
-		}
 		snap.AddImplicitSlots(snapInfo)
 		if err := m.repo.AddSnap(snapInfo); err != nil {
 			logger.Noticef("%s", err)
diff --git a/overlord/snapstate/snapmgr.go b/overlord/snapstate/snapmgr.go
index ed54e01..ba70026 100644
--- a/overlord/snapstate/snapmgr.go
+++ b/overlord/snapstate/snapmgr.go
@@ -248,6 +248,7 @@ func readInfoAnyway(name string, si *snap.SideInfo) (*snap.Info, error) {
 		}
 		return info, nil
 	}
+	info.RenameClashingCorePlugs()
 	return info, err
 }
 
diff --git a/snap/broken.go b/snap/broken.go
index 52ebede..8f2b10a 100644
--- a/snap/broken.go
+++ b/snap/broken.go
@@ -62,3 +62,17 @@ func GuessAppsForBroken(info *Info) map[string]*AppInfo {
 
 	return out
 }
+
+// RenameClashingCorePlugs renames plugs that clash with slot names on core snap.
+//
+// Some released core snaps had explicitly defined plugs "network-bind" and
+// "core-support" that clashed with implicit slots with the same names but this
+// was not validated before.  To avoid a flag day and any potential issues,
+// transparently rename the two clashing plugs by appending the "-plug" suffix.
+func (info *Info) RenameClashingCorePlugs() {
+	if info.Name() == "core" {
+		for _, plugName := range []string{"network-bind", "core-support"} {
+			info.RenamePlug(plugName, plugName+"-plug")
+		}
+	}
+}

This is on top of my rename-clashing-core-plugs branch which lives in this PR: https://github.com/snapcore/snapd/pull/3154

Why do we need to call that explicitly rather than allowing the info itself to rename them when necessary, when loading the data in the first place? This will also need to be done at the place we restore repository connections from the saved state.

It sounds like a good idea as it paints a clear universe in the new snapd no matter which core is currently installed, without breaking the old one on reverts.

I think you are right but I wanted to clear up my understanding of which methods exactly ought to call this. I think that the distinction is that if the helper validates it should also try to correct data like this. Another factor (look at broken.go) is that this not the first method like that so if we want to agree to make all of them private and called implicitly I’d like to make this consistent.

Sounds better to focus on the problem at hand before touching unrelated code. We can then reorganize better in a follow up using lessons learned.

Ok, so this (explicit call) approach is in line with other broken bits. Is that what you expected or do you want to flip this and be implicit with this particular data that needs to be corrected.