Duplicate plug/slot names inside the core snap

ogra · April 6, 2017, 3:34pm

so just change the names then and hide them in the next round … if that fixes it i’m all for it

niemeyer · April 6, 2017, 4:12pm

The first point and the respective PR #2932 looks unrelated to this issue. That’s about validating the names, and all names being discussed are valid from a syntax point of view. The real problem is that they are duplicated, and that’s completely wrong and shouldn’t be allowed by snapd at all during installation. If that wasn’t yet fixed, can we please make sure it is fixed in time for 2.24 (cc @mvo)?

As for the plug names, it sounds more friendly and obvious to have them as network-bind-plug, core-support-plug, and so on. That gives a hint that the reason we’re renaming is because there’s a slot in the same snap, rather than it being internal or obscure in any other way.

zyga-snapd · April 6, 2017, 4:13pm

Indeed, I assumed that the validation would detect the duplication. We may need a deeper look.

zyga-snapd · April 7, 2017, 7:40am

This is now addressed by https://github.com/snapcore/snapd/pull/3153 and @mvo is working on new core snap without the clashing plugs.

zyga-snapd · April 7, 2017, 9:50am

I also made a branch that automatically renames those two plugs and associated connection when starting snapd: https://github.com/snapcore/snapd/pull/3154

zyga-snapd · April 7, 2017, 10:38am

During discussion on IRC @pedronis pointed out that we have a chicken/egg problem. We need to load an invalid snap.info (for the core snap) in order to fix it. New validation added in 3153 will make us bail out and abort early so it’s clearly not a good idea. We also want to be safe in case new snapd is asked to roll-back to old core snap.

My naive idea is to move the “cure” logic from interface manager to the snap manager and to allow the snap manager to load the snap.info on core in a way that lets us bypass validation (or do the fix before we validate). This way we can safely do the rest (where rest is the set of fixes done in the interface manager)

pedronis · April 7, 2017, 11:23am

@zyga-snapd thinking though, the new core will have the correct unique names, so maybe the question is more what happens to the connections? It’s a problem with a revert though

zyga-snapd · April 7, 2017, 11:46am

On revert two things may happen (not sure):

one of the proposed branches migrates connections in place, we can make that not save the state and simply apply the hack when loading connections in the interface manager (the state stays on disk stays as-is). Today I made it save the data so reverted snapd would not see those connections.
unless… not sure here… when we revert we also auto-connect, then it should be OK

pedronis · April 7, 2017, 12:01pm

@zyga-snapd the revert will try from the new snapd to AddSnap the old snap no? (first setup-profiles)

zyga-snapd · April 7, 2017, 12:04pm

Indeed. In that case the old core snap will be “fixed” (plugs get renamed) and we re-start to that snapd. The restarted snapd (the old one) will just work as before. This seems to imply that we want to avoid touching the state persistently now.

pedronis · April 7, 2017, 12:05pm

@zyga-snapd with the current code in the PRs that AddSnap will fail though, no?

zyga-snapd · April 7, 2017, 12:07pm

Yes, if we add the validation we need to do something more. I think that we need to move the transparent rename of the plugs (not connections just plugs) to the snap manager, to the code that loads snaps from disk. This way when the new snapd will be attempting to revert it will not block this operation.

zyga-snapd · April 7, 2017, 12:33pm

@pedronis I’m currently considering something along those lines:

diff --git a/overlord/ifacestate/helpers.go b/overlord/ifacestate/helpers.go
index 511e17e..a9b4222 100644
--- a/overlord/ifacestate/helpers.go
+++ b/overlord/ifacestate/helpers.go
@@ -93,16 +93,6 @@ func (m *InterfaceManager) addSnaps() error {
 		return err
 	}
 	for _, snapInfo := range snaps {
-		if snapInfo.Name() == "core" {
-			// Some released core snaps had explicitly defined plugs
-			// "network-bind" and "core-support" that clashed with implicit
-			// slots with the same names but this was not validated before.  To
-			// avoid a flag day and any potential issues, transparently rename
-			// the two clashing plugs by appending the "-plug" suffix.
-			for _, plugName := range []string{"network-bind", "core-support"} {
-				snapInfo.RenamePlug(plugName, plugName+"-plug")
-			}
-		}
 		snap.AddImplicitSlots(snapInfo)
 		if err := m.repo.AddSnap(snapInfo); err != nil {
 			logger.Noticef("%s", err)
diff --git a/overlord/snapstate/snapmgr.go b/overlord/snapstate/snapmgr.go
index ed54e01..ba70026 100644
--- a/overlord/snapstate/snapmgr.go
+++ b/overlord/snapstate/snapmgr.go
@@ -248,6 +248,7 @@ func readInfoAnyway(name string, si *snap.SideInfo) (*snap.Info, error) {
 		}
 		return info, nil
 	}
+	info.RenameClashingCorePlugs()
 	return info, err
 }
 
diff --git a/snap/broken.go b/snap/broken.go
index 52ebede..8f2b10a 100644
--- a/snap/broken.go
+++ b/snap/broken.go
@@ -62,3 +62,17 @@ func GuessAppsForBroken(info *Info) map[string]*AppInfo {
 
 	return out
 }
+
+// RenameClashingCorePlugs renames plugs that clash with slot names on core snap.
+//
+// Some released core snaps had explicitly defined plugs "network-bind" and
+// "core-support" that clashed with implicit slots with the same names but this
+// was not validated before.  To avoid a flag day and any potential issues,
+// transparently rename the two clashing plugs by appending the "-plug" suffix.
+func (info *Info) RenameClashingCorePlugs() {
+	if info.Name() == "core" {
+		for _, plugName := range []string{"network-bind", "core-support"} {
+			info.RenamePlug(plugName, plugName+"-plug")
+		}
+	}
+}

This is on top of my rename-clashing-core-plugs branch which lives in this PR: https://github.com/snapcore/snapd/pull/3154

niemeyer · April 7, 2017, 4:41pm

Why do we need to call that explicitly rather than allowing the info itself to rename them when necessary, when loading the data in the first place? This will also need to be done at the place we restore repository connections from the saved state.

It sounds like a good idea as it paints a clear universe in the new snapd no matter which core is currently installed, without breaking the old one on reverts.

zyga-snapd · April 7, 2017, 4:44pm

I think you are right but I wanted to clear up my understanding of which methods exactly ought to call this. I think that the distinction is that if the helper validates it should also try to correct data like this. Another factor (look at broken.go) is that this not the first method like that so if we want to agree to make all of them private and called implicitly I’d like to make this consistent.

niemeyer · April 7, 2017, 4:45pm

Sounds better to focus on the problem at hand before touching unrelated code. We can then reorganize better in a follow up using lessons learned.

zyga-snapd · April 7, 2017, 4:47pm

Ok, so this (explicit call) approach is in line with other broken bits. Is that what you expected or do you want to flip this and be implicit with this particular data that needs to be corrected.

niemeyer · April 7, 2017, 4:51pm

The latter. Asking every call site to fix the data is extremely error prone. One call site we miss today or tomorrow and we have a lurking bug waiting to crash someone’s system. Putting that into the reading pipeline means we know for a fact that corrupted data won’t go through, and in that same place we should take the chance and reject attempts to load snaps that have such corrupted state which is not the one case we need to dynamically fix.

zyga-snapd · April 10, 2017, 11:37am

We’re collecting the required PRs in the top post (it is now a wiki).

zyga-snapd · April 10, 2017, 7:58pm

We have one more data point that seems to explains the last mystery. The auto-connect logic is symmetric and we have a very simplistic approach at that. If a given slot may auto-connect to more than one plug (this applies to network-bind we no longer connect as we don’t know it should be connected in all the places. We should re-consider auto-connection logic for slots for 2.25.

The test that mvo added in PR 3164 fails because there are other consumers that could use the network-bind slot and we already knew there are many providers (because two cores exist briefly):

I added extra logging when len(candidates) is not zero and not one:

INFO cannot auto connect {core network-bind-plug} (plug auto-connection), candidates found: “core:network-bind,ubuntu-core:network-bind”
INFO cannot auto connect {core network-bind} (slot auto-connection), candidates found: “core:network-bind-plug,test-snapd-python-webserver:network-bind”