upstream-relation: The Canonical public cloud team is upstream for this snap
interfaces:
system-files:
request-type: installation & auto-connection
reasoning: In order for the ec2-instance-connect to run there must be an active user session. The suggested work-around is to enable-linger for the user. To support this, the gadget needs permission to write to /var/lib/systemd/linger.
If there’s an alternative solution (or a way I can reasonably test this without store permissions) I’d love the assist. While I’m confident enable-linger does resolve the issue, I’m not sure if the gadget writes the file early enough on first boot to take effect and can’t find a way to test without the above auto-connect permissions
Linger means that the user session (of which user?) is perpetually on. Can you clarify how this would work? Do you mean to run this for the root user or for some other user?
I personally think this is somewhat odd and that the agent should create non-user service that just runs.
This would be for the ubuntu user. This is to enable the ec2-instance-connect feature on aws. To accomplish this, ec2-instance-connect sets the AuthorizedKeysCommand to a script (eic-run-authorized-keys) in the snap and the ubuntu user as the AuthorizedKeysCommandUser. We could consider creating a dedicated eic user, but it would still require that user to have an active login and this request would still be required
Sure, the high level purpose of ec2-instance-connect (eic) is to allow aws to dynamically inject keys into a running instance. This allows for things like role-based access without passing around specific keys.
On install, a conf file is created modifying sshd with ExecStart=/usr/sbin/sshd -D -o "AuthorizedKeysCommand /snap/bin/ec2-instance-connect.eic-run-authorized-keys %%u %%f" -o "AuthorizedKeysCommandUser ubuntu" $SSHD_OPTS
When a user tries to login using an eic client, assuming their role/user/etc is authorized to connect to the instance, a key is generated and stored in the instance metadata store. When the session begins, sshd runs the eic-run-authorized-keys script to find authorized keys. That script runs a separate script which curls the dynamically generated key from metadata store and pushes it to~/<user>/.ssh/authorized_keys. Less relevant to this discussion, but for completeness, 60s later the key is removed from authorized keys to prevent repeated access without rechecking the role.
To your specific questions:
Why the agent needs to run as a user service?
There’s not a long running service as such, but the eic-run-authorized-keys script from the snap is run any time an ssh connection is made, and a snap cannot run without an active login session.
What happens when someone spawns an instance and provides initialization data to create a user account that is not called ubuntu?
Great question, and one I hadn’t considered. May need to consider having the snap-ec2-instance-connect package create it’s own user just for this purpose. I think it ultimately doesn’t change the permissions I need here though. I’d still need to be able to enable-linger on that user
In the end, is authorization and connection handled by ssh or by a custom protocol/port?
SSH. Ultimately the only thing that’s different is the command sshd is using to check authorized_keys.
Classically, without linger enabled, we’d have a situation where someone connects, sshd runs a program (here that program just happens to be from the snap), and then connection continues.
What I’m missing now is:
This script will clearly run with the same owner and permissions as sshd (unless sshd arranges otherwise). How does enabling linger change its ability to work?
The script (well, the snap program) will run inside the snap sandbox that limits the ability to access files from user home directories.
At some time, after authentication presumably, PAM will set up a logind session, start systemd as the user and establish all user services and other things that constitute “the session”.
So let me ask one specific thing, what happens today, when there’s no linger enabled? Where does it break?
This script will clearly run with the same owner and permissions as sshd (unless sshd arranges otherwise). How does enabling linger change its ability to work?
It’s possible I misunderstand, but the -o AuthorizedKeysCommandUser <user> specifies which user the AuthorizedKeysCommand is being run under. that user does not have to be root, or the same user that is attempting login.
The script (well, the snap program) will run inside the snap sandbox that limits the ability to access files from user home directories.
I had to go give the source another re-read, but you’re right, it doesn’t actually write the key to authorized_keys, it returns the keys from the metadata service directly to sshd, so file permissions or relation to the user logging in shouldn’t matter
At some time, after authentication presumably, PAM will set up a logind session, start systemd as the user and establish all user services and other things that constitute “the session”.
Yes after authentication there is an active login session for the user
So let me ask one specific thing, what happens today, when there’s no linger enabled? Where does it break?
If linger is not enabled (and there is no active login session for the AuthorizedKeysCommandUser user) then you’ll see something like the following in logs
AuthorizedKeysCommand /snap/bin/ec2-instance-connect.eic-run-authorized-keys <attempted_login_user> SHA256:<shasum>
Connection closed by authenticating user <attempted_login_user> 18.206.107.28 port 45866 [preauth]
A more local example running a core24 snap on noble as a user without a login session you’ll see the following error
/user.slice/user-1000.slice/session-3.scope is not a snap cgroup for tag snap.awspub.awspub