Error creating key

I was researching an issue in the command snap create-key. The issue is that the command create-key gets stuck and the test shows a timeout. For example here this test failed: https://travis-ci.org/snapcore/snapd/builds/241031225

I could reproduce it sporadically and found on the logs that there is a segfault logged by the kernel. I am working on research more about why this segfault is happening.

Here I leave all the data I got from the machine.

The command to reproduce it: https://paste.ubuntu.com/24819863/ (I used just one worker)
syslog: https://paste.ubuntu.com/24820119/
dmesg: https://paste.ubuntu.com/24819869/
kernel.log: https://paste.ubuntu.com/24819881/
systemd status: https://paste.ubuntu.com/24819949/
test output: https://paste.ubuntu.com/24819856/
I also saved the snapd state: http://people.canonical.com/~sjcazzol/snappy/snapd-state-create-key.tar.gz

Any help is welcome.

Thanks

I manually tried to create a key and it gets stuck. This are the commands running in the machine.
root 18725 18554 0 03:48 pts/1 00:00:00 snap create-key testcachio
root 18736 18725 0 03:48 pts/1 00:00:00 /usr/bin/gpg --homedir /root/.snap/gnupg -q --no-auto-check-trustdb --batch --gen-key

Then, I setup the rngd command and called gpg and I could create a key. After that the snap create-key command started working again: https://paste.ubuntu.com/24838922/ . This seem to be a problem on how the random data generation is setup in the tests.

Hi Sergio, thanks for looking into this! There are some fix in place to mitigate the lack of entropy you mentioned, see Snap create-key timeouts, we also use pollinate https://github.com/dustinkirkland/pollinate to seed the generator. With this in place the frequency of the timeout is much lower, but we are still being hit by it.

We have also included some debug info in order to output the entropy available when the test fails, in the log you posted above https://travis-ci.org/snapcore/snapd/builds/241031225#L3384

kernel.random.entropy_avail = 32

which is too low, despite all our efforts. Another interesting data point we have observed is that, after pointing the generator to /dev/urandom and seeding it, the timeout seems to be only happening on ubuntu-16.04-32 (this is confirmed by your log, but maybe it could be a good thing to try to reproduce the problem on other systems to be extra sure), @pedronis suggested to exclude this system from testing until we understand the root cause of the problem.

do you have access to manipulate the kernel cmdline ? if so, putting “rng_core.default_quality=700” on there should help a lot (will force the in-kernel rng to properly push the entropy up)

1 Like

Hi, we already are doing this before run the tests:

apt-get install -y -q rng-tools
echo "HRNGDEVICE=/dev/urandom" > /etc/default/rng-tools
/etc/init.d/rng-tools restart

@ogra, do you know if that segfault that appears in the logs could be making that the entropy is not generated correctly anymore and because of that we are getting what @fgimenez pointed?

 kernel.random.entropy_avail = 32 

i’ll see if I can add "rng_core.default_quality=700, not sure if it is possible.

This is weird because this problems appears just the 5% of the executions.

yes, that would be a possible explanation (and also one of the reasons why we do not ship userspace rng tools in ubuntu core but force the above in kernel number generator to actually have a proper entropy instead)

Even more taking into account that the segfault comes from rngd:

Jun 10 01:39:19 ubuntu kernel: [  911.509447] rngd[3001]: segfault at 805f000 ip 0804ac3e sp bfaaa4dc error 6 in rngd[8048000+5000]

I have added a PR to address this issue.
https://github.com/snapcore/snapd/pull/3473

@cachio As mentioned there, I’m merging it as it’s a step forward, but can’t we do this by default in the project prepare for all cases? There’s no reason for us to want real entropy for anything generated in those tests, as no artifacts are used. We can only ever get blocked by the usual semantics.

This is a new and simple implementation which restarts the rng-tool in case of a crash.

https://github.com/snapcore/snapd/pull/3477