Snapd using 100% CPU in x86 VM for 2 minutes after boot?

Hi all,

Testing the current subiquity-based server image in artful in an x86 VM, I see high load for upwards of 2 minutes after boot. Is this expected on first boot? I know that there have been concerns about snapd first-boot performance on armhf, but I don’t recall seeing this behavior with subiquity images in the zesty timeframe.

I’m not aware of any concerns about performance after boot on snapd.

@slangasek Do you have any more details?

Since this is an installer image, I don’t currently have much additional detail. What is the best way to debug what snapd is doing with this CPU time?

the slowness on Ubuntu Core arm is caused by two things that i dont think apply on classic installs:

  • generating a device gpg key and serial (details regading classic)
  • copying kernel and bootloader in place onto the vfat partition

(the device key generation bit looks like “generic-serials: true” is required, is that set in the used gadget ? (it seems to fall back to the default key generation otherwise))

@slangasek If you don’t want to disrupt it, strace might give a hint of which resources it’s looking at while it waits. May be a bit noisy given Go’s runtime, but easy to post-process. Then, if you send SIGQUIT it should print out the current stack for all goroutines which should tell exactly where it sits, but the process will stop.

@ogra Yes, key generation is a good thing to investigate, but this shouldn’t burn any CPU at all even when it’s hanging on lack of entropy.

remember that we switched to use ssh-keygen for device key generation a while ago, go primality test code was expensive, but now we should be as good/bad as ssh-keygen there

If that’s the case, this shouldn’t show up as snapd using CPU time at all,
correct, because ssh-keygen would be run as a separate process?

well, that is why i said i dont think these issues would apply on classic :slight_smile: unless key generation works any different with the new code for lazy device registration …

one other thing i notice on arm images is that the console is cleared and then “flashing black” for ~20 seconds or so before subiquity comes up. almost like it starts in a loop or battles with getty over owning the console … that flashing is not noticeable in a (x86) kvm window (most likely because there is no actual physical screen to power down and up again)

yes, we invoke it as subprocess. also this code is the same everywhere afair/afaict