New etrace features

Hello folks,

In response to some social media threads going around, I took the time to use a new feature (full blog post incoming soon!) I have implemented in etrace to measure some popular snaps and see what their startup times are and what they would look like if those snaps switched to using LZO compression instead of XZ via a single line change in their snapcraft.yaml: How to switch your snap to use LZO compression.

The new feature basically just combines a bunch of manual steps I would do before to analyze snaps into a single command analyze-snap to see what the performance change looks like with both hot and cold cases measured, as well as the predicted change in size of the snap (since switching to LZO makes your snap larger). See:

$ etrace analyze-snap spotify
original snap size: 179.57 MiB
original compression format is xz
content snap slot dependencies: []
worst case performance:
    average time to display: 15.999082347s
    standard deviation for time to display: 690.576194ms
best case performance:
    average time to display: 1.007677322s
    standard deviation for time to display: 145.771155ms
worst case performance with lzo compression:
    average time to display: 6.040844667s
    standard deviation for time to display: 145.185634ms
    average time to display percent change: -62.24%
best case performance with lzo compression:
    average time to display: 908.255344ms
    standard deviation for time to display: 205.314223ms
    average time to display percent change: -9.87%
lzo snap size: 254.51 MiB (change of +41.74%)

And for 1password:

$ etrace analyze-snap 1password
original snap size: 82.45 MiB
original compression format is xz
content snap slot dependencies: [gnome-3-28-1804 gtk-common-themes]
worst case performance:
    average time to display: 13.165714693s
    standard deviation for time to display: 620.851787ms
best case performance:
    average time to display: 1.032303485s
    standard deviation for time to display: 1.150002ms
worst case performance with lzo compression:
    average time to display: 4.067467633s
    standard deviation for time to display: 4.475865ms
    average time to display percent change: -69.11%
best case performance with lzo compression:
    average time to display: 1.032786947s
    standard deviation for time to display: 9.021305ms
    average time to display percent change: +0.05%
lzo snap size: 105.62 MiB (change of +28.10%)

The other thing that I implemented is some quick options for doing a single run of the worst (cold cache hit) or best (hot cache hit) performance for an application with the --hot and --cold options.

Get the best case performance time for a snap (with 2 runs, the first is to ensure that everything is ready, since the first run might actually be a cold run ordinarily):

$ etrace exec --no-trace --hot --silent 1password > /dev/null; etrace exec --no-trace --hot --silent 1password
Total startup time: 1.029102842

Get the worst case performance time for a snap:

$ etrace exec --no-trace --cold --silent 1password
Total startup time: 13.180403375

Hope this helps.
Thanks,
Ian

5 Likes

Hi! This performance improvement looks great! :slight_smile: I’m also a bit surprised to see that xv decompression is so slow: I would expect decompression to be quite fast in general, regardless of the algorithm, but I’m obviously not an expert here.

Anyway, what I wanted to ask is how does etrace simulate a cold start: the kernel filesystem layer should be caching the disk reads, so is etrace able to invalidate them too somehow?

Thanks for working on this. I tried it out here, with mixed results. I’m using a locally-built snap of etrace from the tip of your git repo.

The 1password run looks okay?

alan@robot:~/tmp$ etrace analyze-snap 1password
[sudo] password for alan: 
original snap size: 82.45 MiB
original compression format is xz
content snap slot dependencies: [gnome-3-28-1804 gtk-common-themes]
worst case performance:
        average time to display: 11.71461643s
        standard deviation for time to display: 521.572248ms
best case performance:
        average time to display: 1.100141468s
        standard deviation for time to display: 33.845835ms
worst case performance with lzo compression:
        average time to display: 5.362162159s
        standard deviation for time to display: 345.313641ms
        average time to display percent change: -54.23%
best case performance with lzo compression:
        average time to display: 1.080339577s
        standard deviation for time to display: 27.562815ms
        average time to display percent change: -1.80%
lzo snap size: 105.62 MiB (change of +28.10%)

However the spotify run never seems to complete, and I end up with a pop up “Pending update of ‘spotify’ snap”. But I removed the snap before starting the test.

alan@robot:~/tmp$ etrace analyze-snap spotify
original snap size: 179.57 MiB
original compression format is xz
content snap slot dependencies: []
worst case performance:
        average time to display: 14.040214447s
        standard deviation for time to display: 426.452495ms
best case performance:
        average time to display: 535.509384ms
        standard deviation for time to display: 7.502187ms
exit status 1

Just before the exit, a window appears and disappears repeatedly (maybe 20 times) end then the error occurs.

alan@robot:~/tmp$ snap version
snap    2.51.4
snapd   2.51.4
series  16
ubuntu  21.04
kernel  5.11.0-25-generic

alan@robot:~/tmp$ snap refresh --time
timer: sun,12:00
last: today at 12:04 BST
next: in 5 days, at 12:00 BST

VLC was unhappy too.

alan@robot:~/tmp$ etrace analyze-snap vlc
original snap size: 295.71 MiB
original compression format is xz
content snap slot dependencies: []
json: cannot unmarshal object into Go struct field Execution.Runs.Errors of type error

Hi, thanks for testing! This sounds like refresh app awareness is getting in the way of etrace trying to remove and reinstall the snap or something like that, can you show the output of snap get -d system experimental?

Right so 20 times means that it is able to analyze the XZ version of the snap and then something breaks with the attempt for the LZO version of snap, since overall there should be 40 tests/windows appearing (10 each for hot/cold and for XZ/LZO).

Can you try just doing single runs with something like etrace exec --no-trace --cold vlc ? That works for me on Hirsute, etc. Also are you using stock gnome with Ubuntu or a different window manager? Not that it would matter as long as it is X11 I think it should work the same with xdotool nonetheless (though I was just this morning informed that project is not maintained so maybe there is indeed some bug there).

If you’re interested in more data, I did a large analysis over here: Squashfs performance effect on snap startup time

Specifically, what it does is this:

  1. clear the VM caches in the kernel via this code https://github.com/canonical/etrace/blob/master/internal/profiling/profiling.go#L33-L50
  2. uninstall the snap (which unmounts the squashfs and should in effect free the loopback device)
  3. reinstall the snap (which should allocate a new loopback device)
  4. delete all snap user data like profiles (and restore it at the end of the run too so that user’s data isn’t permanently lost)
  5. discard the snap mount namespace (though again this should be done already by removing the snap)
  6. start the snap (either with tracing or without, without tracing is the closest to a real run a user would do)
  7. wait for the snap to display a window and capture the time it took
  8. immediately kill the snap process and display the output
1 Like

Yes, looks like you’re right

alan@robot:~/tmp$ snap get -d system experimental
{
        "experimental": {
                "refresh-app-awareness": true
        }
}
alan@robot:~/tmp$ etrace exec --no-trace --cold vlc 
[sudo] password for alan: 
cannot use --discard-snap-ns without --use-snap-run or a command that resolves to /snap/bin/<cmd>

I am using Kubuntu 21.04 on X11 with two displays.

What is the output of which -a vlc on your machine ?

alan@robot:~/tmp$  which -a vlc
/usr/bin/vlc
/bin/vlc
/snap/bin/vlc

Oh, sorry about that. Didn’t realise I had the deb installed.

No worries, I should adjust things so it does the right thing automatically though. Do you also have the deb of spotify installed too?

No, I did at one point and thought that might be an issue, maybe I didn’t restart the terminal after removing it, so it was still in the path. I’ll start again with no debs.

alan@robot:~$ which -a vlc
/snap/bin/vlc
alan@robot:~$ etrace exec --no-trace --cold vlc
/usr/share/libdrm/amdgpu.ids: No such file or directory
VLC media player 3.0.16 Vetinari (revision 3.0.16-0-g5e70837d8d)
[0000556933c4ea00] main libvlc: Running vlc with the default interface. Use 'cvlc' to use vlc without interface.
/usr/share/libdrm/amdgpu.ids: No such file or directory
[0000556933c443c0] main playlist: playlist is empty
The X11 connection broke (error 1). Did the X11 server die?
Total startup time: 10.961770418

Okay, can you try running analyze-snap again?

1 Like

Sadly:

alan@robot:~$ etrace analyze-snap vlc
original snap size: 295.71 MiB
original compression format is xz
content snap slot dependencies: []
json: cannot unmarshal object into Go struct field Execution.Runs.Errors of type error

I just merged a commit which adds a bit more error information, can you try building from tip and installing that version locally? It will probably be an hour or two before shows up in the edge channel automatically

1 Like
alan@robot:~$ etrace analyze-snap vlc
[sudo] password for alan: 
original snap size: 295.71 MiB
original compression format is xz
content snap slot dependencies: []
error getting results from sub-etrace process: json: cannot unmarshal object into Go struct field Execution.Runs.Errors of type error (full output is {"Runs":[{"TimeToDisplay":9937637911,"TimeToRun":9937637911},{"TimeToDisplay":10539745095,"TimeToRun":10539745095,"Errors":[{}]},{"TimeToDisplay":10421553924,"TimeToRun":10421553924,"Errors":[{}]},{"TimeToDisplay":10961942630,"TimeToRun":10961942630,"Errors":[{}]},{"TimeToDisplay":10462786177,"TimeToRun":10462786177,"Errors":[{}]},{"TimeToDisplay":10416046834,"TimeToRun":10416046834,"Errors":[{}]},{"TimeToDisplay":10361480617,"TimeToRun":10361480617,"Errors":[{}]},{"TimeToDisplay":9984684871,"TimeToRun":9984684871},{"TimeToDisplay":9897799626,"TimeToRun":9897799626,"Errors":[{}]},{"TimeToDisplay":10981668741,"TimeToRun":10981668741,"Errors":[{}]}]}
)

etrace 0.9-4-gd2fab49

Thanks for that, I have what I think is a fix up at https://github.com/canonical/etrace/pull/61.

If you could test that PR and let me know if it works for you that would be great, the .snap file is built as part of the GitHub Actions workflow so you can just download it and install it locally to test it.

1 Like

Here’s the result of using that PR.

alan@robot:~$ etrace analyze-snap vlc
[sudo] password for alan: 
original snap size: 295.71 MiB
original compression format is xz
content snap slot dependencies: []
worst case performance:
        average time to display: 10.107405696s
        standard deviation for time to display: 409.305276ms
best case performance:
        average time to display: 532.770106ms
        standard deviation for time to display: 10.149051ms
worst case performance with lzo compression:
        average time to display: 6.339203836s
        standard deviation for time to display: 372.713941ms
        average time to display percent change: -37.28%
best case performance with lzo compression:
        average time to display: 542.555362ms
        standard deviation for time to display: 25.792852ms
        average time to display percent change: +1.84%
lzo snap size: 423.90 MiB (change of +43.35%)

1 Like

I am trying to use etrace, however on my computer it always exists with status 1

om26er@HomePC:~$ etrace analyze-snap slack
original snap size: 126.55 MiB
original compression format is xz
content snap slot dependencies: []
exit status 1

Hi, thanks for trying it out, can you clarify if running etrace --no-trace exec --cold slack works for you? If it doesn’t can you open up an issue on github.com/canonical/etrace so we can continue discussion over there? Thanks

or actually even if it does work, please open a new issue over there since analyze-snap is still not working for you regardless of if exec works