Snap interfaces vs. technologies: alsa, pulseaudio, playback, recording

In my ongoing work to design GUIs for snap interfaces, I’ve become a bit puzzled about how they’re delimited.

Currently, there’s an alsa interface: “Can directly access ALSA devices in /dev/snd.” Presumably this allows both recording and playback.

Separately, there’s a pulseaudio interface: “Can access the PulseAudio sound server which allows for sound playback in games and media application. Recording not supported but will be in a future release.”

This seems to be rotated 90 degrees from what I’d expect as a user. I don’t know of any reason that I would permit snaps to use Alsa but not Pulseaudio, or vice versa. Many users won’t even know those names. (Currently I have the same label for both: “play or record sound”.) What is the motivation for making them separate?

I do care, though, whether a snap is recording sound or just playing it. For example, a parent might let their young child use software that plays audio, but block any snap from recording. So it would be useful if those had separate permissions (as they do in iOS, Android, and Web browsers), but they don’t. What’s the reasoning here?

There’s a similar issue with the network interfaces. The network-manager interface “gives privileged access to configure and observe networking”. Well, which? I might be quite happy with a snap reading network config, but not with changing it. If that isn’t a useful distinction to make, then why are network-setup-observe and network-setup-control separate interfaces? And given that both of those exist, why does network-manager need to be a separate interface at all? Instead of six networking config interfaces — network-control, network-manager, network-observe, network-setup-control, network-setup-observe, and network-status — would anything be lost if there were just two, one for configuring and one for observing?

1 Like

but not as a developer :wink: on a core install you might have an app that completely owns the audio device so you wont use a multiplexer (pulseaudio) … in such cases the alsa interface makes sense …

on a typical desktop you likely always want pulse (and in turn have pulse to use alsa as its backend)

we should probably hide the alsa interface on classic installations …

Ah, so the issue is that a device might have software providing the alsa slot but not the pulseaudio slot? So if you try to install software with a pulseaudio plug, on that device, you should be prevented from doing so.

That would explain why so many of the interfaces are technology-specific. It wouldn’t address the other issue of separating reading vs. writing.

no, the alsa interface allows you direct access to the sound hardware … you would use it if you definitely have only one app ever using audio on the system … (an ubuntu core mediaplayer kiosk that plays a film in a loop)

pulse is required if you want to use more than one audio stream, it typically lives on top of alsa and provides a more abstract interface (cheap (alsa) soundcards do often not have the ability of multiplexing sound output, so the second app would be blocked, pulse helps with this by multiplexing the audio streams into one towards the backend (imagine a funnel))

typically you would not actually use the alsa interface at all on a system where pulse is installed and the interface is available, but simply use the pulse interface to not (potentially) block the audio device.

This is actually wrong. Today there is unfortunately nothing that prevents a snap that plugs pulseaudio from recording via pulseaudio. Ubuntu Touch had a patched pulseaudio that used trust-store which would perform contextual prompting to enforce the distinction, but this work was not upstreamed and is therefore not available with snaps on classic distro. The pulseaudio snap for Ubuntu Core could be patched for this, but I don’t think it is since there is no mechanism to prompt in the cli environment (@morphis could correct me here). There is a portal in the works for pulseaudio, but it isn’t upstream: https://github.com/flatpak/xdg-desktop-portal/issues/27.

As for ALSA, please see @ogra’s comment. pulseaudio is a sound server that snaps should use since accessing ALSA devices directly will cause multiplexing issues for a lot of chipsets. Some specialized snaps might want direct access to ALSA devices (eg, jackd), so it exists for them. In terms of recording and playback, there is no difference between the two: both allow both.

To be clear, I did not say “I don’t know of any reason that a developer would use Alsa rather than Pulseaudio”. I did know roughly as much as @ogra described, through years of being around Ubuntu engineers. :slightly_smiling_face:

What I said was “as a user […] I don’t know of any reason that I would permit snaps to use Alsa but not Pulseaudio”. If my device has one installed but not the other, that would be a good reason. But if that’s not the reason for them to be separate interfaces, what is?

you might want to use your system for a standalone app that simply doesnt require pulse … or use a different multiplexer like jackd as jamie mentioned already

All of this comes down to the fact that there are lots of ways to configure and monitor networking.

network-manager is an interface for allowing access to the NetworkManager DBus service. This service is not implemented with application isolation in mind so there is no way to tease out ‘observe’ from ‘control’, therefore, plugging this interface provides access to everything the NetworkManager DBus API supports. While it would be possible to add the NetworkManager DBus access to network-control (since both are conceptually analogous) it is not possible to add any NetworkManager APIs to network-observe. However, the real reason they are separated is because network-control and network-observe are implicit interfaces provided by the core snap for low-level accesses to networking, and network-manager is a slot-provided interface for high-level DBus access to NetworkManager itself. This is complicated by the fact that on snaps-only systems, NetworkManager doesn’t exist in core, so the network-manager snap must be installed, but on classic distro, NetworkManager typically does exist. Put another way, the network-manager interface must be separate on Ubuntu Core because snaps that ‘plugs: [network-manager]’ need to have the network-manager snap (which ‘slots: [network-manager]’); this is the way that the slot becomes available for a snap to plug into. If they simply ‘plugs: [network-control]’ to obtain access to NetworkManager, there would be no way for snapd to let the snap or the user know that the network-manager snap needed to be installed for the snap to work.

network-setup-observe and network-setup-control are different in that they are about reading or writing netplan configuration. One could argue that these could be moved to network-observe and network-control (indeed, this came up in the PR), but it was decided that writing a netplan configuration was so much less access than what network-control gives that it was useful to separate it out for a snap that only manages netplan (which wouldn’t otherwise need/want all the low level accesses that network-control gives). Since network-setup-control was made separate, it seemed more consistent to have the corresponding network-setup-observe rather than having the read access in network-observe (to avoid “why do I need network-setup-control to write netplan configuration but network-observe to read it?”).

Thanks, but I’m not sure of the difference between that and what you said “no” to above. So let me rephrase the use case:

  • user wants their system devoted to a non-PulseAudio app, therefore →
  • user wants to make sure PulseAudio never interferes with that app →
  • user doesn’t even have PulseAudio installed on the device (e.g. uses Lubuntu) →
  • snap-confined audio playback can’t depend on both Alsa and PulseAudio →
  • there can’t be a single audio-playback interface that depends on both Alsa and PulseAudio.

Is that right?

Understood, thank you. So the answer to “Why are there six networking config interfaces?” is, “If only we could change the NetworkManager API, there’d be seven!”

The remaining mystery is network-status: “Can access snaps providing the NetworkingStatus interface.” What is that? A search suggests that it might be specific to Ubuntu Touch.

This interface is for the connectivty API service, is auto-connected and meant to answer simple questions like “am I online”. network-manager can of course answer this, but plugging it grants excessive access as described above. Historically, yes, connectivity API was used on Ubuntu Touch and was meant to be used with Ubuntu Personal (all-snaps with Unity8). Portals also has a service to answer this simple question so when portals is available, applications plugging ‘desktop’ will get this for free (current thinking is we’ll let the portals trusted helper service handle mediation to its services instead of teasing out specific accesses there). I suspect we’ll want to add the connectivity portal to the network-status interface as well for applications not using the ‘desktop’ interface-- I’m not sure if there are real-world examples of that, but if there are, we’ll accommodate them.

The big distinction of why it is a good idea to separate ALSA and Pulseaudio into independant interfaces is because the ALSA interface is a “privileged” interface into your hardware, where pulseaudio is an unprivileged interface to a daemon. For these reasons the pulseaudio interface is automatically connected upon a snap being installed, and the alsa interface is not to maintain a secure system.

Caveat my previous comment notwithstanding (it is better to not have direct access to device files, but do keep in mind that recording is not mediated in pulseaudio anywhere).

Does that mean the use case I described above (Alsa-only software should still be able to run confined on a system without PulseAudio, and a single interface would prohibit this) is incorrect? Or are they independent reasons?

I’m sure you’re deeply familiar with what makes this a big distinction. Unfortunately, I have no idea why one would be more secure than the other. What bad things, specifically, could an app do with the Alsa interface that it can’t do with the PulseAudio interface? I know that an app using Alsa could annoyingly block other apps from playing/recording. But mentioning “secure” suggests you have something much worse in mind.

Ok, that would make it more important to have different UI text for the two interfaces. If I went with my original text of “The app ‘Rosegarden’ wants to play or record sound”, a user might find it suspicious that the system was asking for Rosegarden when it didn’t ask for Rhythmbox.

FWIW I think, today, the main difference is “do you want this app to take over your sound subsystem” vs “do you want this app to use your sound subsystem”.