Text-To-Speech support in snapd

galgalesh · May 1, 2020, 5:48pm

I also think that’s possible, but I think it’s difficult to do well because speech-dispatcher is so customisable. As an example: it supports the following synthesisers: Festival, Espeak, Flite, Pico, and a “generic” module, which you can use to connect speech-dispatcher to any CLI-based synthesiser. Each synthesiser supports many different languages, and many different voices.

The first issue is space. If we only look at the Espeak Mbrola languages and Festival:

$ apt-cache show mbrola-* | grep Installed-Size | cut -d ' ' -f2 | paste -sd+ - | bc
661010
$ apt-cache show festival-* | grep Installed-Size | cut -d ' ' -f2 | paste -sd+ - | bc
40193
$ apt-cache show festvox-* | grep Installed-Size | cut -d ' ' -f2 | paste -sd+ - | bc
394100
$ apt-cache show festlex-* | grep Installed-Size | cut -d ' ' -f2 | paste -sd+ - | bc
50701

That’s more than 1GB uncompressed data. Putting all this in one snap isn’t a great user experience. I’m assuming that this will be installed on almost every system, given that we want TTS to work ootb for accessibility reasons. We could combine a greedy plug declaration with a snap for every language, however. Users can then install the languages and synthesisers they want to use. (similar to the theme snap[s] proposal).

The second issue is config/setup duplication. I don’t know how to solve this one. Ideally, the user should only have to setup their TTS preferences once. I’m not sure how to do that with a Speech Dispatcher snap. It seems that, unless we use the Speech Dispatcher from the host system, the user will have to configure it both on their host system and in the Speech Dispatcher snap. Any languages and synthesisers used will also have to be installed by both the package manager of the host system and snap.