Snaps mangle Unicode stdin

A simple snap, containing the file unisnap.py:

#!/usr/bin/env python3

import sys

if __name__ == "__main__":
    inp = sys.stdin.read()
    print("This is the input")
    print(repr(inp))
    print("As bytes it is")
    print([hex(ord(x)) for x in inp])

and snapcraft.yaml of:

name: unisnap
base: core18
version: '1.00'
summary: I think snaps sod around with unicode
description: |
  let us test this theory

grade: stable
confinement: strict

parts:
  tabular:
    plugin: dump
    source: .

apps:
  unisnap:
    command: /usr/bin/python3 $SNAP/unisnap.py

Build with snapcraft and install with snap install --dangerous unisnap_1.00_amd64.snap. Now:

15:42 ~/Scra+/unisnap $ cat testfile.txt 
├─sda1    8:1    0   487M  0 part /boot/efi
15:42 ~/Scra+/unisnap $ hexdump -C testfile.txt 
00000000  e2 94 9c e2 94 80 73 64  61 31 20 20 20 20 38 3a  |......sda1    8:|
00000010  31 20 20 20 20 30 20 20  20 34 38 37 4d 20 20 30  |1    0   487M  0|
00000020  20 70 61 72 74 20 2f 62  6f 6f 74 2f 65 66 69 0a  | part /boot/efi.|
00000030
15:42 ~/Scra+/unisnap $ python3 /snap/unisnap/current/unisnap.py < testfile.txt This is the input
'├─sda1    8:1    0   487M  0 part /boot/efi\n'
As bytes it is
['0x251c', '0x2500', '0x73', '0x64', '0x61', '0x31', '0x20', '0x20', '0x20', '0x20', '0x38', '0x3a', '0x31', '0x20', '0x20', '0x20', '0x20', '0x30', '0x20', '0x20', '0x20', '0x34', '0x38', '0x37', '0x4d', '0x20', '0x20', '0x30', '0x20', '0x70', '0x61', '0x72', '0x74', '0x20', '0x2f', '0x62', '0x6f', '0x6f', '0x74', '0x2f', '0x65', '0x66', '0x69', '0xa']
15:43 ~/Scra+/unisnap $ unisnap < testfile.txt 
ERROR: ld.so: object 'libgtk3-nocsd.so.0' from LD_PRELOAD cannot be preloaded (failed to map segment from shared object): ignored.
This is the input
'\udce2\udc94\udc9c\udce2\udc94\udc80sda1    8:1    0   487M  0 part /boot/efi\n'
As bytes it is
['0xdce2', '0xdc94', '0xdc9c', '0xdce2', '0xdc94', '0xdc80', '0x73', '0x64', '0x61', '0x31', '0x20', '0x20', '0x20', '0x20', '0x38', '0x3a', '0x31', '0x20', '0x20', '0x20', '0x20', '0x30', '0x20', '0x20', '0x20', '0x34', '0x38', '0x37', '0x4d', '0x20', '0x20', '0x30', '0x20', '0x70', '0x61', '0x72', '0x74', '0x20', '0x2f', '0x62', '0x6f', '0x6f', '0x74', '0x2f', '0x65', '0x66', '0x69', '0xa']

Observe that running it as a snap, through the command symlink, mangles stdin by changing its encoding. It would be good if snapd didn’t do this.

(It’s possible that something else is going on that breaks Python’s unicodeness? But regardless, I don’t think that running the command via snapd and running the command directly ought to do different things to stdin.)

1 Like

I repeated this with a C program and it worked fine.

This is because C rocks and python doesn’t. It appears you need to add PYTHONIOENCODING: "utf-8" to an environment: stanza in tha app section. Having to do this seems like a bug .

Out of curiosity, what happens if you add locales-all to stage-packages of your snap, and add an environment section like the following:

environment:
  LOCPATH: $SNAP/usr/lib/locale

Alternatively: if you set your locale environment variables to C.utf-8, does the current version of your snap start behaving?

As far as I can tell, this does fix the problem.

As far as I can tell this also fixes the problem.

I feel like I shouldn’t have to do either, of course, but at least this may suggest where the problem lies…

I was mainly asking to test whether missing locale data was the underlying cause. I agree that you shouldn’t have to know about all this.

The core and core18 base snaps only include the C.UTF-8 locale, mostly likely due to it often not being needed for many IoT devices.

There’s a few ways this could be improved:

  1. include compiled locale data for all locales in the core and core18 base snaps. You probably noticed that your snap grew about 10 MB when adding the locale data, so this would likely get some push back.

  2. bind mount in /usr/lib/locale from the host system. The main concerns with this is whether the host’s binary locale data is compatible with the sandbox’s glibc. It also wouldn’t handle the Ubuntu Core use case. It would also leak the set of locales available on a user’s system, but that probably doesn’t reveal anything more than the locale environment variables snaps can already see.

  3. create a “locales” snap that exports compiled locales via the content interface. Combined with a Snapcraft extension, this could be quite easy for snap authors to use. As with (1), there’s a question about cross compatibility of the compiled locale data. So it might need to be a locales snap per base.

  4. for desktop snaps, use Snapcraft’s gnome-3-28 or gnome-3-34 extensions. There’s a copy of the locale database in each of those platform snaps.

What’s the question here? FWIW I think option 3 is the best solution in general, I don’t think we want to carry all possible locales in the base snaps

1 Like

well, option 3 is fine for headless and cli stuff … i think most desktop snaps should slowly migrate to simply use the gnome or qt frameworks mentioned in option 4

1 Like

If I create a compiled locale archive using core18's glibc 2.27, will it work with core's glibc 2.23 or core20's glibc 2.31?

ah yes sorry I thought you mean between the host and the snap, but yes you will need per-base versions of this just like the gnome- and other platform snaps.