Classic confinement request - telegraf

sajoupa · February 12, 2019, 10:33am

Hi,

I have created a snap for telegraf (https://github.com/influxdata/telegraf/).
Its purpose is to run a wide variety of input plugins (https://github.com/influxdata/telegraf/blob/master/plugins/inputs/all/all.go), which are chosen by the user in their telegraf.conf file. It is therefore not possible to strictly confine the snap.

Here is the snapcraft file: https://github.com/sajoupa/telegraf/blob/master/snap/snapcraft.yaml

Thanks,
Laurent

jdstrand · March 2, 2019, 12:12pm

This is not specific enough to understand why it needs classic. It seems that the plugins are supplied by the snap itself, so the plugins are not arbitrary commands. It also seems that this is about gather metrics, of which we have several interfaces that allow info gathering.

Have you tried strict mode and using snappy-debug to see if it isn’t possible to run in strict mode? Can you provide more specific details on why strict mode is not sufficient?

Also, the snap is called telegraf-sajoupa; what relationship does this snap have with upstream telegraf?

jdstrand · March 15, 2019, 9:14pm

@sajoupa - this request cannot be processed until the requested information is provided. Thanks

sajoupa · March 22, 2019, 7:58pm

Hi,

My main concern is that there are (as of now) 152 different input plugins, 31 outputs.
Users manage which plugins are in use in their config file. And for each plugin, usage will be different depending on that config.
The task of testing these plugins seems enormous, and moreover, might not be enough because activating a particular option in a plugin might require a new interface. And with each telegraf release, new plugins are added.

The name ‘telegraf’ is already reserved apparently. I’m not affiliated with influxdata/telegraf, and would like to have a working published snap before considering to have it pushed with the official name, so I went with telegraf-sajoupa.

Thanks,
Laurent

jdstrand · March 25, 2019, 9:49pm

Lack of testing resources is not typically a reason to grant classic confinement. IME you could enable strict for your most common use cases and then report bugs for others that might need refinements to the interfaces. We can add accesses to existing interfaces relatively quickly and devmode is always available while the accesxses are being worked out.

stub · March 26, 2019, 4:43am

Telegraf output plugins such as file and socket_writer for writing metrics to files in various formats are expected to have write access to the whole filesystem. Similarly input plugins such as exec need to execute arbitrary shell commands anywhere on the filesystem, or csv needs to be able to read from any mounted filesystem. A large proportion of other input plugins need to call tools installed at arbitrary locations, relying on $PATH to find executables.

jdstrand · March 26, 2019, 3:49pm

Thanks for responding. Why specifically is write access needed to the whole filesystem? It seems like metrics data could just go to the snap writable areas or home/removeable-media?

The ‘exec’ input plugin sounds like a generic plugin-- what are cases when this is used? Why specifically aren’t the commands found in the runtime sufficient? Why specifically can’t common commands be shipped with the snap? What specific accesses do these accesses need that aren’t covered by existing interfaces? Sorry, I’m unfamiliar with telegraf and I see that it is “a plugin-driven server agent for collecting and reporting metrics” which sounds like things like cat/head/whatever could be used to collect. Part of the process for classic is understanding specifically why strict can’t be used now so we can improve snapd so that the snap can eventually go strict. “We need to read/write/execute everything” is not particularly tractable in that regard . The specifics don’t need to be exhaustive but will hopefully help illuminate why strict is not sufficient (and just as importantly, how we can improve snapd to make it sufficient).

More seriously, can the snap be made to mostly function in strict and then be extended via the content interface or similar to bring in other functionality? Is telegraf a framework for doing things are does it do specific things for people?

stub · March 27, 2019, 1:55am

Read and write access is needed for the whole filesystem because most use cases and user expectations require it. Telegraf is a tool for monitoring arbitrary systems and deployments, and confining it makes it unfit for that purpose. It is a general purpose systems administration tool in that respect, designed for classic environments and requiring classic confinement.

The exec plugin needs to run an arbitrary executable as specified by user configuration and expose the numbers it reports as metrics (eg. ‘ls -1 /var/lib/postgresql/main/pg_wal/*.pending | wc -l’ or ‘/usr/local/bin/rollup_usage.py --daily’).
Or you install telegraf and configure it to watch /srv/frobnoz/logs/unit-12.log and expose a metric on how many lines match the regexp ‘^(ERROR|CRITICAL)’ for example.
Or you deploy the Telegraf Juju subordinate charm and attach it to an Apache deployment, and need to configure it to monitor the pid files and status files and log files in the locations that the Apache Charm stores them.

jdstrand · April 5, 2019, 11:15am

Thank you. The requirements are now understood. This application is a framework for gathering arbitrary metrics which requires explicit configuration from the user/administrator to function which as part of that requires deliberately choosing which files to access.

It sounds like the snap could potentially be made to work (perhaps with modification) if we exposed read-only access to /var/lib/snapd/hostfs.

@Wimpress, @popey, @igor, @evan: can one of you performing publisher vetting?

jdstrand · April 5, 2019, 11:17am

Actually, I was able to do this. Granting classic. This is now live.

jdstrand · April 5, 2019, 11:19am

Your next upload should pass automated review. Alternatively, you can request a manual review for any existing revisions.

sajoupa · October 15, 2019, 9:08am

Hi,

As discussed with upstream:
https://github.com/influxdata/telegraf/issues/6405
I’ve renamed the snap to ‘telegraf’ (no longer telegraf-sajoupa).
Could someone allow the ‘telegraf’ snap for classic confinement too ?

Thanks !

jdstrand · October 16, 2019, 8:05pm

Granted. telegraf-sajoupa and telegraf are published by the same person and the same application.