Incrementing loop device names breaking datadog monitor

samchenraising · February 1, 2019, 8:57pm

Metric queries cannot filter out future device names and all /dev/loop# devices are mounted 100% full RO devices.

ijohnson · February 4, 2019, 4:30pm

What exactly is the issue here?
Snaps are implemented using squashfs files which are then mounted using loopback devices. I don’t know of a way to control or somehow influence which snap gets which loopback device in /dev/loop* as it depends on what loopback devices have already been mounted.

samchenraising · February 4, 2019, 5:26pm

Datadog metrics are based on known tags so until a /dev/loop* device has been seen it can’t be excluded from the monitor. That means whenever a new loopback device gets mounted with a new number at the end, it triggers the monitor that alerts on 100% volume usage. More on datadog metrics here: https://docs.datadoghq.com/graphing/faq/when-i-query-can-i-use-wildcards-in-metric-names-and-events/

CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information that is legally privileged. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED.

chipaca · February 4, 2019, 5:31pm

Getting a warning for volume usage on something that is obviously readonly sounds like a bug, to me.

ijohnson · February 4, 2019, 5:59pm

Yeah I agree with @chipaca that this is an issue with the datadog metrics application and there’s not a problem with snapd here.
I think that there’s some special option added to the mount options used with snap files where the snap mount has x-gdu.hide in the options. Perhaps datadog could also learn to ignore mounts with that option as well

chipaca · February 4, 2019, 6:21pm

Just querying the loop device (like what losetup does, as a regular user) would be enough to figure out you shouldn’t be alerting about it.

samchenraising · February 4, 2019, 6:42pm

Thanks for the replies. I didn’t see any obvious way to adjust this on the datadog side. Why do the loop devices need to have a changing number at the end?

CONFIDENTIALITY NOTICE: This e-mail transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential information that is legally privileged. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of any of the information contained in or attached to this message is STRICTLY PROHIBITED.

chipaca · February 4, 2019, 9:18pm

because they grow dynamically, as they are needed?

Laura · March 14, 2019, 8:42pm

It is possible to exclude /dev/loop* devices from metric collection in the Datadog disk check. (https://docs.datadoghq.com/integrations/disk/#setup)

To exclude the /dev/loop* devices, add a regex that matches the names you would like to exclude to the device_blacklist list in the check’s conf.yaml file: https://github.com/DataDog/integrations-core/blob/master/disk/datadog_checks/disk/data/conf.yaml.default#L64

Restart the agent for changes to take effect.

If you have any questions, please feel free to reach out to us at support@datadoghq.com!

Laura · November 16, 2021, 9:33pm

We made some changes to the disk check, and the parameter in the disk.d/conf.yaml file to exclude devices from metric collection has changed. Now it is called device_exclude: https://github.com/DataDog/integrations-core/blob/2824dc94d595ee2a8d363855dcae7c44b298d6ff/disk/datadog_checks/disk/data/conf.yaml.default#L122-L133

If you have any questions, please feel free to reach out to us at support@datadoghq.com!