I do encounter strange OS freezes among my IoT fleet.
My devices are equipped with a Raspberry Pi4 CM module running Ubuntu Core on version
Some devices do encounter a frequent OS freeze which I can’t pinpoint.
The following information can be provided to describe the problem or how to maybe narrow it down.
- over 90% of the fleet does run stable and does not show this issue.
- Devices that do encounter the problem, will encounter the problem again eventually.
- The frequency of does freezes on a specific device, is not at the same pace as for others. But on the device itself, it seems to show a frequency. In an example, one device freezes within 24-48h after getting started. Another device might need >124h to freeze again.
- I do have a device that froze at different locations. Different power, wi-fi, position, etc… At this spot another device ran flawlessly over weeks.
- So far I was not able to reproduce the issue, but devices with the issue will show it again. So I do have devices that reproduce the issue, but I do not know how to trigger it manually.
- Devices running into the freeze do not show unusual CPU or memory usage.
- Devices in the frozen state are still warm and seem to run.
What I tried to gather information about the issue,
- Recorded logs by enabling the persistent log. The logs do stop without any usable message. So it seems, journaling freezes as well.
- Startup logs do not show a problem with the hardware on the specific devices. The device’s software runs normally until the freeze. Identically constructed devices, with the same software do run correctly and flawlessly at the same spot.
I hope to ask for concrete suggestions on what to troubleshoot next. So I’m able further investigate this issue and maybe be able to pinpoint it. There might be other ideas and troubleshooting methods I do not know so far. Maybe there are known issues about this with Linux in general. Or, someone who encountered the same might share a helpful experience.
Thanks and kind regards