Timeout toward assertions.ubuntu.com

NCuralli · June 28, 2017, 9:21am

Hi all,
we are experiencing some issue during connection to assertions.ubuntu.com.
We hit the following error when we try to refresh snaps present on our brand store:

{"type":"error","status-code":400,"status":"Bad Request","result":{"message":"cannot refresh \"fingbox-agent\": cannot refresh snap-declaration for \"domotz-platform\": Get https://assertions.ubuntu.com/v1/assertions/snap-declaration/16/QpRW3lgmMagM1JnfV4Q1kyxMfOgESQL4?max-format=2: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)"}}

The boards experiencing this issue are locate to this ip address: 123.50.62.201.
We use the following core: 1690.

Can someone help me to understand the problem?

Thanks
Nicolino

cprov · June 28, 2017, 10:52am

Nicolino,

The following operation worked just fine:

curl -s https://assertions.ubuntu.com/v1/assertions/snap-declaration/16/QpRW3lgmMagM1JnfV4Q1kyxMfOgESQL4?max-format=2 | jq .

Can you try that on the same system where it failed originally, so we can rule out network issues ?

We are not observing any errors or outages in the related services.

NCuralli · June 29, 2017, 9:12am

hi @cprov,
in the previous post I inform you only about the problem on the store side.
The problematic boards experiencing timeout problem for other service provided from us regarding the amqp protocol.
We have big difference between board running in europe and board running in Taiwan:

protocol https towards store:
from Europe: 77 ms
from Taiwan: 570 ms
ping toward server serving amqs located in Irland:
- from Europe: 50 ms
- form Taiwan: 290 ms

We fix the problem increasing the timeout for amqps client used in our snap.
It is important to say that our application http client don’t experiencing timeout problem affecting requests to the store in the same period: the corresponding timeout is 4s.

The explanation of problem seems to be that the coupling between timeout for snapd http client and Store CDN doesn’t provide a good service for Taiwan area: today boards located in Taiwan don’t experiencing the problem, then the timeout threshold seems near to exclude Taiwan area under fluctuactions.

Perhaps it is important to increase the timeout for http request form snapd side: our product is used in an area from America to Australia and the problem doesn’t permit reliable update of our snap.
I think that it is important open a discussion about this topic.

Cheers,
Nicolino

cprov · June 29, 2017, 12:09pm

Nicolino,

Assertions are not CDN-ed, until now they are not much bigger than the CDN redirect request itself and a second TLS negotiation, so it would not buy us anything or probably make things worse.

You are clearly facing network issues that should be addressed by tailoring timeout limits and retries. We should probably think on ways to safely support instance values for these parameters.

noise · June 29, 2017, 3:06pm

I’ve been trying to reproduce this by simulating network latency with:

sudo tc qdisc add dev eth0 root netem delay 700ms 100ms 25%

And then doing a snap install/remove in a loop, but thus far no failures.