Snap install fails in a lxd container on an openstack VM

JamesBenson · November 16, 2017, 10:49pm

I’m install Kubernetes from an ubuntu server hosted in openstack VM with 8vCPU, 16GB of RAM, 160GB storage. All ports are open in openstack and no iptables rules set.

I’ve issued the following commands from a clean install:
sudo apt-add-repository ppa:ubuntu-lxc/stable
sudo apt update && sudo apt install lxd lxd-client
lxd init --auto
lxc network create lxdbr0 ipv4.address=auto ipv4.nat=true ipv6.address=none ipv6.nat=false
sudo snap install conjure-up --classic
conjure-up kubernetes
selected Canonical kubernetes
selected Helm
Deployed.

My deployment hung as shown here:

I juju ssh etcd/0 and issued:
sudo snap install etcd
This was the result detailing the TLS issue.
http:// paste.ubuntu .com/25976912/
(Warning this has a lot of juju noise and is large)

Issuing: sudo snap install core
Results in a similar error:
ubuntu@juju-8ea5d5-2:~$ sudo snap install core|pastebinit
error: cannot perform the following tasks:

Download snap “core” (3440) from channel “stable” (Get https://068ed04f23.site.internapcdn.net/download-snap/99T7MUlRhtI3U0QFgl5mXXESAiSwt776_3440.snap?t=2017-11-17T00:00:00Z&h=3bada4b9cae92cb4de4c1236596c082ce43259cb: net/http: TLS handshake timeout)
http:// paste.ubuntu .com/25977039/

adam.stokes · November 17, 2017, 2:32pm

@sergiusens @noise, These are on full VM’s with no proxy in between. Is this possibly a result of something on our end?

sergiusens · November 17, 2017, 2:50pm

I would defer to someone in snapd, let’s start with nominating @Chipaca as he likes network errors

chipaca · November 17, 2017, 3:41pm

if this is ongoing, could you do a network capture?

The last time I saw something like this there were a bunch of RST that for some reason tripped up Go’s network stack, but not things like curl. I’d love to have a reproducer for that simpler than “share network over usb to a rpi and be in a poorly-connected part of France”.

JamesBenson · November 17, 2017, 7:27pm

chipaca, if you are available over IRC, I’d like to walk through this in real time with you. Please ping me if you are, jamesbenson

chipaca · November 18, 2017, 12:58am

@JamesBenson I’m off until Tuesday I’m afraid.

JamesBenson · November 18, 2017, 1:18am

Ok, we can try then. What’s your IRC handle?

chipaca · November 18, 2017, 1:21am

I’m Chipaca on IRC, and I’m on UTC+0.

chipaca · November 21, 2017, 4:36pm

@JamesBenson I seem to have lost you on IRC, and as I’m about to go offline for a bit I thought I’d write this down here.

First, take the URL from the error, and see whether you can download it with wget or curl. Note that you’ll usually have to quote it so the shell doesn’t get confused by the ampersand in the query string. In your example (but AFAIK that one won’t work now because it’s too old),

wget 'https://068ed04f23.site.internapcdn.net/download-snap/99T7MUlRhtI3U0QFgl5mXXESAiSwt776_3440.snap?t=2017-11-17T00:00:00Z&h=3bada4b9cae92cb4de4c1236596c082ce43259cb'

If that doesn’t work, then there’s an issue in your networking. If it does work however, download http://people.canonical.com/~john/gowget and try using that to download the URL. All it does is download the URL you give it, but it’s written in Go so it’s the same network stack.

If that fails with an error similar to the one snapd was giving you, repeat with http://people.canonical.com/~john/gowget19 (this is the same program, built against a newer Go version).

Next step after this would be to run gowget while capturing the nework with wireshark.

wgrant · November 21, 2017, 10:00pm

As discussed on IRC, this was an MTU conflict. The OpenStack cloud in question is configured such that instances have an MTU of just 1450 rather than the standard 1500, but LXD always creates lxdbr0 with 1500. So containers were trying to use 1500-byte frames, which were being dropped by the host.

This can be diagnosed from ip link output on the LXD host. If the physical (well, virtual in this case) Ethernet interface has an MTU below 1500, but lxdbr0’s is still 1500, traffic from inside the containers will be flaky.

Running lxc network set lxdbr0 bridge.mtu 1450 before creating containers solves the problem. If you’ve already started containers, you’ll need to restart them or also change the MTU on the eth0 in each container.