101 connection resets

William has now managed to cleanly reproduce this and has done an analysis of the tcpdumps:

The Internap CDN doesn’t seem to respect receive windows. The provided Python client easily bottlenecks, causing the receive window to eventually drop to zero, but Internap continues to transmit at full speed.

Linux conntrack, as used by the iptables MASQUERADE target, considers packets that lie entirely outside the receive window to be invalid, and the kernel rejects (not just drops) them – this behaviour can be avoided by setting net.netfilter.nf_conntrack_tcp_be_liberal=1. This doesn’t show up on EC2 because EC2’s firewall seems to drop the packets before they get to the instance, while GCE lets them through to be rejected by Linux.

So the summary is that Internap is doing a slightly bad thing to improve performance in the common case but with the right set of circumstances on the client end this is causing the resets. We recommend that you use the “nf_conntrack_tcp_be_liberal” workaround for CI jobs, and we will discuss the case with Internap.

4 Likes