[Ecn-sane] Meanwhile, over on NANOG...

Wed Nov 13 05:43:21 EST 2019

"Rodney W. Grimes" <4bone at gndrsh.dnsmgr.net> writes:

>> Toke H?iland-J?rgensen <toke at toke.dk> writes:
>> 
>> > Luca Muscariello <muscariello at ieee.org> writes:
>> >
>> >> On Tue, Nov 12, 2019 at 2:02 PM Toke H?iland-J?rgensen <toke at toke.dk> wrote:
>> >>
>> >>> Mikael Abrahamsson <swmike at swm.pp.se> writes:
>> >>>
>> >>> > On Tue, 12 Nov 2019, Toke H?iland-J?rgensen wrote:
>> >>> >
>> >>> >> I'm not on the nanog list, but feel free to cross-post; would be good
>> >>> to
>> >>> >> actually get to the bottom of this issue! Marek and I already had an
>> >>> >> off-list back-and-forth after that original thread, and we couldn't
>> >>> find
>> >>> >> anything wrong on the Cloudflare side. And the RSTs have a higher TTL
>> >>> >> than the actual traffic, indicating an in-path problem...
>> >>> >
>> >>> > tcptraceroute supports setting/clearing ECN bits (-E), would be very
>> >>> > interesting to see difference between those tcptraceroutes?
>> >>>
>> >>> No difference. But the RST is not being sent as a response to the SYN;
>> >>> it is sent in response to the first data packet...
>> >>>
>> >>> ... and now that I'm re-testing, things were working for a little while,
>> >>> but now the bug is back. I got an intermittent successful connection
>> >>> with the same TTL that I was previously getting the RST from. And now
>> >>> I'm back to getting RSTed.
>> >>>
>> >>> So I guess there's some kind of multipath issue here; ECMP path,
>> >>> multiple routing upstreams, or a broken load balancer? Any other ideas?
>> >>>
>> >>
>> >>
>> >> It makes me think of some usage of anycast TCP on the cloudflare side.
>> >> What service is this Toke?
>> >
>> > Yeah, I did also think about anycast when I said "multiple routing
>> > upstreams". For testing I've just been doing 'curl 1.1.1.1'. But
>> > Cloudflare-hosted sites in general seem to have this problem; for
>> > instance, 'curl -4 bufferbloat.net' also fails (but IPv6 is fine).
>> 
>> Right, so I've played around with tcptraceroute a bit more, and looked
>> at some more packet dumps, and I think I'm starting to form a theory:
>> 
>> I get two different traceroutes; this was from running two traceroutes
>> right after one another:
>> 
>> $ sudo tcptraceroute 1.1.1.1
>> Selected device eth0, address 10.42.3.130, port 42177 for outgoing packets
>> Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
>>  1  10.42.3.1  0.318 ms  0.325 ms  0.321 ms
>>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.337 ms  5.390 ms  3.194 ms
>>  3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.319 ms  1.120 ms  1.256 ms
>>  4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.533 ms  1.612 ms  1.392 ms
>>  5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.787 ms  6.822 ms  6.721 ms
>>  6  149.6.142.130  7.000 ms  6.939 ms  6.948 ms
>>  7  one.one.one.one (1.1.1.1) [open]  6.957 ms  6.967 ms  6.893 ms
>>  
>> $ sudo tcptraceroute 1.1.1.1
>> Selected device eth0, address 10.42.3.130, port 38681 for outgoing packets
>> Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
>>  1  10.42.3.1  0.290 ms  0.287 ms  0.292 ms
>>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.857 ms  5.382 ms  18.654 ms
>>  3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.249 ms  1.121 ms  1.521 ms
>>  4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.375 ms  2.495 ms  1.440 ms
>>  5  dix.as13335.net (192.38.7.70)  2.093 ms  1.895 ms  1.790 ms
>>  6  one.one.one.one (1.1.1.1) [open]  1.783 ms  1.861 ms  1.817 ms
>> 
>> 
>> Notice how one is one hop longer than the other.
>
> Worse than just longer, it appears as if the exit hop from gigabit.dk
> goes to 2 different providers (hop 4 above).  If these are packets towards
> an anycast address that is going to cause exactly what you see.  ECMP
> accross multiple AS's towards anycast is.. umm.. very fragile and your
> seeing one of the problems with anycast.
>
> It is very unlikely that he.net and cogentco.com end up at the same
> 1.1.1.1 box.

Yeah, did notice it was two different upstreams :)

>> So definitely something
>> to do with anycast; maybe ECMP over both paths since it's changing
>> pretty often?
>
> And the multipath is set to round robin perhaps?

Not round-robin. That it was changing simply at random turns out to be
my mistake; by default tcptraceroute will pick a new source port each
time. If I fix the source port I get the same path each time, so it
looks like it's hashing on headers.

Going back to regular UDP-based trace route I finally found what looks
to be the smoking gun:

$ traceroute 1.1.1.1 -q 1 --sport=10000 -t 1
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
 1  _gateway (10.42.3.1)  0.304 ms
 2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  3.935 ms
 3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.005 ms
 4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.361 ms
 5  netnod-ix-cph-blue-9000.cloudflare.com (212.237.192.246)  1.250 ms
 6  one.one.one.one (1.1.1.1)  1.380 ms

$ traceroute 1.1.1.1 -q 1 --sport=10000 -t 2
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
 1  _gateway (10.42.3.1)  0.236 ms
 2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  53.833 ms
 3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.195 ms
 4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.979 ms
 5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.851 ms
 6  149.6.142.130 (149.6.142.130)  13.081 ms
 7  one.one.one.one (1.1.1.1)  1.842 ms

-t is the TOS value; so those two happen to correspond to ECT(1) and
ECT(0); and as you can see they go two different paths. Which would be
consistent with the SYN going one way and the data packets going
another.

-Toke