[Ecn-sane] Meanwhile, over on NANOG...

Discussion of explicit congestion notification's impact on the Internet
 help / color / mirror / Atom feed

* [Ecn-sane] Meanwhile, over on NANOG...
@ 2019-11-12 12:07 Rich Brown
  2019-11-12 12:20 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Brown @ 2019-11-12 12:07 UTC (permalink / raw)
  To: ecn-sane

I saw this message crop up in the NANOG digest today... https://mailman.nanog.org/pipermail/nanog/2019-November/104045.html

Rich

> Message: 9
> Date: Mon, 11 Nov 2019 16:55:14 -0800
> From: Owen DeLong <owen@delong.com>
> To: Baldur Norddahl <baldur.norddahl@gmail.com>
> Cc: nanog@nanog.org
> Subject: Re: ECN
> Message-ID: <59956815-A2BB-4DA3-AD57-C746C49FD617@delong.com>
> Content-Type: text/plain;	charset=utf-8
> 
> 
> 
>> On Nov 11, 2019, at 05:01 , Baldur Norddahl <baldur.norddahl@gmail.com> wrote:
>> 
>> Hello
>> 
>> I have a customer that believes my network has a ECN problem. We do not, we just move packets. But how do I prove it?
> 
> Are you saying that none of your routers support ECN or that you think ECN only applies to endpoints?
> 
>> Is there a tool that checks for ECN trouble? Ideally something I could run on the NLNOG Ring network.
>> 
>> I believe it likely that it is the destination that has the problem.
> 
> I’d say start with asking the reporter to provide a PCAP of the problem and review the packet trace to provide clues of tap points
> in your network to investigate where ECN is (or should be) occurring and the opposite is occurring.
> 
> Owen
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-12 12:07 [Ecn-sane] Meanwhile, over on NANOG Rich Brown
@ 2019-11-12 12:20 ` Toke Høiland-Jørgensen
  2019-11-12 12:25   ` Mikael Abrahamsson
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-11-12 12:20 UTC (permalink / raw)
  To: Rich Brown, ecn-sane

Rich Brown <richb.hanover@gmail.com> writes:

> I saw this message crop up in the NANOG digest today...
> https://mailman.nanog.org/pipermail/nanog/2019-November/104045.html

Heh. The guy asking is from my ISP; I'm pretty sure I'm that customer... :D

Ref:
https://lists.bufferbloat.net/pipermail/ecn-sane/2019-September/000514.html

I'm not on the nanog list, but feel free to cross-post; would be good to
actually get to the bottom of this issue! Marek and I already had an
off-list back-and-forth after that original thread, and we couldn't find
anything wrong on the Cloudflare side. And the RSTs have a higher TTL
than the actual traffic, indicating an in-path problem...

-Toke

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-12 12:20 ` Toke Høiland-Jørgensen
@ 2019-11-12 12:25   ` Mikael Abrahamsson
  2019-11-12 13:02     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 17+ messages in thread
From: Mikael Abrahamsson @ 2019-11-12 12:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Rich Brown, ecn-sane

[-- Attachment #1: Type: text/plain, Size: 592 bytes --]

On Tue, 12 Nov 2019, Toke Høiland-Jørgensen wrote:

> I'm not on the nanog list, but feel free to cross-post; would be good to 
> actually get to the bottom of this issue! Marek and I already had an 
> off-list back-and-forth after that original thread, and we couldn't find 
> anything wrong on the Cloudflare side. And the RSTs have a higher TTL 
> than the actual traffic, indicating an in-path problem...

tcptraceroute supports setting/clearing ECN bits (-E), would be very 
interesting to see difference between those tcptraceroutes?

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-12 12:25   ` Mikael Abrahamsson
@ 2019-11-12 13:02     ` Toke Høiland-Jørgensen
  2019-11-12 13:54       ` Luca Muscariello
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-11-12 13:02 UTC (permalink / raw)
  To: Mikael Abrahamsson; +Cc: Rich Brown, ecn-sane

Mikael Abrahamsson <swmike@swm.pp.se> writes:

> On Tue, 12 Nov 2019, Toke Høiland-Jørgensen wrote:
>
>> I'm not on the nanog list, but feel free to cross-post; would be good to 
>> actually get to the bottom of this issue! Marek and I already had an 
>> off-list back-and-forth after that original thread, and we couldn't find 
>> anything wrong on the Cloudflare side. And the RSTs have a higher TTL 
>> than the actual traffic, indicating an in-path problem...
>
> tcptraceroute supports setting/clearing ECN bits (-E), would be very 
> interesting to see difference between those tcptraceroutes?

No difference. But the RST is not being sent as a response to the SYN;
it is sent in response to the first data packet...

... and now that I'm re-testing, things were working for a little while,
but now the bug is back. I got an intermittent successful connection
with the same TTL that I was previously getting the RST from. And now
I'm back to getting RSTed.

So I guess there's some kind of multipath issue here; ECMP path,
multiple routing upstreams, or a broken load balancer? Any other ideas?

-Toke


tcpdump output:

With ECN, and failing. Notice TTL 59 for the SYNACK, but TTL 61 for the
RST:

 00:00:00.000000 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 63, id 21817, offset 0, flags [DF], proto TCP (6), length 60)
    85.204.121.218.33376 > 1.1.1.1.80: Flags [SEW], cksum 0x5284 (correct), seq 1677914250, win 64240, options [mss 1460,sackOK,TS val 438384324 ecr 0,nop,wscale 7], length 0
 00:00:00.006962 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 59, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    1.1.1.1.80 > 85.204.121.218.33376: Flags [S.E], cksum 0x4e79 (correct), seq 1887212753, ack 1677914251, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 10], length 0
 00:00:00.000614 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 21818, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.33376 > 1.1.1.1.80: Flags [.], cksum 0xffa8 (correct), seq 1, ack 1, win 502, length 0
 00:00:00.000255 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 125: (tos 0x2,ECT(0), ttl 63, id 21819, offset 0, flags [DF], proto TCP (6), length 111)
    85.204.121.218.33376 > 1.1.1.1.80: Flags [P.], cksum 0x05e5 (correct), seq 1:72, ack 1, win 502, length 71: HTTP, length: 71
        GET / HTTP/1.1
        Host: 1.1.1.1
        User-Agent: curl/7.66.0
        Accept: */*

 00:00:00.001714 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 60: (tos 0x2,ECT(0), ttl 61, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    1.1.1.1.80 > 85.204.121.218.33376: Flags [R], cksum 0x5639 (correct), seq 1887212754, win 0, length 0


Without ECN; succeeding, with TTL 59:

 00:00:00.000000 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 63, id 54830, offset 0, flags [DF], proto TCP (6), length 60)
    85.204.121.218.33362 > 1.1.1.1.80: Flags [S], cksum 0x5430 (correct), seq 922398600, win 64240, options [mss 1460,sackOK,TS val 438346737 ecr 0,nop,wscale 7], length 0
 00:00:00.006895 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 59, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    1.1.1.1.80 > 85.204.121.218.33362: Flags [S.], cksum 0xbdf8 (correct), seq 1251654028, ack 922398601, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 10], length 0
 00:00:00.000570 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 54831, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.33362 > 1.1.1.1.80: Flags [.], cksum 0x6ee8 (correct), seq 1, ack 1, win 502, length 0
 00:00:00.000261 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 125: (tos 0x0, ttl 63, id 54832, offset 0, flags [DF], proto TCP (6), length 111)
    85.204.121.218.33362 > 1.1.1.1.80: Flags [P.], cksum 0x7524 (correct), seq 1:72, ack 1, win 502, length 71: HTTP, length: 71
        GET / HTTP/1.1
        Host: 1.1.1.1
        User-Agent: curl/7.66.0
        Accept: */*

 00:00:00.006955 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 59, id 46658, offset 0, flags [DF], proto TCP (6), length 40)
    1.1.1.1.80 > 85.204.121.218.33362: Flags [.], cksum 0x707a (correct), seq 1, ack 72, win 29, length 0
 00:00:00.004938 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 609: (tos 0x0, ttl 59, id 46659, offset 0, flags [DF], proto TCP (6), length 595)
    1.1.1.1.80 > 85.204.121.218.33362: Flags [P.], cksum 0x13dc (correct), seq 1:556, ack 72, win 29, length 555: HTTP, length: 555
        HTTP/1.1 301 Moved Permanently
        Date: Fri, 20 Sep 2019 09:33:56 GMT
        Content-Type: text/html
        Transfer-Encoding: chunked
        Connection: keep-alive
        Location: https://1.1.1.1/
        Served-In-Seconds: 0.000
        CF-Cache-Status: HIT
        Age: 3920
        Expires: Fri, 20 Sep 2019 13:33:56 GMT
        Cache-Control: public, max-age=14400
        Server: cloudflare
        CF-RAY: 5192ccfbeeefd47b-HAM

        ba
        <html>
        <head><title>301 Moved Permanently</title></head>
        <body bgcolor="white">
        <center><h1>301 Moved Permanently</h1></center>
        <hr><center>cloudflare-lb</center>
        </body>
        </html>

 00:00:00.000002 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 59, id 46660, offset 0, flags [DF], proto TCP (6), length 45)
    1.1.1.1.80 > 85.204.121.218.33362: Flags [P.], cksum 0x2a28 (correct), seq 556:561, ack 72, win 29, length 5: HTTP
 00:00:00.000549 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 54833, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.33362 > 1.1.1.1.80: Flags [.], cksum 0x6c77 (correct), seq 72, ack 556, win 501, length 0
 00:00:00.000266 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 54834, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.33362 > 1.1.1.1.80: Flags [.], cksum 0x6c72 (correct), seq 72, ack 561, win 501, length 0
 00:00:00.000217 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 54835, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.33362 > 1.1.1.1.80: Flags [F.], cksum 0x6c71 (correct), seq 72, ack 561, win 501, length 0
 00:00:00.007287 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 59, id 46661, offset 0, flags [DF], proto TCP (6), length 40)
    1.1.1.1.80 > 85.204.121.218.33362: Flags [F.], cksum 0x6e48 (correct), seq 561, ack 73, win 29, length 0
 00:00:00.000504 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 54836, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.33362 > 1.1.1.1.80: Flags [.], cksum 0x6c70 (correct), seq 73, ack 562, win 501, length 0
 00:00:05.170886 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 11852, offset 0, flags [DF], proto TCP (6), length 60)


And that one time it worked, with TTL 61:

13:47:54.908967 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 63, id 53207, offset 0, flags [DF], proto TCP (6), length 60)
    85.204.121.218.48924 > 1.1.1.1.80: Flags [SEW], cksum 0xa5de (correct), seq 3526272449, win 64240, options [mss 1460,sackOK,TS val 513441489 ecr 0,nop,wscale 7], length 0
13:47:54.910220 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    1.1.1.1.80 > 85.204.121.218.48924: Flags [S.E], cksum 0x17dd (correct), seq 633452041, ack 3526272450, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 10], length 0
13:47:54.910747 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 53208, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.48924 > 1.1.1.1.80: Flags [.], cksum 0xc90c (correct), seq 1, ack 1, win 502, length 0
13:47:54.910990 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 125: (tos 0x2,ECT(0), ttl 63, id 53209, offset 0, flags [DF], proto TCP (6), length 111)
    85.204.121.218.48924 > 1.1.1.1.80: Flags [P.], cksum 0xcf48 (correct), seq 1:72, ack 1, win 502, length 71: HTTP, length: 71
	GET / HTTP/1.1
	Host: 1.1.1.1
	User-Agent: curl/7.66.0
	Accept: */*
	
13:47:55.119451 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 125: (tos 0x0, ttl 63, id 53210, offset 0, flags [DF], proto TCP (6), length 111)
    85.204.121.218.48924 > 1.1.1.1.80: Flags [P.], cksum 0xcf48 (correct), seq 1:72, ack 1, win 502, length 71: HTTP, length: 71
	GET / HTTP/1.1
	Host: 1.1.1.1
	User-Agent: curl/7.66.0
	Accept: */*
	
13:47:55.120638 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 61, id 41447, offset 0, flags [DF], proto TCP (6), length 40)
    1.1.1.1.80 > 85.204.121.218.48924: Flags [.], cksum 0xca9e (correct), seq 1, ack 72, win 29, length 0
13:47:55.130264 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 609: (tos 0x2,ECT(0), ttl 61, id 41448, offset 0, flags [DF], proto TCP (6), length 595)
    1.1.1.1.80 > 85.204.121.218.48924: Flags [P.], cksum 0xde5e (correct), seq 1:556, ack 72, win 29, length 555: HTTP, length: 555
	HTTP/1.1 301 Moved Permanently
	Date: Tue, 12 Nov 2019 12:47:55 GMT
	Content-Type: text/html
	Transfer-Encoding: chunked
	Connection: keep-alive
	Location: https://1.1.1.1/
	Served-In-Seconds: 0.000
	CF-Cache-Status: HIT
	Age: 2976
	Expires: Tue, 12 Nov 2019 16:47:55 GMT
	Cache-Control: public, max-age=14400
	Server: cloudflare
	CF-RAY: 53489e018ad8d885-CPH
	
	ba
	<html>
	<head><title>301 Moved Permanently</title></head>
	<body bgcolor="white">
	<center><h1>301 Moved Permanently</h1></center>
	<hr><center>cloudflare-lb</center>
	</body>
	</html>
	
13:47:55.130265 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 60: (tos 0x2,ECT(0), ttl 61, id 41449, offset 0, flags [DF], proto TCP (6), length 45)
    1.1.1.1.80 > 85.204.121.218.48924: Flags [P.], cksum 0x844c (correct), seq 556:561, ack 72, win 29, length 5: HTTP
13:47:55.130777 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 53211, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.48924 > 1.1.1.1.80: Flags [.], cksum 0xc69b (correct), seq 72, ack 556, win 501, length 0
13:47:55.131097 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 53212, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.48924 > 1.1.1.1.80: Flags [.], cksum 0xc696 (correct), seq 72, ack 561, win 501, length 0
13:47:55.131491 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 53213, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.48924 > 1.1.1.1.80: Flags [F.], cksum 0xc695 (correct), seq 72, ack 561, win 501, length 0
13:47:55.132804 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4 (0x0800), length 60: (tos 0x0, ttl 61, id 41450, offset 0, flags [DF], proto TCP (6), length 40)
    1.1.1.1.80 > 85.204.121.218.48924: Flags [F.], cksum 0xc86c (correct), seq 561, ack 73, win 29, length 0
13:47:55.133281 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4 (0x0800), length 54: (tos 0x0, ttl 63, id 53214, offset 0, flags [DF], proto TCP (6), length 40)
    85.204.121.218.48924 > 1.1.1.1.80: Flags [.], cksum 0xc694 (correct), seq 73, ack 562, win 501, length 0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-12 13:02     ` Toke Høiland-Jørgensen
@ 2019-11-12 13:54       ` Luca Muscariello
  2019-11-12 14:35         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 17+ messages in thread
From: Luca Muscariello @ 2019-11-12 13:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Mikael Abrahamsson, Rich Brown, ECN-Sane

[-- Attachment #1: Type: text/plain, Size: 13244 bytes --]

On Tue, Nov 12, 2019 at 2:02 PM Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> Mikael Abrahamsson <swmike@swm.pp.se> writes:
>
> > On Tue, 12 Nov 2019, Toke Høiland-Jørgensen wrote:
> >
> >> I'm not on the nanog list, but feel free to cross-post; would be good
> to
> >> actually get to the bottom of this issue! Marek and I already had an
> >> off-list back-and-forth after that original thread, and we couldn't
> find
> >> anything wrong on the Cloudflare side. And the RSTs have a higher TTL
> >> than the actual traffic, indicating an in-path problem...
> >
> > tcptraceroute supports setting/clearing ECN bits (-E), would be very
> > interesting to see difference between those tcptraceroutes?
>
> No difference. But the RST is not being sent as a response to the SYN;
> it is sent in response to the first data packet...
>
> ... and now that I'm re-testing, things were working for a little while,
> but now the bug is back. I got an intermittent successful connection
> with the same TTL that I was previously getting the RST from. And now
> I'm back to getting RSTed.
>
> So I guess there's some kind of multipath issue here; ECMP path,
> multiple routing upstreams, or a broken load balancer? Any other ideas?
>


It makes me think of some usage of anycast TCP on the cloudflare side.
What service is this Toke?




>
> -Toke
>
>
> tcpdump output:
>
> With ECN, and failing. Notice TTL 59 for the SYNACK, but TTL 61 for the
> RST:
>
>  00:00:00.000000 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 74: (tos 0x0, ttl 63, id 21817, offset 0, flags [DF],
> proto TCP (6), length 60)
>     85.204.121.218.33376 > 1.1.1.1.80: Flags [SEW], cksum 0x5284
> (correct), seq 1677914250, win 64240, options [mss 1460,sackOK,TS val
> 438384324 ecr 0,nop,wscale 7], length 0
>  00:00:00.006962 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 66: (tos 0x0, ttl 59, id 0, offset 0, flags [DF], proto
> TCP (6), length 52)
>     1.1.1.1.80 > 85.204.121.218.33376: Flags [S.E], cksum 0x4e79
> (correct), seq 1887212753, ack 1677914251, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 10], length 0
>  00:00:00.000614 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 21818, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.33376 > 1.1.1.1.80: Flags [.], cksum 0xffa8 (correct),
> seq 1, ack 1, win 502, length 0
>  00:00:00.000255 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 125: (tos 0x2,ECT(0), ttl 63, id 21819, offset 0, flags
> [DF], proto TCP (6), length 111)
>     85.204.121.218.33376 > 1.1.1.1.80: Flags [P.], cksum 0x05e5 (correct),
> seq 1:72, ack 1, win 502, length 71: HTTP, length: 71
>         GET / HTTP/1.1
>         Host: 1.1.1.1
>         User-Agent: curl/7.66.0
>         Accept: */*
>
>  00:00:00.001714 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 60: (tos 0x2,ECT(0), ttl 61, id 0, offset 0, flags [DF],
> proto TCP (6), length 40)
>     1.1.1.1.80 > 85.204.121.218.33376: Flags [R], cksum 0x5639 (correct),
> seq 1887212754, win 0, length 0
>
>
> Without ECN; succeeding, with TTL 59:
>
>  00:00:00.000000 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 74: (tos 0x0, ttl 63, id 54830, offset 0, flags [DF],
> proto TCP (6), length 60)
>     85.204.121.218.33362 > 1.1.1.1.80: Flags [S], cksum 0x5430 (correct),
> seq 922398600, win 64240, options [mss 1460,sackOK,TS val 438346737 ecr
> 0,nop,wscale 7], length 0
>  00:00:00.006895 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 66: (tos 0x0, ttl 59, id 0, offset 0, flags [DF], proto
> TCP (6), length 52)
>     1.1.1.1.80 > 85.204.121.218.33362: Flags [S.], cksum 0xbdf8 (correct),
> seq 1251654028, ack 922398601, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 10], length 0
>  00:00:00.000570 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 54831, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.33362 > 1.1.1.1.80: Flags [.], cksum 0x6ee8 (correct),
> seq 1, ack 1, win 502, length 0
>  00:00:00.000261 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 125: (tos 0x0, ttl 63, id 54832, offset 0, flags [DF],
> proto TCP (6), length 111)
>     85.204.121.218.33362 > 1.1.1.1.80: Flags [P.], cksum 0x7524 (correct),
> seq 1:72, ack 1, win 502, length 71: HTTP, length: 71
>         GET / HTTP/1.1
>         Host: 1.1.1.1
>         User-Agent: curl/7.66.0
>         Accept: */*
>
>  00:00:00.006955 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 60: (tos 0x0, ttl 59, id 46658, offset 0, flags [DF],
> proto TCP (6), length 40)
>     1.1.1.1.80 > 85.204.121.218.33362: Flags [.], cksum 0x707a (correct),
> seq 1, ack 72, win 29, length 0
>  00:00:00.004938 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 609: (tos 0x0, ttl 59, id 46659, offset 0, flags [DF],
> proto TCP (6), length 595)
>     1.1.1.1.80 > 85.204.121.218.33362: Flags [P.], cksum 0x13dc (correct),
> seq 1:556, ack 72, win 29, length 555: HTTP, length: 555
>         HTTP/1.1 301 Moved Permanently
>         Date: Fri, 20 Sep 2019 09:33:56 GMT
>         Content-Type: text/html
>         Transfer-Encoding: chunked
>         Connection: keep-alive
>         Location: https://1.1.1.1/
>         Served-In-Seconds: 0.000
>         CF-Cache-Status: HIT
>         Age: 3920
>         Expires: Fri, 20 Sep 2019 13:33:56 GMT
>         Cache-Control: public, max-age=14400
>         Server: cloudflare
>         CF-RAY: 5192ccfbeeefd47b-HAM
>
>         ba
>         <html>
>         <head><title>301 Moved Permanently</title></head>
>         <body bgcolor="white">
>         <center><h1>301 Moved Permanently</h1></center>
>         <hr><center>cloudflare-lb</center>
>         </body>
>         </html>
>
>  00:00:00.000002 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 60: (tos 0x0, ttl 59, id 46660, offset 0, flags [DF],
> proto TCP (6), length 45)
>     1.1.1.1.80 > 85.204.121.218.33362: Flags [P.], cksum 0x2a28 (correct),
> seq 556:561, ack 72, win 29, length 5: HTTP
>  00:00:00.000549 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 54833, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.33362 > 1.1.1.1.80: Flags [.], cksum 0x6c77 (correct),
> seq 72, ack 556, win 501, length 0
>  00:00:00.000266 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 54834, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.33362 > 1.1.1.1.80: Flags [.], cksum 0x6c72 (correct),
> seq 72, ack 561, win 501, length 0
>  00:00:00.000217 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 54835, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.33362 > 1.1.1.1.80: Flags [F.], cksum 0x6c71 (correct),
> seq 72, ack 561, win 501, length 0
>  00:00:00.007287 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 60: (tos 0x0, ttl 59, id 46661, offset 0, flags [DF],
> proto TCP (6), length 40)
>     1.1.1.1.80 > 85.204.121.218.33362: Flags [F.], cksum 0x6e48 (correct),
> seq 561, ack 73, win 29, length 0
>  00:00:00.000504 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 54836, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.33362 > 1.1.1.1.80: Flags [.], cksum 0x6c70 (correct),
> seq 73, ack 562, win 501, length 0
>  00:00:05.170886 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 74: (tos 0x0, ttl 64, id 11852, offset 0, flags [DF],
> proto TCP (6), length 60)
>
>
> And that one time it worked, with TTL 61:
>
> 13:47:54.908967 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 74: (tos 0x0, ttl 63, id 53207, offset 0, flags [DF],
> proto TCP (6), length 60)
>     85.204.121.218.48924 > 1.1.1.1.80: Flags [SEW], cksum 0xa5de
> (correct), seq 3526272449, win 64240, options [mss 1460,sackOK,TS val
> 513441489 ecr 0,nop,wscale 7], length 0
> 13:47:54.910220 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 66: (tos 0x0, ttl 61, id 0, offset 0, flags [DF], proto
> TCP (6), length 52)
>     1.1.1.1.80 > 85.204.121.218.48924: Flags [S.E], cksum 0x17dd
> (correct), seq 633452041, ack 3526272450, win 29200, options [mss
> 1460,nop,nop,sackOK,nop,wscale 10], length 0
> 13:47:54.910747 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 53208, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.48924 > 1.1.1.1.80: Flags [.], cksum 0xc90c (correct),
> seq 1, ack 1, win 502, length 0
> 13:47:54.910990 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 125: (tos 0x2,ECT(0), ttl 63, id 53209, offset 0, flags
> [DF], proto TCP (6), length 111)
>     85.204.121.218.48924 > 1.1.1.1.80: Flags [P.], cksum 0xcf48 (correct),
> seq 1:72, ack 1, win 502, length 71: HTTP, length: 71
>         GET / HTTP/1.1
>         Host: 1.1.1.1
>         User-Agent: curl/7.66.0
>         Accept: */*
>
> 13:47:55.119451 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 125: (tos 0x0, ttl 63, id 53210, offset 0, flags [DF],
> proto TCP (6), length 111)
>     85.204.121.218.48924 > 1.1.1.1.80: Flags [P.], cksum 0xcf48 (correct),
> seq 1:72, ack 1, win 502, length 71: HTTP, length: 71
>         GET / HTTP/1.1
>         Host: 1.1.1.1
>         User-Agent: curl/7.66.0
>         Accept: */*
>
> 13:47:55.120638 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 60: (tos 0x0, ttl 61, id 41447, offset 0, flags [DF],
> proto TCP (6), length 40)
>     1.1.1.1.80 > 85.204.121.218.48924: Flags [.], cksum 0xca9e (correct),
> seq 1, ack 72, win 29, length 0
> 13:47:55.130264 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 609: (tos 0x2,ECT(0), ttl 61, id 41448, offset 0, flags
> [DF], proto TCP (6), length 595)
>     1.1.1.1.80 > 85.204.121.218.48924: Flags [P.], cksum 0xde5e (correct),
> seq 1:556, ack 72, win 29, length 555: HTTP, length: 555
>         HTTP/1.1 301 Moved Permanently
>         Date: Tue, 12 Nov 2019 12:47:55 GMT
>         Content-Type: text/html
>         Transfer-Encoding: chunked
>         Connection: keep-alive
>         Location: https://1.1.1.1/
>         Served-In-Seconds: 0.000
>         CF-Cache-Status: HIT
>         Age: 2976
>         Expires: Tue, 12 Nov 2019 16:47:55 GMT
>         Cache-Control: public, max-age=14400
>         Server: cloudflare
>         CF-RAY: 53489e018ad8d885-CPH
>
>         ba
>         <html>
>         <head><title>301 Moved Permanently</title></head>
>         <body bgcolor="white">
>         <center><h1>301 Moved Permanently</h1></center>
>         <hr><center>cloudflare-lb</center>
>         </body>
>         </html>
>
> 13:47:55.130265 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 60: (tos 0x2,ECT(0), ttl 61, id 41449, offset 0, flags
> [DF], proto TCP (6), length 45)
>     1.1.1.1.80 > 85.204.121.218.48924: Flags [P.], cksum 0x844c (correct),
> seq 556:561, ack 72, win 29, length 5: HTTP
> 13:47:55.130777 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 53211, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.48924 > 1.1.1.1.80: Flags [.], cksum 0xc69b (correct),
> seq 72, ack 556, win 501, length 0
> 13:47:55.131097 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 53212, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.48924 > 1.1.1.1.80: Flags [.], cksum 0xc696 (correct),
> seq 72, ack 561, win 501, length 0
> 13:47:55.131491 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 53213, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.48924 > 1.1.1.1.80: Flags [F.], cksum 0xc695 (correct),
> seq 72, ack 561, win 501, length 0
> 13:47:55.132804 cc:1a:fa:e2:bb:20 > d8:58:d7:00:1d:2c, ethertype IPv4
> (0x0800), length 60: (tos 0x0, ttl 61, id 41450, offset 0, flags [DF],
> proto TCP (6), length 40)
>     1.1.1.1.80 > 85.204.121.218.48924: Flags [F.], cksum 0xc86c (correct),
> seq 561, ack 73, win 29, length 0
> 13:47:55.133281 d8:58:d7:00:1d:2c > cc:1a:fa:e2:bb:20, ethertype IPv4
> (0x0800), length 54: (tos 0x0, ttl 63, id 53214, offset 0, flags [DF],
> proto TCP (6), length 40)
>     85.204.121.218.48924 > 1.1.1.1.80: Flags [.], cksum 0xc694 (correct),
> seq 73, ack 562, win 501, length 0
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane
>

[-- Attachment #2: Type: text/html, Size: 15145 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-12 13:54       ` Luca Muscariello
@ 2019-11-12 14:35         ` Toke Høiland-Jørgensen
  2019-11-12 22:01           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-11-12 14:35 UTC (permalink / raw)
  To: Luca Muscariello; +Cc: Mikael Abrahamsson, Rich Brown, ECN-Sane

Luca Muscariello <muscariello@ieee.org> writes:

> On Tue, Nov 12, 2019 at 2:02 PM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
>> Mikael Abrahamsson <swmike@swm.pp.se> writes:
>>
>> > On Tue, 12 Nov 2019, Toke Høiland-Jørgensen wrote:
>> >
>> >> I'm not on the nanog list, but feel free to cross-post; would be good
>> to
>> >> actually get to the bottom of this issue! Marek and I already had an
>> >> off-list back-and-forth after that original thread, and we couldn't
>> find
>> >> anything wrong on the Cloudflare side. And the RSTs have a higher TTL
>> >> than the actual traffic, indicating an in-path problem...
>> >
>> > tcptraceroute supports setting/clearing ECN bits (-E), would be very
>> > interesting to see difference between those tcptraceroutes?
>>
>> No difference. But the RST is not being sent as a response to the SYN;
>> it is sent in response to the first data packet...
>>
>> ... and now that I'm re-testing, things were working for a little while,
>> but now the bug is back. I got an intermittent successful connection
>> with the same TTL that I was previously getting the RST from. And now
>> I'm back to getting RSTed.
>>
>> So I guess there's some kind of multipath issue here; ECMP path,
>> multiple routing upstreams, or a broken load balancer? Any other ideas?
>>
>
>
> It makes me think of some usage of anycast TCP on the cloudflare side.
> What service is this Toke?

Yeah, I did also think about anycast when I said "multiple routing
upstreams". For testing I've just been doing 'curl 1.1.1.1'. But
Cloudflare-hosted sites in general seem to have this problem; for
instance, 'curl -4 bufferbloat.net' also fails (but IPv6 is fine).

-Toke

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-12 14:35         ` Toke Høiland-Jørgensen
@ 2019-11-12 22:01           ` Toke Høiland-Jørgensen
  2019-11-13  0:04             ` Rodney W. Grimes
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-11-12 22:01 UTC (permalink / raw)
  To: Luca Muscariello; +Cc: Mikael Abrahamsson, Rich Brown, ECN-Sane

Toke Høiland-Jørgensen <toke@toke.dk> writes:

> Luca Muscariello <muscariello@ieee.org> writes:
>
>> On Tue, Nov 12, 2019 at 2:02 PM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>
>>> Mikael Abrahamsson <swmike@swm.pp.se> writes:
>>>
>>> > On Tue, 12 Nov 2019, Toke Høiland-Jørgensen wrote:
>>> >
>>> >> I'm not on the nanog list, but feel free to cross-post; would be good
>>> to
>>> >> actually get to the bottom of this issue! Marek and I already had an
>>> >> off-list back-and-forth after that original thread, and we couldn't
>>> find
>>> >> anything wrong on the Cloudflare side. And the RSTs have a higher TTL
>>> >> than the actual traffic, indicating an in-path problem...
>>> >
>>> > tcptraceroute supports setting/clearing ECN bits (-E), would be very
>>> > interesting to see difference between those tcptraceroutes?
>>>
>>> No difference. But the RST is not being sent as a response to the SYN;
>>> it is sent in response to the first data packet...
>>>
>>> ... and now that I'm re-testing, things were working for a little while,
>>> but now the bug is back. I got an intermittent successful connection
>>> with the same TTL that I was previously getting the RST from. And now
>>> I'm back to getting RSTed.
>>>
>>> So I guess there's some kind of multipath issue here; ECMP path,
>>> multiple routing upstreams, or a broken load balancer? Any other ideas?
>>>
>>
>>
>> It makes me think of some usage of anycast TCP on the cloudflare side.
>> What service is this Toke?
>
> Yeah, I did also think about anycast when I said "multiple routing
> upstreams". For testing I've just been doing 'curl 1.1.1.1'. But
> Cloudflare-hosted sites in general seem to have this problem; for
> instance, 'curl -4 bufferbloat.net' also fails (but IPv6 is fine).

Right, so I've played around with tcptraceroute a bit more, and looked
at some more packet dumps, and I think I'm starting to form a theory:

I get two different traceroutes; this was from running two traceroutes
right after one another:

$ sudo tcptraceroute 1.1.1.1
Selected device eth0, address 10.42.3.130, port 42177 for outgoing packets
Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
 1  10.42.3.1  0.318 ms  0.325 ms  0.321 ms
 2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.337 ms  5.390 ms  3.194 ms
 3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.319 ms  1.120 ms  1.256 ms
 4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.533 ms  1.612 ms  1.392 ms
 5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.787 ms  6.822 ms  6.721 ms
 6  149.6.142.130  7.000 ms  6.939 ms  6.948 ms
 7  one.one.one.one (1.1.1.1) [open]  6.957 ms  6.967 ms  6.893 ms
 
$ sudo tcptraceroute 1.1.1.1
Selected device eth0, address 10.42.3.130, port 38681 for outgoing packets
Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
 1  10.42.3.1  0.290 ms  0.287 ms  0.292 ms
 2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.857 ms  5.382 ms  18.654 ms
 3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.249 ms  1.121 ms  1.521 ms
 4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.375 ms  2.495 ms  1.440 ms
 5  dix.as13335.net (192.38.7.70)  2.093 ms  1.895 ms  1.790 ms
 6  one.one.one.one (1.1.1.1) [open]  1.783 ms  1.861 ms  1.817 ms


Notice how one is one hop longer than the other. So definitely something
to do with anycast; maybe ECMP over both paths since it's changing
pretty often?

Now, what I was seeing with the ECN errors was that the SYN-ACK would
have a different TTL than the first data packet. So what I'm thinking is
that maybe there's an ECMP hash that hashes on the wrong parts of the
TCP header, and so considers the SYN packet with the ECN bit set to be
part of a different flow than the subsequent packets. The result being
that the flow is split between two anycasted endpoints, causing the RST.

Does this sound completely out in the weeds? Has anyone else run into an
ECMP device that did something similar?

-Toke

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-12 22:01           ` Toke Høiland-Jørgensen
@ 2019-11-13  0:04             ` Rodney W. Grimes
  2019-11-13  8:05               ` Luca Muscariello
  2019-11-13 10:43               ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 17+ messages in thread
From: Rodney W. Grimes @ 2019-11-13  0:04 UTC (permalink / raw)
  To: Toke H?iland-J?rgensen; +Cc: Luca Muscariello, Rich Brown, ECN-Sane

> Toke H?iland-J?rgensen <toke@toke.dk> writes:
> 
> > Luca Muscariello <muscariello@ieee.org> writes:
> >
> >> On Tue, Nov 12, 2019 at 2:02 PM Toke H?iland-J?rgensen <toke@toke.dk> wrote:
> >>
> >>> Mikael Abrahamsson <swmike@swm.pp.se> writes:
> >>>
> >>> > On Tue, 12 Nov 2019, Toke H?iland-J?rgensen wrote:
> >>> >
> >>> >> I'm not on the nanog list, but feel free to cross-post; would be good
> >>> to
> >>> >> actually get to the bottom of this issue! Marek and I already had an
> >>> >> off-list back-and-forth after that original thread, and we couldn't
> >>> find
> >>> >> anything wrong on the Cloudflare side. And the RSTs have a higher TTL
> >>> >> than the actual traffic, indicating an in-path problem...
> >>> >
> >>> > tcptraceroute supports setting/clearing ECN bits (-E), would be very
> >>> > interesting to see difference between those tcptraceroutes?
> >>>
> >>> No difference. But the RST is not being sent as a response to the SYN;
> >>> it is sent in response to the first data packet...
> >>>
> >>> ... and now that I'm re-testing, things were working for a little while,
> >>> but now the bug is back. I got an intermittent successful connection
> >>> with the same TTL that I was previously getting the RST from. And now
> >>> I'm back to getting RSTed.
> >>>
> >>> So I guess there's some kind of multipath issue here; ECMP path,
> >>> multiple routing upstreams, or a broken load balancer? Any other ideas?
> >>>
> >>
> >>
> >> It makes me think of some usage of anycast TCP on the cloudflare side.
> >> What service is this Toke?
> >
> > Yeah, I did also think about anycast when I said "multiple routing
> > upstreams". For testing I've just been doing 'curl 1.1.1.1'. But
> > Cloudflare-hosted sites in general seem to have this problem; for
> > instance, 'curl -4 bufferbloat.net' also fails (but IPv6 is fine).
> 
> Right, so I've played around with tcptraceroute a bit more, and looked
> at some more packet dumps, and I think I'm starting to form a theory:
> 
> I get two different traceroutes; this was from running two traceroutes
> right after one another:
> 
> $ sudo tcptraceroute 1.1.1.1
> Selected device eth0, address 10.42.3.130, port 42177 for outgoing packets
> Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
>  1  10.42.3.1  0.318 ms  0.325 ms  0.321 ms
>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.337 ms  5.390 ms  3.194 ms
>  3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.319 ms  1.120 ms  1.256 ms
>  4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.533 ms  1.612 ms  1.392 ms
>  5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.787 ms  6.822 ms  6.721 ms
>  6  149.6.142.130  7.000 ms  6.939 ms  6.948 ms
>  7  one.one.one.one (1.1.1.1) [open]  6.957 ms  6.967 ms  6.893 ms
>  
> $ sudo tcptraceroute 1.1.1.1
> Selected device eth0, address 10.42.3.130, port 38681 for outgoing packets
> Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
>  1  10.42.3.1  0.290 ms  0.287 ms  0.292 ms
>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.857 ms  5.382 ms  18.654 ms
>  3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.249 ms  1.121 ms  1.521 ms
>  4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.375 ms  2.495 ms  1.440 ms
>  5  dix.as13335.net (192.38.7.70)  2.093 ms  1.895 ms  1.790 ms
>  6  one.one.one.one (1.1.1.1) [open]  1.783 ms  1.861 ms  1.817 ms
> 
> 
> Notice how one is one hop longer than the other.

Worse than just longer, it appears as if the exit hop from gigabit.dk
goes to 2 different providers (hop 4 above).  If these are packets towards
an anycast address that is going to cause exactly what you see.  ECMP
accross multiple AS's towards anycast is.. umm.. very fragile and your
seeing one of the problems with anycast.

It is very unlikely that he.net and cogentco.com end up at the same
1.1.1.1 box.

> So definitely something
> to do with anycast; maybe ECMP over both paths since it's changing
> pretty often?

And the multipath is set to round robin perhaps?

> Now, what I was seeing with the ECN errors was that the SYN-ACK would
> have a different TTL than the first data packet. So what I'm thinking is
> that maybe there's an ECMP hash that hashes on the wrong parts of the
> TCP header, and so considers the SYN packet with the ECN bit set to be
> part of a different flow than the subsequent packets. The result being
> that the flow is split between two anycasted endpoints, causing the RST.
> 
> Does this sound completely out in the weeds?
Nope, your spot on, other than this is a ECMP issue, not an ECN issue.
> Has anyone else run into an
> ECMP device that did something similar?

Yes.  When round robin path selection is in use.

> -Toke

-- 
Rod Grimes                                                 rgrimes@freebsd.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-13  0:04             ` Rodney W. Grimes
@ 2019-11-13  8:05               ` Luca Muscariello
  2019-11-13 10:45                 ` Toke Høiland-Jørgensen
  2019-11-13 10:43               ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 17+ messages in thread
From: Luca Muscariello @ 2019-11-13  8:05 UTC (permalink / raw)
  To: Rodney W. Grimes; +Cc: ECN-Sane, Rich Brown, Toke H?iland-J?rgensen

[-- Attachment #1: Type: text/plain, Size: 5788 bytes --]

TCP anycast fails in this case and I would not blame the load balancer for
that.
Some people will have a different opinion on that.

The current Internet just does not support well these use cases.

At the same time this DNS service is supposed to be used in a different
way. So we may even blame the user? Toke in this case ?

DNS anycast works as long as it uses UDP.
The IP address returned by the resolver should be unicast and TCP should
run over unicast addresses.

Toke,  Looks like you are doing an HTTP GET directly toward an anycast
address. This is where things are supposed to break and they break.

If you traceroute over unicast addresses you should see the load balancer
providing stickiness.



On Wed 13 Nov 2019 at 01:04, Rodney W. Grimes <4bone@gndrsh.dnsmgr.net>
wrote:

> > Toke H?iland-J?rgensen <toke@toke.dk> writes:
> >
> > > Luca Muscariello <muscariello@ieee.org> writes:
> > >
> > >> On Tue, Nov 12, 2019 at 2:02 PM Toke H?iland-J?rgensen <toke@toke.dk>
> wrote:
> > >>
> > >>> Mikael Abrahamsson <swmike@swm.pp.se> writes:
> > >>>
> > >>> > On Tue, 12 Nov 2019, Toke H?iland-J?rgensen wrote:
> > >>> >
> > >>> >> I'm not on the nanog list, but feel free to cross-post; would be
> good
> > >>> to
> > >>> >> actually get to the bottom of this issue! Marek and I already had
> an
> > >>> >> off-list back-and-forth after that original thread, and we
> couldn't
> > >>> find
> > >>> >> anything wrong on the Cloudflare side. And the RSTs have a higher
> TTL
> > >>> >> than the actual traffic, indicating an in-path problem...
> > >>> >
> > >>> > tcptraceroute supports setting/clearing ECN bits (-E), would be
> very
> > >>> > interesting to see difference between those tcptraceroutes?
> > >>>
> > >>> No difference. But the RST is not being sent as a response to the
> SYN;
> > >>> it is sent in response to the first data packet...
> > >>>
> > >>> ... and now that I'm re-testing, things were working for a little
> while,
> > >>> but now the bug is back. I got an intermittent successful connection
> > >>> with the same TTL that I was previously getting the RST from. And now
> > >>> I'm back to getting RSTed.
> > >>>
> > >>> So I guess there's some kind of multipath issue here; ECMP path,
> > >>> multiple routing upstreams, or a broken load balancer? Any other
> ideas?
> > >>>
> > >>
> > >>
> > >> It makes me think of some usage of anycast TCP on the cloudflare side.
> > >> What service is this Toke?
> > >
> > > Yeah, I did also think about anycast when I said "multiple routing
> > > upstreams". For testing I've just been doing 'curl 1.1.1.1'. But
> > > Cloudflare-hosted sites in general seem to have this problem; for
> > > instance, 'curl -4 bufferbloat.net' also fails (but IPv6 is fine).
> >
> > Right, so I've played around with tcptraceroute a bit more, and looked
> > at some more packet dumps, and I think I'm starting to form a theory:
> >
> > I get two different traceroutes; this was from running two traceroutes
> > right after one another:
> >
> > $ sudo tcptraceroute 1.1.1.1
> > Selected device eth0, address 10.42.3.130, port 42177 for outgoing
> packets
> > Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
> >  1  10.42.3.1  0.318 ms  0.325 ms  0.321 ms
> >  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.337 ms
> 5.390 ms  3.194 ms
> >  3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.319 ms
> 1.120 ms  1.256 ms
> >  4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.533 ms
> 1.612 ms  1.392 ms
> >  5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.787 ms
> 6.822 ms  6.721 ms
> >  6  149.6.142.130  7.000 ms  6.939 ms  6.948 ms
> >  7  one.one.one.one (1.1.1.1) [open]  6.957 ms  6.967 ms  6.893 ms
> >
> > $ sudo tcptraceroute 1.1.1.1
> > Selected device eth0, address 10.42.3.130, port 38681 for outgoing
> packets
> > Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
> >  1  10.42.3.1  0.290 ms  0.287 ms  0.292 ms
> >  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.857 ms
> 5.382 ms  18.654 ms
> >  3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.249 ms
> 1.121 ms  1.521 ms
> >  4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.375 ms  2.495 ms
> 1.440 ms
> >  5  dix.as13335.net (192.38.7.70)  2.093 ms  1.895 ms  1.790 ms
> >  6  one.one.one.one (1.1.1.1) [open]  1.783 ms  1.861 ms  1.817 ms
> >
> >
> > Notice how one is one hop longer than the other.
>
> Worse than just longer, it appears as if the exit hop from gigabit.dk
> goes to 2 different providers (hop 4 above).  If these are packets towards
> an anycast address that is going to cause exactly what you see.  ECMP
> accross multiple AS's towards anycast is.. umm.. very fragile and your
> seeing one of the problems with anycast.
>
> It is very unlikely that he.net and cogentco.com end up at the same
> 1.1.1.1 box.
>
> > So definitely something
> > to do with anycast; maybe ECMP over both paths since it's changing
> > pretty often?
>
> And the multipath is set to round robin perhaps?
>
> > Now, what I was seeing with the ECN errors was that the SYN-ACK would
> > have a different TTL than the first data packet. So what I'm thinking is
> > that maybe there's an ECMP hash that hashes on the wrong parts of the
> > TCP header, and so considers the SYN packet with the ECN bit set to be
> > part of a different flow than the subsequent packets. The result being
> > that the flow is split between two anycasted endpoints, causing the RST.
> >
> > Does this sound completely out in the weeds?
> Nope, your spot on, other than this is a ECMP issue, not an ECN issue.
> > Has anyone else run into an
> > ECMP device that did something similar?
>
> Yes.  When round robin path selection is in use.
>
> > -Toke
>
> --
> Rod Grimes
> rgrimes@freebsd.org
>

[-- Attachment #2: Type: text/html, Size: 8786 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-13  0:04             ` Rodney W. Grimes
  2019-11-13  8:05               ` Luca Muscariello
@ 2019-11-13 10:43               ` Toke Høiland-Jørgensen
  2019-11-13 15:25                 ` Rodney W. Grimes
  1 sibling, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-11-13 10:43 UTC (permalink / raw)
  To: Rodney W. Grimes; +Cc: Luca Muscariello, Rich Brown, ECN-Sane

"Rodney W. Grimes" <4bone@gndrsh.dnsmgr.net> writes:

>> Toke H?iland-J?rgensen <toke@toke.dk> writes:
>> 
>> > Luca Muscariello <muscariello@ieee.org> writes:
>> >
>> >> On Tue, Nov 12, 2019 at 2:02 PM Toke H?iland-J?rgensen <toke@toke.dk> wrote:
>> >>
>> >>> Mikael Abrahamsson <swmike@swm.pp.se> writes:
>> >>>
>> >>> > On Tue, 12 Nov 2019, Toke H?iland-J?rgensen wrote:
>> >>> >
>> >>> >> I'm not on the nanog list, but feel free to cross-post; would be good
>> >>> to
>> >>> >> actually get to the bottom of this issue! Marek and I already had an
>> >>> >> off-list back-and-forth after that original thread, and we couldn't
>> >>> find
>> >>> >> anything wrong on the Cloudflare side. And the RSTs have a higher TTL
>> >>> >> than the actual traffic, indicating an in-path problem...
>> >>> >
>> >>> > tcptraceroute supports setting/clearing ECN bits (-E), would be very
>> >>> > interesting to see difference between those tcptraceroutes?
>> >>>
>> >>> No difference. But the RST is not being sent as a response to the SYN;
>> >>> it is sent in response to the first data packet...
>> >>>
>> >>> ... and now that I'm re-testing, things were working for a little while,
>> >>> but now the bug is back. I got an intermittent successful connection
>> >>> with the same TTL that I was previously getting the RST from. And now
>> >>> I'm back to getting RSTed.
>> >>>
>> >>> So I guess there's some kind of multipath issue here; ECMP path,
>> >>> multiple routing upstreams, or a broken load balancer? Any other ideas?
>> >>>
>> >>
>> >>
>> >> It makes me think of some usage of anycast TCP on the cloudflare side.
>> >> What service is this Toke?
>> >
>> > Yeah, I did also think about anycast when I said "multiple routing
>> > upstreams". For testing I've just been doing 'curl 1.1.1.1'. But
>> > Cloudflare-hosted sites in general seem to have this problem; for
>> > instance, 'curl -4 bufferbloat.net' also fails (but IPv6 is fine).
>> 
>> Right, so I've played around with tcptraceroute a bit more, and looked
>> at some more packet dumps, and I think I'm starting to form a theory:
>> 
>> I get two different traceroutes; this was from running two traceroutes
>> right after one another:
>> 
>> $ sudo tcptraceroute 1.1.1.1
>> Selected device eth0, address 10.42.3.130, port 42177 for outgoing packets
>> Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
>>  1  10.42.3.1  0.318 ms  0.325 ms  0.321 ms
>>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.337 ms  5.390 ms  3.194 ms
>>  3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.319 ms  1.120 ms  1.256 ms
>>  4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.533 ms  1.612 ms  1.392 ms
>>  5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.787 ms  6.822 ms  6.721 ms
>>  6  149.6.142.130  7.000 ms  6.939 ms  6.948 ms
>>  7  one.one.one.one (1.1.1.1) [open]  6.957 ms  6.967 ms  6.893 ms
>>  
>> $ sudo tcptraceroute 1.1.1.1
>> Selected device eth0, address 10.42.3.130, port 38681 for outgoing packets
>> Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
>>  1  10.42.3.1  0.290 ms  0.287 ms  0.292 ms
>>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.857 ms  5.382 ms  18.654 ms
>>  3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.249 ms  1.121 ms  1.521 ms
>>  4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.375 ms  2.495 ms  1.440 ms
>>  5  dix.as13335.net (192.38.7.70)  2.093 ms  1.895 ms  1.790 ms
>>  6  one.one.one.one (1.1.1.1) [open]  1.783 ms  1.861 ms  1.817 ms
>> 
>> 
>> Notice how one is one hop longer than the other.
>
> Worse than just longer, it appears as if the exit hop from gigabit.dk
> goes to 2 different providers (hop 4 above).  If these are packets towards
> an anycast address that is going to cause exactly what you see.  ECMP
> accross multiple AS's towards anycast is.. umm.. very fragile and your
> seeing one of the problems with anycast.
>
> It is very unlikely that he.net and cogentco.com end up at the same
> 1.1.1.1 box.

Yeah, did notice it was two different upstreams :)

>> So definitely something
>> to do with anycast; maybe ECMP over both paths since it's changing
>> pretty often?
>
> And the multipath is set to round robin perhaps?

Not round-robin. That it was changing simply at random turns out to be
my mistake; by default tcptraceroute will pick a new source port each
time. If I fix the source port I get the same path each time, so it
looks like it's hashing on headers.

Going back to regular UDP-based trace route I finally found what looks
to be the smoking gun:

$ traceroute 1.1.1.1 -q 1 --sport=10000 -t 1
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
 1  _gateway (10.42.3.1)  0.304 ms
 2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  3.935 ms
 3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.005 ms
 4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.361 ms
 5  netnod-ix-cph-blue-9000.cloudflare.com (212.237.192.246)  1.250 ms
 6  one.one.one.one (1.1.1.1)  1.380 ms
 
$ traceroute 1.1.1.1 -q 1 --sport=10000 -t 2
traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
 1  _gateway (10.42.3.1)  0.236 ms
 2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  53.833 ms
 3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.195 ms
 4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.979 ms
 5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.851 ms
 6  149.6.142.130 (149.6.142.130)  13.081 ms
 7  one.one.one.one (1.1.1.1)  1.842 ms

-t is the TOS value; so those two happen to correspond to ECT(1) and
ECT(0); and as you can see they go two different paths. Which would be
consistent with the SYN going one way and the data packets going
another.

-Toke

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-13  8:05               ` Luca Muscariello
@ 2019-11-13 10:45                 ` Toke Høiland-Jørgensen
  2019-11-13 15:36                   ` Rodney W. Grimes
  0 siblings, 1 reply; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-11-13 10:45 UTC (permalink / raw)
  To: Luca Muscariello, Rodney W. Grimes; +Cc: ECN-Sane, Rich Brown

Luca Muscariello <muscariello@ieee.org> writes:

> TCP anycast fails in this case and I would not blame the load balancer for
> that.
> Some people will have a different opinion on that.
>
> The current Internet just does not support well these use cases.
>
> At the same time this DNS service is supposed to be used in a different
> way. So we may even blame the user? Toke in this case ?
>
> DNS anycast works as long as it uses UDP.
> The IP address returned by the resolver should be unicast and TCP should
> run over unicast addresses.
>
> Toke,  Looks like you are doing an HTTP GET directly toward an anycast
> address. This is where things are supposed to break and they break.

I was just using 1.1.1.1 as a convenient example because it's easy to
type. I get the same behaviour to an actual web site hosted on
Cloudflare (which is how I discovered it in the first place). Cloudflare
makes heavy use of anycast, including to its HTTP endpoints.

> If you traceroute over unicast addresses you should see the load
> balancer providing stickiness.

As I replied to Rod, the non-stickiness was indeed user error on my
part. The problem is that the load balancer is hashing on headers
including the ECN bits.

I guess I'll go reply to the NANOG thread... :)

-Toke

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-13 10:43               ` Toke Høiland-Jørgensen
@ 2019-11-13 15:25                 ` Rodney W. Grimes
  2019-11-13 15:35                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 17+ messages in thread
From: Rodney W. Grimes @ 2019-11-13 15:25 UTC (permalink / raw)
  To: Toke H?iland-J?rgensen
  Cc: Rodney W. Grimes, Luca Muscariello, Rich Brown, ECN-Sane

> "Rodney W. Grimes" <4bone@gndrsh.dnsmgr.net> writes:
> 
> >> Toke H?iland-J?rgensen <toke@toke.dk> writes:
> >> 
> >> > Luca Muscariello <muscariello@ieee.org> writes:
> >> >
> >> >> On Tue, Nov 12, 2019 at 2:02 PM Toke H?iland-J?rgensen <toke@toke.dk> wrote:
> >> >>
> >> >>> Mikael Abrahamsson <swmike@swm.pp.se> writes:
> >> >>>
> >> >>> > On Tue, 12 Nov 2019, Toke H?iland-J?rgensen wrote:
> >> >>> >
> >> >>> >> I'm not on the nanog list, but feel free to cross-post; would be good
> >> >>> to
> >> >>> >> actually get to the bottom of this issue! Marek and I already had an
> >> >>> >> off-list back-and-forth after that original thread, and we couldn't
> >> >>> find
> >> >>> >> anything wrong on the Cloudflare side. And the RSTs have a higher TTL
> >> >>> >> than the actual traffic, indicating an in-path problem...
> >> >>> >
> >> >>> > tcptraceroute supports setting/clearing ECN bits (-E), would be very
> >> >>> > interesting to see difference between those tcptraceroutes?
> >> >>>
> >> >>> No difference. But the RST is not being sent as a response to the SYN;
> >> >>> it is sent in response to the first data packet...
> >> >>>
> >> >>> ... and now that I'm re-testing, things were working for a little while,
> >> >>> but now the bug is back. I got an intermittent successful connection
> >> >>> with the same TTL that I was previously getting the RST from. And now
> >> >>> I'm back to getting RSTed.
> >> >>>
> >> >>> So I guess there's some kind of multipath issue here; ECMP path,
> >> >>> multiple routing upstreams, or a broken load balancer? Any other ideas?
> >> >>>
> >> >>
> >> >>
> >> >> It makes me think of some usage of anycast TCP on the cloudflare side.
> >> >> What service is this Toke?
> >> >
> >> > Yeah, I did also think about anycast when I said "multiple routing
> >> > upstreams". For testing I've just been doing 'curl 1.1.1.1'. But
> >> > Cloudflare-hosted sites in general seem to have this problem; for
> >> > instance, 'curl -4 bufferbloat.net' also fails (but IPv6 is fine).
> >> 
> >> Right, so I've played around with tcptraceroute a bit more, and looked
> >> at some more packet dumps, and I think I'm starting to form a theory:
> >> 
> >> I get two different traceroutes; this was from running two traceroutes
> >> right after one another:
> >> 
> >> $ sudo tcptraceroute 1.1.1.1
> >> Selected device eth0, address 10.42.3.130, port 42177 for outgoing packets
> >> Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
> >>  1  10.42.3.1  0.318 ms  0.325 ms  0.321 ms
> >>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.337 ms  5.390 ms  3.194 ms
> >>  3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.319 ms  1.120 ms  1.256 ms
> >>  4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.533 ms  1.612 ms  1.392 ms
> >>  5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.787 ms  6.822 ms  6.721 ms
> >>  6  149.6.142.130  7.000 ms  6.939 ms  6.948 ms
> >>  7  one.one.one.one (1.1.1.1) [open]  6.957 ms  6.967 ms  6.893 ms
> >>  
> >> $ sudo tcptraceroute 1.1.1.1
> >> Selected device eth0, address 10.42.3.130, port 38681 for outgoing packets
> >> Tracing the path to 1.1.1.1 on TCP port 80 (http), 30 hops max
> >>  1  10.42.3.1  0.290 ms  0.287 ms  0.292 ms
> >>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  1.857 ms  5.382 ms  18.654 ms
> >>  3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.249 ms  1.121 ms  1.521 ms
> >>  4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.375 ms  2.495 ms  1.440 ms
> >>  5  dix.as13335.net (192.38.7.70)  2.093 ms  1.895 ms  1.790 ms
> >>  6  one.one.one.one (1.1.1.1) [open]  1.783 ms  1.861 ms  1.817 ms
> >> 
> >> 
> >> Notice how one is one hop longer than the other.
> >
> > Worse than just longer, it appears as if the exit hop from gigabit.dk
> > goes to 2 different providers (hop 4 above).  If these are packets towards
> > an anycast address that is going to cause exactly what you see.  ECMP
> > accross multiple AS's towards anycast is.. umm.. very fragile and your
> > seeing one of the problems with anycast.
> >
> > It is very unlikely that he.net and cogentco.com end up at the same
> > 1.1.1.1 box.
> 
> Yeah, did notice it was two different upstreams :)
> 
> >> So definitely something
> >> to do with anycast; maybe ECMP over both paths since it's changing
> >> pretty often?
> >
> > And the multipath is set to round robin perhaps?
> 
> Not round-robin. That it was changing simply at random turns out to be
> my mistake; by default tcptraceroute will pick a new source port each
> time. If I fix the source port I get the same path each time, so it
> looks like it's hashing on headers.
> 
> Going back to regular UDP-based trace route I finally found what looks
> to be the smoking gun:
> 
> $ traceroute 1.1.1.1 -q 1 --sport=10000 -t 1
> traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
>  1  _gateway (10.42.3.1)  0.304 ms
>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  3.935 ms
>  3  customer-185-24-168-46.ip4.gigabit.dk (185.24.168.46)  1.005 ms
>  4  te0-1-1-5.rcr21.cph01.atlas.cogentco.com (149.6.137.49)  1.361 ms
>  5  netnod-ix-cph-blue-9000.cloudflare.com (212.237.192.246)  1.250 ms
>  6  one.one.one.one (1.1.1.1)  1.380 ms
>  
> $ traceroute 1.1.1.1 -q 1 --sport=10000 -t 2
> traceroute to 1.1.1.1 (1.1.1.1), 30 hops max, 60 byte packets
>  1  _gateway (10.42.3.1)  0.236 ms
>  2  albertslund-edge1-lo.net.gigabit.dk (185.24.171.254)  53.833 ms
>  3  customer-185-24-168-38.ip4.gigabit.dk (185.24.168.38)  1.195 ms
>  4  10ge1-2.core1.cph1.he.net (216.66.83.101)  1.979 ms
>  5  be2306.ccr42.ham01.atlas.cogentco.com (130.117.3.237)  6.851 ms
>  6  149.6.142.130 (149.6.142.130)  13.081 ms
>  7  one.one.one.one (1.1.1.1)  1.842 ms
> 
> -t is the TOS value; so those two happen to correspond to ECT(1) and
> ECT(0); and as you can see they go two different paths. Which would be
> consistent with the SYN going one way and the data packets going
> another.

Perhaps Old enough that maybe they are treating that as TOS byte?

Looks like you have nailed it though, someone has a broken hash.

> -Toke
-- 
Rod Grimes                                                 rgrimes@freebsd.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-13 15:25                 ` Rodney W. Grimes
@ 2019-11-13 15:35                   ` Toke Høiland-Jørgensen
  2019-11-13 15:36                     ` Luca Muscariello
  2019-11-13 15:42                     ` Rodney W. Grimes
  0 siblings, 2 replies; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-11-13 15:35 UTC (permalink / raw)
  To: Rodney W. Grimes; +Cc: Rodney W. Grimes, Luca Muscariello, Rich Brown, ECN-Sane

"Rodney W. Grimes" <4bone@gndrsh.dnsmgr.net> writes:

>> -t is the TOS value; so those two happen to correspond to ECT(1) and
>> ECT(0); and as you can see they go two different paths. Which would be
>> consistent with the SYN going one way and the data packets going
>> another.
>
> Perhaps Old enough that maybe they are treating that as TOS byte?
>
> Looks like you have nailed it though, someone has a broken hash.

Yup, seems like it. Posted a writeup to the NANOG list in response to
the guy asking; it hasn't showed up in the archive, though, so I guess
it's still in the moderation queue.

I think I'll write the whole thing up as a blog post as well, once it's
resolved. I'll see if I can get them to tell me which router make and
model is doing this.

Thanks everyone who helped with ideas etc! :)

-Toke

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-13 10:45                 ` Toke Høiland-Jørgensen
@ 2019-11-13 15:36                   ` Rodney W. Grimes
  0 siblings, 0 replies; 17+ messages in thread
From: Rodney W. Grimes @ 2019-11-13 15:36 UTC (permalink / raw)
  To: Toke H?iland-J?rgensen
  Cc: Luca Muscariello, Rodney W. Grimes, ECN-Sane, Rich Brown

> Luca Muscariello <muscariello@ieee.org> writes:
> 
> > TCP anycast fails in this case and I would not blame the load balancer for
> > that.
> > Some people will have a different opinion on that.
> >
> > The current Internet just does not support well these use cases.
> >
> > At the same time this DNS service is supposed to be used in a different
> > way. So we may even blame the user? Toke in this case ?
> >
> > DNS anycast works as long as it uses UDP.
> > The IP address returned by the resolver should be unicast and TCP should
> > run over unicast addresses.
> >
> > Toke,  Looks like you are doing an HTTP GET directly toward an anycast
> > address. This is where things are supposed to break and they break.
> 
> I was just using 1.1.1.1 as a convenient example because it's easy to
> type. I get the same behaviour to an actual web site hosted on
> Cloudflare (which is how I discovered it in the first place). Cloudflare
> makes heavy use of anycast, including to its HTTP endpoints.
> 
> > If you traceroute over unicast addresses you should see the load
> > balancer providing stickiness.
> 
> As I replied to Rod, the non-stickiness was indeed user error on my
> part. The problem is that the load balancer is hashing on headers
> including the ECN bits.
> 
> I guess I'll go reply to the NANOG thread... :)

While your over dealing with the Operators, could you get a few of
them to show up on tsvwg and say how bad an idea using ECT(1) as
a traffic classifier for admission to a L4S service is?

It is that group of people that has the greatest experience with
how you can not trust end nodes in how to treat traffic, especially
when that treatment MAY have some form of advantage, no matter how
trivial that advantage.
 
We need this group to be vocal, or L4S is going to end up doing
just that, and it is the NOG's that are gona get hurt.

> -Toke
-- 
Rod Grimes                                                 rgrimes@freebsd.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-13 15:35                   ` Toke Høiland-Jørgensen
@ 2019-11-13 15:36                     ` Luca Muscariello
  2019-11-13 15:42                     ` Rodney W. Grimes
  1 sibling, 0 replies; 17+ messages in thread
From: Luca Muscariello @ 2019-11-13 15:36 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Rodney W. Grimes, Rich Brown, ECN-Sane

[-- Attachment #1: Type: text/plain, Size: 1028 bytes --]

On Wed, Nov 13, 2019 at 4:35 PM Toke Høiland-Jørgensen <toke@toke.dk> wrote:

> "Rodney W. Grimes" <4bone@gndrsh.dnsmgr.net> writes:
>
> >> -t is the TOS value; so those two happen to correspond to ECT(1) and
> >> ECT(0); and as you can see they go two different paths. Which would be
> >> consistent with the SYN going one way and the data packets going
> >> another.
> >
> > Perhaps Old enough that maybe they are treating that as TOS byte?
> >
> > Looks like you have nailed it though, someone has a broken hash.
>
> Yup, seems like it. Posted a writeup to the NANOG list in response to
> the guy asking; it hasn't showed up in the archive, though, so I guess
> it's still in the moderation queue.
>
> I think I'll write the whole thing up as a blog post as well, once it's
> resolved. I'll see if I can get them to tell me which router make and
> model is doing this.
>
> Thanks everyone who helped with ideas etc! :)
>
>
great! You'll get a free subscription for a full year!




> -Toke
>

[-- Attachment #2: Type: text/html, Size: 1845 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-13 15:35                   ` Toke Høiland-Jørgensen
  2019-11-13 15:36                     ` Luca Muscariello
@ 2019-11-13 15:42                     ` Rodney W. Grimes
  2019-11-13 15:52                       ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 17+ messages in thread
From: Rodney W. Grimes @ 2019-11-13 15:42 UTC (permalink / raw)
  To: Toke H?iland-J?rgensen
  Cc: Rodney W. Grimes, Luca Muscariello, Rich Brown, ECN-Sane

> "Rodney W. Grimes" <4bone@gndrsh.dnsmgr.net> writes:
> 
> >> -t is the TOS value; so those two happen to correspond to ECT(1) and
> >> ECT(0); and as you can see they go two different paths. Which would be
> >> consistent with the SYN going one way and the data packets going
> >> another.
> >
> > Perhaps Old enough that maybe they are treating that as TOS byte?
> >
> > Looks like you have nailed it though, someone has a broken hash.
> 
> Yup, seems like it. Posted a writeup to the NANOG list in response to
> the guy asking; it hasn't showed up in the archive, though, so I guess
> it's still in the moderation queue.
> 
> I think I'll write the whole thing up as a blog post as well, once it's
> resolved. I'll see if I can get them to tell me which router make and
> model is doing this.

Yes, please do write it up some place.  It would probably be sane to
also start a list of "Things that have been found, (and fixed if true) the following
brokeness regarding ECN/RFC3168 conformance of systems."

Even without the make and model one can describe it as inproper hashing
in ECMP routing equipment at foo.

> Thanks everyone who helped with ideas etc! :)
> 
> -Toke
-- 
Rod Grimes                                                 rgrimes@freebsd.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Ecn-sane] Meanwhile, over on NANOG...
  2019-11-13 15:42                     ` Rodney W. Grimes
@ 2019-11-13 15:52                       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 17+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-11-13 15:52 UTC (permalink / raw)
  To: Rodney W. Grimes; +Cc: Rodney W. Grimes, Luca Muscariello, Rich Brown, ECN-Sane

"Rodney W. Grimes" <4bone@gndrsh.dnsmgr.net> writes:

>> "Rodney W. Grimes" <4bone@gndrsh.dnsmgr.net> writes:
>> 
>> >> -t is the TOS value; so those two happen to correspond to ECT(1) and
>> >> ECT(0); and as you can see they go two different paths. Which would be
>> >> consistent with the SYN going one way and the data packets going
>> >> another.
>> >
>> > Perhaps Old enough that maybe they are treating that as TOS byte?
>> >
>> > Looks like you have nailed it though, someone has a broken hash.
>> 
>> Yup, seems like it. Posted a writeup to the NANOG list in response to
>> the guy asking; it hasn't showed up in the archive, though, so I guess
>> it's still in the moderation queue.
>> 
>> I think I'll write the whole thing up as a blog post as well, once it's
>> resolved. I'll see if I can get them to tell me which router make and
>> model is doing this.
>
> Yes, please do write it up some place.  It would probably be sane to
> also start a list of "Things that have been found, (and fixed if true) the following
> brokeness regarding ECN/RFC3168 conformance of systems."
>
> Even without the make and model one can describe it as inproper hashing
> in ECMP routing equipment at foo.

Yup, that was exactly my thought (documenting brokenness) :)

-Toke

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-11-13 15:52 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 12:07 [Ecn-sane] Meanwhile, over on NANOG Rich Brown
2019-11-12 12:20 ` Toke Høiland-Jørgensen
2019-11-12 12:25   ` Mikael Abrahamsson
2019-11-12 13:02     ` Toke Høiland-Jørgensen
2019-11-12 13:54       ` Luca Muscariello
2019-11-12 14:35         ` Toke Høiland-Jørgensen
2019-11-12 22:01           ` Toke Høiland-Jørgensen
2019-11-13  0:04             ` Rodney W. Grimes
2019-11-13  8:05               ` Luca Muscariello
2019-11-13 10:45                 ` Toke Høiland-Jørgensen
2019-11-13 15:36                   ` Rodney W. Grimes
2019-11-13 10:43               ` Toke Høiland-Jørgensen
2019-11-13 15:25                 ` Rodney W. Grimes
2019-11-13 15:35                   ` Toke Høiland-Jørgensen
2019-11-13 15:36                     ` Luca Muscariello
2019-11-13 15:42                     ` Rodney W. Grimes
2019-11-13 15:52                       ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox