From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <4bone@gndrsh.dnsmgr.net> Received: from gndrsh.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 973CE3B29D for ; Tue, 29 Dec 2020 08:34:01 -0500 (EST) Received: from gndrsh.dnsmgr.net (localhost [127.0.0.1]) by gndrsh.dnsmgr.net (8.13.3/8.13.3) with ESMTP id 0BTDXw0C077552; Tue, 29 Dec 2020 05:33:58 -0800 (PST) (envelope-from 4bone@gndrsh.dnsmgr.net) Received: (from 4bone@localhost) by gndrsh.dnsmgr.net (8.13.3/8.13.3/Submit) id 0BTDXwOf077551; Tue, 29 Dec 2020 05:33:58 -0800 (PST) (envelope-from 4bone) From: "Rodney W. Grimes" <4bone@gndrsh.dnsmgr.net> Message-Id: <202012291333.0BTDXwOf077551@gndrsh.dnsmgr.net> In-Reply-To: To: Dave Taht Date: Tue, 29 Dec 2020 05:33:58 -0800 (PST) CC: ECN-Sane X-Mailer: ELM [version 2.4ME+ PL121h (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Subject: Re: [Ecn-sane] Fwd: [PATCH net] ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst() X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 29 Dec 2020 13:34:01 -0000 Dave, Thanks for the heads up on this... Comments in line about what, IMHO, is very bad practice. Regards, Rod > ---------- Forwarded message --------- > From: Guillaume Nault > Date: Thu, Dec 24, 2020 at 11:05 AM > Subject: [PATCH net] ipv4: Ignore ECN bits for fib lookups in > fib_compute_spec_dst() > To: David Miller , Jakub Kicinski > Cc: > > > RT_TOS() only clears one of the ECN bits. Therefore, when > fib_compute_spec_dst() resorts to a fib lookup, it can return > different results depending on the value of the second ECN bit. > > For example, ECT(0) and ECT(1) packets could be treated differently. > > $ ip netns add ns0 > $ ip netns add ns1 > $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1 > $ ip -netns ns0 link set dev lo up > $ ip -netns ns1 link set dev lo up > $ ip -netns ns0 link set dev veth01 up > $ ip -netns ns1 link set dev veth10 up > > $ ip -netns ns0 address add 192.0.2.10/24 dev veth01 > $ ip -netns ns1 address add 192.0.2.11/24 dev veth10 > > $ ip -netns ns1 address add 192.0.2.21/32 dev lo > $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21 > $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0 > > With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21 > (ping uses -Q to set all TOS and ECN bits): > > $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255 > [...] > 64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms > > But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11 > because the "tos 4" route isn't matched: > > $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255 > [...] > 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms > > After this patch the ECN bits don't affect the result anymore: > > $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255 > [...] > 64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms > > Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper.") > Signed-off-by: Guillaume Nault > --- > net/ipv4/fib_frontend.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c > index cdf6ec5aa45d..84bb707bd88d 100644 > --- a/net/ipv4/fib_frontend.c > +++ b/net/ipv4/fib_frontend.c > @@ -292,7 +292,7 @@ __be32 fib_compute_spec_dst(struct sk_buff *skb) > .flowi4_iif = LOOPBACK_IFINDEX, > .flowi4_oif = l3mdev_master_ifindex_rcu(dev), > .daddr = ip_hdr(skb)->saddr, > - .flowi4_tos = RT_TOS(ip_hdr(skb)->tos), > + .flowi4_tos = ip_hdr(skb)->tos & IPTOS_RT_MASK, This is, IMHO, a bad way to fix this. It addresses the issue by NOT using what should be the standard method of accessing tos bits. And further if RT_TOS is every properly fixed leaves a dangling place that would need to be cleaned up later. I understand that RT_TOS is legacy and returns the whole byte, that should probably be corrected as I suspect there are other lurking cases of use of RT_TOS that is also still consuming the ECN bits. The above one line patch is the "fast and dirty" way to do it, and leads to longer term maintainance issues. I assert this Opinion from my experience in trying to clean up IN_ADDR macros in the BSD's, 90% of what needing patching was caused mostly by special cases that didnt use the macros, and instead did there own bit banging with | and &. A better fix might be to create some new macros, make the old macro emmit a compile time warning to weed out the bad use cases: a) Modify RT_TOS to assert a compile time warning to flag old usage. b) create RT_TOSBYTE, this returns the whole byte, used to replace RT_TOS when the whole byte is needed and dealt with properly. c) create RT_TOSTOSBITS, this returns ONLY the TOS bits, and probably should replace most of the current calls to RT_TOS with this, after inspecting the code to make sure it does the right thing. d) create RT_TOSECNBITS, this returns ONLY the ECN bits > .flowi4_scope = scope, > .flowi4_mark = vmark ? skb->mark : 0, > }; > -- > 2.21.3 > > > > -- > "For a successful technology, reality must take precedence over public > relations, for Mother Nature cannot be fooled" - Richard Feynman > > dave@taht.net CTO, TekLibre, LLC Tel: 1-831-435-0729 > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane > > -- Rod Grimes rgrimes@freebsd.org