From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-14-ewr.dyndns.com (mxout-098-ewr.mailhop.org [216.146.33.98]) by lists.bufferbloat.net (Postfix) with ESMTP id 0B46D2E0271 for ; Tue, 15 Mar 2011 10:33:08 -0700 (PDT) Received: from scan-12-ewr.mailhop.org (scan-12-ewr.local [10.0.141.230]) by mail-14-ewr.dyndns.com (Postfix) with ESMTP id 54A989CBFCE for ; Tue, 15 Mar 2011 17:33:07 +0000 (UTC) X-Spam-Score: 0.1 () X-Mail-Handler: MailHop by DynDNS X-Originating-IP: 75.145.127.229 Received: from gw.co.teklibre.org (75-145-127-229-Colorado.hfc.comcastbusiness.net [75.145.127.229]) by mail-14-ewr.dyndns.com (Postfix) with ESMTP id D25C99CBEE2 for ; Tue, 15 Mar 2011 17:33:03 +0000 (UTC) Received: from cruithne.co.teklibre.org (unknown [IPv6:2002:4b91:7fe5:1::20]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "cruithne.co.teklibre.org", Issuer "CA Cert Signing Authority" (verified OK)) by gw.co.teklibre.org (Postfix) with ESMTPS id 487D25EA31 for ; Tue, 15 Mar 2011 11:33:03 -0600 (MDT) Received: by cruithne.co.teklibre.org (Postfix, from userid 1000) id 7811D12085C; Tue, 15 Mar 2011 11:33:02 -0600 (MDT) From: d@taht.net (Dave =?utf-8?Q?T=C3=A4ht?=) To: bismark-devel Organization: Teklibre - http://www.teklibre.com References: <87wrk1a4gx.fsf@cruithne.co.teklibre.org> <5BC42741-852B-4699-BA5D-D70B8D610D96@gmail.com> <1300134277.2649.19.camel@edumazet-laptop> <1300164166.2649.70.camel@edumazet-laptop> Date: Tue, 15 Mar 2011 11:33:02 -0600 In-Reply-To: <1300164166.2649.70.camel@edumazet-laptop> (Eric Dumazet's message of "Tue, 15 Mar 2011 05:42:46 +0100") Message-ID: <877hc0i90x.fsf_-_@cruithne.co.teklibre.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: [Bismark-devel] ECN bug found, fixed X-BeenThere: bismark-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2011 17:33:09 -0000 We think we've found and are in the process of fixing a bug in ECN handling in Linux. I'm curious if you had ecn turned on at all in your initial testing and what qdisc you were using. Eric Dumazet writes: > Le lundi 14 mars 2011 =C3=A0 21:24 +0100, Eric Dumazet a =C3=A9crit : > > remove CC to bloat lists for now, adding David Miller to thread. > >> Le lundi 14 mars 2011 =C3=A0 21:55 +0200, Jonathan Morton a =C3=A9crit : >> > On 14 Mar, 2011, at 9:26 pm, Dave T=C3=A4ht wrote: >> >=20 >> > > Over the weekend, Dan Siemons uncovered a possible bad interaction >> > > between ECN and the default pfifo_fast qdisc in Linux. >> > >=20 >> > > http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/ >> >=20 >> > This seems to be more complicated that it appears. It looks as though >> > Linux has re-used the LSB of the old TOS field for some "link local" >> > flag which is used by routing. >> >=20 >> > It's not immediately obvious whether pfifo_fast is using this new >> > interpretation though. If it isn't, the fix should be to remove the >> > RTO_ONLINK bit from the mask it's using on the tos field. The other >> > half of the mask correctly excludes the ECN bits from the field. >> >=20 >>=20 >> CC netdev, where linux network dev can take a look. >>=20 >> I would say that this is a wrong analysis :=20 >>=20 >> 1) ECN uses two low order bits of TOS byte >>=20 >> 2) pfifo_fast uses skb->priority >>=20 >>=20 >> skb->priority =3D rt_tos2priority(iph->tos); >>=20 >> #define IPTOS_TOS_MASK 0x1E >> #define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK) >>=20 >> static inline char rt_tos2priority(u8 tos) >> { >> return ip_tos2prio[IPTOS_TOS(tos)>>1]; >> } >>=20 >> No interference between two mechanisms, unless sysadmin messed up things >> (skb_edit) >>=20 >>=20 > > David, it seems ip_tos2prio is wrong on its 2nd entry : > > #define TC_PRIO_BESTEFFORT 0 > #define TC_PRIO_FILLER 1 > #define TC_PRIO_BULK 2 > #define TC_PRIO_INTERACTIVE_BULK 4 > #define TC_PRIO_INTERACTIVE 6 > #define TC_PRIO_CONTROL 7 > > #define TC_PRIO_MAX 15 > > net/ipv4/route.c:170:#define ECN_OR_COST(class) TC_PRIO_##class > > const __u8 ip_tos2prio[16] =3D { > TC_PRIO_BESTEFFORT, /* 0 : for flow without ECN */ > ECN_OR_COST(FILLER), /* 1 : flow with ECN */ > ... > }; > > > > > This means ECN enabled flows got TC_PRIO_FILLER (what the hell is > that ?) > > pfifo_fast has : > > static const u8 prio2band[TC_PRIO_MAX+1] =3D > { 1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 }; > > So a non ECN enabled flow goes to band 1, while an ECN enabled one is in > band 2 (!). Thus, ECN enabled flows have a chance being droped more > often than non ECN flows. Thats not fair... > > What do you think ? > > Thanks > > diff --git a/net/ipv4/route.c b/net/ipv4/route.c > index 6ed6603..fabfe81 100644 > --- a/net/ipv4/route.c > +++ b/net/ipv4/route.c > @@ -171,7 +171,7 @@ static struct dst_ops ipv4_dst_ops =3D { >=20=20 > const __u8 ip_tos2prio[16] =3D { > TC_PRIO_BESTEFFORT, > - ECN_OR_COST(FILLER), > + ECN_OR_COST(BESTEFFORT), > TC_PRIO_BESTEFFORT, > ECN_OR_COST(BESTEFFORT), > TC_PRIO_BULK, > > --=20 Dave Taht http://nex-6.taht.net