From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f66.google.com (mail-ed1-f66.google.com [209.85.208.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id AC5033CB35 for ; Mon, 4 Mar 2019 06:17:53 -0500 (EST) Received: by mail-ed1-f66.google.com with SMTP id m35so3903983ede.10 for ; Mon, 04 Mar 2019 03:17:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=Foguq34Z8MWNtNO1D6h1ODTILJCPyqr4S0K7B/HRW00=; b=BAMTsrO+U6DlGgoWuRT4gNcoosjJrH0I5JtM7kFmyaSPCxlxWKghquXA8e0ypOcJzp mwMhaC9ANxxqBmgKASAd4F/Oi9IpnJJM/++00kjqmJH63TKLxr3OxlrzOnkHbg51V+HR jntXIXBIzTC6Ud5zCvo1t22Zug9d8cChYun8L1jh/j0xcjg0ecPeBHBW74gKcpI3uTa4 nI/QmZAioGOdMDUxhh136fzmkMGHLD4s2uiw+jf7fslg0svogC3yu+7H7M6y0KrABhS1 MLAK+zKSfVBu9tJtdJrvID4T+iY3hzVyoBpHoga0VvXGHItt93dMnvlwhIv7OuXzgOK0 MQ5Q== X-Gm-Message-State: APjAAAU4ki3djl3IrOQHwVuANoZX0JwRHUXzG0LmL7ZG0TkJuv1hvGVF sj8+AUUc0pA9iNPAVHejsegP02KFS+s= X-Google-Smtp-Source: APXvYqwpkACHtTJr7ipvs4TBhuwP3w/pGtyN6GjQ7QCurMvHvoXIVbK3OUxKKHHNm+g6GMEtY/o9qg== X-Received: by 2002:a50:a5bc:: with SMTP id a57mr15253272edc.10.1551698272668; Mon, 04 Mar 2019 03:17:52 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk (borgediget.toke.dk. [85.204.121.218]) by smtp.gmail.com with ESMTPSA id y24sm1965350eds.35.2019.03.04.03.17.51 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 04 Mar 2019 03:17:51 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 7218E183BB9; Mon, 4 Mar 2019 12:17:51 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Kevin Darbyshire-Bryant , Pete Heist Cc: "cake\@lists.bufferbloat.net" In-Reply-To: References: <1443315187.406067.1551301968084@webmail.strato.de> <000901d4cf15$27748ec0$765dac40$@gmail.com> <67E78212-F68E-4BD1-946D-F1830EEA2968@darbyshire-bryant.me.uk> <4D01FA8A-E732-49C8-A4E6-8EA453CFA80C@heistp.net> X-Clacks-Overhead: GNU Terry Pratchett Date: Mon, 04 Mar 2019 12:17:51 +0100 Message-ID: <87d0n6lwgw.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] Using firewall connmarks as tin selectors X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Mar 2019 11:17:54 -0000 Kevin Darbyshire-Bryant writes: >> On 4 Mar 2019, at 08:39, Pete Heist wrote: >>=20 >>=20 >>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant wrote: >>>=20 >>> The very bad idea: >>>=20 >>> And it=E2=80=99s bad =E2=80=98cos it=E2=80=99s sort of incompatible wit= h the existing fwmark >>> implementation as described above. So an awful lot of our >>> shenanigans above is due to DSCP not traversing the internet >>> particularly well. The solution above abstracts DSCP into =E2=80=99tins= =E2=80=99 >>> which we put into fwmarks. Another approach would be to put the DSCP >>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained >>> DSCP into the diffserv field onto the actual packets. Voila DSCP >>> traversal across =E2=80=99tinternet with tin/bandwidth allocation in our >>> local domain preserved. >>=20 >> If I understand it right, another use case for this =E2=80=9Cvery bad id= ea=E2=80=9D >> is preserving DSCP locally while traversing upstream WiFi links as >> besteffort, which avoids airtime efficiency problems that can occur >> with 802.11e (WMM). In cases where the router config can=E2=80=99t be ch= anged >> (802.11e is mandatory after all) I=E2=80=99ve used IPIP tunnels for this= , as >> it hides DSCP from the WiFi stack while preserving the values through >> the tunnel, but this would be easier. Neat=E2=80=A6 :) > > Everyone has understood the intent & maybe the implementation > correctly. 2 patches attached, one for cake, one for tc. > > They are naively coded and some of it undoes Toke=E2=80=99s recent tidyin= g up > (sorry!) Heh. First comment: Don't do that ;) A few more below. > 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A > diff --git a/pkt_sched.h b/pkt_sched.h > index a2f570c..d1f288d 100644 > --- a/pkt_sched.h > +++ b/pkt_sched.h > @@ -879,6 +879,7 @@ enum { > TCA_CAKE_ACK_FILTER, > TCA_CAKE_SPLIT_GSO, > TCA_CAKE_FWMARK, > + TCA_CAKE_ICING, > __TCA_CAKE_MAX > }; > #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1) > diff --git a/sch_cake.c b/sch_cake.c > index 733b897..5aca0f3 100644 > --- a/sch_cake.c > +++ b/sch_cake.c > @@ -270,7 +270,8 @@ enum { > CAKE_FLAG_INGRESS =3D BIT(2), > CAKE_FLAG_WASH =3D BIT(3), > CAKE_FLAG_SPLIT_GSO =3D BIT(4), > - CAKE_FLAG_FWMARK =3D BIT(5) > + CAKE_FLAG_FWMARK =3D BIT(5), > + CAKE_FLAG_ICING =3D BIT(6) This implies that icing and fwmark can be enabled completely independently of each other. Are you sure about the semantics for that? > }; >=20=20 > /* COBALT operates the Codel and BLUE algorithms in parallel, in order to > @@ -333,7 +334,7 @@ static const u8 diffserv8[] =3D { > }; >=20=20 > static const u8 diffserv4[] =3D { > - 0, 2, 0, 0, 2, 0, 0, 0, > + 0, 1, 0, 0, 2, 0, 0, 0, > 1, 0, 0, 0, 0, 0, 0, 0, > 2, 0, 2, 0, 2, 0, 2, 0, > 2, 0, 2, 0, 2, 0, 2, 0, > @@ -344,7 +345,7 @@ static const u8 diffserv4[] =3D { > }; >=20=20 > static const u8 diffserv3[] =3D { > - 0, 0, 0, 0, 2, 0, 0, 0, > + 0, 1, 0, 0, 2, 0, 0, 0, Why are you messing with the diffserv mappings in this patch? > 1, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, > @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, s= truct sk_buff **to_free) > return idx + (tin << 16); > } >=20=20 > -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash) > +void cake_update_diffserv(struct sk_buff *skb, u8 dscp) > +{ > + switch (skb->protocol) { > + case htons(ETH_P_IP): > + if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) !=3D dscp) > + ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp); > + break; > + case htons(ETH_P_IPV6): > + if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) !=3D dscp) > + ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp); > + break; > + default: > + break; > + } > + > +} So washing is just a special case of this (wash is cake_update_diffserv(skb,0)). So you shouldn't need to add another function, just augment the existing handling code. > +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash) > { > u8 dscp; >=20=20 > @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *sk= b, u16 wash) > } > } >=20=20 > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) Save an ifdef below by moving the ifdef inside the function definition. > +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp) > +{ > + enum ip_conntrack_info ctinfo; > + struct nf_conn *ct; > + > + ct =3D nf_ct_get(skb, &ctinfo); > + if (!ct) > + return; > + > + ct->mark &=3D 0x80ffffff; > + ct->mark |=3D (0x40 | dscp) << 24; Right, so we *might* have an argument that putting the *tin* into the fwmark is CAKE's business, but copying over the dscp mark is not something a qdisc should be doing... > + nf_conntrack_event_cache(IPCT_MARK, ct); > +} > +#endif Also, are you sure this will work in all permutations of conntrack being a module vs not etc? (we had to jump through quite some hoops to get the conntrack hooks to work last time; this is probably my biggest worry here). > static struct cake_tin_data *cake_select_tin(struct Qdisc *sch, > struct sk_buff *skb) > { > struct cake_sched_data *q =3D qdisc_priv(sch); > - u32 tin; > + bool wash; > u8 dscp; > + u8 tin; >=20=20 > - /* Tin selection: Default to diffserv-based selection, allow overriding > - * using firewall marks or skb->priority. > - */ > - dscp =3D cake_handle_diffserv(skb, > - q->rate_flags & CAKE_FLAG_WASH); > + wash =3D !!(q->rate_flags & CAKE_FLAG_WASH); > + > + if (q->tin_mode =3D=3D CAKE_DIFFSERV_BESTEFFORT) { >=20=20 > - if (q->tin_mode =3D=3D CAKE_DIFFSERV_BESTEFFORT) > tin =3D 0; > + if (wash) > + cake_update_diffserv(skb, 0); >=20=20 > - else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ > - skb->mark && > - skb->mark <=3D q->tin_cnt) > - tin =3D q->tin_order[skb->mark - 1]; > + } else if (TC_H_MAJ(skb->priority) =3D=3D sch->handle && /* use priorit= y */ > + TC_H_MIN(skb->priority) > 0 && > + TC_H_MIN(skb->priority) <=3D q->tin_cnt) { >=20=20 > - else if (TC_H_MAJ(skb->priority) =3D=3D sch->handle && > - TC_H_MIN(skb->priority) > 0 && > - TC_H_MIN(skb->priority) <=3D q->tin_cnt) > tin =3D q->tin_order[TC_H_MIN(skb->priority) - 1]; > + if (wash) > + cake_update_diffserv(skb, 0); > + > + } else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */ > + skb->mark & 0x40000000) { > + > + dscp =3D skb->mark >> 24 & 0x3f; > + tin =3D q->tin_index[dscp]; >=20=20 > - else { > + if (wash) > + cake_update_diffserv(skb, 0); > + else if (q->rate_flags & CAKE_FLAG_ICING) > + cake_update_diffserv(skb, dscp << 2); > + > + } else { /* fallback to DSCP */ > + /* extract the Diffserv Precedence field, if it exists */ > + /* and clear DSCP bits if washing */ > + dscp =3D cake_handle_diffserv(skb, wash); > tin =3D q->tin_index[dscp]; As I said above, no reason to revert the cleanup commit... > if (unlikely(tin >=3D q->tin_cnt)) > tin =3D 0; > + > +#if IS_REACHABLE(CONFIG_NF_CONNTRACK) > + if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_IN= GRESS)) > + cake_update_ct_mark(skb, dscp); > +#endif See above about moving the ifdef and losing this one. > } >=20=20 > return &q->tins[tin]; > @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct n= lattr *opt, > q->rate_flags &=3D ~CAKE_FLAG_FWMARK; > } >=20=20 > + if (tb[TCA_CAKE_ICING]) { > + if (!!nla_get_u32(tb[TCA_CAKE_ICING])) > + q->rate_flags |=3D CAKE_FLAG_ICING; > + else > + q->rate_flags &=3D ~CAKE_FLAG_ICING; > + } > + > if (q->tins) { > sch_tree_lock(sch); > cake_reconfigure(sch); > @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_= buff *skb) > !!(q->rate_flags & CAKE_FLAG_FWMARK))) > goto nla_put_failure; >=20=20 > + if (nla_put_u32(skb, TCA_CAKE_ICING, > + !!(q->rate_flags & CAKE_FLAG_ICING))) > + goto nla_put_failure; > + > return nla_nest_end(skb, opts); >=20=20 > nla_put_failure: > From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001 > From: Kevin Darbyshire-Bryant > Date: Wed, 27 Feb 2019 14:46:05 +0000 > Subject: [PATCH] cake: add fwmark & icing options > > Signed-off-by: Kevin Darbyshire-Bryant > --- > include/uapi/linux/pkt_sched.h | 2 ++ > man/man8/tc-cake.8 | 19 ++++++++++++++++ > tc/q_cake.c | 40 ++++++++++++++++++++++++++++++++++ > 3 files changed, 61 insertions(+) > > diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sche= d.h > index 01f96352..f2b1b270 100644 > --- a/include/uapi/linux/pkt_sched.h > +++ b/include/uapi/linux/pkt_sched.h > @@ -954,6 +954,8 @@ enum { > TCA_CAKE_INGRESS, > TCA_CAKE_ACK_FILTER, > TCA_CAKE_SPLIT_GSO, > + TCA_CAKE_FWMARK, > + TCA_CAKE_ICING, > __TCA_CAKE_MAX > }; > #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1) > diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8 > index eda436e1..626d4525 100644 > --- a/man/man8/tc-cake.8 > +++ b/man/man8/tc-cake.8 > @@ -73,6 +73,12 @@ TIME | > ] > .br > [ > +.BR fwmark > +| > +.BR nofwmark* > +] > +.br > +[ > .BR split-gso* > | > .BR no-split-gso > @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it wil= l be used as both source and > destination host. >=20=20 >=20=20 > +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS > + > +In addition to TC FILTER tin classification, firewall marks may also opt= ionally > +be used. The priority order (highest to lowest) for tin selection is TC= filter, > +firewall mark and then DSCP. > +.PP > +.B fwmark > + > +.br > + Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to t= he > +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4. > +Values outside the valid range are ignored and CAKE will fall back to us= ing > +DSCP for tin selection. This should document the masking and shifting. >=20=20 > .SH EXAMPLES > # tc qdisc delete root dev eth0 > diff --git a/tc/q_cake.c b/tc/q_cake.c > index e827e3f1..fdafd3b7 100644 > --- a/tc/q_cake.c > +++ b/tc/q_cake.c > @@ -79,6 +79,8 @@ static void explain(void) > " dual-srchost | dual-dsthost | triple-isolate* ]\n" > " [ nat | nonat* ]\n" > " [ wash | nowash* ]\n" > +" [ icing | noicing* ]\n" > +" [ fwmark | nofwmark* ]\n" Much as I appreciate the wordplay, I'm not sure this is actually going to be super helpful for something just trying to make sense of the command output. Not sure I have a better suggestion, though... -Toke