Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
From: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Pete Heist <pete@heistp.net>,
	"cake@lists.bufferbloat.net" <cake@lists.bufferbloat.net>
Subject: Re: [Cake] Using firewall connmarks as tin selectors
Date: Mon, 4 Mar 2019 11:55:01 +0000	[thread overview]
Message-ID: <52F1A35A-9F57-40EC-AF9B-EC6D8BD376BE@darbyshire-bryant.me.uk> (raw)
In-Reply-To: <87d0n6lwgw.fsf@toke.dk>



> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
> 
>>> On 4 Mar 2019, at 08:39, Pete Heist <pete@heistp.net> wrote:
>>> 
>>> 
>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>>> 
>>>> The very bad idea:
>>>> 
>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>>> implementation as described above. So an awful lot of our
>>>> shenanigans above is due to DSCP not traversing the internet
>>>> particularly well. The solution above abstracts DSCP into ’tins’
>>>> which we put into fwmarks. Another approach would be to put the DSCP
>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>>> local domain preserved.
>>> 
>>> If I understand it right, another use case for this “very bad idea”
>>> is preserving DSCP locally while traversing upstream WiFi links as
>>> besteffort, which avoids airtime efficiency problems that can occur
>>> with 802.11e (WMM). In cases where the router config can’t be changed
>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>>> it hides DSCP from the WiFi stack while preserving the values through
>>> the tunnel, but this would be easier. Neat… :)
>> 
>> Everyone has understood the intent & maybe the implementation
>> correctly. 2 patches attached, one for cake, one for tc.
>> 
>> They are naively coded and some of it undoes Toke’s recent tidying up
>> (sorry!)
> 
> Heh. First comment: Don't do that ;)

I did say naively coded.

> 
> A few more below.
> 
>> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
>> diff --git a/pkt_sched.h b/pkt_sched.h
>> index a2f570c..d1f288d 100644
>> --- a/pkt_sched.h
>> +++ b/pkt_sched.h
>> @@ -879,6 +879,7 @@ enum {
>> 	TCA_CAKE_ACK_FILTER,
>> 	TCA_CAKE_SPLIT_GSO,
>> 	TCA_CAKE_FWMARK,
>> +	TCA_CAKE_ICING,
>> 	__TCA_CAKE_MAX
>> };
>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>> diff --git a/sch_cake.c b/sch_cake.c
>> index 733b897..5aca0f3 100644
>> --- a/sch_cake.c
>> +++ b/sch_cake.c
>> @@ -270,7 +270,8 @@ enum {
>> 	CAKE_FLAG_INGRESS	   = BIT(2),
>> 	CAKE_FLAG_WASH		   = BIT(3),
>> 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
>> -	CAKE_FLAG_FWMARK	   = BIT(5)
>> +	CAKE_FLAG_FWMARK	   = BIT(5),
>> +	CAKE_FLAG_ICING		   = BIT(6)
> 
> This implies that icing and fwmark can be enabled completely
> independently of each other. Are you sure about the semantics for that?

No, I’m not.  I sent the patches so others could see my lunacy in action and hopefully improve it.

The actual operation links FWMARK, INGRESS & ICING in a variety of combinations.

>> };
>> 
>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>> };
>> 
>> static const u8 diffserv4[] = {
>> -	0, 2, 0, 0, 2, 0, 0, 0,
>> +	0, 1, 0, 0, 2, 0, 0, 0,
>> 	1, 0, 0, 0, 0, 0, 0, 0,
>> 	2, 0, 2, 0, 2, 0, 2, 0,
>> 	2, 0, 2, 0, 2, 0, 2, 0,
>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>> };
>> 
>> static const u8 diffserv3[] = {
>> -	0, 0, 0, 0, 2, 0, 0, 0,
>> +	0, 1, 0, 0, 2, 0, 0, 0,
> 
> Why are you messing with the diffserv mappings in this patch?

This is a combination patch of Dave’s new LE coding and the fwmark/dscp mangling.

> 
>> 	1, 0, 0, 0, 0, 0, 0, 0,
>> 	0, 0, 0, 0, 0, 0, 0, 0,
>> 	0, 0, 0, 0, 0, 0, 0, 0,
>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>> 	return idx + (tin << 16);
>> }
>> 
>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
>> +{
>> +	switch (skb->protocol) {
>> +	case htons(ETH_P_IP):
>> +		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>> +			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
>> +		break;
>> +	case htons(ETH_P_IPV6):
>> +		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>> +			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +}
> 
> So washing is just a special case of this (wash is
> cake_update_diffserv(skb,0)). So you shouldn't need to add another
> function, just augment the existing handling code.

Erm, that’s exactly what I’ve done.

> 
>> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>> {
>> 	u8 dscp;
>> 
>> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>> 	}
>> }
>> 
>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
> 
> Save an ifdef below by moving the ifdef inside the function definition.
> 
>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>> +{
>> +	enum ip_conntrack_info ctinfo;
>> +	struct nf_conn *ct;
>> +
>> +	ct = nf_ct_get(skb, &ctinfo);
>> +	if (!ct)
>> +		return;
>> +
>> +	ct->mark &= 0x80ffffff;
>> +	ct->mark |= (0x40 | dscp) << 24;
> 
> Right, so we *might* have an argument that putting the *tin* into the
> fwmark is CAKE's business, but copying over the dscp mark is not
> something a qdisc should be doing…

Why ever not?  It’s not the DSCP, it’s a lookup value into the cake priority table, it just happens to look like the DSCP ;-)


>> +	nf_conntrack_event_cache(IPCT_MARK, ct);
>> +}
>> +#endif
> 
> Also, are you sure this will work in all permutations of conntrack being
> a module vs not etc? (we had to jump through quite some hoops to get the
> conntrack hooks to work last time; this is probably my biggest worry here).

No, I have absolutely no clue here at all.

> 
>> static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>> 					     struct sk_buff *skb)
>> {
>> 	struct cake_sched_data *q = qdisc_priv(sch);
>> -	u32 tin;
>> +	bool wash;
>> 	u8 dscp;
>> +	u8 tin;
>> 
>> -	/* Tin selection: Default to diffserv-based selection, allow overriding
>> -	 * using firewall marks or skb->priority.
>> -	 */
>> -	dscp = cake_handle_diffserv(skb,
>> -				    q->rate_flags & CAKE_FLAG_WASH);
>> +	wash = !!(q->rate_flags & CAKE_FLAG_WASH);
>> +
>> +	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) {
>> 
>> -	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
>> 		tin = 0;
>> +		if (wash)
>> +			cake_update_diffserv(skb, 0);
>> 
>> -	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>> -		 skb->mark &&
>> -		 skb->mark <= q->tin_cnt)
>> -		tin = q->tin_order[skb->mark - 1];
>> +	} else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
>> +		   TC_H_MIN(skb->priority) > 0 &&
>> +		   TC_H_MIN(skb->priority) <= q->tin_cnt) {
>> 
>> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
>> -		 TC_H_MIN(skb->priority) > 0 &&
>> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>> 		tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
>> +		if (wash)
>> +			cake_update_diffserv(skb, 0);
>> +
>> +	} else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>> +		   skb->mark & 0x40000000) {
>> +
>> +		dscp = skb->mark >> 24 & 0x3f;
>> +		tin = q->tin_index[dscp];
>> 
>> -	else {
>> +		if (wash)
>> +			cake_update_diffserv(skb, 0);
>> +		else if (q->rate_flags & CAKE_FLAG_ICING)
>> +			cake_update_diffserv(skb, dscp << 2);
>> +
>> +	} else { /* fallback to DSCP */
>> +		/* extract the Diffserv Precedence field, if it exists */
>> +		/* and clear DSCP bits if washing */
>> +		dscp = cake_handle_diffserv(skb, wash);
>> 		tin = q->tin_index[dscp];
> 
> As I said above, no reason to revert the cleanup commit…
> 
>> 		if (unlikely(tin >= q->tin_cnt))
>> 			tin = 0;
>> +
>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
>> +		if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
>> +			cake_update_ct_mark(skb, dscp);
>> +#endif
> 
> See above about moving the ifdef and losing this one.

Done

> 
>> 	}
>> 
>> 	return &q->tins[tin];
>> @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
>> 			q->rate_flags &= ~CAKE_FLAG_FWMARK;
>> 	}
>> 
>> +	if (tb[TCA_CAKE_ICING]) {
>> +		if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
>> +			q->rate_flags |= CAKE_FLAG_ICING;
>> +		else
>> +			q->rate_flags &= ~CAKE_FLAG_ICING;
>> +	}
>> +
>> 	if (q->tins) {
>> 		sch_tree_lock(sch);
>> 		cake_reconfigure(sch);
>> @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
>> 			!!(q->rate_flags & CAKE_FLAG_FWMARK)))
>> 		goto nla_put_failure;
>> 
>> +	if (nla_put_u32(skb, TCA_CAKE_ICING,
>> +			!!(q->rate_flags & CAKE_FLAG_ICING)))
>> +		goto nla_put_failure;
>> +
>> 	return nla_nest_end(skb, opts);
>> 
>> nla_put_failure:
>> From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001
>> From: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
>> Date: Wed, 27 Feb 2019 14:46:05 +0000
>> Subject: [PATCH] cake: add fwmark & icing options
>> 
>> Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
>> ---
>> include/uapi/linux/pkt_sched.h |  2 ++
>> man/man8/tc-cake.8             | 19 ++++++++++++++++
>> tc/q_cake.c                    | 40 ++++++++++++++++++++++++++++++++++
>> 3 files changed, 61 insertions(+)
>> 
>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>> index 01f96352..f2b1b270 100644
>> --- a/include/uapi/linux/pkt_sched.h
>> +++ b/include/uapi/linux/pkt_sched.h
>> @@ -954,6 +954,8 @@ enum {
>> 	TCA_CAKE_INGRESS,
>> 	TCA_CAKE_ACK_FILTER,
>> 	TCA_CAKE_SPLIT_GSO,
>> +	TCA_CAKE_FWMARK,
>> +	TCA_CAKE_ICING,
>> 	__TCA_CAKE_MAX
>> };
>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>> diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
>> index eda436e1..626d4525 100644
>> --- a/man/man8/tc-cake.8
>> +++ b/man/man8/tc-cake.8
>> @@ -73,6 +73,12 @@ TIME |
>> ]
>> .br
>> [
>> +.BR fwmark
>> +|
>> +.BR nofwmark*
>> +]
>> +.br
>> +[
>> .BR split-gso*
>> |
>> .BR no-split-gso
>> @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be used as both source and
>> destination host.
>> 
>> 
>> +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS
>> +
>> +In addition to TC FILTER tin classification, firewall marks may also optionally
>> +be used.  The priority order (highest to lowest) for tin selection is TC filter,
>> +firewall mark and then DSCP.
>> +.PP
>> +.B fwmark
>> +
>> +.br
>> +	Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to the
>> +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4.
>> +Values outside the valid range are ignored and CAKE will fall back to using
>> +DSCP for tin selection.
> 
> This should document the masking and shifting.

Yes, and the ‘icing’ option however that turns out.  Again it’s a case of sharing what I have for others to test, play, improve.

> 
> 
>> 
>> .SH EXAMPLES
>> # tc qdisc delete root dev eth0
>> diff --git a/tc/q_cake.c b/tc/q_cake.c
>> index e827e3f1..fdafd3b7 100644
>> --- a/tc/q_cake.c
>> +++ b/tc/q_cake.c
>> @@ -79,6 +79,8 @@ static void explain(void)
>> "                  dual-srchost | dual-dsthost | triple-isolate* ]\n"
>> "                [ nat | nonat* ]\n"
>> "                [ wash | nowash* ]\n"
>> +"                [ icing | noicing* ]\n"
>> +"                [ fwmark | nofwmark* ]\n"
> 
> Much as I appreciate the wordplay, I'm not sure this is actually going
> to be super helpful for something just trying to make sense of the
> command output. Not sure I have a better suggestion, though...
> 
> -Toke


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


  reply	other threads:[~2019-03-04 11:55 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-27 21:12 Felix Resch
2019-02-28  3:24 ` gamanakis
2019-03-03 11:52   ` Kevin Darbyshire-Bryant
2019-03-03 12:22     ` John Sager
2019-03-03 16:25       ` Kevin Darbyshire-Bryant
2019-03-04 11:04         ` Toke Høiland-Jørgensen
2019-03-04 11:39           ` John Sager
2019-03-04  5:37     ` Ryan Mounce
2019-03-04  6:31       ` Jonathan Morton
2019-03-04  6:37         ` Ryan Mounce
2019-03-04  7:15           ` Dave Taht
2019-03-04  8:39     ` Pete Heist
2019-03-04 11:01       ` Kevin Darbyshire-Bryant
2019-03-04 11:17         ` Toke Høiland-Jørgensen
2019-03-04 11:55           ` Kevin Darbyshire-Bryant [this message]
2019-03-04 12:44             ` Toke Høiland-Jørgensen
2019-03-04 15:50               ` Kevin Darbyshire-Bryant
2019-03-04 16:39                 ` Toke Høiland-Jørgensen
2019-03-04 17:19                   ` Kevin Darbyshire-Bryant
2019-03-04 17:36                     ` Toke Høiland-Jørgensen
2019-03-04 20:58                       ` Kevin Darbyshire-Bryant
2019-03-04 21:33                         ` Toke Høiland-Jørgensen
2019-03-04 21:42                           ` Toke Høiland-Jørgensen
2019-03-05 14:06                           ` Kevin Darbyshire-Bryant
  -- strict thread matches above, loose matches on Subject: below --
2019-02-27 14:52 Kevin Darbyshire-Bryant
2019-02-27 15:14 ` Toke Høiland-Jørgensen
2019-02-28  8:32   ` Kevin Darbyshire-Bryant
2019-02-28  9:54     ` Toke Høiland-Jørgensen
2019-02-28 11:00       ` Kevin Darbyshire-Bryant
2019-02-28 11:13         ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/cake.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52F1A35A-9F57-40EC-AF9B-EC6D8BD376BE@darbyshire-bryant.me.uk \
    --to=kevin@darbyshire-bryant.me.uk \
    --cc=cake@lists.bufferbloat.net \
    --cc=pete@heistp.net \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox