[Cake] Using firewall connmarks as tin selectors

Kevin Darbyshire-Bryant kevin at darbyshire-bryant.me.uk
Mon Mar 4 06:55:01 EST 2019



> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen <toke at redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:
> 
>>> On 4 Mar 2019, at 08:39, Pete Heist <pete at heistp.net> wrote:
>>> 
>>> 
>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> wrote:
>>>> 
>>>> The very bad idea:
>>>> 
>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>>> implementation as described above. So an awful lot of our
>>>> shenanigans above is due to DSCP not traversing the internet
>>>> particularly well. The solution above abstracts DSCP into ’tins’
>>>> which we put into fwmarks. Another approach would be to put the DSCP
>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>>> local domain preserved.
>>> 
>>> If I understand it right, another use case for this “very bad idea”
>>> is preserving DSCP locally while traversing upstream WiFi links as
>>> besteffort, which avoids airtime efficiency problems that can occur
>>> with 802.11e (WMM). In cases where the router config can’t be changed
>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>>> it hides DSCP from the WiFi stack while preserving the values through
>>> the tunnel, but this would be easier. Neat… :)
>> 
>> Everyone has understood the intent & maybe the implementation
>> correctly. 2 patches attached, one for cake, one for tc.
>> 
>> They are naively coded and some of it undoes Toke’s recent tidying up
>> (sorry!)
> 
> Heh. First comment: Don't do that ;)

I did say naively coded.

> 
> A few more below.
> 
>> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
>> diff --git a/pkt_sched.h b/pkt_sched.h
>> index a2f570c..d1f288d 100644
>> --- a/pkt_sched.h
>> +++ b/pkt_sched.h
>> @@ -879,6 +879,7 @@ enum {
>> 	TCA_CAKE_ACK_FILTER,
>> 	TCA_CAKE_SPLIT_GSO,
>> 	TCA_CAKE_FWMARK,
>> +	TCA_CAKE_ICING,
>> 	__TCA_CAKE_MAX
>> };
>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>> diff --git a/sch_cake.c b/sch_cake.c
>> index 733b897..5aca0f3 100644
>> --- a/sch_cake.c
>> +++ b/sch_cake.c
>> @@ -270,7 +270,8 @@ enum {
>> 	CAKE_FLAG_INGRESS	   = BIT(2),
>> 	CAKE_FLAG_WASH		   = BIT(3),
>> 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
>> -	CAKE_FLAG_FWMARK	   = BIT(5)
>> +	CAKE_FLAG_FWMARK	   = BIT(5),
>> +	CAKE_FLAG_ICING		   = BIT(6)
> 
> This implies that icing and fwmark can be enabled completely
> independently of each other. Are you sure about the semantics for that?

No, I’m not.  I sent the patches so others could see my lunacy in action and hopefully improve it.

The actual operation links FWMARK, INGRESS & ICING in a variety of combinations.

>> };
>> 
>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>> };
>> 
>> static const u8 diffserv4[] = {
>> -	0, 2, 0, 0, 2, 0, 0, 0,
>> +	0, 1, 0, 0, 2, 0, 0, 0,
>> 	1, 0, 0, 0, 0, 0, 0, 0,
>> 	2, 0, 2, 0, 2, 0, 2, 0,
>> 	2, 0, 2, 0, 2, 0, 2, 0,
>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>> };
>> 
>> static const u8 diffserv3[] = {
>> -	0, 0, 0, 0, 2, 0, 0, 0,
>> +	0, 1, 0, 0, 2, 0, 0, 0,
> 
> Why are you messing with the diffserv mappings in this patch?

This is a combination patch of Dave’s new LE coding and the fwmark/dscp mangling.

> 
>> 	1, 0, 0, 0, 0, 0, 0, 0,
>> 	0, 0, 0, 0, 0, 0, 0, 0,
>> 	0, 0, 0, 0, 0, 0, 0, 0,
>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>> 	return idx + (tin << 16);
>> }
>> 
>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
>> +{
>> +	switch (skb->protocol) {
>> +	case htons(ETH_P_IP):
>> +		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>> +			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
>> +		break;
>> +	case htons(ETH_P_IPV6):
>> +		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>> +			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +}
> 
> So washing is just a special case of this (wash is
> cake_update_diffserv(skb,0)). So you shouldn't need to add another
> function, just augment the existing handling code.

Erm, that’s exactly what I’ve done.

> 
>> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>> {
>> 	u8 dscp;
>> 
>> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>> 	}
>> }
>> 
>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
> 
> Save an ifdef below by moving the ifdef inside the function definition.
> 
>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>> +{
>> +	enum ip_conntrack_info ctinfo;
>> +	struct nf_conn *ct;
>> +
>> +	ct = nf_ct_get(skb, &ctinfo);
>> +	if (!ct)
>> +		return;
>> +
>> +	ct->mark &= 0x80ffffff;
>> +	ct->mark |= (0x40 | dscp) << 24;
> 
> Right, so we *might* have an argument that putting the *tin* into the
> fwmark is CAKE's business, but copying over the dscp mark is not
> something a qdisc should be doing…

Why ever not?  It’s not the DSCP, it’s a lookup value into the cake priority table, it just happens to look like the DSCP ;-)


>> +	nf_conntrack_event_cache(IPCT_MARK, ct);
>> +}
>> +#endif
> 
> Also, are you sure this will work in all permutations of conntrack being
> a module vs not etc? (we had to jump through quite some hoops to get the
> conntrack hooks to work last time; this is probably my biggest worry here).

No, I have absolutely no clue here at all.

> 
>> static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>> 					     struct sk_buff *skb)
>> {
>> 	struct cake_sched_data *q = qdisc_priv(sch);
>> -	u32 tin;
>> +	bool wash;
>> 	u8 dscp;
>> +	u8 tin;
>> 
>> -	/* Tin selection: Default to diffserv-based selection, allow overriding
>> -	 * using firewall marks or skb->priority.
>> -	 */
>> -	dscp = cake_handle_diffserv(skb,
>> -				    q->rate_flags & CAKE_FLAG_WASH);
>> +	wash = !!(q->rate_flags & CAKE_FLAG_WASH);
>> +
>> +	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) {
>> 
>> -	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
>> 		tin = 0;
>> +		if (wash)
>> +			cake_update_diffserv(skb, 0);
>> 
>> -	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>> -		 skb->mark &&
>> -		 skb->mark <= q->tin_cnt)
>> -		tin = q->tin_order[skb->mark - 1];
>> +	} else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
>> +		   TC_H_MIN(skb->priority) > 0 &&
>> +		   TC_H_MIN(skb->priority) <= q->tin_cnt) {
>> 
>> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
>> -		 TC_H_MIN(skb->priority) > 0 &&
>> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>> 		tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
>> +		if (wash)
>> +			cake_update_diffserv(skb, 0);
>> +
>> +	} else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>> +		   skb->mark & 0x40000000) {
>> +
>> +		dscp = skb->mark >> 24 & 0x3f;
>> +		tin = q->tin_index[dscp];
>> 
>> -	else {
>> +		if (wash)
>> +			cake_update_diffserv(skb, 0);
>> +		else if (q->rate_flags & CAKE_FLAG_ICING)
>> +			cake_update_diffserv(skb, dscp << 2);
>> +
>> +	} else { /* fallback to DSCP */
>> +		/* extract the Diffserv Precedence field, if it exists */
>> +		/* and clear DSCP bits if washing */
>> +		dscp = cake_handle_diffserv(skb, wash);
>> 		tin = q->tin_index[dscp];
> 
> As I said above, no reason to revert the cleanup commit…
> 
>> 		if (unlikely(tin >= q->tin_cnt))
>> 			tin = 0;
>> +
>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
>> +		if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
>> +			cake_update_ct_mark(skb, dscp);
>> +#endif
> 
> See above about moving the ifdef and losing this one.

Done

> 
>> 	}
>> 
>> 	return &q->tins[tin];
>> @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
>> 			q->rate_flags &= ~CAKE_FLAG_FWMARK;
>> 	}
>> 
>> +	if (tb[TCA_CAKE_ICING]) {
>> +		if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
>> +			q->rate_flags |= CAKE_FLAG_ICING;
>> +		else
>> +			q->rate_flags &= ~CAKE_FLAG_ICING;
>> +	}
>> +
>> 	if (q->tins) {
>> 		sch_tree_lock(sch);
>> 		cake_reconfigure(sch);
>> @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
>> 			!!(q->rate_flags & CAKE_FLAG_FWMARK)))
>> 		goto nla_put_failure;
>> 
>> +	if (nla_put_u32(skb, TCA_CAKE_ICING,
>> +			!!(q->rate_flags & CAKE_FLAG_ICING)))
>> +		goto nla_put_failure;
>> +
>> 	return nla_nest_end(skb, opts);
>> 
>> nla_put_failure:
>> From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001
>> From: Kevin Darbyshire-Bryant <ldir at darbyshire-bryant.me.uk>
>> Date: Wed, 27 Feb 2019 14:46:05 +0000
>> Subject: [PATCH] cake: add fwmark & icing options
>> 
>> Signed-off-by: Kevin Darbyshire-Bryant <ldir at darbyshire-bryant.me.uk>
>> ---
>> include/uapi/linux/pkt_sched.h |  2 ++
>> man/man8/tc-cake.8             | 19 ++++++++++++++++
>> tc/q_cake.c                    | 40 ++++++++++++++++++++++++++++++++++
>> 3 files changed, 61 insertions(+)
>> 
>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>> index 01f96352..f2b1b270 100644
>> --- a/include/uapi/linux/pkt_sched.h
>> +++ b/include/uapi/linux/pkt_sched.h
>> @@ -954,6 +954,8 @@ enum {
>> 	TCA_CAKE_INGRESS,
>> 	TCA_CAKE_ACK_FILTER,
>> 	TCA_CAKE_SPLIT_GSO,
>> +	TCA_CAKE_FWMARK,
>> +	TCA_CAKE_ICING,
>> 	__TCA_CAKE_MAX
>> };
>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>> diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
>> index eda436e1..626d4525 100644
>> --- a/man/man8/tc-cake.8
>> +++ b/man/man8/tc-cake.8
>> @@ -73,6 +73,12 @@ TIME |
>> ]
>> .br
>> [
>> +.BR fwmark
>> +|
>> +.BR nofwmark*
>> +]
>> +.br
>> +[
>> .BR split-gso*
>> |
>> .BR no-split-gso
>> @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be used as both source and
>> destination host.
>> 
>> 
>> +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS
>> +
>> +In addition to TC FILTER tin classification, firewall marks may also optionally
>> +be used.  The priority order (highest to lowest) for tin selection is TC filter,
>> +firewall mark and then DSCP.
>> +.PP
>> +.B fwmark
>> +
>> +.br
>> +	Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to the
>> +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4.
>> +Values outside the valid range are ignored and CAKE will fall back to using
>> +DSCP for tin selection.
> 
> This should document the masking and shifting.

Yes, and the ‘icing’ option however that turns out.  Again it’s a case of sharing what I have for others to test, play, improve.

> 
> 
>> 
>> .SH EXAMPLES
>> # tc qdisc delete root dev eth0
>> diff --git a/tc/q_cake.c b/tc/q_cake.c
>> index e827e3f1..fdafd3b7 100644
>> --- a/tc/q_cake.c
>> +++ b/tc/q_cake.c
>> @@ -79,6 +79,8 @@ static void explain(void)
>> "                  dual-srchost | dual-dsthost | triple-isolate* ]\n"
>> "                [ nat | nonat* ]\n"
>> "                [ wash | nowash* ]\n"
>> +"                [ icing | noicing* ]\n"
>> +"                [ fwmark | nofwmark* ]\n"
> 
> Much as I appreciate the wordplay, I'm not sure this is actually going
> to be super helpful for something just trying to make sense of the
> command output. Not sure I have a better suggestion, though...
> 
> -Toke


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A



More information about the Cake mailing list