[Cake] Using firewall connmarks as tin selectors

Toke Høiland-Jørgensen toke at redhat.com
Mon Mar 4 06:17:51 EST 2019


Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:

>> On 4 Mar 2019, at 08:39, Pete Heist <pete at heistp.net> wrote:
>> 
>> 
>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> wrote:
>>> 
>>> The very bad idea:
>>> 
>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>> implementation as described above. So an awful lot of our
>>> shenanigans above is due to DSCP not traversing the internet
>>> particularly well. The solution above abstracts DSCP into ’tins’
>>> which we put into fwmarks. Another approach would be to put the DSCP
>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>> local domain preserved.
>> 
>> If I understand it right, another use case for this “very bad idea”
>> is preserving DSCP locally while traversing upstream WiFi links as
>> besteffort, which avoids airtime efficiency problems that can occur
>> with 802.11e (WMM). In cases where the router config can’t be changed
>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>> it hides DSCP from the WiFi stack while preserving the values through
>> the tunnel, but this would be easier. Neat… :)
>
> Everyone has understood the intent & maybe the implementation
> correctly. 2 patches attached, one for cake, one for tc.
>
> They are naively coded and some of it undoes Toke’s recent tidying up
> (sorry!)

Heh. First comment: Don't do that ;)

A few more below.

> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
> diff --git a/pkt_sched.h b/pkt_sched.h
> index a2f570c..d1f288d 100644
> --- a/pkt_sched.h
> +++ b/pkt_sched.h
> @@ -879,6 +879,7 @@ enum {
>  	TCA_CAKE_ACK_FILTER,
>  	TCA_CAKE_SPLIT_GSO,
>  	TCA_CAKE_FWMARK,
> +	TCA_CAKE_ICING,
>  	__TCA_CAKE_MAX
>  };
>  #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
> diff --git a/sch_cake.c b/sch_cake.c
> index 733b897..5aca0f3 100644
> --- a/sch_cake.c
> +++ b/sch_cake.c
> @@ -270,7 +270,8 @@ enum {
>  	CAKE_FLAG_INGRESS	   = BIT(2),
>  	CAKE_FLAG_WASH		   = BIT(3),
>  	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
> -	CAKE_FLAG_FWMARK	   = BIT(5)
> +	CAKE_FLAG_FWMARK	   = BIT(5),
> +	CAKE_FLAG_ICING		   = BIT(6)

This implies that icing and fwmark can be enabled completely
independently of each other. Are you sure about the semantics for that?
>  };
>  
>  /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>  };
>  
>  static const u8 diffserv4[] = {
> -	0, 2, 0, 0, 2, 0, 0, 0,
> +	0, 1, 0, 0, 2, 0, 0, 0,
>  	1, 0, 0, 0, 0, 0, 0, 0,
>  	2, 0, 2, 0, 2, 0, 2, 0,
>  	2, 0, 2, 0, 2, 0, 2, 0,
> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>  };
>  
>  static const u8 diffserv3[] = {
> -	0, 0, 0, 0, 2, 0, 0, 0,
> +	0, 1, 0, 0, 2, 0, 0, 0,

Why are you messing with the diffserv mappings in this patch?

>  	1, 0, 0, 0, 0, 0, 0, 0,
>  	0, 0, 0, 0, 0, 0, 0, 0,
>  	0, 0, 0, 0, 0, 0, 0, 0,
> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>  	return idx + (tin << 16);
>  }
>  
> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
> +{
> +	switch (skb->protocol) {
> +	case htons(ETH_P_IP):
> +		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> +			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
> +		break;
> +	case htons(ETH_P_IPV6):
> +		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> +			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +}

So washing is just a special case of this (wash is
cake_update_diffserv(skb,0)). So you shouldn't need to add another
function, just augment the existing handling code.

> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>  {
>  	u8 dscp;
>  
> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>  	}
>  }
>  
> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)

Save an ifdef below by moving the ifdef inside the function definition.

> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
> +{
> +	enum ip_conntrack_info ctinfo;
> +	struct nf_conn *ct;
> +
> +	ct = nf_ct_get(skb, &ctinfo);
> +	if (!ct)
> +		return;
> +
> +	ct->mark &= 0x80ffffff;
> +	ct->mark |= (0x40 | dscp) << 24;

Right, so we *might* have an argument that putting the *tin* into the
fwmark is CAKE's business, but copying over the dscp mark is not
something a qdisc should be doing...

> +	nf_conntrack_event_cache(IPCT_MARK, ct);
> +}
> +#endif

Also, are you sure this will work in all permutations of conntrack being
a module vs not etc? (we had to jump through quite some hoops to get the
conntrack hooks to work last time; this is probably my biggest worry here).

>  static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>  					     struct sk_buff *skb)
>  {
>  	struct cake_sched_data *q = qdisc_priv(sch);
> -	u32 tin;
> +	bool wash;
>  	u8 dscp;
> +	u8 tin;
>  
> -	/* Tin selection: Default to diffserv-based selection, allow overriding
> -	 * using firewall marks or skb->priority.
> -	 */
> -	dscp = cake_handle_diffserv(skb,
> -				    q->rate_flags & CAKE_FLAG_WASH);
> +	wash = !!(q->rate_flags & CAKE_FLAG_WASH);
> +
> +	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) {
>  
> -	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
>  		tin = 0;
> +		if (wash)
> +			cake_update_diffserv(skb, 0);
>  
> -	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
> -		 skb->mark &&
> -		 skb->mark <= q->tin_cnt)
> -		tin = q->tin_order[skb->mark - 1];
> +	} else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
> +		   TC_H_MIN(skb->priority) > 0 &&
> +		   TC_H_MIN(skb->priority) <= q->tin_cnt) {
>  
> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
> -		 TC_H_MIN(skb->priority) > 0 &&
> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>  		tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
> +		if (wash)
> +			cake_update_diffserv(skb, 0);
> +
> +	} else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
> +		   skb->mark & 0x40000000) {
> +
> +		dscp = skb->mark >> 24 & 0x3f;
> +		tin = q->tin_index[dscp];
>  
> -	else {
> +		if (wash)
> +			cake_update_diffserv(skb, 0);
> +		else if (q->rate_flags & CAKE_FLAG_ICING)
> +			cake_update_diffserv(skb, dscp << 2);
> +
> +	} else { /* fallback to DSCP */
> +		/* extract the Diffserv Precedence field, if it exists */
> +		/* and clear DSCP bits if washing */
> +		dscp = cake_handle_diffserv(skb, wash);
>  		tin = q->tin_index[dscp];

As I said above, no reason to revert the cleanup commit...

>  		if (unlikely(tin >= q->tin_cnt))
>  			tin = 0;
> +
> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
> +		if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
> +			cake_update_ct_mark(skb, dscp);
> +#endif

See above about moving the ifdef and losing this one.

>  	}
>  
>  	return &q->tins[tin];
> @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
>  			q->rate_flags &= ~CAKE_FLAG_FWMARK;
>  	}
>  
> +	if (tb[TCA_CAKE_ICING]) {
> +		if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
> +			q->rate_flags |= CAKE_FLAG_ICING;
> +		else
> +			q->rate_flags &= ~CAKE_FLAG_ICING;
> +	}
> +
>  	if (q->tins) {
>  		sch_tree_lock(sch);
>  		cake_reconfigure(sch);
> @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
>  			!!(q->rate_flags & CAKE_FLAG_FWMARK)))
>  		goto nla_put_failure;
>  
> +	if (nla_put_u32(skb, TCA_CAKE_ICING,
> +			!!(q->rate_flags & CAKE_FLAG_ICING)))
> +		goto nla_put_failure;
> +
>  	return nla_nest_end(skb, opts);
>  
>  nla_put_failure:
> From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001
> From: Kevin Darbyshire-Bryant <ldir at darbyshire-bryant.me.uk>
> Date: Wed, 27 Feb 2019 14:46:05 +0000
> Subject: [PATCH] cake: add fwmark & icing options
>
> Signed-off-by: Kevin Darbyshire-Bryant <ldir at darbyshire-bryant.me.uk>
> ---
>  include/uapi/linux/pkt_sched.h |  2 ++
>  man/man8/tc-cake.8             | 19 ++++++++++++++++
>  tc/q_cake.c                    | 40 ++++++++++++++++++++++++++++++++++
>  3 files changed, 61 insertions(+)
>
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 01f96352..f2b1b270 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -954,6 +954,8 @@ enum {
>  	TCA_CAKE_INGRESS,
>  	TCA_CAKE_ACK_FILTER,
>  	TCA_CAKE_SPLIT_GSO,
> +	TCA_CAKE_FWMARK,
> +	TCA_CAKE_ICING,
>  	__TCA_CAKE_MAX
>  };
>  #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
> diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
> index eda436e1..626d4525 100644
> --- a/man/man8/tc-cake.8
> +++ b/man/man8/tc-cake.8
> @@ -73,6 +73,12 @@ TIME |
>  ]
>  .br
>  [
> +.BR fwmark
> +|
> +.BR nofwmark*
> +]
> +.br
> +[
>  .BR split-gso*
>  |
>  .BR no-split-gso
> @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be used as both source and
>  destination host.
>  
>  
> +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS
> +
> +In addition to TC FILTER tin classification, firewall marks may also optionally
> +be used.  The priority order (highest to lowest) for tin selection is TC filter,
> +firewall mark and then DSCP.
> +.PP
> +.B fwmark
> +
> +.br
> +	Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to the
> +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4.
> +Values outside the valid range are ignored and CAKE will fall back to using
> +DSCP for tin selection.

This should document the masking and shifting.


>  
>  .SH EXAMPLES
>  # tc qdisc delete root dev eth0
> diff --git a/tc/q_cake.c b/tc/q_cake.c
> index e827e3f1..fdafd3b7 100644
> --- a/tc/q_cake.c
> +++ b/tc/q_cake.c
> @@ -79,6 +79,8 @@ static void explain(void)
>  "                  dual-srchost | dual-dsthost | triple-isolate* ]\n"
>  "                [ nat | nonat* ]\n"
>  "                [ wash | nowash* ]\n"
> +"                [ icing | noicing* ]\n"
> +"                [ fwmark | nofwmark* ]\n"

Much as I appreciate the wordplay, I'm not sure this is actually going
to be super helpful for something just trying to make sense of the
command output. Not sure I have a better suggestion, though...

-Toke


More information about the Cake mailing list