[Cake] Using firewall connmarks as tin selectors
Toke Høiland-Jørgensen
toke at redhat.com
Mon Mar 4 06:17:51 EST 2019
Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:
>> On 4 Mar 2019, at 08:39, Pete Heist <pete at heistp.net> wrote:
>>
>>
>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> wrote:
>>>
>>> The very bad idea:
>>>
>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>> implementation as described above. So an awful lot of our
>>> shenanigans above is due to DSCP not traversing the internet
>>> particularly well. The solution above abstracts DSCP into ’tins’
>>> which we put into fwmarks. Another approach would be to put the DSCP
>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>> local domain preserved.
>>
>> If I understand it right, another use case for this “very bad idea”
>> is preserving DSCP locally while traversing upstream WiFi links as
>> besteffort, which avoids airtime efficiency problems that can occur
>> with 802.11e (WMM). In cases where the router config can’t be changed
>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>> it hides DSCP from the WiFi stack while preserving the values through
>> the tunnel, but this would be easier. Neat… :)
>
> Everyone has understood the intent & maybe the implementation
> correctly. 2 patches attached, one for cake, one for tc.
>
> They are naively coded and some of it undoes Toke’s recent tidying up
> (sorry!)
Heh. First comment: Don't do that ;)
A few more below.
> 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A
> diff --git a/pkt_sched.h b/pkt_sched.h
> index a2f570c..d1f288d 100644
> --- a/pkt_sched.h
> +++ b/pkt_sched.h
> @@ -879,6 +879,7 @@ enum {
> TCA_CAKE_ACK_FILTER,
> TCA_CAKE_SPLIT_GSO,
> TCA_CAKE_FWMARK,
> + TCA_CAKE_ICING,
> __TCA_CAKE_MAX
> };
> #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1)
> diff --git a/sch_cake.c b/sch_cake.c
> index 733b897..5aca0f3 100644
> --- a/sch_cake.c
> +++ b/sch_cake.c
> @@ -270,7 +270,8 @@ enum {
> CAKE_FLAG_INGRESS = BIT(2),
> CAKE_FLAG_WASH = BIT(3),
> CAKE_FLAG_SPLIT_GSO = BIT(4),
> - CAKE_FLAG_FWMARK = BIT(5)
> + CAKE_FLAG_FWMARK = BIT(5),
> + CAKE_FLAG_ICING = BIT(6)
This implies that icing and fwmark can be enabled completely
independently of each other. Are you sure about the semantics for that?
> };
>
> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
> };
>
> static const u8 diffserv4[] = {
> - 0, 2, 0, 0, 2, 0, 0, 0,
> + 0, 1, 0, 0, 2, 0, 0, 0,
> 1, 0, 0, 0, 0, 0, 0, 0,
> 2, 0, 2, 0, 2, 0, 2, 0,
> 2, 0, 2, 0, 2, 0, 2, 0,
> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
> };
>
> static const u8 diffserv3[] = {
> - 0, 0, 0, 0, 2, 0, 0, 0,
> + 0, 1, 0, 0, 2, 0, 0, 0,
Why are you messing with the diffserv mappings in this patch?
> 1, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0,
> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
> return idx + (tin << 16);
> }
>
> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
> +{
> + switch (skb->protocol) {
> + case htons(ETH_P_IP):
> + if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> + ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
> + break;
> + case htons(ETH_P_IPV6):
> + if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> + ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
> + break;
> + default:
> + break;
> + }
> +
> +}
So washing is just a special case of this (wash is
cake_update_diffserv(skb,0)). So you shouldn't need to add another
function, just augment the existing handling code.
> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
> {
> u8 dscp;
>
> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
> }
> }
>
> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
Save an ifdef below by moving the ifdef inside the function definition.
> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
> +{
> + enum ip_conntrack_info ctinfo;
> + struct nf_conn *ct;
> +
> + ct = nf_ct_get(skb, &ctinfo);
> + if (!ct)
> + return;
> +
> + ct->mark &= 0x80ffffff;
> + ct->mark |= (0x40 | dscp) << 24;
Right, so we *might* have an argument that putting the *tin* into the
fwmark is CAKE's business, but copying over the dscp mark is not
something a qdisc should be doing...
> + nf_conntrack_event_cache(IPCT_MARK, ct);
> +}
> +#endif
Also, are you sure this will work in all permutations of conntrack being
a module vs not etc? (we had to jump through quite some hoops to get the
conntrack hooks to work last time; this is probably my biggest worry here).
> static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
> struct sk_buff *skb)
> {
> struct cake_sched_data *q = qdisc_priv(sch);
> - u32 tin;
> + bool wash;
> u8 dscp;
> + u8 tin;
>
> - /* Tin selection: Default to diffserv-based selection, allow overriding
> - * using firewall marks or skb->priority.
> - */
> - dscp = cake_handle_diffserv(skb,
> - q->rate_flags & CAKE_FLAG_WASH);
> + wash = !!(q->rate_flags & CAKE_FLAG_WASH);
> +
> + if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) {
>
> - if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
> tin = 0;
> + if (wash)
> + cake_update_diffserv(skb, 0);
>
> - else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
> - skb->mark &&
> - skb->mark <= q->tin_cnt)
> - tin = q->tin_order[skb->mark - 1];
> + } else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
> + TC_H_MIN(skb->priority) > 0 &&
> + TC_H_MIN(skb->priority) <= q->tin_cnt) {
>
> - else if (TC_H_MAJ(skb->priority) == sch->handle &&
> - TC_H_MIN(skb->priority) > 0 &&
> - TC_H_MIN(skb->priority) <= q->tin_cnt)
> tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
> + if (wash)
> + cake_update_diffserv(skb, 0);
> +
> + } else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
> + skb->mark & 0x40000000) {
> +
> + dscp = skb->mark >> 24 & 0x3f;
> + tin = q->tin_index[dscp];
>
> - else {
> + if (wash)
> + cake_update_diffserv(skb, 0);
> + else if (q->rate_flags & CAKE_FLAG_ICING)
> + cake_update_diffserv(skb, dscp << 2);
> +
> + } else { /* fallback to DSCP */
> + /* extract the Diffserv Precedence field, if it exists */
> + /* and clear DSCP bits if washing */
> + dscp = cake_handle_diffserv(skb, wash);
> tin = q->tin_index[dscp];
As I said above, no reason to revert the cleanup commit...
> if (unlikely(tin >= q->tin_cnt))
> tin = 0;
> +
> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
> + if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
> + cake_update_ct_mark(skb, dscp);
> +#endif
See above about moving the ifdef and losing this one.
> }
>
> return &q->tins[tin];
> @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
> q->rate_flags &= ~CAKE_FLAG_FWMARK;
> }
>
> + if (tb[TCA_CAKE_ICING]) {
> + if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
> + q->rate_flags |= CAKE_FLAG_ICING;
> + else
> + q->rate_flags &= ~CAKE_FLAG_ICING;
> + }
> +
> if (q->tins) {
> sch_tree_lock(sch);
> cake_reconfigure(sch);
> @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
> !!(q->rate_flags & CAKE_FLAG_FWMARK)))
> goto nla_put_failure;
>
> + if (nla_put_u32(skb, TCA_CAKE_ICING,
> + !!(q->rate_flags & CAKE_FLAG_ICING)))
> + goto nla_put_failure;
> +
> return nla_nest_end(skb, opts);
>
> nla_put_failure:
> From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001
> From: Kevin Darbyshire-Bryant <ldir at darbyshire-bryant.me.uk>
> Date: Wed, 27 Feb 2019 14:46:05 +0000
> Subject: [PATCH] cake: add fwmark & icing options
>
> Signed-off-by: Kevin Darbyshire-Bryant <ldir at darbyshire-bryant.me.uk>
> ---
> include/uapi/linux/pkt_sched.h | 2 ++
> man/man8/tc-cake.8 | 19 ++++++++++++++++
> tc/q_cake.c | 40 ++++++++++++++++++++++++++++++++++
> 3 files changed, 61 insertions(+)
>
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 01f96352..f2b1b270 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -954,6 +954,8 @@ enum {
> TCA_CAKE_INGRESS,
> TCA_CAKE_ACK_FILTER,
> TCA_CAKE_SPLIT_GSO,
> + TCA_CAKE_FWMARK,
> + TCA_CAKE_ICING,
> __TCA_CAKE_MAX
> };
> #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1)
> diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
> index eda436e1..626d4525 100644
> --- a/man/man8/tc-cake.8
> +++ b/man/man8/tc-cake.8
> @@ -73,6 +73,12 @@ TIME |
> ]
> .br
> [
> +.BR fwmark
> +|
> +.BR nofwmark*
> +]
> +.br
> +[
> .BR split-gso*
> |
> .BR no-split-gso
> @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be used as both source and
> destination host.
>
>
> +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS
> +
> +In addition to TC FILTER tin classification, firewall marks may also optionally
> +be used. The priority order (highest to lowest) for tin selection is TC filter,
> +firewall mark and then DSCP.
> +.PP
> +.B fwmark
> +
> +.br
> + Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to the
> +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4.
> +Values outside the valid range are ignored and CAKE will fall back to using
> +DSCP for tin selection.
This should document the masking and shifting.
>
> .SH EXAMPLES
> # tc qdisc delete root dev eth0
> diff --git a/tc/q_cake.c b/tc/q_cake.c
> index e827e3f1..fdafd3b7 100644
> --- a/tc/q_cake.c
> +++ b/tc/q_cake.c
> @@ -79,6 +79,8 @@ static void explain(void)
> " dual-srchost | dual-dsthost | triple-isolate* ]\n"
> " [ nat | nonat* ]\n"
> " [ wash | nowash* ]\n"
> +" [ icing | noicing* ]\n"
> +" [ fwmark | nofwmark* ]\n"
Much as I appreciate the wordplay, I'm not sure this is actually going
to be super helpful for something just trying to make sense of the
command output. Not sure I have a better suggestion, though...
-Toke
More information about the Cake
mailing list