[Cake] Using firewall connmarks as tin selectors
Kevin Darbyshire-Bryant
kevin at darbyshire-bryant.me.uk
Mon Mar 4 06:55:01 EST 2019
> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen <toke at redhat.com> wrote:
>
> Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> writes:
>
>>> On 4 Mar 2019, at 08:39, Pete Heist <pete at heistp.net> wrote:
>>>
>>>
>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin at darbyshire-bryant.me.uk> wrote:
>>>>
>>>> The very bad idea:
>>>>
>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>>> implementation as described above. So an awful lot of our
>>>> shenanigans above is due to DSCP not traversing the internet
>>>> particularly well. The solution above abstracts DSCP into ’tins’
>>>> which we put into fwmarks. Another approach would be to put the DSCP
>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>>> local domain preserved.
>>>
>>> If I understand it right, another use case for this “very bad idea”
>>> is preserving DSCP locally while traversing upstream WiFi links as
>>> besteffort, which avoids airtime efficiency problems that can occur
>>> with 802.11e (WMM). In cases where the router config can’t be changed
>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>>> it hides DSCP from the WiFi stack while preserving the values through
>>> the tunnel, but this would be easier. Neat… :)
>>
>> Everyone has understood the intent & maybe the implementation
>> correctly. 2 patches attached, one for cake, one for tc.
>>
>> They are naively coded and some of it undoes Toke’s recent tidying up
>> (sorry!)
>
> Heh. First comment: Don't do that ;)
I did say naively coded.
>
> A few more below.
>
>> 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A
>> diff --git a/pkt_sched.h b/pkt_sched.h
>> index a2f570c..d1f288d 100644
>> --- a/pkt_sched.h
>> +++ b/pkt_sched.h
>> @@ -879,6 +879,7 @@ enum {
>> TCA_CAKE_ACK_FILTER,
>> TCA_CAKE_SPLIT_GSO,
>> TCA_CAKE_FWMARK,
>> + TCA_CAKE_ICING,
>> __TCA_CAKE_MAX
>> };
>> #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1)
>> diff --git a/sch_cake.c b/sch_cake.c
>> index 733b897..5aca0f3 100644
>> --- a/sch_cake.c
>> +++ b/sch_cake.c
>> @@ -270,7 +270,8 @@ enum {
>> CAKE_FLAG_INGRESS = BIT(2),
>> CAKE_FLAG_WASH = BIT(3),
>> CAKE_FLAG_SPLIT_GSO = BIT(4),
>> - CAKE_FLAG_FWMARK = BIT(5)
>> + CAKE_FLAG_FWMARK = BIT(5),
>> + CAKE_FLAG_ICING = BIT(6)
>
> This implies that icing and fwmark can be enabled completely
> independently of each other. Are you sure about the semantics for that?
No, I’m not. I sent the patches so others could see my lunacy in action and hopefully improve it.
The actual operation links FWMARK, INGRESS & ICING in a variety of combinations.
>> };
>>
>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>> };
>>
>> static const u8 diffserv4[] = {
>> - 0, 2, 0, 0, 2, 0, 0, 0,
>> + 0, 1, 0, 0, 2, 0, 0, 0,
>> 1, 0, 0, 0, 0, 0, 0, 0,
>> 2, 0, 2, 0, 2, 0, 2, 0,
>> 2, 0, 2, 0, 2, 0, 2, 0,
>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>> };
>>
>> static const u8 diffserv3[] = {
>> - 0, 0, 0, 0, 2, 0, 0, 0,
>> + 0, 1, 0, 0, 2, 0, 0, 0,
>
> Why are you messing with the diffserv mappings in this patch?
This is a combination patch of Dave’s new LE coding and the fwmark/dscp mangling.
>
>> 1, 0, 0, 0, 0, 0, 0, 0,
>> 0, 0, 0, 0, 0, 0, 0, 0,
>> 0, 0, 0, 0, 0, 0, 0, 0,
>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>> return idx + (tin << 16);
>> }
>>
>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
>> +{
>> + switch (skb->protocol) {
>> + case htons(ETH_P_IP):
>> + if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>> + ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
>> + break;
>> + case htons(ETH_P_IPV6):
>> + if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>> + ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
>> + break;
>> + default:
>> + break;
>> + }
>> +
>> +}
>
> So washing is just a special case of this (wash is
> cake_update_diffserv(skb,0)). So you shouldn't need to add another
> function, just augment the existing handling code.
Erm, that’s exactly what I’ve done.
>
>> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>> {
>> u8 dscp;
>>
>> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>> }
>> }
>>
>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
>
> Save an ifdef below by moving the ifdef inside the function definition.
>
>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>> +{
>> + enum ip_conntrack_info ctinfo;
>> + struct nf_conn *ct;
>> +
>> + ct = nf_ct_get(skb, &ctinfo);
>> + if (!ct)
>> + return;
>> +
>> + ct->mark &= 0x80ffffff;
>> + ct->mark |= (0x40 | dscp) << 24;
>
> Right, so we *might* have an argument that putting the *tin* into the
> fwmark is CAKE's business, but copying over the dscp mark is not
> something a qdisc should be doing…
Why ever not? It’s not the DSCP, it’s a lookup value into the cake priority table, it just happens to look like the DSCP ;-)
>> + nf_conntrack_event_cache(IPCT_MARK, ct);
>> +}
>> +#endif
>
> Also, are you sure this will work in all permutations of conntrack being
> a module vs not etc? (we had to jump through quite some hoops to get the
> conntrack hooks to work last time; this is probably my biggest worry here).
No, I have absolutely no clue here at all.
>
>> static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>> struct sk_buff *skb)
>> {
>> struct cake_sched_data *q = qdisc_priv(sch);
>> - u32 tin;
>> + bool wash;
>> u8 dscp;
>> + u8 tin;
>>
>> - /* Tin selection: Default to diffserv-based selection, allow overriding
>> - * using firewall marks or skb->priority.
>> - */
>> - dscp = cake_handle_diffserv(skb,
>> - q->rate_flags & CAKE_FLAG_WASH);
>> + wash = !!(q->rate_flags & CAKE_FLAG_WASH);
>> +
>> + if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) {
>>
>> - if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
>> tin = 0;
>> + if (wash)
>> + cake_update_diffserv(skb, 0);
>>
>> - else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>> - skb->mark &&
>> - skb->mark <= q->tin_cnt)
>> - tin = q->tin_order[skb->mark - 1];
>> + } else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
>> + TC_H_MIN(skb->priority) > 0 &&
>> + TC_H_MIN(skb->priority) <= q->tin_cnt) {
>>
>> - else if (TC_H_MAJ(skb->priority) == sch->handle &&
>> - TC_H_MIN(skb->priority) > 0 &&
>> - TC_H_MIN(skb->priority) <= q->tin_cnt)
>> tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
>> + if (wash)
>> + cake_update_diffserv(skb, 0);
>> +
>> + } else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>> + skb->mark & 0x40000000) {
>> +
>> + dscp = skb->mark >> 24 & 0x3f;
>> + tin = q->tin_index[dscp];
>>
>> - else {
>> + if (wash)
>> + cake_update_diffserv(skb, 0);
>> + else if (q->rate_flags & CAKE_FLAG_ICING)
>> + cake_update_diffserv(skb, dscp << 2);
>> +
>> + } else { /* fallback to DSCP */
>> + /* extract the Diffserv Precedence field, if it exists */
>> + /* and clear DSCP bits if washing */
>> + dscp = cake_handle_diffserv(skb, wash);
>> tin = q->tin_index[dscp];
>
> As I said above, no reason to revert the cleanup commit…
>
>> if (unlikely(tin >= q->tin_cnt))
>> tin = 0;
>> +
>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
>> + if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
>> + cake_update_ct_mark(skb, dscp);
>> +#endif
>
> See above about moving the ifdef and losing this one.
Done
>
>> }
>>
>> return &q->tins[tin];
>> @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
>> q->rate_flags &= ~CAKE_FLAG_FWMARK;
>> }
>>
>> + if (tb[TCA_CAKE_ICING]) {
>> + if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
>> + q->rate_flags |= CAKE_FLAG_ICING;
>> + else
>> + q->rate_flags &= ~CAKE_FLAG_ICING;
>> + }
>> +
>> if (q->tins) {
>> sch_tree_lock(sch);
>> cake_reconfigure(sch);
>> @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
>> !!(q->rate_flags & CAKE_FLAG_FWMARK)))
>> goto nla_put_failure;
>>
>> + if (nla_put_u32(skb, TCA_CAKE_ICING,
>> + !!(q->rate_flags & CAKE_FLAG_ICING)))
>> + goto nla_put_failure;
>> +
>> return nla_nest_end(skb, opts);
>>
>> nla_put_failure:
>> From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001
>> From: Kevin Darbyshire-Bryant <ldir at darbyshire-bryant.me.uk>
>> Date: Wed, 27 Feb 2019 14:46:05 +0000
>> Subject: [PATCH] cake: add fwmark & icing options
>>
>> Signed-off-by: Kevin Darbyshire-Bryant <ldir at darbyshire-bryant.me.uk>
>> ---
>> include/uapi/linux/pkt_sched.h | 2 ++
>> man/man8/tc-cake.8 | 19 ++++++++++++++++
>> tc/q_cake.c | 40 ++++++++++++++++++++++++++++++++++
>> 3 files changed, 61 insertions(+)
>>
>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>> index 01f96352..f2b1b270 100644
>> --- a/include/uapi/linux/pkt_sched.h
>> +++ b/include/uapi/linux/pkt_sched.h
>> @@ -954,6 +954,8 @@ enum {
>> TCA_CAKE_INGRESS,
>> TCA_CAKE_ACK_FILTER,
>> TCA_CAKE_SPLIT_GSO,
>> + TCA_CAKE_FWMARK,
>> + TCA_CAKE_ICING,
>> __TCA_CAKE_MAX
>> };
>> #define TCA_CAKE_MAX (__TCA_CAKE_MAX - 1)
>> diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
>> index eda436e1..626d4525 100644
>> --- a/man/man8/tc-cake.8
>> +++ b/man/man8/tc-cake.8
>> @@ -73,6 +73,12 @@ TIME |
>> ]
>> .br
>> [
>> +.BR fwmark
>> +|
>> +.BR nofwmark*
>> +]
>> +.br
>> +[
>> .BR split-gso*
>> |
>> .BR no-split-gso
>> @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be used as both source and
>> destination host.
>>
>>
>> +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS
>> +
>> +In addition to TC FILTER tin classification, firewall marks may also optionally
>> +be used. The priority order (highest to lowest) for tin selection is TC filter,
>> +firewall mark and then DSCP.
>> +.PP
>> +.B fwmark
>> +
>> +.br
>> + Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to the
>> +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4.
>> +Values outside the valid range are ignored and CAKE will fall back to using
>> +DSCP for tin selection.
>
> This should document the masking and shifting.
Yes, and the ‘icing’ option however that turns out. Again it’s a case of sharing what I have for others to test, play, improve.
>
>
>>
>> .SH EXAMPLES
>> # tc qdisc delete root dev eth0
>> diff --git a/tc/q_cake.c b/tc/q_cake.c
>> index e827e3f1..fdafd3b7 100644
>> --- a/tc/q_cake.c
>> +++ b/tc/q_cake.c
>> @@ -79,6 +79,8 @@ static void explain(void)
>> " dual-srchost | dual-dsthost | triple-isolate* ]\n"
>> " [ nat | nonat* ]\n"
>> " [ wash | nowash* ]\n"
>> +" [ icing | noicing* ]\n"
>> +" [ fwmark | nofwmark* ]\n"
>
> Much as I appreciate the wordplay, I'm not sure this is actually going
> to be super helpful for something just trying to make sense of the
> command output. Not sure I have a better suggestion, though...
>
> -Toke
Cheers,
Kevin D-B
012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A
More information about the Cake
mailing list