Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
From: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>
Cc: Pete Heist <pete@heistp.net>,
	"cake@lists.bufferbloat.net" <cake@lists.bufferbloat.net>
Subject: Re: [Cake] Using firewall connmarks as tin selectors
Date: Mon, 4 Mar 2019 15:50:42 +0000	[thread overview]
Message-ID: <CA77F44F-97F1-48D7-8F36-45C4588303BE@darbyshire-bryant.me.uk> (raw)
In-Reply-To: <871s3mlsge.fsf@toke.dk>

[-- Attachment #1: Type: text/plain, Size: 7297 bytes --]



> On 4 Mar 2019, at 12:44, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
> 
>>> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> 
>>> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
>>> 
>>>>> On 4 Mar 2019, at 08:39, Pete Heist <pete@heistp.net> wrote:
>>>>> 
>>>>> 
>>>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>>>>> 
>>>>>> The very bad idea:
>>>>>> 
>>>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>>>>> implementation as described above. So an awful lot of our
>>>>>> shenanigans above is due to DSCP not traversing the internet
>>>>>> particularly well. The solution above abstracts DSCP into ’tins’
>>>>>> which we put into fwmarks. Another approach would be to put the DSCP
>>>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>>>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>>>>> local domain preserved.
>>>>> 
>>>>> If I understand it right, another use case for this “very bad idea”
>>>>> is preserving DSCP locally while traversing upstream WiFi links as
>>>>> besteffort, which avoids airtime efficiency problems that can occur
>>>>> with 802.11e (WMM). In cases where the router config can’t be changed
>>>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>>>>> it hides DSCP from the WiFi stack while preserving the values through
>>>>> the tunnel, but this would be easier. Neat… :)
>>>> 
>>>> Everyone has understood the intent & maybe the implementation
>>>> correctly. 2 patches attached, one for cake, one for tc.
>>>> 
>>>> They are naively coded and some of it undoes Toke’s recent tidying up
>>>> (sorry!)
>>> 
>>> Heh. First comment: Don't do that ;)
>> 
>> I did say naively coded.
>> 
>>> 
>>> A few more below.
>>> 
>>>> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
>>>> diff --git a/pkt_sched.h b/pkt_sched.h
>>>> index a2f570c..d1f288d 100644
>>>> --- a/pkt_sched.h
>>>> +++ b/pkt_sched.h
>>>> @@ -879,6 +879,7 @@ enum {
>>>> 	TCA_CAKE_ACK_FILTER,
>>>> 	TCA_CAKE_SPLIT_GSO,
>>>> 	TCA_CAKE_FWMARK,
>>>> +	TCA_CAKE_ICING,
>>>> 	__TCA_CAKE_MAX
>>>> };
>>>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>>>> diff --git a/sch_cake.c b/sch_cake.c
>>>> index 733b897..5aca0f3 100644
>>>> --- a/sch_cake.c
>>>> +++ b/sch_cake.c
>>>> @@ -270,7 +270,8 @@ enum {
>>>> 	CAKE_FLAG_INGRESS	   = BIT(2),
>>>> 	CAKE_FLAG_WASH		   = BIT(3),
>>>> 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
>>>> -	CAKE_FLAG_FWMARK	   = BIT(5)
>>>> +	CAKE_FLAG_FWMARK	   = BIT(5),
>>>> +	CAKE_FLAG_ICING		   = BIT(6)
>>> 
>>> This implies that icing and fwmark can be enabled completely
>>> independently of each other. Are you sure about the semantics for that?
>> 
>> No, I’m not.  I sent the patches so others could see my lunacy in action and hopefully improve it.
>> 
>> The actual operation links FWMARK, INGRESS & ICING in a variety of
>> combinations.
> 
> Right, so obviously this needs to be thought through…

Yes.  Volunteers with thoughts & ideally code sought.

> 
>>>> };
>>>> 
>>>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
>>>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>>>> };
>>>> 
>>>> static const u8 diffserv4[] = {
>>>> -	0, 2, 0, 0, 2, 0, 0, 0,
>>>> +	0, 1, 0, 0, 2, 0, 0, 0,
>>>> 	1, 0, 0, 0, 0, 0, 0, 0,
>>>> 	2, 0, 2, 0, 2, 0, 2, 0,
>>>> 	2, 0, 2, 0, 2, 0, 2, 0,
>>>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>>>> };
>>>> 
>>>> static const u8 diffserv3[] = {
>>>> -	0, 0, 0, 0, 2, 0, 0, 0,
>>>> +	0, 1, 0, 0, 2, 0, 0, 0,
>>> 
>>> Why are you messing with the diffserv mappings in this patch?
>> 
>> This is a combination patch of Dave’s new LE coding and the
>> fwmark/dscp mangling.
> 
> Ah. Well let's keep that separate from this patch/discussion…

Done
> 
>>> 
>>>> 	1, 0, 0, 0, 0, 0, 0, 0,
>>>> 	0, 0, 0, 0, 0, 0, 0, 0,
>>>> 	0, 0, 0, 0, 0, 0, 0, 0,
>>>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>>>> 	return idx + (tin << 16);
>>>> }
>>>> 
>>>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>>>> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
>>>> +{
>>>> +	switch (skb->protocol) {
>>>> +	case htons(ETH_P_IP):
>>>> +		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>>>> +			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
>>>> +		break;
>>>> +	case htons(ETH_P_IPV6):
>>>> +		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>>>> +			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
>>>> +		break;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>> +
>>>> +}
>>> 
>>> So washing is just a special case of this (wash is
>>> cake_update_diffserv(skb,0)). So you shouldn't need to add another
>>> function, just augment the existing handling code.
>> 
>> Erm, that’s exactly what I’ve done.
> 
> Ah, right; I guess it's just the reverting of the cleanup patch that is
> the issue, then :)

With the reverting of the cleaning reverted - if that makes any sense.
And I have major twitchiness at the result.

> 
>>>> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>>>> {
>>>> 	u8 dscp;
>>>> 
>>>> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>>>> 	}
>>>> }
>>>> 
>>>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
>>> 
>>> Save an ifdef below by moving the ifdef inside the function definition.
>>> 
>>>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>>>> +{
>>>> +	enum ip_conntrack_info ctinfo;
>>>> +	struct nf_conn *ct;
>>>> +
>>>> +	ct = nf_ct_get(skb, &ctinfo);
>>>> +	if (!ct)
>>>> +		return;
>>>> +
>>>> +	ct->mark &= 0x80ffffff;
>>>> +	ct->mark |= (0x40 | dscp) << 24;
>>> 
>>> Right, so we *might* have an argument that putting the *tin* into the
>>> fwmark is CAKE's business, but copying over the dscp mark is not
>>> something a qdisc should be doing…
>> 
>> Why ever not?  It’s not the DSCP, it’s a lookup value into the cake
>> priority table, it just happens to look like the DSCP ;-)
> 
> If it quacks like a duck…

I honestly don’t know where to go from here.  I’m clearly trying to do something that the kernel doesn’t want to do.

> 
>> 
>>>> +	nf_conntrack_event_cache(IPCT_MARK, ct);
>>>> +}
>>>> +#endif
>>> 
>>> Also, are you sure this will work in all permutations of conntrack being
>>> a module vs not etc? (we had to jump through quite some hoops to get the
>>> conntrack hooks to work last time; this is probably my biggest worry here).
>> 
>> No, I have absolutely no clue here at all.
> 
> Well, another issue that needs fixing, then…

Again bright-eyed & bushy tailed volunteers sought!

> 
> -Toke

v2 addressing some of the comments attached.  Is it best to keep the in progress patches here or should they be github PRs ?


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A

[-- Attachment #2: v2-0001-Copy-dscp-to-fwmark.patch --]
[-- Type: application/octet-stream, Size: 4877 bytes --]

From 762f91e6da353815970d6c2f316f00d8f1cb6585 Mon Sep 17 00:00:00 2001
From: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Date: Fri, 1 Mar 2019 11:35:49 +0000
Subject: [PATCH v2] Copy dscp to fwmark

and then I had a really bad idea!

Which is - if we're in fwmark & egress mode & we don't have a mark so
we've fallen through to DSCP, set a mark based on the DSCP.
That way, reply packets will be automagically fw marked for us on their
return.

With the 'icing' keyword, copy the dscp coded into the fwmark into the
real packets (the opposite of wash) in ingress mode.

The fwmark encodes the dscp thus (basically we steal 7 bits from the top
byte of fwmark and preserve everything else):

ct->mark &= 0x80ffffff;
ct->mark |= (0x40 | dscp) << 24;

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 pkt_sched.h |  1 +
 sch_cake.c  | 69 +++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/pkt_sched.h b/pkt_sched.h
index a2f570c..d1f288d 100644
--- a/pkt_sched.h
+++ b/pkt_sched.h
@@ -879,6 +879,7 @@ enum {
 	TCA_CAKE_ACK_FILTER,
 	TCA_CAKE_SPLIT_GSO,
 	TCA_CAKE_FWMARK,
+	TCA_CAKE_ICING,
 	__TCA_CAKE_MAX
 };
 #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
diff --git a/sch_cake.c b/sch_cake.c
index 733b897..7da03b5 100644
--- a/sch_cake.c
+++ b/sch_cake.c
@@ -270,7 +270,8 @@ enum {
 	CAKE_FLAG_INGRESS	   = BIT(2),
 	CAKE_FLAG_WASH		   = BIT(3),
 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
-	CAKE_FLAG_FWMARK	   = BIT(5)
+	CAKE_FLAG_FWMARK	   = BIT(5),
+	CAKE_FLAG_ICING		   = BIT(6)
 };
 
 /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
@@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 	return idx + (tin << 16);
 }
 
-static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
+static void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
+{
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
+			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
+		break;
+	case htons(ETH_P_IPV6):
+		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
+			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
+		break;
+	default:
+		break;
+	}
+
+}
+
+static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
 {
 	u8 dscp;
 
@@ -1644,12 +1662,28 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
 	}
 }
 
+static void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
+{
+#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct)
+		return;
+
+	ct->mark &= 0x80ffffff;
+	ct->mark |= (0x40 | dscp) << 24;
+	nf_conntrack_event_cache(IPCT_MARK, ct);
+#endif
+}
+
 static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 					     struct sk_buff *skb)
 {
 	struct cake_sched_data *q = qdisc_priv(sch);
-	u32 tin;
 	u8 dscp;
+	u8 tin;
 
 	/* Tin selection: Default to diffserv-based selection, allow overriding
 	 * using firewall marks or skb->priority.
@@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 		tin = 0;
 
 	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
-		 skb->mark &&
-		 skb->mark <= q->tin_cnt)
-		tin = q->tin_order[skb->mark - 1];
-
-	else if (TC_H_MAJ(skb->priority) == sch->handle &&
-		 TC_H_MIN(skb->priority) > 0 &&
-		 TC_H_MIN(skb->priority) <= q->tin_cnt)
+		   skb->mark & 0x40000000) {
+		dscp = skb->mark >> 24 & 0x3f;
+		tin = q->tin_index[dscp];
+		if (q->rate_flags & CAKE_FLAG_ICING)
+			cake_update_diffserv(skb, dscp << 2);
+	} else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
+		   TC_H_MIN(skb->priority) > 0 &&
+		   TC_H_MIN(skb->priority) <= q->tin_cnt)
 		tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
 
 	else {
@@ -1675,6 +1710,9 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 
 		if (unlikely(tin >= q->tin_cnt))
 			tin = 0;
+
+		if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
+			cake_update_ct_mark(skb, dscp);
 	}
 
 	return &q->tins[tin];
@@ -2763,6 +2801,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
 			q->rate_flags &= ~CAKE_FLAG_FWMARK;
 	}
 
+	if (tb[TCA_CAKE_ICING]) {
+		if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
+			q->rate_flags |= CAKE_FLAG_ICING;
+		else
+			q->rate_flags &= ~CAKE_FLAG_ICING;
+	}
+
 	if (q->tins) {
 		sch_tree_lock(sch);
 		cake_reconfigure(sch);
@@ -2947,6 +2992,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
 			!!(q->rate_flags & CAKE_FLAG_FWMARK)))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, TCA_CAKE_ICING,
+			!!(q->rate_flags & CAKE_FLAG_ICING)))
+		goto nla_put_failure;
+
 	return nla_nest_end(skb, opts);
 
 nla_put_failure:
-- 
2.17.2 (Apple Git-113)


  reply	other threads:[~2019-03-04 15:50 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-27 21:12 Felix Resch
2019-02-28  3:24 ` gamanakis
2019-03-03 11:52   ` Kevin Darbyshire-Bryant
2019-03-03 12:22     ` John Sager
2019-03-03 16:25       ` Kevin Darbyshire-Bryant
2019-03-04 11:04         ` Toke Høiland-Jørgensen
2019-03-04 11:39           ` John Sager
2019-03-04  5:37     ` Ryan Mounce
2019-03-04  6:31       ` Jonathan Morton
2019-03-04  6:37         ` Ryan Mounce
2019-03-04  7:15           ` Dave Taht
2019-03-04  8:39     ` Pete Heist
2019-03-04 11:01       ` Kevin Darbyshire-Bryant
2019-03-04 11:17         ` Toke Høiland-Jørgensen
2019-03-04 11:55           ` Kevin Darbyshire-Bryant
2019-03-04 12:44             ` Toke Høiland-Jørgensen
2019-03-04 15:50               ` Kevin Darbyshire-Bryant [this message]
2019-03-04 16:39                 ` Toke Høiland-Jørgensen
2019-03-04 17:19                   ` Kevin Darbyshire-Bryant
2019-03-04 17:36                     ` Toke Høiland-Jørgensen
2019-03-04 20:58                       ` Kevin Darbyshire-Bryant
2019-03-04 21:33                         ` Toke Høiland-Jørgensen
2019-03-04 21:42                           ` Toke Høiland-Jørgensen
2019-03-05 14:06                           ` Kevin Darbyshire-Bryant
  -- strict thread matches above, loose matches on Subject: below --
2019-02-27 14:52 Kevin Darbyshire-Bryant
2019-02-27 15:14 ` Toke Høiland-Jørgensen
2019-02-28  8:32   ` Kevin Darbyshire-Bryant
2019-02-28  9:54     ` Toke Høiland-Jørgensen
2019-02-28 11:00       ` Kevin Darbyshire-Bryant
2019-02-28 11:13         ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/cake.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA77F44F-97F1-48D7-8F36-45C4588303BE@darbyshire-bryant.me.uk \
    --to=kevin@darbyshire-bryant.me.uk \
    --cc=cake@lists.bufferbloat.net \
    --cc=pete@heistp.net \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox