Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
* [Cake] Using firewall connmarks as tin selectors
@ 2019-02-27 14:52 Kevin Darbyshire-Bryant
  2019-02-27 15:14 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-02-27 14:52 UTC (permalink / raw)
  To: cake

How unpopular would the idea of having cake look at skb->mark directly be?

https://github.com/ldir-EDB0/sch_cake/commit/64d0e6ac9368a271221db888ab91a367fcd37ae1

https://github.com/ldir-EDB0/tc-adv/commit/4f16ae5d588d44f8a5c83fe2f2b7dcad97843cbc


I did the equivalent in eBPF here https://github.com/ldir-EDB0/cls_bpf_connmark_to_caketin but I can’t work out how to make the major number a tc command line argument into the BPF code.



Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-02-27 14:52 [Cake] Using firewall connmarks as tin selectors Kevin Darbyshire-Bryant
@ 2019-02-27 15:14 ` Toke Høiland-Jørgensen
  2019-02-28  8:32   ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-02-27 15:14 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant, cake

Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:

> How unpopular would the idea of having cake look at skb->mark directly be?
>
> https://github.com/ldir-EDB0/sch_cake/commit/64d0e6ac9368a271221db888ab91a367fcd37ae1
>
> https://github.com/ldir-EDB0/tc-adv/commit/4f16ae5d588d44f8a5c83fe2f2b7dcad97843cbc

Hmm, not impossible, but seeing as we already have a way to achieve that
with BPF, is it really needed?

> I did the equivalent in eBPF here
> https://github.com/ldir-EDB0/cls_bpf_connmark_to_caketin but I can’t
> work out how to make the major number a tc command line argument into
> the BPF code.

You can't, but you can set the major number explicitly when you create
the qdisc:

$ sudo tc qdisc replace dev eth0 root handle 1: cake
$ tc qdisc show dev eth0
qdisc cake 1: root refcnt 2 bandwidth unlimited diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0


-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-02-27 15:14 ` Toke Høiland-Jørgensen
@ 2019-02-28  8:32   ` Kevin Darbyshire-Bryant
  2019-02-28  9:54     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-02-28  8:32 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: cake



> On 27 Feb 2019, at 15:14, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
> 
>> How unpopular would the idea of having cake look at skb->mark directly be?
>> 
>> https://github.com/ldir-EDB0/sch_cake/commit/64d0e6ac9368a271221db888ab91a367fcd37ae1
>> 
>> https://github.com/ldir-EDB0/tc-adv/commit/4f16ae5d588d44f8a5c83fe2f2b7dcad97843cbc
> 

Hiya Toke,

OK, so it’s not an instant no :-)

> Hmm, not impossible, but seeing as we already have a way to achieve that
> with BPF, is it really needed?

Well, from a command line usability I’d say a ’no/fwmark’ option is a lot easier to grasp/configure than invoking anything BPF related.

Similarly, a suitable BPF program requires building and based on your below comments, with hard coded constants e.g. major number.  The ‘entry barrier’ is high, I have to write the BPF program, compile it (eBPF ‘toolchain'), maintain it, install it and there are certain (lack of) features that make it clunky to use even after you’ve done all that.

> 
>> I did the equivalent in eBPF here
>> https://github.com/ldir-EDB0/cls_bpf_connmark_to_caketin but I can’t
>> work out how to make the major number a tc command line argument into
>> the BPF code.
> 
> You can't, but you can set the major number explicitly when you create
> the qdisc:

Indeed the script I’m using does exactly that, but it means I need a ‘hard coded’ BPF program (or section at least) per cake instance.  Tail wagging the dog springs to mind.

I re-worked the code (again) anyway, (last commit https://github.com/ldir-EDB0/sch_cake/commits/mine )

The driver for doing any of this is primarily related to wan ingress classification.  DSCP can’t be trusted and can’t be manipulated (save for eBPF)  Neither iptables rules nor conntrack NAT lookup will have occurred so an eBPF program using internal addresses is challenging.  i *can* persuade tc to restore the connmark on ingress, so I can write old style, complicated, slow iptables rules to apply a connmark once on egress and have that connmark used for both egress & ingress aspects of that connection.

I also equally aware that this is ‘creeping featuritis’ and doing nothing to speed cake up…actually I may have improved BESTEFFORT a little - we no longer look for matching TC Major numbers if there’s no actual choice of tin to be made :-)


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-02-28  8:32   ` Kevin Darbyshire-Bryant
@ 2019-02-28  9:54     ` Toke Høiland-Jørgensen
  2019-02-28 11:00       ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-02-28  9:54 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

> I also equally aware that this is ‘creeping featuritis’ and doing
> nothing to speed cake up…

Yeah, this is the crux of the issue, really: it's a tradeoff between
ease of use and featuritis. Now in this case the actual impact is a
single check it might actually be acceptable

> actually I may have improved BESTEFFORT a little - we no longer look
> for matching TC Major numbers if there’s no actual choice of tin to be
> made :-)

Well, you made the besteffort case slightly faster, but every other mode
slightly slower... :)

If you are going to send a patch (or pull request), please leave out the
refactoring, and only include the feature. This makes it easier to see
the impact of the feature addition on its own.

Also, I assume you have a companion patch for iproute2 somewhere?

-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-02-28  9:54     ` Toke Høiland-Jørgensen
@ 2019-02-28 11:00       ` Kevin Darbyshire-Bryant
  2019-02-28 11:13         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-02-28 11:00 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: cake



> On 28 Feb 2019, at 09:54, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
>> I also equally aware that this is ‘creeping featuritis’ and doing
>> nothing to speed cake up…
> 
> Yeah, this is the crux of the issue, really: it's a tradeoff between
> ease of use and featuritis. Now in this case the actual impact is a
> single check it might actually be acceptable
> 
>> actually I may have improved BESTEFFORT a little - we no longer look
>> for matching TC Major numbers if there’s no actual choice of tin to be
>> made :-)
> 
> Well, you made the besteffort case slightly faster, but every other mode
> slightly slower... :)

Ah, ok, tc filter really is king ;-)

> 
> If you are going to send a patch (or pull request), please leave out the
> refactoring, and only include the feature. This makes it easier to see
> the impact of the feature addition on its own.

Ha ha, sending a patch to the kernel lists???  Never again!

The non-refactored version is https://github.com/ldir-EDB0/sch_cake/commit/f30bb18adc97c827c81c9a3297ab14bfaf2adcb0 and I’ve created a PR

> 
> Also, I assume you have a companion patch for iproute2 somewhere?

In the first email: https://github.com/ldir-EDB0/tc-adv/commits/fwmark - based on 4.20.0 ‘cos that’s where openwrt is at the moment.  When I’ve found a suitable editor I was even going to update the man page!

> 
> -Toke


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-02-28 11:00       ` Kevin Darbyshire-Bryant
@ 2019-02-28 11:13         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-02-28 11:13 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:

>> On 28 Feb 2019, at 09:54, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> 
>>> I also equally aware that this is ‘creeping featuritis’ and doing
>>> nothing to speed cake up…
>> 
>> Yeah, this is the crux of the issue, really: it's a tradeoff between
>> ease of use and featuritis. Now in this case the actual impact is a
>> single check it might actually be acceptable
>> 
>>> actually I may have improved BESTEFFORT a little - we no longer look
>>> for matching TC Major numbers if there’s no actual choice of tin to be
>>> made :-)
>> 
>> Well, you made the besteffort case slightly faster, but every other mode
>> slightly slower... :)
>
> Ah, ok, tc filter really is king ;-)

Well, I don't mind doing the simplification, but it should be done as a
separate commit; I'll do that after mering your pull request.

>> If you are going to send a patch (or pull request), please leave out the
>> refactoring, and only include the feature. This makes it easier to see
>> the impact of the feature addition on its own.
>
> Ha ha, sending a patch to the kernel lists???  Never again!

Haha, no, I meant to this list. I'll submit upstream, along with the
other stuff.

> The non-refactored version is
> https://github.com/ldir-EDB0/sch_cake/commit/f30bb18adc97c827c81c9a3297ab14bfaf2adcb0
> and I’ve created a PR

Great, thanks!

>> Also, I assume you have a companion patch for iproute2 somewhere?
>
> In the first email: https://github.com/ldir-EDB0/tc-adv/commits/fwmark
> - based on 4.20.0 ‘cos that’s where openwrt is at the moment.

A pull request against tc-adv would be useful :)

> When I’ve found a suitable editor I was even going to update the man
> page!

I usually just use a regular editor and copy-paste the syntax bits from
another entry...

-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 21:33                         ` Toke Høiland-Jørgensen
  2019-03-04 21:42                           ` Toke Høiland-Jørgensen
@ 2019-03-05 14:06                           ` Kevin Darbyshire-Bryant
  1 sibling, 0 replies; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-03-05 14:06 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: cake



> On 4 Mar 2019, at 21:33, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
<snip>
>> I think so too though I think the mechanism of copying the DSCP bits
>> and adding a ‘I did this’ flag bit should be retained so that other
>> user space tools (iptables etc) can detect when a connmark based DSCP
>> has been set/applied.
> 
> I guess this could be an option as well?

If we don’t do that then potentially we have to look up the DSCP and update the conntrack mark for every packet.

>> I think cake ‘fwmark’ should have the smarts to look for the act_dscp
>> DSCP value if nothing else so we don’t have to have the overhead of
>> act_dscp set restoring DSCP to all the packets if we don’t want to.
> 
> Not sure what you mean here?

What I meant was that we can make the diffserv restore part optional.  Our qdisc (or whatever) could pick up the fw stored DSCP for tin/bandwidth selection and not require the real DSCP to be set and quite possibly washed/bleached again anyway.


>> I’m right at the limit of my coding ability with what I’ve sent in so
>> far - the kernel space bits of act_connmark leave me mostly confused -
>> really not sure where to start with act_dscp!
> 
> I think I would start with `cp act_connmark.c act_dscp.c`, adding the
> new file to the Makefile and Kconfig, and working from there. Then rip
> out everything not needed, and copy over what you already added to cake.
> 
> Happy to help you work out the details; but I think we'll make more
> progress on this if you are driving it :)

OK - we’ll see how long it takes before someone screams or laughs themselves to death :-)

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 21:33                         ` Toke Høiland-Jørgensen
@ 2019-03-04 21:42                           ` Toke Høiland-Jørgensen
  2019-03-05 14:06                           ` Kevin Darbyshire-Bryant
  1 sibling, 0 replies; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-03-04 21:42 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

Toke Høiland-Jørgensen <toke@redhat.com> writes:

> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
>
>>> On 4 Mar 2019, at 17:36, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> < snips >
>>> 
>>> Or act_dscp with 'get' and 'set' options :)
>>> 
>>>> As I said earlier I couldn't work out how m_conntrack did… anything at
>>>> all to be honest!
>>> 
>>> act_connmark is pretty straight forward:
>>> https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34
>>
>> Oh bloody hell I’m an idiot - I was looking in user space tc code for
>> something that obviously lives in kernel land.  Yes *now* I can see
>> what it does.
>
> No comment ;)
>
>>>>>> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>>>>>> 		tin = 0;
>>>>>> 
>>>>>> 	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>>>>>> -		 skb->mark &&
>>>>>> -		 skb->mark <= q->tin_cnt)
>>>>>> -		tin = q->tin_order[skb->mark - 1];
>>>>>> -
>>>>>> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
>>>>>> -		 TC_H_MIN(skb->priority) > 0 &&
>>>>>> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>>>>>> +		   skb->mark & 0x40000000) {
>>>>> 
>>>>> I think there's something odd with this mask?  There's only one bit set
>>>>> in it…
>>>> 
>>>> I use the single bit as a flag to indicate cake has stored the  DSCP
>>>> in the lower 6 bits of the byte.  Otherwise with a DSCP of 0 (the
>>>> default) it’s rather difficult to know if a connection has been
>>>> through the cake ’save dscp to fwmark’ process or not.  That also
>>>> indicates to user space whether it should consider mangling packets or
>>>> not e.g.
>>> 
>>> Ah, right. But that breaks the previous use case where someone just
>>> wants to set fwmarks that get turned into CAKE tins?
>>
>> Yes, which is why I was hoping for upstream to bounce it, not least
>> because  of the unmasked use of fwmark field.  Personally I’d like to
>> see that change reverted before we get trapped into something we’ll
>> regret.
>
> Well, we have plenty of time to try out things; the whole point of the
> rc cycle is testing. But sure, if we don't settle on something, we can
> just revert it :)

And, well, the simple thing to do is just keep the current behaviour, or
add a mask of 0xff, and use the tc action for everything that's more
advanced than this...

>>> I think this definitely is leaning towards decoupling the
>>> fw-mark-to-DSCP settings to an action. And probably making the shift and
>>> mask configurable rather than hard-coding something…
>>
>> I think so too though I think the mechanism of copying the DSCP bits
>> and adding a ‘I did this’ flag bit should be retained so that other
>> user space tools (iptables etc) can detect when a connmark based DSCP
>> has been set/applied.
>
> I guess this could be an option as well?
>
>> I think cake ‘fwmark’ should have the smarts to look for the act_dscp
>> DSCP value if nothing else so we don’t have to have the overhead of
>> act_dscp set restoring DSCP to all the packets if we don’t want to.
>
> Not sure what you mean here?
>
>> I’m right at the limit of my coding ability with what I’ve sent in so
>> far - the kernel space bits of act_connmark leave me mostly confused -
>> really not sure where to start with act_dscp!
>
> I think I would start with `cp act_connmark.c act_dscp.c`, adding the
> new file to the Makefile and Kconfig, and working from there. Then rip
> out everything not needed, and copy over what you already added to
> cake.

Or even simpler, just add new options to act_connmark...

-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 20:58                       ` Kevin Darbyshire-Bryant
@ 2019-03-04 21:33                         ` Toke Høiland-Jørgensen
  2019-03-04 21:42                           ` Toke Høiland-Jørgensen
  2019-03-05 14:06                           ` Kevin Darbyshire-Bryant
  0 siblings, 2 replies; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-03-04 21:33 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:

>> On 4 Mar 2019, at 17:36, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> < snips >
>> 
>> Or act_dscp with 'get' and 'set' options :)
>> 
>>> As I said earlier I couldn't work out how m_conntrack did… anything at
>>> all to be honest!
>> 
>> act_connmark is pretty straight forward:
>> https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34
>
> Oh bloody hell I’m an idiot - I was looking in user space tc code for
> something that obviously lives in kernel land.  Yes *now* I can see
> what it does.

No comment ;)

>>>>> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>>>>> 		tin = 0;
>>>>> 
>>>>> 	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>>>>> -		 skb->mark &&
>>>>> -		 skb->mark <= q->tin_cnt)
>>>>> -		tin = q->tin_order[skb->mark - 1];
>>>>> -
>>>>> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
>>>>> -		 TC_H_MIN(skb->priority) > 0 &&
>>>>> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>>>>> +		   skb->mark & 0x40000000) {
>>>> 
>>>> I think there's something odd with this mask?  There's only one bit set
>>>> in it…
>>> 
>>> I use the single bit as a flag to indicate cake has stored the  DSCP
>>> in the lower 6 bits of the byte.  Otherwise with a DSCP of 0 (the
>>> default) it’s rather difficult to know if a connection has been
>>> through the cake ’save dscp to fwmark’ process or not.  That also
>>> indicates to user space whether it should consider mangling packets or
>>> not e.g.
>> 
>> Ah, right. But that breaks the previous use case where someone just
>> wants to set fwmarks that get turned into CAKE tins?
>
> Yes, which is why I was hoping for upstream to bounce it, not least
> because  of the unmasked use of fwmark field.  Personally I’d like to
> see that change reverted before we get trapped into something we’ll
> regret.

Well, we have plenty of time to try out things; the whole point of the
rc cycle is testing. But sure, if we don't settle on something, we can
just revert it :)

>> I think this definitely is leaning towards decoupling the
>> fw-mark-to-DSCP settings to an action. And probably making the shift and
>> mask configurable rather than hard-coding something…
>
> I think so too though I think the mechanism of copying the DSCP bits
> and adding a ‘I did this’ flag bit should be retained so that other
> user space tools (iptables etc) can detect when a connmark based DSCP
> has been set/applied.

I guess this could be an option as well?

> I think cake ‘fwmark’ should have the smarts to look for the act_dscp
> DSCP value if nothing else so we don’t have to have the overhead of
> act_dscp set restoring DSCP to all the packets if we don’t want to.

Not sure what you mean here?

> I’m right at the limit of my coding ability with what I’ve sent in so
> far - the kernel space bits of act_connmark leave me mostly confused -
> really not sure where to start with act_dscp!

I think I would start with `cp act_connmark.c act_dscp.c`, adding the
new file to the Makefile and Kconfig, and working from there. Then rip
out everything not needed, and copy over what you already added to cake.

Happy to help you work out the details; but I think we'll make more
progress on this if you are driving it :)

-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 17:36                     ` Toke Høiland-Jørgensen
@ 2019-03-04 20:58                       ` Kevin Darbyshire-Bryant
  2019-03-04 21:33                         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-03-04 20:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: cake



> On 4 Mar 2019, at 17:36, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> < snips >
> 
> Or act_dscp with 'get' and 'set' options :)
> 
>> As I said earlier I couldn't work out how m_conntrack did… anything at
>> all to be honest!
> 
> act_connmark is pretty straight forward:
> https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34

Oh bloody hell I’m an idiot - I was looking in user space tc code for something that obviously lives in kernel land.  Yes *now* I can see what it does.

> 
>>>> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>>>> 		tin = 0;
>>>> 
>>>> 	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>>>> -		 skb->mark &&
>>>> -		 skb->mark <= q->tin_cnt)
>>>> -		tin = q->tin_order[skb->mark - 1];
>>>> -
>>>> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
>>>> -		 TC_H_MIN(skb->priority) > 0 &&
>>>> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>>>> +		   skb->mark & 0x40000000) {
>>> 
>>> I think there's something odd with this mask?  There's only one bit set
>>> in it…
>> 
>> I use the single bit as a flag to indicate cake has stored the  DSCP
>> in the lower 6 bits of the byte.  Otherwise with a DSCP of 0 (the
>> default) it’s rather difficult to know if a connection has been
>> through the cake ’save dscp to fwmark’ process or not.  That also
>> indicates to user space whether it should consider mangling packets or
>> not e.g.
> 
> Ah, right. But that breaks the previous use case where someone just
> wants to set fwmarks that get turned into CAKE tins?

Yes, which is why I was hoping for upstream to bounce it, not least because  of the unmasked use of fwmark field.  Personally I’d like to see that change reverted before we get trapped into something we’ll regret.

> I think this definitely is leaning towards decoupling the
> fw-mark-to-DSCP settings to an action. And probably making the shift and
> mask configurable rather than hard-coding something…

I think so too though I think the mechanism of copying the DSCP bits and adding a ‘I did this’ flag bit should be retained so that other user space tools (iptables etc) can detect when a connmark based DSCP has been set/applied.  I think cake ‘fwmark’ should have the smarts to look for the act_dscp DSCP value if nothing else so we don’t have to have the overhead of act_dscp set restoring DSCP to all the packets if we don’t want to.

I’m right at the limit of my coding ability with what I’ve sent in so far - the kernel space bits of act_connmark leave me mostly confused - really not sure where to start with act_dscp!



Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 17:19                   ` Kevin Darbyshire-Bryant
@ 2019-03-04 17:36                     ` Toke Høiland-Jørgensen
  2019-03-04 20:58                       ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-03-04 17:36 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: Pete Heist, cake

Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:

>> On 4 Mar 2019, at 16:39, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> 
>> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
>> 
>> [ ... snipping a bit of context here ... ]
>> 
>>>>>>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>>>>>>> +{
>>>>>>> +	enum ip_conntrack_info ctinfo;
>>>>>>> +	struct nf_conn *ct;
>>>>>>> +
>>>>>>> +	ct = nf_ct_get(skb, &ctinfo);
>>>>>>> +	if (!ct)
>>>>>>> +		return;
>>>>>>> +
>>>>>>> +	ct->mark &= 0x80ffffff;
>>>>>>> +	ct->mark |= (0x40 | dscp) << 24;
>>>>>> 
>>>>>> Right, so we *might* have an argument that putting the *tin* into the
>>>>>> fwmark is CAKE's business, but copying over the dscp mark is not
>>>>>> something a qdisc should be doing…
>>>>> 
>>>>> Why ever not?  It’s not the DSCP, it’s a lookup value into the cake
>>>>> priority table, it just happens to look like the DSCP ;-)
>>>> 
>>>> If it quacks like a duck…
>>> 
>>> I honestly don’t know where to go from here. I’m clearly trying to do
>>> something that the kernel doesn’t want to do.
>> 
>> I'm not disputing that what you're trying to do (moving DSCP field into
>> connmark) is useful. I'm just questioning whether CAKE is the right
>> place to do this. I think it would fit better in a TC action; either
>> modify act_connmark, or create a new action to sync fwmarks and DSCP
>> marks…
>
> Interesting.  Thinks out loud - Two actions - ‘act_storedscp’,
> ‘act_restoredscp'

Or act_dscp with 'get' and 'set' options :)

> As I said earlier I couldn't work out how m_conntrack did… anything at
> all to be honest!

act_connmark is pretty straight forward:
https://elixir.bootlin.com/linux/latest/source/net/sched/act_connmark.c#L34

>>> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>>> 		tin = 0;
>>> 
>>> 	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>>> -		 skb->mark &&
>>> -		 skb->mark <= q->tin_cnt)
>>> -		tin = q->tin_order[skb->mark - 1];
>>> -
>>> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
>>> -		 TC_H_MIN(skb->priority) > 0 &&
>>> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>>> +		   skb->mark & 0x40000000) {
>> 
>> I think there's something odd with this mask?  There's only one bit set
>> in it…
>
> I use the single bit as a flag to indicate cake has stored the  DSCP
> in the lower 6 bits of the byte.  Otherwise with a DSCP of 0 (the
> default) it’s rather difficult to know if a connection has been
> through the cake ’save dscp to fwmark’ process or not.  That also
> indicates to user space whether it should consider mangling packets or
> not e.g.

Ah, right. But that breaks the previous use case where someone just
wants to set fwmarks that get turned into CAKE tins?

I think this definitely is leaning towards decoupling the
fw-mark-to-DSCP settings to an action. And probably making the shift and
mask configurable rather than hard-coding something...

-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 16:39                 ` Toke Høiland-Jørgensen
@ 2019-03-04 17:19                   ` Kevin Darbyshire-Bryant
  2019-03-04 17:36                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-03-04 17:19 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Pete Heist, cake



> On 4 Mar 2019, at 16:39, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
> 
> [ ... snipping a bit of context here ... ]
> 
>>>>>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>>>>>> +{
>>>>>> +	enum ip_conntrack_info ctinfo;
>>>>>> +	struct nf_conn *ct;
>>>>>> +
>>>>>> +	ct = nf_ct_get(skb, &ctinfo);
>>>>>> +	if (!ct)
>>>>>> +		return;
>>>>>> +
>>>>>> +	ct->mark &= 0x80ffffff;
>>>>>> +	ct->mark |= (0x40 | dscp) << 24;
>>>>> 
>>>>> Right, so we *might* have an argument that putting the *tin* into the
>>>>> fwmark is CAKE's business, but copying over the dscp mark is not
>>>>> something a qdisc should be doing…
>>>> 
>>>> Why ever not?  It’s not the DSCP, it’s a lookup value into the cake
>>>> priority table, it just happens to look like the DSCP ;-)
>>> 
>>> If it quacks like a duck…
>> 
>> I honestly don’t know where to go from here. I’m clearly trying to do
>> something that the kernel doesn’t want to do.
> 
> I'm not disputing that what you're trying to do (moving DSCP field into
> connmark) is useful. I'm just questioning whether CAKE is the right
> place to do this. I think it would fit better in a TC action; either
> modify act_connmark, or create a new action to sync fwmarks and DSCP
> marks…

Interesting.  Thinks out loud - Two actions - ‘act_storedscp’, ‘act_restoredscp'

As I said earlier I couldn't work out how m_conntrack did… anything at all to be honest!

> 
> This would both sidestep the whole conntrack dependency issue, and make
> the same functionality available outside of CAKE (for an HTB-based
> setup, for instance).
> 
>> v2 addressing some of the comments attached.  Is it best to keep the
>> in progress patches here or should they be github PRs ?
> 
> Patches on the mailing list is fine by me, and it seems there are people
> reading the list, but not github, so let's keep it here for now at
> least. However, I'll hold off on more detailed comments on the patch
> until we've resolved the point above. With one exception:

There’s certainly been some response here that’s for sure :-)

> 
>> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>> 		tin = 0;
>> 
>> 	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>> -		 skb->mark &&
>> -		 skb->mark <= q->tin_cnt)
>> -		tin = q->tin_order[skb->mark - 1];
>> -
>> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
>> -		 TC_H_MIN(skb->priority) > 0 &&
>> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>> +		   skb->mark & 0x40000000) {
> 
> I think there's something odd with this mask?  There's only one bit set
> in it…

I use the single bit as a flag to indicate cake has stored the  DSCP in the lower 6 bits of the byte.  Otherwise with a DSCP of 0 (the default) it’s rather difficult to know if a connection has been through the cake ’save dscp to fwmark’ process or not.  That also indicates to user space whether it should consider mangling packets or not e.g.

#if bit 6 0 then not been marked by cake - go & mangle the DSCP ready for cake to find & set
#the mark.
    ipt -t mangle -A PREROUTING  -i $IFACE -m mark --mark 0x00/0x40000000 -g QOS_MARK_${IFACE}
    ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00/0x40000000 -g QOS_MARK_${IFACE}



Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 15:50               ` Kevin Darbyshire-Bryant
@ 2019-03-04 16:39                 ` Toke Høiland-Jørgensen
  2019-03-04 17:19                   ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-03-04 16:39 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: Pete Heist, cake

Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:

[ ... snipping a bit of context here ... ]

>>>>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>>>>> +{
>>>>> +	enum ip_conntrack_info ctinfo;
>>>>> +	struct nf_conn *ct;
>>>>> +
>>>>> +	ct = nf_ct_get(skb, &ctinfo);
>>>>> +	if (!ct)
>>>>> +		return;
>>>>> +
>>>>> +	ct->mark &= 0x80ffffff;
>>>>> +	ct->mark |= (0x40 | dscp) << 24;
>>>> 
>>>> Right, so we *might* have an argument that putting the *tin* into the
>>>> fwmark is CAKE's business, but copying over the dscp mark is not
>>>> something a qdisc should be doing…
>>> 
>>> Why ever not?  It’s not the DSCP, it’s a lookup value into the cake
>>> priority table, it just happens to look like the DSCP ;-)
>> 
>> If it quacks like a duck…
>
> I honestly don’t know where to go from here. I’m clearly trying to do
> something that the kernel doesn’t want to do.

I'm not disputing that what you're trying to do (moving DSCP field into
connmark) is useful. I'm just questioning whether CAKE is the right
place to do this. I think it would fit better in a TC action; either
modify act_connmark, or create a new action to sync fwmarks and DSCP
marks...

This would both sidestep the whole conntrack dependency issue, and make
the same functionality available outside of CAKE (for an HTB-based
setup, for instance).

> v2 addressing some of the comments attached.  Is it best to keep the
> in progress patches here or should they be github PRs ?

Patches on the mailing list is fine by me, and it seems there are people
reading the list, but not github, so let's keep it here for now at
least. However, I'll hold off on more detailed comments on the patch
until we've resolved the point above. With one exception:

> @@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>  		tin = 0;
>  
>  	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
> -		 skb->mark &&
> -		 skb->mark <= q->tin_cnt)
> -		tin = q->tin_order[skb->mark - 1];
> -
> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
> -		 TC_H_MIN(skb->priority) > 0 &&
> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
> +		   skb->mark & 0x40000000) {

I think there's something odd with this mask?  There's only one bit set
in it...

-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 12:44             ` Toke Høiland-Jørgensen
@ 2019-03-04 15:50               ` Kevin Darbyshire-Bryant
  2019-03-04 16:39                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-03-04 15:50 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Pete Heist, cake

[-- Attachment #1: Type: text/plain, Size: 7297 bytes --]



> On 4 Mar 2019, at 12:44, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
> 
>>> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>> 
>>> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
>>> 
>>>>> On 4 Mar 2019, at 08:39, Pete Heist <pete@heistp.net> wrote:
>>>>> 
>>>>> 
>>>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>>>>> 
>>>>>> The very bad idea:
>>>>>> 
>>>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>>>>> implementation as described above. So an awful lot of our
>>>>>> shenanigans above is due to DSCP not traversing the internet
>>>>>> particularly well. The solution above abstracts DSCP into ’tins’
>>>>>> which we put into fwmarks. Another approach would be to put the DSCP
>>>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>>>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>>>>> local domain preserved.
>>>>> 
>>>>> If I understand it right, another use case for this “very bad idea”
>>>>> is preserving DSCP locally while traversing upstream WiFi links as
>>>>> besteffort, which avoids airtime efficiency problems that can occur
>>>>> with 802.11e (WMM). In cases where the router config can’t be changed
>>>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>>>>> it hides DSCP from the WiFi stack while preserving the values through
>>>>> the tunnel, but this would be easier. Neat… :)
>>>> 
>>>> Everyone has understood the intent & maybe the implementation
>>>> correctly. 2 patches attached, one for cake, one for tc.
>>>> 
>>>> They are naively coded and some of it undoes Toke’s recent tidying up
>>>> (sorry!)
>>> 
>>> Heh. First comment: Don't do that ;)
>> 
>> I did say naively coded.
>> 
>>> 
>>> A few more below.
>>> 
>>>> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
>>>> diff --git a/pkt_sched.h b/pkt_sched.h
>>>> index a2f570c..d1f288d 100644
>>>> --- a/pkt_sched.h
>>>> +++ b/pkt_sched.h
>>>> @@ -879,6 +879,7 @@ enum {
>>>> 	TCA_CAKE_ACK_FILTER,
>>>> 	TCA_CAKE_SPLIT_GSO,
>>>> 	TCA_CAKE_FWMARK,
>>>> +	TCA_CAKE_ICING,
>>>> 	__TCA_CAKE_MAX
>>>> };
>>>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>>>> diff --git a/sch_cake.c b/sch_cake.c
>>>> index 733b897..5aca0f3 100644
>>>> --- a/sch_cake.c
>>>> +++ b/sch_cake.c
>>>> @@ -270,7 +270,8 @@ enum {
>>>> 	CAKE_FLAG_INGRESS	   = BIT(2),
>>>> 	CAKE_FLAG_WASH		   = BIT(3),
>>>> 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
>>>> -	CAKE_FLAG_FWMARK	   = BIT(5)
>>>> +	CAKE_FLAG_FWMARK	   = BIT(5),
>>>> +	CAKE_FLAG_ICING		   = BIT(6)
>>> 
>>> This implies that icing and fwmark can be enabled completely
>>> independently of each other. Are you sure about the semantics for that?
>> 
>> No, I’m not.  I sent the patches so others could see my lunacy in action and hopefully improve it.
>> 
>> The actual operation links FWMARK, INGRESS & ICING in a variety of
>> combinations.
> 
> Right, so obviously this needs to be thought through…

Yes.  Volunteers with thoughts & ideally code sought.

> 
>>>> };
>>>> 
>>>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
>>>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>>>> };
>>>> 
>>>> static const u8 diffserv4[] = {
>>>> -	0, 2, 0, 0, 2, 0, 0, 0,
>>>> +	0, 1, 0, 0, 2, 0, 0, 0,
>>>> 	1, 0, 0, 0, 0, 0, 0, 0,
>>>> 	2, 0, 2, 0, 2, 0, 2, 0,
>>>> 	2, 0, 2, 0, 2, 0, 2, 0,
>>>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>>>> };
>>>> 
>>>> static const u8 diffserv3[] = {
>>>> -	0, 0, 0, 0, 2, 0, 0, 0,
>>>> +	0, 1, 0, 0, 2, 0, 0, 0,
>>> 
>>> Why are you messing with the diffserv mappings in this patch?
>> 
>> This is a combination patch of Dave’s new LE coding and the
>> fwmark/dscp mangling.
> 
> Ah. Well let's keep that separate from this patch/discussion…

Done
> 
>>> 
>>>> 	1, 0, 0, 0, 0, 0, 0, 0,
>>>> 	0, 0, 0, 0, 0, 0, 0, 0,
>>>> 	0, 0, 0, 0, 0, 0, 0, 0,
>>>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>>>> 	return idx + (tin << 16);
>>>> }
>>>> 
>>>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>>>> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
>>>> +{
>>>> +	switch (skb->protocol) {
>>>> +	case htons(ETH_P_IP):
>>>> +		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>>>> +			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
>>>> +		break;
>>>> +	case htons(ETH_P_IPV6):
>>>> +		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>>>> +			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
>>>> +		break;
>>>> +	default:
>>>> +		break;
>>>> +	}
>>>> +
>>>> +}
>>> 
>>> So washing is just a special case of this (wash is
>>> cake_update_diffserv(skb,0)). So you shouldn't need to add another
>>> function, just augment the existing handling code.
>> 
>> Erm, that’s exactly what I’ve done.
> 
> Ah, right; I guess it's just the reverting of the cleanup patch that is
> the issue, then :)

With the reverting of the cleaning reverted - if that makes any sense.
And I have major twitchiness at the result.

> 
>>>> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>>>> {
>>>> 	u8 dscp;
>>>> 
>>>> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>>>> 	}
>>>> }
>>>> 
>>>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
>>> 
>>> Save an ifdef below by moving the ifdef inside the function definition.
>>> 
>>>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>>>> +{
>>>> +	enum ip_conntrack_info ctinfo;
>>>> +	struct nf_conn *ct;
>>>> +
>>>> +	ct = nf_ct_get(skb, &ctinfo);
>>>> +	if (!ct)
>>>> +		return;
>>>> +
>>>> +	ct->mark &= 0x80ffffff;
>>>> +	ct->mark |= (0x40 | dscp) << 24;
>>> 
>>> Right, so we *might* have an argument that putting the *tin* into the
>>> fwmark is CAKE's business, but copying over the dscp mark is not
>>> something a qdisc should be doing…
>> 
>> Why ever not?  It’s not the DSCP, it’s a lookup value into the cake
>> priority table, it just happens to look like the DSCP ;-)
> 
> If it quacks like a duck…

I honestly don’t know where to go from here.  I’m clearly trying to do something that the kernel doesn’t want to do.

> 
>> 
>>>> +	nf_conntrack_event_cache(IPCT_MARK, ct);
>>>> +}
>>>> +#endif
>>> 
>>> Also, are you sure this will work in all permutations of conntrack being
>>> a module vs not etc? (we had to jump through quite some hoops to get the
>>> conntrack hooks to work last time; this is probably my biggest worry here).
>> 
>> No, I have absolutely no clue here at all.
> 
> Well, another issue that needs fixing, then…

Again bright-eyed & bushy tailed volunteers sought!

> 
> -Toke

v2 addressing some of the comments attached.  Is it best to keep the in progress patches here or should they be github PRs ?


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A

[-- Attachment #2: v2-0001-Copy-dscp-to-fwmark.patch --]
[-- Type: application/octet-stream, Size: 4877 bytes --]

From 762f91e6da353815970d6c2f316f00d8f1cb6585 Mon Sep 17 00:00:00 2001
From: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Date: Fri, 1 Mar 2019 11:35:49 +0000
Subject: [PATCH v2] Copy dscp to fwmark

and then I had a really bad idea!

Which is - if we're in fwmark & egress mode & we don't have a mark so
we've fallen through to DSCP, set a mark based on the DSCP.
That way, reply packets will be automagically fw marked for us on their
return.

With the 'icing' keyword, copy the dscp coded into the fwmark into the
real packets (the opposite of wash) in ingress mode.

The fwmark encodes the dscp thus (basically we steal 7 bits from the top
byte of fwmark and preserve everything else):

ct->mark &= 0x80ffffff;
ct->mark |= (0x40 | dscp) << 24;

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 pkt_sched.h |  1 +
 sch_cake.c  | 69 +++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/pkt_sched.h b/pkt_sched.h
index a2f570c..d1f288d 100644
--- a/pkt_sched.h
+++ b/pkt_sched.h
@@ -879,6 +879,7 @@ enum {
 	TCA_CAKE_ACK_FILTER,
 	TCA_CAKE_SPLIT_GSO,
 	TCA_CAKE_FWMARK,
+	TCA_CAKE_ICING,
 	__TCA_CAKE_MAX
 };
 #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
diff --git a/sch_cake.c b/sch_cake.c
index 733b897..7da03b5 100644
--- a/sch_cake.c
+++ b/sch_cake.c
@@ -270,7 +270,8 @@ enum {
 	CAKE_FLAG_INGRESS	   = BIT(2),
 	CAKE_FLAG_WASH		   = BIT(3),
 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
-	CAKE_FLAG_FWMARK	   = BIT(5)
+	CAKE_FLAG_FWMARK	   = BIT(5),
+	CAKE_FLAG_ICING		   = BIT(6)
 };
 
 /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
@@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 	return idx + (tin << 16);
 }
 
-static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
+static void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
+{
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
+			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
+		break;
+	case htons(ETH_P_IPV6):
+		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
+			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
+		break;
+	default:
+		break;
+	}
+
+}
+
+static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
 {
 	u8 dscp;
 
@@ -1644,12 +1662,28 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
 	}
 }
 
+static void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
+{
+#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct)
+		return;
+
+	ct->mark &= 0x80ffffff;
+	ct->mark |= (0x40 | dscp) << 24;
+	nf_conntrack_event_cache(IPCT_MARK, ct);
+#endif
+}
+
 static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 					     struct sk_buff *skb)
 {
 	struct cake_sched_data *q = qdisc_priv(sch);
-	u32 tin;
 	u8 dscp;
+	u8 tin;
 
 	/* Tin selection: Default to diffserv-based selection, allow overriding
 	 * using firewall marks or skb->priority.
@@ -1661,13 +1695,14 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 		tin = 0;
 
 	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
-		 skb->mark &&
-		 skb->mark <= q->tin_cnt)
-		tin = q->tin_order[skb->mark - 1];
-
-	else if (TC_H_MAJ(skb->priority) == sch->handle &&
-		 TC_H_MIN(skb->priority) > 0 &&
-		 TC_H_MIN(skb->priority) <= q->tin_cnt)
+		   skb->mark & 0x40000000) {
+		dscp = skb->mark >> 24 & 0x3f;
+		tin = q->tin_index[dscp];
+		if (q->rate_flags & CAKE_FLAG_ICING)
+			cake_update_diffserv(skb, dscp << 2);
+	} else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
+		   TC_H_MIN(skb->priority) > 0 &&
+		   TC_H_MIN(skb->priority) <= q->tin_cnt)
 		tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
 
 	else {
@@ -1675,6 +1710,9 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 
 		if (unlikely(tin >= q->tin_cnt))
 			tin = 0;
+
+		if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
+			cake_update_ct_mark(skb, dscp);
 	}
 
 	return &q->tins[tin];
@@ -2763,6 +2801,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
 			q->rate_flags &= ~CAKE_FLAG_FWMARK;
 	}
 
+	if (tb[TCA_CAKE_ICING]) {
+		if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
+			q->rate_flags |= CAKE_FLAG_ICING;
+		else
+			q->rate_flags &= ~CAKE_FLAG_ICING;
+	}
+
 	if (q->tins) {
 		sch_tree_lock(sch);
 		cake_reconfigure(sch);
@@ -2947,6 +2992,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
 			!!(q->rate_flags & CAKE_FLAG_FWMARK)))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, TCA_CAKE_ICING,
+			!!(q->rate_flags & CAKE_FLAG_ICING)))
+		goto nla_put_failure;
+
 	return nla_nest_end(skb, opts);
 
 nla_put_failure:
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 11:55           ` Kevin Darbyshire-Bryant
@ 2019-03-04 12:44             ` Toke Høiland-Jørgensen
  2019-03-04 15:50               ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-03-04 12:44 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: Pete Heist, cake

Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:

>> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> 
>> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
>> 
>>>> On 4 Mar 2019, at 08:39, Pete Heist <pete@heistp.net> wrote:
>>>> 
>>>> 
>>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>>>> 
>>>>> The very bad idea:
>>>>> 
>>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>>>> implementation as described above. So an awful lot of our
>>>>> shenanigans above is due to DSCP not traversing the internet
>>>>> particularly well. The solution above abstracts DSCP into ’tins’
>>>>> which we put into fwmarks. Another approach would be to put the DSCP
>>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>>>> local domain preserved.
>>>> 
>>>> If I understand it right, another use case for this “very bad idea”
>>>> is preserving DSCP locally while traversing upstream WiFi links as
>>>> besteffort, which avoids airtime efficiency problems that can occur
>>>> with 802.11e (WMM). In cases where the router config can’t be changed
>>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>>>> it hides DSCP from the WiFi stack while preserving the values through
>>>> the tunnel, but this would be easier. Neat… :)
>>> 
>>> Everyone has understood the intent & maybe the implementation
>>> correctly. 2 patches attached, one for cake, one for tc.
>>> 
>>> They are naively coded and some of it undoes Toke’s recent tidying up
>>> (sorry!)
>> 
>> Heh. First comment: Don't do that ;)
>
> I did say naively coded.
>
>> 
>> A few more below.
>> 
>>> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
>>> diff --git a/pkt_sched.h b/pkt_sched.h
>>> index a2f570c..d1f288d 100644
>>> --- a/pkt_sched.h
>>> +++ b/pkt_sched.h
>>> @@ -879,6 +879,7 @@ enum {
>>> 	TCA_CAKE_ACK_FILTER,
>>> 	TCA_CAKE_SPLIT_GSO,
>>> 	TCA_CAKE_FWMARK,
>>> +	TCA_CAKE_ICING,
>>> 	__TCA_CAKE_MAX
>>> };
>>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>>> diff --git a/sch_cake.c b/sch_cake.c
>>> index 733b897..5aca0f3 100644
>>> --- a/sch_cake.c
>>> +++ b/sch_cake.c
>>> @@ -270,7 +270,8 @@ enum {
>>> 	CAKE_FLAG_INGRESS	   = BIT(2),
>>> 	CAKE_FLAG_WASH		   = BIT(3),
>>> 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
>>> -	CAKE_FLAG_FWMARK	   = BIT(5)
>>> +	CAKE_FLAG_FWMARK	   = BIT(5),
>>> +	CAKE_FLAG_ICING		   = BIT(6)
>> 
>> This implies that icing and fwmark can be enabled completely
>> independently of each other. Are you sure about the semantics for that?
>
> No, I’m not.  I sent the patches so others could see my lunacy in action and hopefully improve it.
>
> The actual operation links FWMARK, INGRESS & ICING in a variety of
> combinations.

Right, so obviously this needs to be thought through...

>>> };
>>> 
>>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
>>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>>> };
>>> 
>>> static const u8 diffserv4[] = {
>>> -	0, 2, 0, 0, 2, 0, 0, 0,
>>> +	0, 1, 0, 0, 2, 0, 0, 0,
>>> 	1, 0, 0, 0, 0, 0, 0, 0,
>>> 	2, 0, 2, 0, 2, 0, 2, 0,
>>> 	2, 0, 2, 0, 2, 0, 2, 0,
>>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>>> };
>>> 
>>> static const u8 diffserv3[] = {
>>> -	0, 0, 0, 0, 2, 0, 0, 0,
>>> +	0, 1, 0, 0, 2, 0, 0, 0,
>> 
>> Why are you messing with the diffserv mappings in this patch?
>
> This is a combination patch of Dave’s new LE coding and the
> fwmark/dscp mangling.

Ah. Well let's keep that separate from this patch/discussion...

>> 
>>> 	1, 0, 0, 0, 0, 0, 0, 0,
>>> 	0, 0, 0, 0, 0, 0, 0, 0,
>>> 	0, 0, 0, 0, 0, 0, 0, 0,
>>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>>> 	return idx + (tin << 16);
>>> }
>>> 
>>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>>> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
>>> +{
>>> +	switch (skb->protocol) {
>>> +	case htons(ETH_P_IP):
>>> +		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>>> +			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
>>> +		break;
>>> +	case htons(ETH_P_IPV6):
>>> +		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>>> +			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
>>> +		break;
>>> +	default:
>>> +		break;
>>> +	}
>>> +
>>> +}
>> 
>> So washing is just a special case of this (wash is
>> cake_update_diffserv(skb,0)). So you shouldn't need to add another
>> function, just augment the existing handling code.
>
> Erm, that’s exactly what I’ve done.

Ah, right; I guess it's just the reverting of the cleanup patch that is
the issue, then :)

>>> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>>> {
>>> 	u8 dscp;
>>> 
>>> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>>> 	}
>>> }
>>> 
>>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
>> 
>> Save an ifdef below by moving the ifdef inside the function definition.
>> 
>>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>>> +{
>>> +	enum ip_conntrack_info ctinfo;
>>> +	struct nf_conn *ct;
>>> +
>>> +	ct = nf_ct_get(skb, &ctinfo);
>>> +	if (!ct)
>>> +		return;
>>> +
>>> +	ct->mark &= 0x80ffffff;
>>> +	ct->mark |= (0x40 | dscp) << 24;
>> 
>> Right, so we *might* have an argument that putting the *tin* into the
>> fwmark is CAKE's business, but copying over the dscp mark is not
>> something a qdisc should be doing…
>
> Why ever not?  It’s not the DSCP, it’s a lookup value into the cake
> priority table, it just happens to look like the DSCP ;-)

If it quacks like a duck...

>
>>> +	nf_conntrack_event_cache(IPCT_MARK, ct);
>>> +}
>>> +#endif
>> 
>> Also, are you sure this will work in all permutations of conntrack being
>> a module vs not etc? (we had to jump through quite some hoops to get the
>> conntrack hooks to work last time; this is probably my biggest worry here).
>
> No, I have absolutely no clue here at all.

Well, another issue that needs fixing, then...

-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 11:17         ` Toke Høiland-Jørgensen
@ 2019-03-04 11:55           ` Kevin Darbyshire-Bryant
  2019-03-04 12:44             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-03-04 11:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Pete Heist, cake



> On 4 Mar 2019, at 11:17, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> 
> Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
> 
>>> On 4 Mar 2019, at 08:39, Pete Heist <pete@heistp.net> wrote:
>>> 
>>> 
>>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>>> 
>>>> The very bad idea:
>>>> 
>>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>>> implementation as described above. So an awful lot of our
>>>> shenanigans above is due to DSCP not traversing the internet
>>>> particularly well. The solution above abstracts DSCP into ’tins’
>>>> which we put into fwmarks. Another approach would be to put the DSCP
>>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>>> local domain preserved.
>>> 
>>> If I understand it right, another use case for this “very bad idea”
>>> is preserving DSCP locally while traversing upstream WiFi links as
>>> besteffort, which avoids airtime efficiency problems that can occur
>>> with 802.11e (WMM). In cases where the router config can’t be changed
>>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>>> it hides DSCP from the WiFi stack while preserving the values through
>>> the tunnel, but this would be easier. Neat… :)
>> 
>> Everyone has understood the intent & maybe the implementation
>> correctly. 2 patches attached, one for cake, one for tc.
>> 
>> They are naively coded and some of it undoes Toke’s recent tidying up
>> (sorry!)
> 
> Heh. First comment: Don't do that ;)

I did say naively coded.

> 
> A few more below.
> 
>> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
>> diff --git a/pkt_sched.h b/pkt_sched.h
>> index a2f570c..d1f288d 100644
>> --- a/pkt_sched.h
>> +++ b/pkt_sched.h
>> @@ -879,6 +879,7 @@ enum {
>> 	TCA_CAKE_ACK_FILTER,
>> 	TCA_CAKE_SPLIT_GSO,
>> 	TCA_CAKE_FWMARK,
>> +	TCA_CAKE_ICING,
>> 	__TCA_CAKE_MAX
>> };
>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>> diff --git a/sch_cake.c b/sch_cake.c
>> index 733b897..5aca0f3 100644
>> --- a/sch_cake.c
>> +++ b/sch_cake.c
>> @@ -270,7 +270,8 @@ enum {
>> 	CAKE_FLAG_INGRESS	   = BIT(2),
>> 	CAKE_FLAG_WASH		   = BIT(3),
>> 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
>> -	CAKE_FLAG_FWMARK	   = BIT(5)
>> +	CAKE_FLAG_FWMARK	   = BIT(5),
>> +	CAKE_FLAG_ICING		   = BIT(6)
> 
> This implies that icing and fwmark can be enabled completely
> independently of each other. Are you sure about the semantics for that?

No, I’m not.  I sent the patches so others could see my lunacy in action and hopefully improve it.

The actual operation links FWMARK, INGRESS & ICING in a variety of combinations.

>> };
>> 
>> /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
>> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>> };
>> 
>> static const u8 diffserv4[] = {
>> -	0, 2, 0, 0, 2, 0, 0, 0,
>> +	0, 1, 0, 0, 2, 0, 0, 0,
>> 	1, 0, 0, 0, 0, 0, 0, 0,
>> 	2, 0, 2, 0, 2, 0, 2, 0,
>> 	2, 0, 2, 0, 2, 0, 2, 0,
>> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>> };
>> 
>> static const u8 diffserv3[] = {
>> -	0, 0, 0, 0, 2, 0, 0, 0,
>> +	0, 1, 0, 0, 2, 0, 0, 0,
> 
> Why are you messing with the diffserv mappings in this patch?

This is a combination patch of Dave’s new LE coding and the fwmark/dscp mangling.

> 
>> 	1, 0, 0, 0, 0, 0, 0, 0,
>> 	0, 0, 0, 0, 0, 0, 0, 0,
>> 	0, 0, 0, 0, 0, 0, 0, 0,
>> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>> 	return idx + (tin << 16);
>> }
>> 
>> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
>> +{
>> +	switch (skb->protocol) {
>> +	case htons(ETH_P_IP):
>> +		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>> +			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
>> +		break;
>> +	case htons(ETH_P_IPV6):
>> +		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
>> +			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +
>> +}
> 
> So washing is just a special case of this (wash is
> cake_update_diffserv(skb,0)). So you shouldn't need to add another
> function, just augment the existing handling code.

Erm, that’s exactly what I’ve done.

> 
>> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>> {
>> 	u8 dscp;
>> 
>> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>> 	}
>> }
>> 
>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
> 
> Save an ifdef below by moving the ifdef inside the function definition.
> 
>> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
>> +{
>> +	enum ip_conntrack_info ctinfo;
>> +	struct nf_conn *ct;
>> +
>> +	ct = nf_ct_get(skb, &ctinfo);
>> +	if (!ct)
>> +		return;
>> +
>> +	ct->mark &= 0x80ffffff;
>> +	ct->mark |= (0x40 | dscp) << 24;
> 
> Right, so we *might* have an argument that putting the *tin* into the
> fwmark is CAKE's business, but copying over the dscp mark is not
> something a qdisc should be doing…

Why ever not?  It’s not the DSCP, it’s a lookup value into the cake priority table, it just happens to look like the DSCP ;-)


>> +	nf_conntrack_event_cache(IPCT_MARK, ct);
>> +}
>> +#endif
> 
> Also, are you sure this will work in all permutations of conntrack being
> a module vs not etc? (we had to jump through quite some hoops to get the
> conntrack hooks to work last time; this is probably my biggest worry here).

No, I have absolutely no clue here at all.

> 
>> static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>> 					     struct sk_buff *skb)
>> {
>> 	struct cake_sched_data *q = qdisc_priv(sch);
>> -	u32 tin;
>> +	bool wash;
>> 	u8 dscp;
>> +	u8 tin;
>> 
>> -	/* Tin selection: Default to diffserv-based selection, allow overriding
>> -	 * using firewall marks or skb->priority.
>> -	 */
>> -	dscp = cake_handle_diffserv(skb,
>> -				    q->rate_flags & CAKE_FLAG_WASH);
>> +	wash = !!(q->rate_flags & CAKE_FLAG_WASH);
>> +
>> +	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) {
>> 
>> -	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
>> 		tin = 0;
>> +		if (wash)
>> +			cake_update_diffserv(skb, 0);
>> 
>> -	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>> -		 skb->mark &&
>> -		 skb->mark <= q->tin_cnt)
>> -		tin = q->tin_order[skb->mark - 1];
>> +	} else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
>> +		   TC_H_MIN(skb->priority) > 0 &&
>> +		   TC_H_MIN(skb->priority) <= q->tin_cnt) {
>> 
>> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
>> -		 TC_H_MIN(skb->priority) > 0 &&
>> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>> 		tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
>> +		if (wash)
>> +			cake_update_diffserv(skb, 0);
>> +
>> +	} else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
>> +		   skb->mark & 0x40000000) {
>> +
>> +		dscp = skb->mark >> 24 & 0x3f;
>> +		tin = q->tin_index[dscp];
>> 
>> -	else {
>> +		if (wash)
>> +			cake_update_diffserv(skb, 0);
>> +		else if (q->rate_flags & CAKE_FLAG_ICING)
>> +			cake_update_diffserv(skb, dscp << 2);
>> +
>> +	} else { /* fallback to DSCP */
>> +		/* extract the Diffserv Precedence field, if it exists */
>> +		/* and clear DSCP bits if washing */
>> +		dscp = cake_handle_diffserv(skb, wash);
>> 		tin = q->tin_index[dscp];
> 
> As I said above, no reason to revert the cleanup commit…
> 
>> 		if (unlikely(tin >= q->tin_cnt))
>> 			tin = 0;
>> +
>> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
>> +		if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
>> +			cake_update_ct_mark(skb, dscp);
>> +#endif
> 
> See above about moving the ifdef and losing this one.

Done

> 
>> 	}
>> 
>> 	return &q->tins[tin];
>> @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
>> 			q->rate_flags &= ~CAKE_FLAG_FWMARK;
>> 	}
>> 
>> +	if (tb[TCA_CAKE_ICING]) {
>> +		if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
>> +			q->rate_flags |= CAKE_FLAG_ICING;
>> +		else
>> +			q->rate_flags &= ~CAKE_FLAG_ICING;
>> +	}
>> +
>> 	if (q->tins) {
>> 		sch_tree_lock(sch);
>> 		cake_reconfigure(sch);
>> @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
>> 			!!(q->rate_flags & CAKE_FLAG_FWMARK)))
>> 		goto nla_put_failure;
>> 
>> +	if (nla_put_u32(skb, TCA_CAKE_ICING,
>> +			!!(q->rate_flags & CAKE_FLAG_ICING)))
>> +		goto nla_put_failure;
>> +
>> 	return nla_nest_end(skb, opts);
>> 
>> nla_put_failure:
>> From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001
>> From: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
>> Date: Wed, 27 Feb 2019 14:46:05 +0000
>> Subject: [PATCH] cake: add fwmark & icing options
>> 
>> Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
>> ---
>> include/uapi/linux/pkt_sched.h |  2 ++
>> man/man8/tc-cake.8             | 19 ++++++++++++++++
>> tc/q_cake.c                    | 40 ++++++++++++++++++++++++++++++++++
>> 3 files changed, 61 insertions(+)
>> 
>> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
>> index 01f96352..f2b1b270 100644
>> --- a/include/uapi/linux/pkt_sched.h
>> +++ b/include/uapi/linux/pkt_sched.h
>> @@ -954,6 +954,8 @@ enum {
>> 	TCA_CAKE_INGRESS,
>> 	TCA_CAKE_ACK_FILTER,
>> 	TCA_CAKE_SPLIT_GSO,
>> +	TCA_CAKE_FWMARK,
>> +	TCA_CAKE_ICING,
>> 	__TCA_CAKE_MAX
>> };
>> #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
>> diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
>> index eda436e1..626d4525 100644
>> --- a/man/man8/tc-cake.8
>> +++ b/man/man8/tc-cake.8
>> @@ -73,6 +73,12 @@ TIME |
>> ]
>> .br
>> [
>> +.BR fwmark
>> +|
>> +.BR nofwmark*
>> +]
>> +.br
>> +[
>> .BR split-gso*
>> |
>> .BR no-split-gso
>> @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be used as both source and
>> destination host.
>> 
>> 
>> +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS
>> +
>> +In addition to TC FILTER tin classification, firewall marks may also optionally
>> +be used.  The priority order (highest to lowest) for tin selection is TC filter,
>> +firewall mark and then DSCP.
>> +.PP
>> +.B fwmark
>> +
>> +.br
>> +	Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to the
>> +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4.
>> +Values outside the valid range are ignored and CAKE will fall back to using
>> +DSCP for tin selection.
> 
> This should document the masking and shifting.

Yes, and the ‘icing’ option however that turns out.  Again it’s a case of sharing what I have for others to test, play, improve.

> 
> 
>> 
>> .SH EXAMPLES
>> # tc qdisc delete root dev eth0
>> diff --git a/tc/q_cake.c b/tc/q_cake.c
>> index e827e3f1..fdafd3b7 100644
>> --- a/tc/q_cake.c
>> +++ b/tc/q_cake.c
>> @@ -79,6 +79,8 @@ static void explain(void)
>> "                  dual-srchost | dual-dsthost | triple-isolate* ]\n"
>> "                [ nat | nonat* ]\n"
>> "                [ wash | nowash* ]\n"
>> +"                [ icing | noicing* ]\n"
>> +"                [ fwmark | nofwmark* ]\n"
> 
> Much as I appreciate the wordplay, I'm not sure this is actually going
> to be super helpful for something just trying to make sense of the
> command output. Not sure I have a better suggestion, though...
> 
> -Toke


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 11:04         ` Toke Høiland-Jørgensen
@ 2019-03-04 11:39           ` John Sager
  0 siblings, 0 replies; 30+ messages in thread
From: John Sager @ 2019-03-04 11:39 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Kevin Darbyshire-Bryant; +Cc: cake

[-- Attachment #1: Type: text/plain, Size: 2054 bytes --]

Let action connmark continue to do that. You still need mirred anyway.

John


On 4 March 2019 11:04:39 GMT, "Toke Høiland-Jørgensen" <toke@redhat.com> wrote:
>Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:
>
>>> On 3 Mar 2019, at 12:22, John Sager <john@sager.me.uk> wrote:
>>> 
>>> If you are going to do that, I would suggest using a few of the
>upper bits
>>> of the 32-bit fwmark/connmark space available, rather than the
>lowest bits.
>>> Then that would allow to use fwmarks other purposes, and to use the
>lowest
>>> bits, as well in the future. As iptables allows a mask before
>comparison,
>>> then choose a specific mask for the bits you use both for setting
>and testing.
>>
>> That’s a good point and I’m sort of hoping upstream reject the
>current
>> submission on that basis.  I think the ‘use of fwmarks’ needs more
>> thought as to how it’s done for the future - too keen to get
>something
>> out.  Maybe it’s sufficient as is, I don’t know.
>
>https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0b5c7efdfc6e389ec6840579fe90bdb6f42b08dc
>
>This means it'll be in 5.1; so we have until that is released (~8 weeks
>or so) to set the behaviour in stone.
>
>I do think we at least need to add masking of the mark before using it
>for tin selection; the question is just which bits to use from it.
>
>As for setting the fwmark back in conntrack, I'm not sure I agree that
>this is something CAKE should be doing. Mostly because it means even
>tighter coupling between CAKE and the conntrack subsystem. However, I
>may be convinced by a sufficiently neat implementation, and anyway this
>is definitely something that will need to wait for 5.2 for upstream.
>
>So I think the priority is to agree on semantics for masking the fwmark
>when reading, and getting that implemented in a way that is compatible
>with both other uses of marks, and with anything we else we might want
>to do down the road.
>
>-Toke

-- 
Sent from the Aether.

[-- Attachment #2: Type: text/html, Size: 2718 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04 11:01       ` Kevin Darbyshire-Bryant
@ 2019-03-04 11:17         ` Toke Høiland-Jørgensen
  2019-03-04 11:55           ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-03-04 11:17 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant, Pete Heist; +Cc: cake

Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:

>> On 4 Mar 2019, at 08:39, Pete Heist <pete@heistp.net> wrote:
>> 
>> 
>>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>> 
>>> The very bad idea:
>>> 
>>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark
>>> implementation as described above. So an awful lot of our
>>> shenanigans above is due to DSCP not traversing the internet
>>> particularly well. The solution above abstracts DSCP into ’tins’
>>> which we put into fwmarks. Another approach would be to put the DSCP
>>> *into* the fwmark. CAKE could (optionally) copy the FWMARK contained
>>> DSCP into the diffserv field onto the actual packets. Voila DSCP
>>> traversal across ’tinternet with tin/bandwidth allocation in our
>>> local domain preserved.
>> 
>> If I understand it right, another use case for this “very bad idea”
>> is preserving DSCP locally while traversing upstream WiFi links as
>> besteffort, which avoids airtime efficiency problems that can occur
>> with 802.11e (WMM). In cases where the router config can’t be changed
>> (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as
>> it hides DSCP from the WiFi stack while preserving the values through
>> the tunnel, but this would be easier. Neat… :)
>
> Everyone has understood the intent & maybe the implementation
> correctly. 2 patches attached, one for cake, one for tc.
>
> They are naively coded and some of it undoes Toke’s recent tidying up
> (sorry!)

Heh. First comment: Don't do that ;)

A few more below.

> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
> diff --git a/pkt_sched.h b/pkt_sched.h
> index a2f570c..d1f288d 100644
> --- a/pkt_sched.h
> +++ b/pkt_sched.h
> @@ -879,6 +879,7 @@ enum {
>  	TCA_CAKE_ACK_FILTER,
>  	TCA_CAKE_SPLIT_GSO,
>  	TCA_CAKE_FWMARK,
> +	TCA_CAKE_ICING,
>  	__TCA_CAKE_MAX
>  };
>  #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
> diff --git a/sch_cake.c b/sch_cake.c
> index 733b897..5aca0f3 100644
> --- a/sch_cake.c
> +++ b/sch_cake.c
> @@ -270,7 +270,8 @@ enum {
>  	CAKE_FLAG_INGRESS	   = BIT(2),
>  	CAKE_FLAG_WASH		   = BIT(3),
>  	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
> -	CAKE_FLAG_FWMARK	   = BIT(5)
> +	CAKE_FLAG_FWMARK	   = BIT(5),
> +	CAKE_FLAG_ICING		   = BIT(6)

This implies that icing and fwmark can be enabled completely
independently of each other. Are you sure about the semantics for that?
>  };
>  
>  /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
> @@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
>  };
>  
>  static const u8 diffserv4[] = {
> -	0, 2, 0, 0, 2, 0, 0, 0,
> +	0, 1, 0, 0, 2, 0, 0, 0,
>  	1, 0, 0, 0, 0, 0, 0, 0,
>  	2, 0, 2, 0, 2, 0, 2, 0,
>  	2, 0, 2, 0, 2, 0, 2, 0,
> @@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
>  };
>  
>  static const u8 diffserv3[] = {
> -	0, 0, 0, 0, 2, 0, 0, 0,
> +	0, 1, 0, 0, 2, 0, 0, 0,

Why are you messing with the diffserv mappings in this patch?

>  	1, 0, 0, 0, 0, 0, 0, 0,
>  	0, 0, 0, 0, 0, 0, 0, 0,
>  	0, 0, 0, 0, 0, 0, 0, 0,
> @@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
>  	return idx + (tin << 16);
>  }
>  
> -static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
> +void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
> +{
> +	switch (skb->protocol) {
> +	case htons(ETH_P_IP):
> +		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> +			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
> +		break;
> +	case htons(ETH_P_IPV6):
> +		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
> +			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +}

So washing is just a special case of this (wash is
cake_update_diffserv(skb,0)). So you shouldn't need to add another
function, just augment the existing handling code.

> +static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
>  {
>  	u8 dscp;
>  
> @@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
>  	}
>  }
>  
> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)

Save an ifdef below by moving the ifdef inside the function definition.

> +void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
> +{
> +	enum ip_conntrack_info ctinfo;
> +	struct nf_conn *ct;
> +
> +	ct = nf_ct_get(skb, &ctinfo);
> +	if (!ct)
> +		return;
> +
> +	ct->mark &= 0x80ffffff;
> +	ct->mark |= (0x40 | dscp) << 24;

Right, so we *might* have an argument that putting the *tin* into the
fwmark is CAKE's business, but copying over the dscp mark is not
something a qdisc should be doing...

> +	nf_conntrack_event_cache(IPCT_MARK, ct);
> +}
> +#endif

Also, are you sure this will work in all permutations of conntrack being
a module vs not etc? (we had to jump through quite some hoops to get the
conntrack hooks to work last time; this is probably my biggest worry here).

>  static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
>  					     struct sk_buff *skb)
>  {
>  	struct cake_sched_data *q = qdisc_priv(sch);
> -	u32 tin;
> +	bool wash;
>  	u8 dscp;
> +	u8 tin;
>  
> -	/* Tin selection: Default to diffserv-based selection, allow overriding
> -	 * using firewall marks or skb->priority.
> -	 */
> -	dscp = cake_handle_diffserv(skb,
> -				    q->rate_flags & CAKE_FLAG_WASH);
> +	wash = !!(q->rate_flags & CAKE_FLAG_WASH);
> +
> +	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) {
>  
> -	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
>  		tin = 0;
> +		if (wash)
> +			cake_update_diffserv(skb, 0);
>  
> -	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
> -		 skb->mark &&
> -		 skb->mark <= q->tin_cnt)
> -		tin = q->tin_order[skb->mark - 1];
> +	} else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
> +		   TC_H_MIN(skb->priority) > 0 &&
> +		   TC_H_MIN(skb->priority) <= q->tin_cnt) {
>  
> -	else if (TC_H_MAJ(skb->priority) == sch->handle &&
> -		 TC_H_MIN(skb->priority) > 0 &&
> -		 TC_H_MIN(skb->priority) <= q->tin_cnt)
>  		tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
> +		if (wash)
> +			cake_update_diffserv(skb, 0);
> +
> +	} else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
> +		   skb->mark & 0x40000000) {
> +
> +		dscp = skb->mark >> 24 & 0x3f;
> +		tin = q->tin_index[dscp];
>  
> -	else {
> +		if (wash)
> +			cake_update_diffserv(skb, 0);
> +		else if (q->rate_flags & CAKE_FLAG_ICING)
> +			cake_update_diffserv(skb, dscp << 2);
> +
> +	} else { /* fallback to DSCP */
> +		/* extract the Diffserv Precedence field, if it exists */
> +		/* and clear DSCP bits if washing */
> +		dscp = cake_handle_diffserv(skb, wash);
>  		tin = q->tin_index[dscp];

As I said above, no reason to revert the cleanup commit...

>  		if (unlikely(tin >= q->tin_cnt))
>  			tin = 0;
> +
> +#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
> +		if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
> +			cake_update_ct_mark(skb, dscp);
> +#endif

See above about moving the ifdef and losing this one.

>  	}
>  
>  	return &q->tins[tin];
> @@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
>  			q->rate_flags &= ~CAKE_FLAG_FWMARK;
>  	}
>  
> +	if (tb[TCA_CAKE_ICING]) {
> +		if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
> +			q->rate_flags |= CAKE_FLAG_ICING;
> +		else
> +			q->rate_flags &= ~CAKE_FLAG_ICING;
> +	}
> +
>  	if (q->tins) {
>  		sch_tree_lock(sch);
>  		cake_reconfigure(sch);
> @@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
>  			!!(q->rate_flags & CAKE_FLAG_FWMARK)))
>  		goto nla_put_failure;
>  
> +	if (nla_put_u32(skb, TCA_CAKE_ICING,
> +			!!(q->rate_flags & CAKE_FLAG_ICING)))
> +		goto nla_put_failure;
> +
>  	return nla_nest_end(skb, opts);
>  
>  nla_put_failure:
> From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001
> From: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
> Date: Wed, 27 Feb 2019 14:46:05 +0000
> Subject: [PATCH] cake: add fwmark & icing options
>
> Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
> ---
>  include/uapi/linux/pkt_sched.h |  2 ++
>  man/man8/tc-cake.8             | 19 ++++++++++++++++
>  tc/q_cake.c                    | 40 ++++++++++++++++++++++++++++++++++
>  3 files changed, 61 insertions(+)
>
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 01f96352..f2b1b270 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -954,6 +954,8 @@ enum {
>  	TCA_CAKE_INGRESS,
>  	TCA_CAKE_ACK_FILTER,
>  	TCA_CAKE_SPLIT_GSO,
> +	TCA_CAKE_FWMARK,
> +	TCA_CAKE_ICING,
>  	__TCA_CAKE_MAX
>  };
>  #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
> diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
> index eda436e1..626d4525 100644
> --- a/man/man8/tc-cake.8
> +++ b/man/man8/tc-cake.8
> @@ -73,6 +73,12 @@ TIME |
>  ]
>  .br
>  [
> +.BR fwmark
> +|
> +.BR nofwmark*
> +]
> +.br
> +[
>  .BR split-gso*
>  |
>  .BR no-split-gso
> @@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be used as both source and
>  destination host.
>  
>  
> +.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS
> +
> +In addition to TC FILTER tin classification, firewall marks may also optionally
> +be used.  The priority order (highest to lowest) for tin selection is TC filter,
> +firewall mark and then DSCP.
> +.PP
> +.B fwmark
> +
> +.br
> +	Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to the
> +maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4.
> +Values outside the valid range are ignored and CAKE will fall back to using
> +DSCP for tin selection.

This should document the masking and shifting.


>  
>  .SH EXAMPLES
>  # tc qdisc delete root dev eth0
> diff --git a/tc/q_cake.c b/tc/q_cake.c
> index e827e3f1..fdafd3b7 100644
> --- a/tc/q_cake.c
> +++ b/tc/q_cake.c
> @@ -79,6 +79,8 @@ static void explain(void)
>  "                  dual-srchost | dual-dsthost | triple-isolate* ]\n"
>  "                [ nat | nonat* ]\n"
>  "                [ wash | nowash* ]\n"
> +"                [ icing | noicing* ]\n"
> +"                [ fwmark | nofwmark* ]\n"

Much as I appreciate the wordplay, I'm not sure this is actually going
to be super helpful for something just trying to make sense of the
command output. Not sure I have a better suggestion, though...

-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-03 16:25       ` Kevin Darbyshire-Bryant
@ 2019-03-04 11:04         ` Toke Høiland-Jørgensen
  2019-03-04 11:39           ` John Sager
  0 siblings, 1 reply; 30+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-03-04 11:04 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant, John Sager; +Cc: cake

Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> writes:

>> On 3 Mar 2019, at 12:22, John Sager <john@sager.me.uk> wrote:
>> 
>> If you are going to do that, I would suggest using a few of the upper bits
>> of the 32-bit fwmark/connmark space available, rather than the lowest bits.
>> Then that would allow to use fwmarks other purposes, and to use the lowest
>> bits, as well in the future. As iptables allows a mask before comparison,
>> then choose a specific mask for the bits you use both for setting and testing.
>
> That’s a good point and I’m sort of hoping upstream reject the current
> submission on that basis.  I think the ‘use of fwmarks’ needs more
> thought as to how it’s done for the future - too keen to get something
> out.  Maybe it’s sufficient as is, I don’t know.

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=0b5c7efdfc6e389ec6840579fe90bdb6f42b08dc

This means it'll be in 5.1; so we have until that is released (~8 weeks
or so) to set the behaviour in stone.

I do think we at least need to add masking of the mark before using it
for tin selection; the question is just which bits to use from it.

As for setting the fwmark back in conntrack, I'm not sure I agree that
this is something CAKE should be doing. Mostly because it means even
tighter coupling between CAKE and the conntrack subsystem. However, I
may be convinced by a sufficiently neat implementation, and anyway this
is definitely something that will need to wait for 5.2 for upstream.

So I think the priority is to agree on semantics for masking the fwmark
when reading, and getting that implemented in a way that is compatible
with both other uses of marks, and with anything we else we might want
to do down the road.

-Toke

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04  8:39     ` Pete Heist
@ 2019-03-04 11:01       ` Kevin Darbyshire-Bryant
  2019-03-04 11:17         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-03-04 11:01 UTC (permalink / raw)
  To: Pete Heist; +Cc: cake

[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]



> On 4 Mar 2019, at 08:39, Pete Heist <pete@heistp.net> wrote:
> 
> 
>> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>> 
>> The very bad idea:
>> 
>> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark implementation as described above.  So an awful lot of our shenanigans above is due to DSCP not traversing the internet particularly well.  The solution above abstracts DSCP into ’tins’ which we put into fwmarks.  Another approach would be to put the DSCP *into* the fwmark.  CAKE could (optionally) copy the FWMARK contained DSCP into the diffserv field onto the actual packets.  Voila DSCP traversal across ’tinternet with tin/bandwidth allocation in our local domain preserved.
> 
> If I understand it right, another use case for this “very bad idea” is preserving DSCP locally while traversing upstream WiFi links as besteffort, which avoids airtime efficiency problems that can occur with 802.11e (WMM). In cases where the router config can’t be changed (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as it hides DSCP from the WiFi stack while preserving the values through the tunnel, but this would be easier. Neat… :)

Everyone has understood the intent & maybe the implementation correctly.  2 patches attached, one for cake, one for tc.

They are naively coded and some of it undoes Toke’s recent tidying up (sorry!)

The ingress path is dependent upon tc actions having restored the fwmark.  I cannot for the life of me work out how that actually happens (ie. the code that does it) to see if that’s all it does, or if cake could do it itself and so on.

Anyway, the code warts and all….



Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A

[-- Attachment #2: cake-add-dscp-copy-and-icing.patch --]
[-- Type: application/octet-stream, Size: 5022 bytes --]

diff --git a/pkt_sched.h b/pkt_sched.h
index a2f570c..d1f288d 100644
--- a/pkt_sched.h
+++ b/pkt_sched.h
@@ -879,6 +879,7 @@ enum {
 	TCA_CAKE_ACK_FILTER,
 	TCA_CAKE_SPLIT_GSO,
 	TCA_CAKE_FWMARK,
+	TCA_CAKE_ICING,
 	__TCA_CAKE_MAX
 };
 #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
diff --git a/sch_cake.c b/sch_cake.c
index 733b897..5aca0f3 100644
--- a/sch_cake.c
+++ b/sch_cake.c
@@ -270,7 +270,8 @@ enum {
 	CAKE_FLAG_INGRESS	   = BIT(2),
 	CAKE_FLAG_WASH		   = BIT(3),
 	CAKE_FLAG_SPLIT_GSO	   = BIT(4),
-	CAKE_FLAG_FWMARK	   = BIT(5)
+	CAKE_FLAG_FWMARK	   = BIT(5),
+	CAKE_FLAG_ICING		   = BIT(6)
 };
 
 /* COBALT operates the Codel and BLUE algorithms in parallel, in order to
@@ -333,7 +334,7 @@ static const u8 diffserv8[] = {
 };
 
 static const u8 diffserv4[] = {
-	0, 2, 0, 0, 2, 0, 0, 0,
+	0, 1, 0, 0, 2, 0, 0, 0,
 	1, 0, 0, 0, 0, 0, 0, 0,
 	2, 0, 2, 0, 2, 0, 2, 0,
 	2, 0, 2, 0, 2, 0, 2, 0,
@@ -344,7 +345,7 @@ static const u8 diffserv4[] = {
 };
 
 static const u8 diffserv3[] = {
-	0, 0, 0, 0, 2, 0, 0, 0,
+	0, 1, 0, 0, 2, 0, 0, 0,
 	1, 0, 0, 0, 0, 0, 0, 0,
 	0, 0, 0, 0, 0, 0, 0, 0,
 	0, 0, 0, 0, 0, 0, 0, 0,
@@ -1618,7 +1619,24 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
 	return idx + (tin << 16);
 }
 
-static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
+void cake_update_diffserv(struct sk_buff *skb, u8 dscp)
+{
+	switch (skb->protocol) {
+	case htons(ETH_P_IP):
+		if ((ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK) != dscp)
+			ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, dscp);
+		break;
+	case htons(ETH_P_IPV6):
+		if ((ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK) != dscp)
+			ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, dscp);
+		break;
+	default:
+		break;
+	}
+
+}
+
+static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
 {
 	u8 dscp;
 
@@ -1644,37 +1662,70 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
 	}
 }
 
+#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
+void cake_update_ct_mark(struct sk_buff *skb, u8 dscp)
+{
+	enum ip_conntrack_info ctinfo;
+	struct nf_conn *ct;
+
+	ct = nf_ct_get(skb, &ctinfo);
+	if (!ct)
+		return;
+
+	ct->mark &= 0x80ffffff;
+	ct->mark |= (0x40 | dscp) << 24;
+	nf_conntrack_event_cache(IPCT_MARK, ct);
+}
+#endif
+
 static struct cake_tin_data *cake_select_tin(struct Qdisc *sch,
 					     struct sk_buff *skb)
 {
 	struct cake_sched_data *q = qdisc_priv(sch);
-	u32 tin;
+	bool wash;
 	u8 dscp;
+	u8 tin;
 
-	/* Tin selection: Default to diffserv-based selection, allow overriding
-	 * using firewall marks or skb->priority.
-	 */
-	dscp = cake_handle_diffserv(skb,
-				    q->rate_flags & CAKE_FLAG_WASH);
+	wash = !!(q->rate_flags & CAKE_FLAG_WASH);
+
+	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT) {
 
-	if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
 		tin = 0;
+		if (wash)
+			cake_update_diffserv(skb, 0);
 
-	else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
-		 skb->mark &&
-		 skb->mark <= q->tin_cnt)
-		tin = q->tin_order[skb->mark - 1];
+	} else if (TC_H_MAJ(skb->priority) == sch->handle && /* use priority */
+		   TC_H_MIN(skb->priority) > 0 &&
+		   TC_H_MIN(skb->priority) <= q->tin_cnt) {
 
-	else if (TC_H_MAJ(skb->priority) == sch->handle &&
-		 TC_H_MIN(skb->priority) > 0 &&
-		 TC_H_MIN(skb->priority) <= q->tin_cnt)
 		tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
+		if (wash)
+			cake_update_diffserv(skb, 0);
+
+	} else if (q->rate_flags & CAKE_FLAG_FWMARK && /* use fw mark */
+		   skb->mark & 0x40000000) {
+
+		dscp = skb->mark >> 24 & 0x3f;
+		tin = q->tin_index[dscp];
 
-	else {
+		if (wash)
+			cake_update_diffserv(skb, 0);
+		else if (q->rate_flags & CAKE_FLAG_ICING)
+			cake_update_diffserv(skb, dscp << 2);
+
+	} else { /* fallback to DSCP */
+		/* extract the Diffserv Precedence field, if it exists */
+		/* and clear DSCP bits if washing */
+		dscp = cake_handle_diffserv(skb, wash);
 		tin = q->tin_index[dscp];
 
 		if (unlikely(tin >= q->tin_cnt))
 			tin = 0;
+
+#if IS_REACHABLE(CONFIG_NF_CONNTRACK)
+		if (q->rate_flags & CAKE_FLAG_FWMARK && !(q->rate_flags & CAKE_FLAG_INGRESS))
+			cake_update_ct_mark(skb, dscp);
+#endif
 	}
 
 	return &q->tins[tin];
@@ -2763,6 +2814,13 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
 			q->rate_flags &= ~CAKE_FLAG_FWMARK;
 	}
 
+	if (tb[TCA_CAKE_ICING]) {
+		if (!!nla_get_u32(tb[TCA_CAKE_ICING]))
+			q->rate_flags |= CAKE_FLAG_ICING;
+		else
+			q->rate_flags &= ~CAKE_FLAG_ICING;
+	}
+
 	if (q->tins) {
 		sch_tree_lock(sch);
 		cake_reconfigure(sch);
@@ -2947,6 +3005,10 @@ static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
 			!!(q->rate_flags & CAKE_FLAG_FWMARK)))
 		goto nla_put_failure;
 
+	if (nla_put_u32(skb, TCA_CAKE_ICING,
+			!!(q->rate_flags & CAKE_FLAG_ICING)))
+		goto nla_put_failure;
+
 	return nla_nest_end(skb, opts);
 
 nla_put_failure:

[-- Attachment #3: my_layer_cake.qos --]
[-- Type: application/octet-stream, Size: 4722 bytes --]

#!/bin/sh
# Cero3 Shaper
# A cake shaper and AQM solution that allows several diffserv marking schemes
# for ethernet gateways

# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
#
#       Copyright (C) 2012-5 Michael D. Taht, Toke Høiland-Jørgensen, Sebastian Moeller


#sm: TODO pass in the cake diffserv keyword

. ${SQM_LIB_DIR}/defaults.sh
QDISC=cake

# Default traffic classication is passed in INGRESS_CAKE_OPTS and EGRESS_CAKE_OPTS, defined in defaults.sh now

egress() {
    SILENT=1 $TC qdisc del dev $IFACE root
    $TC qdisc add dev $IFACE root handle cacf: $( get_stab_string ) cake \
        bandwidth ${UPLINK}kbit $( get_cake_lla_string ) ${EGRESS_CAKE_OPTS} ${EQDISC_OPTS}

}


ingress() {

    SILENT=1 $TC qdisc del dev $IFACE handle ffff: ingress
    $TC qdisc add dev $IFACE handle ffff: ingress

    SILENT=1 $TC qdisc del dev $DEV root

    [ "$IGNORE_DSCP_INGRESS" -eq "1" ] && INGRESS_CAKE_OPTS="$INGRESS_CAKE_OPTS besteffort"
    [ "$ZERO_DSCP_INGRESS" -eq "1" ] && INGRESS_CAKE_OPTS="$INGRESS_CAKE_OPTS wash"

    $TC qdisc add dev $DEV root handle cace: $( get_stab_string ) cake \
        bandwidth ${DOWNLINK}kbit $( get_cake_lla_string ) ${INGRESS_CAKE_OPTS} ${IQDISC_OPTS}

    $IP link set dev $DEV up

    # redirect all IP packets arriving in $IFACE to ifb0
    # and restore connmark this may be used by cake+
    $TC filter add dev $IFACE parent ffff: protocol all prio 10 u32 \
	match u32 0 0 flowid 1:1 action connmark action mirred egress redirect dev $DEV

    # Configure iptables chain to mark packets
    ipt -t mangle -N QOS_MARK_${IFACE}

    # Change DSCP of relevant hosts/packets - this will be picked up by cake+ and placed in the firewall connmark
    # also the DSCP is used as the tin selector.

iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.5/255.255.255.255 -m comment --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.5/255.255.255.255 -m comment --comment "Skybox DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.10/255.255.255.255 -m comment --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2
iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.10/255.255.255.255 -m comment --comment "Bluray DSCP CS2 Video" -j DSCP --set-dscp-class CS2
iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12/255.255.255.255 -m tcp --sport 6981 -m comment --comment "Bittorrent DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
iptables -t mangle -A QOS_MARK_${IFACE} -p udp -s 192.168.219.12/255.255.255.255 -m udp --sport 6981 -m comment --comment "Bittorrent DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
iptables -t mangle -A QOS_MARK_${IFACE} -p tcp -s 192.168.219.12/255.255.255.255 -m tcp --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1

ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff -m tcp --sport 6981 -m comment --comment "Bittorrent DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
ip6tables -t mangle -A QOS_MARK_${IFACE} -p udp -s ::c/::ffff:ffff:ffff:ffff -m udp --sport 6981 -m comment --comment "Bittorrent DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1
ip6tables -t mangle -A QOS_MARK_${IFACE} -p tcp -s ::c/::ffff:ffff:ffff:ffff -m tcp --dport 443 -m comment --comment "HTTPS uploads DSCP CS1 Bulk" -j DSCP --set-dscp-class CS1

    # Send cake+ unmarked connections to the marking chain - Cake+ uses 7 bits of the top byte as the
    # i've been marked & here's the dscp placeholder. Topmost bit is left spare as is likely to be interpreted
    # as a sign bit.
    ipt -t mangle -A PREROUTING  -i $IFACE -m mark --mark 0x00/0x40000000 -g QOS_MARK_${IFACE}
    ipt -t mangle -A POSTROUTING -o $IFACE -m mark --mark 0x00/0x40000000 -g QOS_MARK_${IFACE}

}

sqm_start() {
    [ -n "$IFACE" ] || return 1
    do_modules
    verify_qdisc $QDISC "cake" || return 1
    sqm_debug "Starting ${SCRIPT}"

    [ -z "$DEV" ] && DEV=$( get_ifb_for_if ${IFACE} )

    if [ "${UPLINK}" -ne 0 ];
    then
        egress
        sqm_debug "egress shaping activated"
    else
        sqm_debug "egress shaping deactivated"
        SILENT=1 $TC qdisc del dev ${IFACE} root
    fi
    if [ "${DOWNLINK}" -ne 0 ];
    then
	verify_qdisc ingress "ingress" || return 1
        ingress
        sqm_debug "ingress shaping activated"
    else
        sqm_debug "ingress shaping deactivated"
        SILENT=1 $TC qdisc del dev ${DEV} root
        SILENT=1 $TC qdisc del dev ${IFACE} ingress
    fi

    return 0
}

[-- Attachment #4: tc-cake-add-fwmark-icing-options.patch --]
[-- Type: application/octet-stream, Size: 5123 bytes --]

From 00e93b0dbbde10acfc8bc0a3787ca4d693f0ccc9 Mon Sep 17 00:00:00 2001
From: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Date: Wed, 27 Feb 2019 14:46:05 +0000
Subject: [PATCH] cake: add fwmark & icing options

Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
---
 include/uapi/linux/pkt_sched.h |  2 ++
 man/man8/tc-cake.8             | 19 ++++++++++++++++
 tc/q_cake.c                    | 40 ++++++++++++++++++++++++++++++++++
 3 files changed, 61 insertions(+)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 01f96352..f2b1b270 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -954,6 +954,8 @@ enum {
 	TCA_CAKE_INGRESS,
 	TCA_CAKE_ACK_FILTER,
 	TCA_CAKE_SPLIT_GSO,
+	TCA_CAKE_FWMARK,
+	TCA_CAKE_ICING,
 	__TCA_CAKE_MAX
 };
 #define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
diff --git a/man/man8/tc-cake.8 b/man/man8/tc-cake.8
index eda436e1..626d4525 100644
--- a/man/man8/tc-cake.8
+++ b/man/man8/tc-cake.8
@@ -73,6 +73,12 @@ TIME |
 ]
 .br
 [
+.BR fwmark
+|
+.BR nofwmark*
+]
+.br
+[
 .BR split-gso*
 |
 .BR no-split-gso
@@ -623,6 +629,19 @@ override mechanism; if a host ID is assigned, it will be used as both source and
 destination host.
 
 
+.SH OVERRIDING CLASSIFICATION WITH NETFILTER CONNMARKS
+
+In addition to TC FILTER tin classification, firewall marks may also optionally
+be used.  The priority order (highest to lowest) for tin selection is TC filter,
+firewall mark and then DSCP.
+.PP
+.B fwmark
+
+.br
+	Enables CONNMARK based tin selection. Valid CONNMARKS range from 1 to the
+maximum number of tins i.e. 3 tins for diffserv3, 4 tins for diffserv4.
+Values outside the valid range are ignored and CAKE will fall back to using
+DSCP for tin selection.
 
 .SH EXAMPLES
 # tc qdisc delete root dev eth0
diff --git a/tc/q_cake.c b/tc/q_cake.c
index e827e3f1..fdafd3b7 100644
--- a/tc/q_cake.c
+++ b/tc/q_cake.c
@@ -79,6 +79,8 @@ static void explain(void)
 "                  dual-srchost | dual-dsthost | triple-isolate* ]\n"
 "                [ nat | nonat* ]\n"
 "                [ wash | nowash* ]\n"
+"                [ icing | noicing* ]\n"
+"                [ fwmark | nofwmark* ]\n"
 "                [ split-gso* | no-split-gso ]\n"
 "                [ ack-filter | ack-filter-aggressive | no-ack-filter* ]\n"
 "                [ memlimit LIMIT ]\n"
@@ -106,6 +108,8 @@ static int cake_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 	int autorate = -1;
 	int ingress = -1;
 	int overhead = 0;
+	int fwmark = -1;
+	int icing = -1;
 	int wash = -1;
 	int nat = -1;
 	int atm = -1;
@@ -161,6 +165,14 @@ static int cake_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 			split_gso = 1;
 		} else if (strcmp(*argv, "no-split-gso") == 0) {
 			split_gso = 0;
+		} else if (strcmp(*argv, "fwmark") == 0) {
+			fwmark = 1;
+		} else if (strcmp(*argv, "nofwmark") == 0) {
+			fwmark = 0;
+		} else if (strcmp(*argv, "icing") == 0) {
+			icing = 1;
+		} else if (strcmp(*argv, "noicing") == 0) {
+			icing = 0;
 		} else if (strcmp(*argv, "flowblind") == 0) {
 			flowmode = CAKE_FLOW_NONE;
 		} else if (strcmp(*argv, "srchost") == 0) {
@@ -383,6 +395,12 @@ static int cake_parse_opt(struct qdisc_util *qu, int argc, char **argv,
 	if (split_gso != -1)
 		addattr_l(n, 1024, TCA_CAKE_SPLIT_GSO, &split_gso,
 			  sizeof(split_gso));
+	if (fwmark != -1)
+		addattr_l(n, 1024, TCA_CAKE_FWMARK, &fwmark,
+			  sizeof(fwmark));
+	if (icing != -1)
+		addattr_l(n, 1024, TCA_CAKE_ICING, &icing,
+			  sizeof(icing));
 	if (ingress != -1)
 		addattr_l(n, 1024, TCA_CAKE_INGRESS, &ingress, sizeof(ingress));
 	if (ack_filter != -1)
@@ -415,6 +433,8 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 	int overhead = 0;
 	int autorate = 0;
 	int ingress = 0;
+	int fwmark = 0;
+	int icing = 0;
 	int wash = 0;
 	int raw = 0;
 	int mpu = 0;
@@ -500,6 +520,14 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 	    RTA_PAYLOAD(tb[TCA_CAKE_SPLIT_GSO]) >= sizeof(__u32)) {
 		split_gso = rta_getattr_u32(tb[TCA_CAKE_SPLIT_GSO]);
 	}
+	if (tb[TCA_CAKE_FWMARK] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_FWMARK]) >= sizeof(__u32)) {
+		fwmark = rta_getattr_u32(tb[TCA_CAKE_FWMARK]);
+	}
+	if (tb[TCA_CAKE_ICING] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_ICING]) >= sizeof(__u32)) {
+		icing = rta_getattr_u32(tb[TCA_CAKE_ICING]);
+	}
 	if (tb[TCA_CAKE_RAW]) {
 		raw = 1;
 	}
@@ -532,6 +560,18 @@ static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
 		print_string(PRINT_FP, NULL, "no-split-gso ", NULL);
 	print_bool(PRINT_JSON, "split_gso", NULL, split_gso);
 
+	if (fwmark)
+		print_string(PRINT_FP, NULL, "fwmark ", NULL);
+	else
+		print_string(PRINT_FP, NULL, "nofwmark ", NULL);
+	print_bool(PRINT_JSON, "fwmark", NULL, fwmark);
+
+	if (icing)
+		print_string(PRINT_FP, NULL, "icing ", NULL);
+	else
+		print_string(PRINT_FP, NULL, "noicing ", NULL);
+	print_bool(PRINT_JSON, "icing", NULL, icing);
+
 	if (interval)
 		print_string(PRINT_FP, NULL, "rtt %s ",
 			     sprint_time(interval, b2));
-- 
2.17.2 (Apple Git-113)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-03 11:52   ` Kevin Darbyshire-Bryant
  2019-03-03 12:22     ` John Sager
  2019-03-04  5:37     ` Ryan Mounce
@ 2019-03-04  8:39     ` Pete Heist
  2019-03-04 11:01       ` Kevin Darbyshire-Bryant
  2 siblings, 1 reply; 30+ messages in thread
From: Pete Heist @ 2019-03-04  8:39 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake


> On Mar 3, 2019, at 12:52 PM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> The very bad idea:
> 
> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark implementation as described above.  So an awful lot of our shenanigans above is due to DSCP not traversing the internet particularly well.  The solution above abstracts DSCP into ’tins’ which we put into fwmarks.  Another approach would be to put the DSCP *into* the fwmark.  CAKE could (optionally) copy the FWMARK contained DSCP into the diffserv field onto the actual packets.  Voila DSCP traversal across ’tinternet with tin/bandwidth allocation in our local domain preserved.

If I understand it right, another use case for this “very bad idea” is preserving DSCP locally while traversing upstream WiFi links as besteffort, which avoids airtime efficiency problems that can occur with 802.11e (WMM). In cases where the router config can’t be changed (802.11e is mandatory after all) I’ve used IPIP tunnels for this, as it hides DSCP from the WiFi stack while preserving the values through the tunnel, but this would be easier. Neat… :)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04  6:37         ` Ryan Mounce
@ 2019-03-04  7:15           ` Dave Taht
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Taht @ 2019-03-04  7:15 UTC (permalink / raw)
  To: Ryan Mounce; +Cc: Jonathan Morton, cake, Felix Resch

On Sun, Mar 3, 2019 at 10:37 PM Ryan Mounce <ryan@mounce.com.au> wrote:
>
> On Mon, 4 Mar 2019 at 17:01, Jonathan Morton <chromatix99@gmail.com> wrote:
> > …icing?
>
> Perfect! And to me, this functionality truly is the icing on (the)
> cake that makes it the complete bufferbloat/QoS system I've been
> dreaming of for ingress.
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake

Goforit. And while you are at it please patch in support for the new
LE bit and see if that works better than CS1?

It's in IESG now, it should have a number in 2-3 months.

https://datatracker.ietf.org/doc/draft-ietf-tsvwg-le-phb/


-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04  6:31       ` Jonathan Morton
@ 2019-03-04  6:37         ` Ryan Mounce
  2019-03-04  7:15           ` Dave Taht
  0 siblings, 1 reply; 30+ messages in thread
From: Ryan Mounce @ 2019-03-04  6:37 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Kevin Darbyshire-Bryant, cake, Felix Resch

On Mon, 4 Mar 2019 at 17:01, Jonathan Morton <chromatix99@gmail.com> wrote:
> …icing?

Perfect! And to me, this functionality truly is the icing on (the)
cake that makes it the complete bufferbloat/QoS system I've been
dreaming of for ingress.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-04  5:37     ` Ryan Mounce
@ 2019-03-04  6:31       ` Jonathan Morton
  2019-03-04  6:37         ` Ryan Mounce
  0 siblings, 1 reply; 30+ messages in thread
From: Jonathan Morton @ 2019-03-04  6:31 UTC (permalink / raw)
  To: Ryan Mounce; +Cc: Kevin Darbyshire-Bryant, cake, Felix Resch

> On 4 Mar, 2019, at 7:37 am, Ryan Mounce <ryan@mounce.com.au> wrote:
> 
> The overwriting of the DSCP bits from fwmark could be called
> 'staining', as opposed to DSCP 'bleaching'. But this doesn't sound
> very appealing when we're baking delicious CAKE - we're in the kitchen
> not the laundry! So how about dyeing, or colouring?

…icing?

https://www.youtube.com/watch?v=ouRcDRpsRsA

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-03 11:52   ` Kevin Darbyshire-Bryant
  2019-03-03 12:22     ` John Sager
@ 2019-03-04  5:37     ` Ryan Mounce
  2019-03-04  6:31       ` Jonathan Morton
  2019-03-04  8:39     ` Pete Heist
  2 siblings, 1 reply; 30+ messages in thread
From: Ryan Mounce @ 2019-03-04  5:37 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: George Amanakis, cake, Felix Resch

On Sun, 3 Mar 2019 at 22:22, Kevin Darbyshire-Bryant
<kevin@darbyshire-bryant.me.uk> wrote:
>
> Be afraid, be very afraid.
>
> I’ve woken up with two ideas in my head, one is bad, the other is very bad.  The bad one is already implemented and lurking in the mine branch of my cake git tree:
>
> The bad idea:
>
> An extension of the ‘fwmark’ tin allocation idea is to get cake to automagically update the conntrack mark based on the DSCP tin allocation chosen on egress.  That way, well behaved applications using DSCP (e.g. dropbear) get their return path packets similarly classified on ingress.  Badly behaved applications can have iptables rules put in place to ‘manually’ add fwmarks as is already done.
>
>
> The very bad idea:
>
> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark implementation as described above.  So an awful lot of our shenanigans above is due to DSCP not traversing the internet particularly well.  The solution above abstracts DSCP into ’tins’ which we put into fwmarks.  Another approach would be to put the DSCP *into* the fwmark.  CAKE could (optionally) copy the FWMARK contained DSCP into the diffserv field onto the actual packets.  Voila DSCP traversal across ’tinternet with tin/bandwidth allocation in our local domain preserved.

Ooh now you've got my attention - equal parts horrifying and
brilliant. Just making sure I have my head around this, I think it
would be perfect for my use case.

My ISP requires traffic to be washed on egress, and they wash it on
ingress. I set the DSCP in some applications and mangle some others
with iptables on the router. Either way, DSCP is present by the time
packets reach cake on egress, which dutifully uses DSCP to sort into
tins before washing. This works great, albeit with each packet for
certain flows being inefficiently mangled.

So the first change is that cake will copy the DSCP bits of a packet
into the fwmark on egress - if the fwmark hasn't already been set by
cake. This allows the impact of my iptables mangle rules to be
minimised as only packets without a cake-populated fwmark need their
DSCP to be mangled. Cake then uses the fwmark to sort packets into
tins, and in my case the DSCP is washed on egress.

Then on ingress, cake uses the same fwmark to sort packets into tins.
And cake can copy the DSCP bits back from the fwmark to packets on
ingress. Magic!

The overwriting of the DSCP bits from fwmark could be called
'staining', as opposed to DSCP 'bleaching'. But this doesn't sound
very appealing when we're baking delicious CAKE - we're in the kitchen
not the laundry! So how about dyeing, or colouring?

-Ryan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-03 12:22     ` John Sager
@ 2019-03-03 16:25       ` Kevin Darbyshire-Bryant
  2019-03-04 11:04         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-03-03 16:25 UTC (permalink / raw)
  To: John Sager; +Cc: cake



> On 3 Mar 2019, at 12:22, John Sager <john@sager.me.uk> wrote:
> 
> If you are going to do that, I would suggest using a few of the upper bits
> of the 32-bit fwmark/connmark space available, rather than the lowest bits.
> Then that would allow to use fwmarks other purposes, and to use the lowest
> bits, as well in the future. As iptables allows a mask before comparison,
> then choose a specific mask for the bits you use both for setting and testing.

That’s a good point and I’m sort of hoping upstream reject the current submission on that basis.  I think the ‘use of fwmarks’ needs more thought as to how it’s done for the future - too keen to get something out.  Maybe it’s sufficient as is, I don’t know.

What I do know is that after implementing the ‘bad idea’ I subsequently implemented the very bad idea!  Using the top byte of mark, bits 5-0 are the DSCP, bit 6 is the ‘Cake set this’ flag, and bit 7 is left alone as it tends to get mis-interpreted as a sign bit.

I implemented the iptables marking chains as ‘check if cake fwmark bit set, go to a DSCP mangle chain if not’, mangle the DSCP, picked up by CAKE which (in egress mode) copies the DSCP to the fwmark’ and set the ‘cake did this’ bit. In theory connections only go through the dscp mangle once.

Cake in ingress mode will copy the fwmark if it’s been set by cake back into the diffserv field on each packet.  I think that last step should be made optional but I’ve had enough ‘fun’ for the day.

https://github.com/ldir-EDB0/sch_cake/commits/mine



Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-03-03 11:52   ` Kevin Darbyshire-Bryant
@ 2019-03-03 12:22     ` John Sager
  2019-03-03 16:25       ` Kevin Darbyshire-Bryant
  2019-03-04  5:37     ` Ryan Mounce
  2019-03-04  8:39     ` Pete Heist
  2 siblings, 1 reply; 30+ messages in thread
From: John Sager @ 2019-03-03 12:22 UTC (permalink / raw)
  To: cake

If you are going to do that, I would suggest using a few of the upper bits
of the 32-bit fwmark/connmark space available, rather than the lowest bits.
Then that would allow to use fwmarks other purposes, and to use the lowest
bits, as well in the future. As iptables allows a mask before comparison,
then choose a specific mask for the bits you use both for setting and testing.

regards,

John

On 03/03/2019 11:52, Kevin Darbyshire-Bryant wrote:
> Be afraid, be very afraid.
> 
> I’ve woken up with two ideas in my head, one is bad, the other is very bad.  The bad one is already implemented and lurking in the mine branch of my cake git tree:
> 
> The bad idea:
> 
> An extension of the ‘fwmark’ tin allocation idea is to get cake to automagically update the conntrack mark based on the DSCP tin allocation chosen on egress.  That way, well behaved applications using DSCP (e.g. dropbear) get their return path packets similarly classified on ingress.  Badly behaved applications can have iptables rules put in place to ‘manually’ add fwmarks as is already done.
> 
> 
> The very bad idea:
> 
> And it’s bad ‘cos it’s sort of incompatible with the existing fwmark implementation as described above.  So an awful lot of our shenanigans above is due to DSCP not traversing the internet particularly well.  The solution above abstracts DSCP into ’tins’ which we put into fwmarks.  Another approach would be to put the DSCP *into* the fwmark.  CAKE could (optionally) copy the FWMARK contained DSCP into the diffserv field onto the actual packets.  Voila DSCP traversal across ’tinternet with tin/bandwidth allocation in our local domain preserved.
> 
> 
>> On 28 Feb 2019, at 03:24, gamanakis@gmail.com wrote:
>>
>> I think it's much simpler to use than tc-filter, BPF or even DSCP bits.
>> Manipulating DSCP bits seems the simplest of the currently available mechanisms to classify traffic. Even in this case, fwmarks are essentially simpler.
>> E.g. if you want to classify outgoing traffic on the LAN interface:
>> with DSCP you need to manipulate DSCP bits on incoming packets on the WAN interface. 
>> with fwmark you can directly mark outgoing packets on the LAN interface and cake will classify them appropriately.
>>
>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
> 
> 
> Cheers,
> 
> Kevin D-B
> 
> 012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A
> 
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-02-28  3:24 ` gamanakis
@ 2019-03-03 11:52   ` Kevin Darbyshire-Bryant
  2019-03-03 12:22     ` John Sager
                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Kevin Darbyshire-Bryant @ 2019-03-03 11:52 UTC (permalink / raw)
  To: George Amanakis; +Cc: Felix Resch, cake

Be afraid, be very afraid.

I’ve woken up with two ideas in my head, one is bad, the other is very bad.  The bad one is already implemented and lurking in the mine branch of my cake git tree:

The bad idea:

An extension of the ‘fwmark’ tin allocation idea is to get cake to automagically update the conntrack mark based on the DSCP tin allocation chosen on egress.  That way, well behaved applications using DSCP (e.g. dropbear) get their return path packets similarly classified on ingress.  Badly behaved applications can have iptables rules put in place to ‘manually’ add fwmarks as is already done.


The very bad idea:

And it’s bad ‘cos it’s sort of incompatible with the existing fwmark implementation as described above.  So an awful lot of our shenanigans above is due to DSCP not traversing the internet particularly well.  The solution above abstracts DSCP into ’tins’ which we put into fwmarks.  Another approach would be to put the DSCP *into* the fwmark.  CAKE could (optionally) copy the FWMARK contained DSCP into the diffserv field onto the actual packets.  Voila DSCP traversal across ’tinternet with tin/bandwidth allocation in our local domain preserved.


> On 28 Feb 2019, at 03:24, gamanakis@gmail.com wrote:
> 
> I think it's much simpler to use than tc-filter, BPF or even DSCP bits.
> Manipulating DSCP bits seems the simplest of the currently available mechanisms to classify traffic. Even in this case, fwmarks are essentially simpler.
> E.g. if you want to classify outgoing traffic on the LAN interface:
> with DSCP you need to manipulate DSCP bits on incoming packets on the WAN interface. 
> with fwmark you can directly mark outgoing packets on the LAN interface and cake will classify them appropriately.
> 
> 
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
  2019-02-27 21:12 Felix Resch
@ 2019-02-28  3:24 ` gamanakis
  2019-03-03 11:52   ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 30+ messages in thread
From: gamanakis @ 2019-02-28  3:24 UTC (permalink / raw)
  To: 'Felix Resch', cake

I think it's much simpler to use than tc-filter, BPF or even DSCP bits.
Manipulating DSCP bits seems the simplest of the currently available mechanisms to classify traffic. Even in this case, fwmarks are essentially simpler.
E.g. if you want to classify outgoing traffic on the LAN interface:
with DSCP you need to manipulate DSCP bits on incoming packets on the WAN interface. 
with fwmark you can directly mark outgoing packets on the LAN interface and cake will classify them appropriately.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Cake] Using firewall connmarks as tin selectors
@ 2019-02-27 21:12 Felix Resch
  2019-02-28  3:24 ` gamanakis
  0 siblings, 1 reply; 30+ messages in thread
From: Felix Resch @ 2019-02-27 21:12 UTC (permalink / raw)
  To: cake

> How unpopular would the idea of having cake look at skb->mark directly be?

loving it

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-03-05 14:06 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-27 14:52 [Cake] Using firewall connmarks as tin selectors Kevin Darbyshire-Bryant
2019-02-27 15:14 ` Toke Høiland-Jørgensen
2019-02-28  8:32   ` Kevin Darbyshire-Bryant
2019-02-28  9:54     ` Toke Høiland-Jørgensen
2019-02-28 11:00       ` Kevin Darbyshire-Bryant
2019-02-28 11:13         ` Toke Høiland-Jørgensen
2019-02-27 21:12 Felix Resch
2019-02-28  3:24 ` gamanakis
2019-03-03 11:52   ` Kevin Darbyshire-Bryant
2019-03-03 12:22     ` John Sager
2019-03-03 16:25       ` Kevin Darbyshire-Bryant
2019-03-04 11:04         ` Toke Høiland-Jørgensen
2019-03-04 11:39           ` John Sager
2019-03-04  5:37     ` Ryan Mounce
2019-03-04  6:31       ` Jonathan Morton
2019-03-04  6:37         ` Ryan Mounce
2019-03-04  7:15           ` Dave Taht
2019-03-04  8:39     ` Pete Heist
2019-03-04 11:01       ` Kevin Darbyshire-Bryant
2019-03-04 11:17         ` Toke Høiland-Jørgensen
2019-03-04 11:55           ` Kevin Darbyshire-Bryant
2019-03-04 12:44             ` Toke Høiland-Jørgensen
2019-03-04 15:50               ` Kevin Darbyshire-Bryant
2019-03-04 16:39                 ` Toke Høiland-Jørgensen
2019-03-04 17:19                   ` Kevin Darbyshire-Bryant
2019-03-04 17:36                     ` Toke Høiland-Jørgensen
2019-03-04 20:58                       ` Kevin Darbyshire-Bryant
2019-03-04 21:33                         ` Toke Høiland-Jørgensen
2019-03-04 21:42                           ` Toke Høiland-Jørgensen
2019-03-05 14:06                           ` Kevin Darbyshire-Bryant

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox