Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
* [Cake] de-natting & host fairness
@ 2016-09-26  3:20 Kevin Darbyshire-Bryant
  2016-09-26  3:54 ` Dave Taht
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-26  3:20 UTC (permalink / raw)
  To: cake

Greetings!

A while back I started on a quest to make cake 'nat' aware as the lack 
of host fairness in a typical home router environment was the only thing 
that prevented cake from being the ultimate qdisc in my opinion.  This 
involves dealing with conntrack which on egress is easy (the kernel 
fills in a data structure for us), ingress is less clear.  I hacked 
something together but wasn't really happy with it.

Another github user 'tegularius' presented some beautifully crafted code 
that did the lookups in a much neater way.  Originally it too had an 
'ingress' lookup problem.  This was worked on and I hacked some 
conditional 'denat' options into cake & tc.

For your 'delight' a denat cake 
https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along with 
a matching tc https://github.com/kdarbyshirebryant/tc-adv/tree/denat

Typically I use 'dual-srchost srcnat' options on the egress interface, 
with 'dual-dsthost dstnat' in the ingress ifb interface.  In *brief* 
testing, bandwidth is shared fairly between hosts, and fairly by flow 
within each host.  And it's not crashed yet.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26  3:20 [Cake] de-natting & host fairness Kevin Darbyshire-Bryant
@ 2016-09-26  3:54 ` Dave Taht
  2016-09-26  5:11   ` Dave Taht
  2016-09-26  8:54 ` moeller0
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Dave Taht @ 2016-09-26  3:54 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

On Sun, Sep 25, 2016 at 8:20 PM, Kevin Darbyshire-Bryant
<kevin@darbyshire-bryant.me.uk> wrote:
> Greetings!
>
> A while back I started on a quest to make cake 'nat' aware as the lack of
> host fairness in a typical home router environment was the only thing that
> prevented cake from being the ultimate qdisc in my opinion.  This involves
> dealing with conntrack which on egress is easy (the kernel fills in a data
> structure for us), ingress is less clear.  I hacked something together but
> wasn't really happy with it.
>
> Another github user 'tegularius' presented some beautifully crafted code
> that did the lookups in a much neater way.  Originally it too had an
> 'ingress' lookup problem.  This was worked on and I hacked some conditional
> 'denat' options into cake & tc.
>
> For your 'delight' a denat cake
> https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along with a
> matching tc https://github.com/kdarbyshirebryant/tc-adv/tree/denat
>
> Typically I use 'dual-srchost srcnat' options on the egress interface, with
> 'dual-dsthost dstnat' in the ingress ifb interface.  In *brief* testing,
> bandwidth is shared fairly between hosts, and fairly by flow within each
> host.  And it's not crashed yet.

Groovy! (good morning).

Is there a way to autodetect if nat is on or not?

If you are bored and want to see what happens with tcp bbr, I just set
up a server at:

173 dot 230 dot 156 dot 252

and I'll probably put one up in the uk as well.


> Kevin
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26  3:54 ` Dave Taht
@ 2016-09-26  5:11   ` Dave Taht
  0 siblings, 0 replies; 29+ messages in thread
From: Dave Taht @ 2016-09-26  5:11 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

I have to admit my end-state was:

tc qdisc add dev eth0 ingress cake bandwidth 100mbit.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26  3:20 [Cake] de-natting & host fairness Kevin Darbyshire-Bryant
  2016-09-26  3:54 ` Dave Taht
@ 2016-09-26  8:54 ` moeller0
  2016-09-26 13:02   ` Kevin Darbyshire-Bryant
  2016-09-27  1:52 ` Noah Causin
  2016-09-27 23:08 ` Jonathan Morton
  3 siblings, 1 reply; 29+ messages in thread
From: moeller0 @ 2016-09-26  8:54 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

Hi Kevin,

this is like the missing puzzle piece, if you solved this, most home users might end up deep in your debt (without them realizing it of course).
Question, if I enable this on my link how will it deal with the typical differences between IPv4 and IPv6? I believe that the situation I have at home, NAT for IPv4 but no NAT for IPv6 (or if NAT, at least NAT with identifying last 64 bits of the IPv6 addresses, no port remapping games) is quite common now a days. I assume it will do the right thing for IPv4 but will it still do the right thing for IPv6 flows as well? And what if for $DEITY’s sake someone would insist on using a port-remapping NAT on IPv6?
If, what I assume it will do the right thing by default, I would vote for enabling this by default and introduce keywords to disable this if required (in what I assume to be one of cake’s main ideas use reasonable defaults that in general do the right thing, but also allow crazy stuff if need be).
Do you have any idea how expensive this is computationally? I realize that this is a tad hard to measure as cake will not simply reduce the available bandwidth when running out of CPU cycles but first will allow the latency to increase.

Best Regards
	Sebastian

> On Sep 26, 2016, at 05:20 , Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> Greetings!
> 
> A while back I started on a quest to make cake 'nat' aware as the lack of host fairness in a typical home router environment was the only thing that prevented cake from being the ultimate qdisc in my opinion.  This involves dealing with conntrack which on egress is easy (the kernel fills in a data structure for us), ingress is less clear.  I hacked something together but wasn't really happy with it.
> 
> Another github user 'tegularius' presented some beautifully crafted code that did the lookups in a much neater way.  Originally it too had an 'ingress' lookup problem.  This was worked on and I hacked some conditional 'denat' options into cake & tc.
> 
> For your 'delight' a denat cake https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along with a matching tc https://github.com/kdarbyshirebryant/tc-adv/tree/denat
> 
> Typically I use 'dual-srchost srcnat' options on the egress interface, with 'dual-dsthost dstnat' in the ingress ifb interface.  In *brief* testing, bandwidth is shared fairly between hosts, and fairly by flow within each host.  And it's not crashed yet.
> 
> Kevin
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26  8:54 ` moeller0
@ 2016-09-26 13:02   ` Kevin Darbyshire-Bryant
  2016-09-26 13:28     ` moeller0
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-26 13:02 UTC (permalink / raw)
  To: moeller0; +Cc: cake

Hi Sebastian et al,

I'm feeling a bit unwell at the moment with an eye infection and I'm 
working nights on some tennis coverage for TV so the brain cell is 
somewhat addled.

It is indeed the missing puzzle piece and represents something of a holy 
grail for my use case.  A *lot* of credit has to go to 'tegularius' who 
took the idea and ran with it after I'd given up.  My only consolation 
is that the methods are broadly similar, the current implementation is 
so much neater and obviously written in a more kernel/conntrack 
knowledgeable way (based on net/sched/cls_flow.c)

This really needs to be tested.  As I mentioned the 'ingress' side of 
things is harder work because the kernel hasn't filled in the conntrack 
pointer for us.  There are some remaining concerns over how reliable our 
own lookup actually is.  The conntrack entry 'direction' is apparently 
determined by where it is seen first, there are then 2 tuples created in 
the 'original' and 'reverse' directions.  This made me think that a 
connection initiated by the router vs a connection initiated from 
outside into it (even if natted) would have the src & destination fields 
swapped...however in my limited testing 'who started the connection' 
appeared to make no difference.  But conntrack makes my brain cell hurt.

I'm sure there are people on this list who are a) much cleverererer than 
me and b) know conntrack upside down & backwards.  Help is as ever 
gratefully received.

Regarding IPv6 vs IPv4:  As it currently stands the code does conntrack 
lookups for both so if someone is translating IPv6 addresses then we 
know about it.  I'm now thinking about making IPv6 lookups a runtime 
option (default off)  From a flow/host fairness point of view I really 
don't care if a one to one address translation has occurred...and if 
someone really does implement a 'masquerading many hosts behind one IPv6 
address' environment...and they still want per IP & per flow fairness 
then unmentionable things should be done to them.

I'm not a fan of de-natting by default.  Per IP fairness is not the 
default and requires at least one of the 'dual-???host' or 
'triple-isolate' options to be relevant.  I've also concerns on CPU usage.

CPU usage is difficult to quantify.  As a rough guestimate my Archer C7 
used about 10% cpu per megabit.  I'd say that has gone up by 2% percent 
with this change, so it is heavy!

The code is out there, if you've an itch...scratch it :-)  Fork it, 
improve it etc but please don't think I'm any sort of kernel guru :-)

Incidentally, an obvious gaming of this: A host that has both IPv4 & v6 
addresses can get at least double the bandwidth than a host with only 
one of them, it's per IP fairness really, not per host.

Kevin


On 26/09/16 09:54, moeller0 wrote:
> Hi Kevin,
>
> this is like the missing puzzle piece, if you solved this, most home users might end up deep in your debt (without them realizing it of course).
> Question, if I enable this on my link how will it deal with the typical differences between IPv4 and IPv6? I believe that the situation I have at home, NAT for IPv4 but no NAT for IPv6 (or if NAT, at least NAT with identifying last 64 bits of the IPv6 addresses, no port remapping games) is quite common now a days. I assume it will do the right thing for IPv4 but will it still do the right thing for IPv6 flows as well? And what if for $DEITY’s sake someone would insist on using a port-remapping NAT on IPv6?
> If, what I assume it will do the right thing by default, I would vote for enabling this by default and introduce keywords to disable this if required (in what I assume to be one of cake’s main ideas use reasonable defaults that in general do the right thing, but also allow crazy stuff if need be).
> Do you have any idea how expensive this is computationally? I realize that this is a tad hard to measure as cake will not simply reduce the available bandwidth when running out of CPU cycles but first will allow the latency to increase.
>
> Best Regards
> 	Sebastian
>
>> On Sep 26, 2016, at 05:20 , Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>
>> Greetings!
>>
>> A while back I started on a quest to make cake 'nat' aware as the lack of host fairness in a typical home router environment was the only thing that prevented cake from being the ultimate qdisc in my opinion.  This involves dealing with conntrack which on egress is easy (the kernel fills in a data structure for us), ingress is less clear.  I hacked something together but wasn't really happy with it.
>>
>> Another github user 'tegularius' presented some beautifully crafted code that did the lookups in a much neater way.  Originally it too had an 'ingress' lookup problem.  This was worked on and I hacked some conditional 'denat' options into cake & tc.
>>
>> For your 'delight' a denat cake https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along with a matching tc https://github.com/kdarbyshirebryant/tc-adv/tree/denat
>>
>> Typically I use 'dual-srchost srcnat' options on the egress interface, with 'dual-dsthost dstnat' in the ingress ifb interface.  In *brief* testing, bandwidth is shared fairly between hosts, and fairly by flow within each host.  And it's not crashed yet.
>>
>> Kevin
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26 13:02   ` Kevin Darbyshire-Bryant
@ 2016-09-26 13:28     ` moeller0
  2016-09-26 14:06       ` Kevin Darbyshire-Bryant
  2016-09-26 14:30       ` Jonathan Morton
  0 siblings, 2 replies; 29+ messages in thread
From: moeller0 @ 2016-09-26 13:28 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

Hi Kevin,

> On Sep 26, 2016, at 15:02 , Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> Hi Sebastian et al,
> 
> I'm feeling a bit unwell at the moment with an eye infection and I'm working nights on some tennis coverage for TV so the brain cell is somewhat addled.
> 
> It is indeed the missing puzzle piece and represents something of a holy grail for my use case.  A *lot* of credit has to go to 'tegularius' who took the idea and ran with it after I'd given up.  My only consolation is that the methods are broadly similar, the current implementation is so much neater and obviously written in a more kernel/conntrack knowledgeable way (based on net/sched/cls_flow.c)

	Ah.

> 
> This really needs to be tested.  As I mentioned the 'ingress' side of things is harder work because the kernel hasn't filled in the conntrack pointer for us.  There are some remaining concerns over how reliable our own lookup actually is.  The conntrack entry 'direction' is apparently determined by where it is seen first, there are then 2 tuples created in the 'original' and 'reverse' directions.  This made me think that a connection initiated by the router vs a connection initiated from outside into it (even if natted) would have the src & destination fields swapped...however in my limited testing 'who started the connection' appeared to make no difference.  But conntrack makes my brain cell hurt.

	Does that mean an initial packet(s) for a flow will be “misclassified” (not really since there should be no record yet to snatch the translated IP from) do all those initially non-classified packets end up in the same bin? (I guess even if that should not matter too much unless in extreme situations and those merit extreme reactions anyways)

> 
> I'm sure there are people on this list who are a) much cleverererer than me and b) know conntrack upside down & backwards.  Help is as ever gratefully received.
> 
> Regarding IPv6 vs IPv4:  As it currently stands the code does conntrack lookups for both so if someone is translating IPv6 addresses then we know about it.  I’m now thinking about making IPv6 lookups a runtime option (default off)  

	That would allow to easily measure the cost of those lookups.

> From a flow/host fairness point of view I really don't care if a one to one address translation has occurred...and if someone really does implement a 'masquerading many hosts behind one IPv6 address' environment...and they still want per IP & per flow fairness then unmentionable things should be done to them.
> 
> I'm not a fan of de-natting by default.  Per IP fairness is not the default and requires at least one of the 'dual-???host' or 'triple-isolate' options to be relevant.  I’ve also concerns on CPU usage.

	Mmh, I would have thought that even for srchost and dsthost (note no dual) it would make sense to allow to deNAT? If we default to deNAT we might also default to triple-isolate, assuming that it actually works… Cake offers to refine the hashing for discerning users, but for everybody else we should pick well working defaults. Cake not being upstream yet is a virtue as we will not need to argue against the “no unexpected surprise behavior change” policy that seems to be used in the kernel (no argument from my side, for the kernel that seems a good policy, but we still can try to upstream the most useful deaults for “my mom”).


> 
> CPU usage is difficult to quantify.  As a rough guestimate my Archer C7 used about 10% cpu per megabit.  I’d say that has gone up by 2% percent with this change, so it is heavy!

	That is a tad high; maybe too high for making it a default but still it would be nice having a qdisc that by default does what naive users expect a(ll) qdisc(s) to do ;)

> 
> The code is out there, if you’ve an itch...scratch it :-)  Fork it, improve it etc but please don't think I'm any sort of kernel guru :-)

	Yepp, I really need to get my own LEDE builds going so I can start playing around with that again. (I am slow with that as my typical use cases at home work pretty well with what we have right now; and I somehow don’t want to start with heavy bit-torrenting (how many debian DVD images could I actually ever need?) or windows10 updates).

> 
> Incidentally, an obvious gaming of this: A host that has both IPv4 & v6 addresses can get at least double the bandwidth than a host with only one of them, it’s per IP fairness really, not per host.

	That is pretty much our new IPv6 world, per-IP fairness might actually not be the kind of guarantee we actually want, but I assume it is the only one we can expect to get (IPv6 privacy addressing alone will bestow a flock of IP addresses (that changes over time) to each active host).


Best Regards
	Sebastian

> 
> Kevin
> 
> 
> On 26/09/16 09:54, moeller0 wrote:
>> Hi Kevin,
>> 
>> this is like the missing puzzle piece, if you solved this, most home users might end up deep in your debt (without them realizing it of course).
>> Question, if I enable this on my link how will it deal with the typical differences between IPv4 and IPv6? I believe that the situation I have at home, NAT for IPv4 but no NAT for IPv6 (or if NAT, at least NAT with identifying last 64 bits of the IPv6 addresses, no port remapping games) is quite common now a days. I assume it will do the right thing for IPv4 but will it still do the right thing for IPv6 flows as well? And what if for $DEITY’s sake someone would insist on using a port-remapping NAT on IPv6?
>> If, what I assume it will do the right thing by default, I would vote for enabling this by default and introduce keywords to disable this if required (in what I assume to be one of cake’s main ideas use reasonable defaults that in general do the right thing, but also allow crazy stuff if need be).
>> Do you have any idea how expensive this is computationally? I realize that this is a tad hard to measure as cake will not simply reduce the available bandwidth when running out of CPU cycles but first will allow the latency to increase.
>> 
>> Best Regards
>> 	Sebastian
>> 
>>> On Sep 26, 2016, at 05:20 , Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>> 
>>> Greetings!
>>> 
>>> A while back I started on a quest to make cake 'nat' aware as the lack of host fairness in a typical home router environment was the only thing that prevented cake from being the ultimate qdisc in my opinion.  This involves dealing with conntrack which on egress is easy (the kernel fills in a data structure for us), ingress is less clear.  I hacked something together but wasn't really happy with it.
>>> 
>>> Another github user 'tegularius' presented some beautifully crafted code that did the lookups in a much neater way.  Originally it too had an 'ingress' lookup problem.  This was worked on and I hacked some conditional 'denat' options into cake & tc.
>>> 
>>> For your 'delight' a denat cake https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along with a matching tc https://github.com/kdarbyshirebryant/tc-adv/tree/denat
>>> 
>>> Typically I use 'dual-srchost srcnat' options on the egress interface, with 'dual-dsthost dstnat' in the ingress ifb interface.  In *brief* testing, bandwidth is shared fairly between hosts, and fairly by flow within each host.  And it's not crashed yet.
>>> 
>>> Kevin
>>> _______________________________________________
>>> Cake mailing list
>>> Cake@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cake
>> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26 13:28     ` moeller0
@ 2016-09-26 14:06       ` Kevin Darbyshire-Bryant
  2016-09-26 14:30       ` Jonathan Morton
  1 sibling, 0 replies; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-26 14:06 UTC (permalink / raw)
  To: moeller0; +Cc: cake



On 26/09/16 14:28, moeller0 wrote:
> Hi Kevin,
>
>> This really needs to be tested.  As I mentioned the 'ingress' side
>> of things is harder work because the kernel hasn't filled in the
>> conntrack pointer for us.  There are some remaining concerns over
>> how reliable our own lookup actually is.  The conntrack entry
>> 'direction' is apparently determined by where it is seen first,
>> there are then 2 tuples created in the 'original' and 'reverse'
>> directions.  This made me think that a connection initiated by the
>> router vs a connection initiated from outside into it (even if
>> natted) would have the src & destination fields swapped...however
>> in my limited testing 'who started the connection' appeared to make
>> no difference.  But conntrack makes my brain cell hurt.
>
> Does that mean an initial packet(s) for a flow will be
> “misclassified” (not really since there should be no record yet to
> snatch the translated IP from) do all those initially non-classified
> packets end up in the same bin? (I guess even if that should not
> matter too much unless in extreme situations and those merit extreme
> reactions anyways)

That was certainly one of my concerns, however the simplistic testing I 
performed showed that somehow a translation was in place.  The 
simplistic testing being setting up a port forward to an internal lan 
host, connecting to it from the wan side with another device and dumping 
the addresses seen by the qdisc with printk.  The test may be flawed, my 
execution of the test flawed etc.  I expected the test to fail, however 
it passed!

>
>>
>> I'm sure there are people on this list who are a) much cleverererer
>> than me and b) know conntrack upside down & backwards.  Help is as
>> ever gratefully received.
>>
>> Regarding IPv6 vs IPv4:  As it currently stands the code does
>> conntrack lookups for both so if someone is translating IPv6
>> addresses then we know about it.  I’m now thinking about making
>> IPv6 lookups a runtime option (default off)
>
> That would allow to easily measure the cost of those lookups.

I've done half the job...updated the qdisc.  I need to update tc to 
allow/report the ipv6 de-nat status.

>
>> From a flow/host fairness point of view I really don't care if a
>> one to one address translation has occurred...and if someone really
>> does implement a 'masquerading many hosts behind one IPv6 address'
>> environment...and they still want per IP & per flow fairness then
>> unmentionable things should be done to them.
>>
>> I'm not a fan of de-natting by default.  Per IP fairness is not the
>> default and requires at least one of the 'dual-???host' or
>> 'triple-isolate' options to be relevant.  I’ve also concerns on CPU
>> usage.
>
> Mmh, I would have thought that even for srchost and dsthost (note no
> dual) it would make sense to allow to deNAT? If we default to deNAT
> we might also default to triple-isolate, assuming that it actually
> works… Cake offers to refine the hashing for discerning users, but
> for everybody else we should pick well working defaults. Cake not
> being upstream yet is a virtue as we will not need to argue against
> the “no unexpected surprise behavior change” policy that seems to be
> used in the kernel (no argument from my side, for the kernel that
> seems a good policy, but we still can try to upstream the most useful
> deaults for “my mom”).

Forgot about the srchost/dsthost only options :-)

>>
>> CPU usage is difficult to quantify.  As a rough guestimate my
>> Archer C7 used about 10% cpu per megabit.  I’d say that has gone up
>> by 2% percent with this change, so it is heavy!
>
> That is a tad high; maybe too high for making it a default but still
> it would be nice having a qdisc that by default does what naive users
> expect a(ll) qdisc(s) to do ;)

The de-nat ipv6 tweak may have also introduced a small 
optimisation...depending on how clever the compiler is.

>
>>
>> The code is out there, if you’ve an itch...scratch it :-)  Fork it,
>> improve it etc but please don't think I'm any sort of kernel guru
>> :-)
>
> Yepp, I really need to get my own LEDE builds going so I can start
> playing around with that again. (I am slow with that as my typical
> use cases at home work pretty well with what we have right now; and I
> somehow don’t want to start with heavy bit-torrenting (how many
> debian DVD images could I actually ever need?) or windows10
> updates).

Ha!  Bet you can't guess what I used as a download source for testing 
purposes :-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26 13:28     ` moeller0
  2016-09-26 14:06       ` Kevin Darbyshire-Bryant
@ 2016-09-26 14:30       ` Jonathan Morton
  2016-09-26 15:23         ` moeller0
  1 sibling, 1 reply; 29+ messages in thread
From: Jonathan Morton @ 2016-09-26 14:30 UTC (permalink / raw)
  To: moeller0; +Cc: Kevin Darbyshire-Bryant, cake


> On 26 Sep, 2016, at 16:28, moeller0 <moeller0@gmx.de> wrote:
> 
> Does that mean an initial packet(s) for a flow will be “misclassified” (not really since there should be no record yet to snatch the translated IP from) do all those initially non-classified packets end up in the same bin?

The initial packet will normally be outgoing, so it’ll go through conntrack before reaching the qdisc.  If it’s incoming, then it’ll be “related to” an existing connection or else won’t be natted - though I’m not sure whether “related” connections pre-emptively get conntrack entries before traffic has been seen.  If not, that initial packet will be associated with the NAT box by the qdisc, rather than the internal host, while subsequent packets will correctly be associated with the internal host.

That assumes we have qdiscs attached to the egress and ingress sides of a WAN-facing interface, as normally desired.

The code looks sane at first glance, so I’ll give it a try at my end.  With any luck, I’ll be able to improve triple-isolate’s performance enough to make that the default, too.  I should probably use a different data structure than a ring buffer, so that there is less in the way of linear searching for an unblocked flow.

The current default is “flows”, which doesn’t need NAT information to unambiguously distinguish flows from each other.  However, “hosts” mode does need it when running in a NAT environment, otherwise internal hosts will erroneously be lumped together with the NAT box.  Triple-isolate is effectively a combination of “hosts” and “flows” - that is probably the easiest way to understand it.

I think it is reasonable to turn on conntrack lookups by default whenever host information is relevant.  This is potentially true for all modes except “flowblind” and “flows”.

Also long overdue are the more subtle overhead compensation factor for PTM, and the two extra keywords for DOCSIS’ asymmetric overhead.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26 14:30       ` Jonathan Morton
@ 2016-09-26 15:23         ` moeller0
  0 siblings, 0 replies; 29+ messages in thread
From: moeller0 @ 2016-09-26 15:23 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Kevin Darbyshire-Bryant, cake

Hi Jonathan,

> On Sep 26, 2016, at 16:30 , Jonathan Morton <chromatix99@gmail.com> wrote:
> 
> 
>> On 26 Sep, 2016, at 16:28, moeller0 <moeller0@gmx.de> wrote:
>> 
>> Does that mean an initial packet(s) for a flow will be “misclassified” (not really since there should be no record yet to snatch the translated IP from) do all those initially non-classified packets end up in the same bin?
> 
> The initial packet will normally be outgoing, so it’ll go through conntrack before reaching the qdisc.  If it’s incoming, then it’ll be “related to” an existing connection or else won’t be natted - though I’m not sure whether “related” connections pre-emptively get conntrack entries before traffic has been seen.  If not, that initial packet will be associated with the NAT box by the qdisc, rather than the internal host, while subsequent packets will correctly be associated with the internal host.

	Okay, so the worst case is “regression” to simple flow fairness and only for initial packets. What about port redirects, will these be already included into the conntrack and what about UDP (I believe bit-torrent uses UDP so I believe this to be relevant).

> 
> That assumes we have qdiscs attached to the egress and ingress sides of a WAN-facing interface, as normally desired.
> 
> The code looks sane at first glance, so I’ll give it a try at my end.  With any luck, I’ll be able to improve triple-isolate’s performance enough to make that the default, too.  I should probably use a different data structure than a ring buffer, so that there is less in the way of linear searching for an unblocked flow.
> 
> The current default is “flows”, which doesn’t need NAT information to unambiguously distinguish flows from each other.  However, “hosts” mode does need it when running in a NAT environment, otherwise internal hosts will erroneously be lumped together with the NAT box.  Triple-isolate is effectively a combination of “hosts” and “flows” - that is probably the easiest way to understand it.

	But the dual-[src\dst]host options are similar to triple-isolate in that regard except they also need directionality information which is hard to divine, so I fully agree that riple is the best candidate for a default. Assuming that it actually works…

> 
> I think it is reasonable to turn on conntrack lookups by default whenever host information is relevant.  This is potentially true for all modes except “flowblind” and “flows”.
> 
> Also long overdue are the more subtle overhead compensation factor for PTM, and the two extra keywords for DOCSIS’ asymmetric overhead.

	Teacher, teacher, ask me: the term you are looking for is “proper documentation”. 
About PTM what are you thinking about, the 64/64 encapsulation “tax”? 
About DOCSIS, while we have Greg White going on record for cable labs, would it not be wiser to survey more cable ISPs to check first whether everybody actually follows those recommendations before creating keywords? One thing I have learned about overhead compensation is, it never is as simple and nice as in real life as RFCs make you hope. I would love to be wrong on DOCSIS, but I believe the hypothesis should be someone set it up different. So in essence pan for a number of keywords for DOCSIS and name them in a way that allows future additions that do not sound awkward.

Best Regards
	Sebastian

> 
> - Jonathan Morton
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26  3:20 [Cake] de-natting & host fairness Kevin Darbyshire-Bryant
  2016-09-26  3:54 ` Dave Taht
  2016-09-26  8:54 ` moeller0
@ 2016-09-27  1:52 ` Noah Causin
  2016-09-27  2:32   ` Kevin Darbyshire-Bryant
  2016-09-27 23:08 ` Jonathan Morton
  3 siblings, 1 reply; 29+ messages in thread
From: Noah Causin @ 2016-09-27  1:52 UTC (permalink / raw)
  To: cake

I've been trying to compile this on LEDE, but I get this error:

Package kmod-sched-cake is missing dependencies for the following libraries:
nf_conntrack.ko


On 9/25/2016 11:20 PM, Kevin Darbyshire-Bryant wrote:
> Greetings!
>
> A while back I started on a quest to make cake 'nat' aware as the lack 
> of host fairness in a typical home router environment was the only 
> thing that prevented cake from being the ultimate qdisc in my 
> opinion.  This involves dealing with conntrack which on egress is easy 
> (the kernel fills in a data structure for us), ingress is less clear.  
> I hacked something together but wasn't really happy with it.
>
> Another github user 'tegularius' presented some beautifully crafted 
> code that did the lookups in a much neater way. Originally it too had 
> an 'ingress' lookup problem.  This was worked on and I hacked some 
> conditional 'denat' options into cake & tc.
>
> For your 'delight' a denat cake 
> https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along 
> with a matching tc https://github.com/kdarbyshirebryant/tc-adv/tree/denat
>
> Typically I use 'dual-srchost srcnat' options on the egress interface, 
> with 'dual-dsthost dstnat' in the ingress ifb interface.  In *brief* 
> testing, bandwidth is shared fairly between hosts, and fairly by flow 
> within each host.  And it's not crashed yet.
>
> Kevin
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-27  1:52 ` Noah Causin
@ 2016-09-27  2:32   ` Kevin Darbyshire-Bryant
  2016-09-27  4:20     ` Noah Causin
  2016-09-27 14:52     ` Noah Causin
  0 siblings, 2 replies; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-27  2:32 UTC (permalink / raw)
  To: cake

[-- Attachment #1: Type: text/plain, Size: 400 bytes --]

Easy fix.  See the added DEPENDS line in the attached patch for the 
package Makefile :-)

I'm guessing you've updated the git checkout hash to point at a suitable 
place.

Cheers,

Kevin



On 27/09/16 02:52, Noah Causin wrote:
> I've been trying to compile this on LEDE, but I get this error:
>
> Package kmod-sched-cake is missing dependencies for the following
> libraries:
> nf_conntrack.ko
>
>

[-- Attachment #2: 0001-kmod-sched-cake-add-conntrack-lib-dependency.patch --]
[-- Type: text/x-patch, Size: 1009 bytes --]

From 2d0d549f072379da20be70535c14b40496d42dfb Mon Sep 17 00:00:00 2001
From: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
Date: Thu, 30 Jun 2016 16:09:32 +0100
Subject: [PATCH 1/3] kmod-sched-cake: add conntrack lib dependency

Prepare for cake to understand NAT.  Maybe.  So experimental you
wouldn't believe.  Dragons.  Big snappy fire breathing dragons.

Signed-off-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk>
---
 package/kernel/kmod-sched-cake/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/package/kernel/kmod-sched-cake/Makefile b/package/kernel/kmod-sched-cake/Makefile
index 6108ed7..417ffd6 100644
--- a/package/kernel/kmod-sched-cake/Makefile
+++ b/package/kernel/kmod-sched-cake/Makefile
@@ -27,6 +27,7 @@ define KernelPackage/sched-cake
   URL:=https://github.com/dtaht/sch_cake
   FILES:=$(PKG_BUILD_DIR)/sch_cake.ko
   AUTOLOAD:=$(call AutoLoad,75,sch_cake)
+  DEPENDS:=+kmod-ipt-conntrack
 endef
 
 include $(INCLUDE_DIR)/kernel-defaults.mk
-- 
2.7.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-27  2:32   ` Kevin Darbyshire-Bryant
@ 2016-09-27  4:20     ` Noah Causin
  2016-09-27 14:52     ` Noah Causin
  1 sibling, 0 replies; 29+ messages in thread
From: Noah Causin @ 2016-09-27  4:20 UTC (permalink / raw)
  To: cake

[-- Attachment #1: Type: text/plain, Size: 678 bytes --]

Thank you,

It now compiles correctly.

Noah


On 9/26/2016 10:32 PM, Kevin Darbyshire-Bryant wrote:
> Easy fix.  See the added DEPENDS line in the attached patch for the 
> package Makefile :-)
>
> I'm guessing you've updated the git checkout hash to point at a 
> suitable place.
>
> Cheers,
>
> Kevin
>
>
>
> On 27/09/16 02:52, Noah Causin wrote:
>> I've been trying to compile this on LEDE, but I get this error:
>>
>> Package kmod-sched-cake is missing dependencies for the following
>> libraries:
>> nf_conntrack.ko
>>
>>
>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


[-- Attachment #2: Type: text/html, Size: 1627 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-27  2:32   ` Kevin Darbyshire-Bryant
  2016-09-27  4:20     ` Noah Causin
@ 2016-09-27 14:52     ` Noah Causin
  2016-09-27 15:28       ` Kevin Darbyshire-Bryant
  1 sibling, 1 reply; 29+ messages in thread
From: Noah Causin @ 2016-09-27 14:52 UTC (permalink / raw)
  To: cake

[-- Attachment #1: Type: text/plain, Size: 2599 bytes --]

Thank you for helping me get cake to compile on LEDE.

The issue I have now is getting tc-adv to compile.

I use this MakeFile:
https://github.com/antoinedeschenes/openwrt-sqm/tree/master/net/tc-adv

I use these commands to compile it:

make package/feeds/sqm/tc-adv/clean -j 1 V=s
make package/feeds/sqm/tc-adv/prepare 
USE_SOURCE_DIR=/home/n0man/Desktop/denat/ -j 1 V=s
make package/feeds/sqm/tc-adv/compile -j 1 V=s

These are the errors I get:

*namespace.c: In function 'bind_etc':**
**namespace.c:18:22: error: 'MAXPATHLEN' undeclared (first use in this 
function)**
**  char etc_netns_path[MAXPATHLEN];**
**                      ^**
**namespace.c:18:22: note: each undeclared identifier is reported only 
once for each function it appears in**
**namespace.c:20:7: warning: unused variable 'etc_name' 
[-Wunused-variable]**
**  char etc_name[MAXPATHLEN];**
**       ^**
**namespace.c:19:7: warning: unused variable 'netns_name' 
[-Wunused-variable]**
**  char netns_name[MAXPATHLEN];**
**       ^**
**namespace.c:18:7: warning: unused variable 'etc_netns_path' 
[-Wunused-variable]**
**  char etc_netns_path[MAXPATHLEN];**
**       ^**
**namespace.c: In function 'netns_switch':**
**namespace.c:46:16: error: 'MAXPATHLEN' undeclared (first use in this 
function)**
**  char net_path[MAXPATHLEN];**
**                ^**
**namespace.c:46:7: warning: unused variable 'net_path' 
[-Wunused-variable]**
**  char net_path[MAXPATHLEN];**
**       ^**
**namespace.c: In function 'netns_get_fd':**
**namespace.c:90:15: error: 'MAXPATHLEN' undeclared (first use in this 
function)**
**  char pathbuf[MAXPATHLEN];**
**               ^**
**namespace.c:90:7: warning: unused variable 'pathbuf' [-Wunused-variable]**
**  char pathbuf[MAXPATHLEN];**
**       ^**
**<builtin>: recipe for target 'namespace.o' failed**
**make[4]: *** [namespace.o] Error 1**
**make[4]: Leaving directory '/home/n0man/Desktop/denat/lib'*

Any help would be appreciated.

Thank you,

Noah Causin

On 9/26/2016 10:32 PM, Kevin Darbyshire-Bryant wrote:
> Easy fix.  See the added DEPENDS line in the attached patch for the 
> package Makefile :-)
>
> I'm guessing you've updated the git checkout hash to point at a 
> suitable place.
>
> Cheers,
>
> Kevin
>
>
>
> On 27/09/16 02:52, Noah Causin wrote:
>> I've been trying to compile this on LEDE, but I get this error:
>>
>> Package kmod-sched-cake is missing dependencies for the following
>> libraries:
>> nf_conntrack.ko
>>
>>
>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


[-- Attachment #2: Type: text/html, Size: 4553 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-27 14:52     ` Noah Causin
@ 2016-09-27 15:28       ` Kevin Darbyshire-Bryant
  2016-09-27 20:40         ` Noah Causin
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-27 15:28 UTC (permalink / raw)
  To: cake

[-- Attachment #1: Type: text/plain, Size: 3642 bytes --]

LEDE already has a patch included in the basefiles 'iproute2' package to 
make tc cake aware.  If you replace 
package/network/utils/iproute2/patches/950-add-cake-to-tc.patch (which 
is unaware of the nat options) with the attached nat option aware 
version then recompile, you should find you've a 'natted cake' capable 
version of the iproute utilities without adding extra packages, feeds etc.

I've not yet got around to adding a 'de-nat ipv6' option, so assuming 
you've built my latest version of cake, it can be configured to de-nat 
ipv4 but not ipv6.

Similarly I've not updated the LEDE stuff as I'd rather like some code 
review/testing done before it gets pushed out....and I get shouted at 
lots :-)

Kevin


On 27/09/16 15:52, Noah Causin wrote:
> Thank you for helping me get cake to compile on LEDE.
>
> The issue I have now is getting tc-adv to compile.
>
> I use this MakeFile:
> https://github.com/antoinedeschenes/openwrt-sqm/tree/master/net/tc-adv
>
> I use these commands to compile it:
>
> make package/feeds/sqm/tc-adv/clean -j 1 V=s
> make package/feeds/sqm/tc-adv/prepare
> USE_SOURCE_DIR=/home/n0man/Desktop/denat/ -j 1 V=s
> make package/feeds/sqm/tc-adv/compile -j 1 V=s
>
> These are the errors I get:
>
> *namespace.c: In function 'bind_etc':**
> **namespace.c:18:22: error: 'MAXPATHLEN' undeclared (first use in this
> function)**
> **  char etc_netns_path[MAXPATHLEN];**
> **                      ^**
> **namespace.c:18:22: note: each undeclared identifier is reported only
> once for each function it appears in**
> **namespace.c:20:7: warning: unused variable 'etc_name'
> [-Wunused-variable]**
> **  char etc_name[MAXPATHLEN];**
> **       ^**
> **namespace.c:19:7: warning: unused variable 'netns_name'
> [-Wunused-variable]**
> **  char netns_name[MAXPATHLEN];**
> **       ^**
> **namespace.c:18:7: warning: unused variable 'etc_netns_path'
> [-Wunused-variable]**
> **  char etc_netns_path[MAXPATHLEN];**
> **       ^**
> **namespace.c: In function 'netns_switch':**
> **namespace.c:46:16: error: 'MAXPATHLEN' undeclared (first use in this
> function)**
> **  char net_path[MAXPATHLEN];**
> **                ^**
> **namespace.c:46:7: warning: unused variable 'net_path'
> [-Wunused-variable]**
> **  char net_path[MAXPATHLEN];**
> **       ^**
> **namespace.c: In function 'netns_get_fd':**
> **namespace.c:90:15: error: 'MAXPATHLEN' undeclared (first use in this
> function)**
> **  char pathbuf[MAXPATHLEN];**
> **               ^**
> **namespace.c:90:7: warning: unused variable 'pathbuf' [-Wunused-variable]**
> **  char pathbuf[MAXPATHLEN];**
> **       ^**
> **<builtin>: recipe for target 'namespace.o' failed**
> **make[4]: *** [namespace.o] Error 1**
> **make[4]: Leaving directory '/home/n0man/Desktop/denat/lib'*
>
> Any help would be appreciated.
>
> Thank you,
>
> Noah Causin
>
> On 9/26/2016 10:32 PM, Kevin Darbyshire-Bryant wrote:
>> Easy fix.  See the added DEPENDS line in the attached patch for the
>> package Makefile :-)
>>
>> I'm guessing you've updated the git checkout hash to point at a
>> suitable place.
>>
>> Cheers,
>>
>> Kevin
>>
>>
>>
>> On 27/09/16 02:52, Noah Causin wrote:
>>> I've been trying to compile this on LEDE, but I get this error:
>>>
>>> Package kmod-sched-cake is missing dependencies for the following
>>> libraries:
>>> nf_conntrack.ko
>>>
>>>
>>
>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>
>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>

[-- Attachment #2: 950-add-cake-to-tc.patch --]
[-- Type: text/x-patch, Size: 21004 bytes --]

--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -850,4 +850,57 @@ struct tc_pie_xstats {
 	__u32 maxq;             /* maximum queue size */
 	__u32 ecn_mark;         /* packets marked with ecn*/
 };
+
+/* CAKE */
+enum {
+	TCA_CAKE_UNSPEC,
+	TCA_CAKE_BASE_RATE,
+	TCA_CAKE_DIFFSERV_MODE,
+	TCA_CAKE_ATM,
+	TCA_CAKE_FLOW_MODE,
+	TCA_CAKE_OVERHEAD,
+	TCA_CAKE_RTT,
+	TCA_CAKE_TARGET,
+	TCA_CAKE_AUTORATE,
+	TCA_CAKE_MEMORY,
+	TCA_CAKE_NAT_MODE,
+	__TCA_CAKE_MAX
+};
+#define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
+
+struct tc_cake_traffic_stats {
+	__u32 packets;
+	__u32 link_ms;
+	__u64 bytes;
+};
+
+#define TC_CAKE_MAX_TINS (8)
+struct tc_cake_xstats {
+	__u16 version;  /* == 4, increments when struct extended */
+	__u8  max_tins; /* == TC_CAKE_MAX_TINS */
+	__u8  tin_cnt;  /* <= TC_CAKE_MAX_TINS */
+
+	__u32 threshold_rate   [TC_CAKE_MAX_TINS];
+	__u32 target_us        [TC_CAKE_MAX_TINS];
+	struct tc_cake_traffic_stats sent      [TC_CAKE_MAX_TINS];
+	struct tc_cake_traffic_stats dropped   [TC_CAKE_MAX_TINS];
+	struct tc_cake_traffic_stats ecn_marked[TC_CAKE_MAX_TINS];
+	struct tc_cake_traffic_stats backlog   [TC_CAKE_MAX_TINS];
+	__u32 interval_us      [TC_CAKE_MAX_TINS];
+	__u32 way_indirect_hits[TC_CAKE_MAX_TINS];
+	__u32 way_misses       [TC_CAKE_MAX_TINS];
+	__u32 way_collisions   [TC_CAKE_MAX_TINS];
+	__u32 peak_delay_us    [TC_CAKE_MAX_TINS]; /* ~= delay to bulk flows */
+	__u32 avge_delay_us    [TC_CAKE_MAX_TINS];
+	__u32 base_delay_us    [TC_CAKE_MAX_TINS]; /* ~= delay to sparse flows */
+	__u16 sparse_flows     [TC_CAKE_MAX_TINS];
+	__u16 bulk_flows       [TC_CAKE_MAX_TINS];
+	__u16 unresponse_flows [TC_CAKE_MAX_TINS]; /* v4 - was u32 last_len  */
+	__u16 spare            [TC_CAKE_MAX_TINS]; /* v4 - split last_len */
+	__u32 max_skblen       [TC_CAKE_MAX_TINS];
+	__u32 capacity_estimate;  /* version 2 */
+	__u32 memory_limit;       /* version 3 */
+	__u32 memory_used;        /* version 3 */
+};
+
 #endif
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -63,6 +63,7 @@ TCMODULES += q_codel.o
 TCMODULES += q_fq_codel.o
 TCMODULES += q_fq.o
 TCMODULES += q_pie.o
+TCMODULES += q_cake.o
 TCMODULES += q_hhf.o
 TCMODULES += e_bpf.o
 
--- /dev/null
+++ b/tc/q_cake.c
@@ -0,0 +1,637 @@
+/*
+ * Common Applications Kept Enhanced  --  CAKE
+ *
+ *  Copyright (C) 2014-2015 Jonathan Morton <chromatix99@gmail.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ *    derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... cake [ bandwidth RATE | unlimited* | autorate_ingress ]\n"
+	                "                [ rtt TIME | datacentre | lan | metro | regional | internet* | oceanic | satellite | interplanetary ]\n"
+	                "                [ besteffort | precedence | diffserv8 | diffserv4* ]\n"
+	                "                [ flowblind | srchost | dsthost | hosts | flows* | dual-srchost | dual-dsthost | triple-isolate ]\n"
+	                "                [ nonat* | srcnat | dstnat | nat ]\n"
+	                "                [ atm | noatm* ] [ overhead N | conservative | raw* ]\n"
+	                "                [ memlimit LIMIT ]\n"
+	                "    (* marks defaults)\n");
+}
+
+static int cake_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			      struct nlmsghdr *n)
+{
+	int unlimited = 0;
+	unsigned bandwidth = 0;
+	unsigned interval = 0;
+	unsigned target = 0;
+	unsigned diffserv = 0;
+	unsigned memlimit = 0;
+	int  overhead = 0;
+	bool overhead_set = false;
+	int flowmode = -1;
+	int natmode = -1;
+	int atm = -1;
+	int autorate = -1;
+	struct rtattr *tail;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "bandwidth") == 0) {
+			NEXT_ARG();
+			if (get_rate(&bandwidth, *argv)) {
+				fprintf(stderr, "Illegal \"bandwidth\"\n");
+				return -1;
+			}
+			unlimited = 0;
+			autorate = 0;
+		} else if (strcmp(*argv, "unlimited") == 0) {
+			bandwidth = 0;
+			unlimited = 1;
+			autorate = 0;
+		} else if (strcmp(*argv, "autorate_ingress") == 0) {
+			autorate = 1;
+
+		} else if (strcmp(*argv, "rtt") == 0) {
+			NEXT_ARG();
+			if (get_time(&interval, *argv)) {
+				fprintf(stderr, "Illegal \"rtt\"\n");
+				return -1;
+			}
+			target = interval / 20;
+			if(!target)
+				target = 1;
+		} else if (strcmp(*argv, "datacentre") == 0) {
+			interval = 100;
+			target   =   5;
+		} else if (strcmp(*argv, "lan") == 0) {
+			interval = 1000;
+			target   =   50;
+		} else if (strcmp(*argv, "metro") == 0) {
+			interval = 10000;
+			target   =   500;
+		} else if (strcmp(*argv, "regional") == 0) {
+			interval = 30000;
+			target    = 1500;
+		} else if (strcmp(*argv, "internet") == 0) {
+			interval = 100000;
+			target   =   5000;
+		} else if (strcmp(*argv, "oceanic") == 0) {
+			interval = 300000;
+			target   =  15000;
+		} else if (strcmp(*argv, "satellite") == 0) {
+			interval = 1000000;
+			target   =   50000;
+		} else if (strcmp(*argv, "interplanetary") == 0) {
+			interval = 3600000000U;
+			target   =       5000;
+
+		} else if (strcmp(*argv, "besteffort") == 0) {
+			diffserv = 1;
+		} else if (strcmp(*argv, "precedence") == 0) {
+			diffserv = 2;
+		} else if (strcmp(*argv, "diffserv8") == 0) {
+			diffserv = 3;
+		} else if (strcmp(*argv, "diffserv4") == 0) {
+			diffserv = 4;
+		} else if (strcmp(*argv, "diffserv") == 0) {
+			diffserv = 4;
+		} else if (strcmp(*argv, "diffserv-llt") == 0) {
+			diffserv = 5;
+
+		} else if (strcmp(*argv, "flowblind") == 0) {
+			flowmode = 0;
+		} else if (strcmp(*argv, "srchost") == 0) {
+			flowmode = 1;
+		} else if (strcmp(*argv, "dsthost") == 0) {
+			flowmode = 2;
+		} else if (strcmp(*argv, "hosts") == 0) {
+			flowmode = 3;
+		} else if (strcmp(*argv, "flows") == 0) {
+			flowmode = 4;
+		} else if (strcmp(*argv, "dual-srchost") == 0) {
+			flowmode = 5;
+		} else if (strcmp(*argv, "dual-dsthost") == 0) {
+			flowmode = 6;
+		} else if (strcmp(*argv, "triple-isolate") == 0) {
+			flowmode = 7;
+
+		} else if (strcmp(*argv, "nonat") == 0) {
+			natmode = 0;
+		} else if (strcmp(*argv, "srcnat") == 0) {
+			natmode = 1;
+		} else if (strcmp(*argv, "dstnat") == 0) {
+			natmode = 2;
+		} else if (strcmp(*argv, "nat") == 0) {
+			natmode = 3;
+
+		} else if (strcmp(*argv, "atm") == 0) {
+			atm = 1;
+		} else if (strcmp(*argv, "noatm") == 0) {
+			atm = 0;
+
+		} else if (strcmp(*argv, "raw") == 0) {
+			atm = 0;
+			overhead = 0;
+			overhead_set = true;
+		} else if (strcmp(*argv, "conservative") == 0) {
+			/*
+			 * Deliberately over-estimate overhead:
+			 * one whole ATM cell plus ATM framing.
+			 * A safe choice if the actual overhead is unknown.
+			 */
+			atm = 1;
+			overhead = 48;
+			overhead_set = true;
+
+		/* Various ADSL framing schemes */
+		} else if (strcmp(*argv, "ipoa-vcmux") == 0) {
+			atm = 1;
+			overhead += 8;
+			overhead_set = true;
+		} else if (strcmp(*argv, "ipoa-llcsnap") == 0) {
+			atm = 1;
+			overhead += 16;
+			overhead_set = true;
+		} else if (strcmp(*argv, "bridged-vcmux") == 0) {
+			atm = 1;
+			overhead += 24;
+			overhead_set = true;
+		} else if (strcmp(*argv, "bridged-llcsnap") == 0) {
+			atm = 1;
+			overhead += 32;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoa-vcmux") == 0) {
+			atm = 1;
+			overhead += 10;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoa-llc") == 0) {
+			atm = 1;
+			overhead += 14;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoe-vcmux") == 0) {
+			atm = 1;
+			overhead += 32;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoe-llcsnap") == 0) {
+			atm = 1;
+			overhead += 40;
+			overhead_set = true;
+
+		/* Typical VDSL2 framing schemes */
+		/* NB: PTM includes HDLC's 0x7D/7E expansion, adds extra 1/128 */
+		} else if (strcmp(*argv, "pppoe-ptm") == 0) {
+			atm = 0;
+			overhead += 27;
+		} else if (strcmp(*argv, "bridged-ptm") == 0) {
+			atm = 0;
+			overhead += 19;
+
+		} else if (strcmp(*argv, "via-ethernet") == 0) {
+			/*
+			 * The above overheads are relative to an IP packet,
+			 * but if the physical interface is Ethernet, Linux
+			 * includes Ethernet framing overhead already.
+			 */
+			overhead -= 14;
+			overhead_set = true;
+
+		/* Additional Ethernet-related overheads used by some ISPs */
+		} else if (strcmp(*argv, "ether-phy") == 0) {
+			/* ethernet pre-amble & interframe gap 20 bytes
+			 * Linux will have already accounted for MACs & frame type 14 bytes
+			 * you probably want to add an FCS as well*/
+			overhead += 20;
+			overhead_set = true;
+		} else if (strcmp(*argv, "ether-all") == 0) {
+			/* ethernet pre-amble & interframe gap & FCS
+			 * Linux will have already accounted for MACs & frame type 14 bytes
+			 * you may need to add vlan tag*/
+			overhead += 24;
+			overhead_set = true;
+
+		} else if (strcmp(*argv, "ether-fcs") == 0) {
+			/* Frame Check Sequence */
+			/* we ignore the minimum frame size, because IP packets usually meet it */
+			overhead += 4;
+			overhead_set = true;
+		} else if (strcmp(*argv, "ether-vlan") == 0) {
+			/* 802.1q VLAN tag - may be repeated */
+			overhead += 4;
+			overhead_set = true;
+
+		} else if (strcmp(*argv, "overhead") == 0) {
+			char* p = NULL;
+			NEXT_ARG();
+			overhead = strtol(*argv, &p, 10);
+			if(!p || *p || !*argv || overhead < -64 || overhead > 256) {
+				fprintf(stderr, "Illegal \"overhead\", valid range is -64 to 256\\n");
+				return -1;
+			}
+			overhead_set = true;
+
+		} else if (strcmp(*argv, "memlimit") == 0) {
+			NEXT_ARG();
+			if(get_size(&memlimit, *argv)) {
+				fprintf(stderr, "Illegal value for \"memlimit\": \"%s\"\n", *argv);
+				return -1;
+			}
+
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	if (bandwidth || unlimited)
+		addattr_l(n, 1024, TCA_CAKE_BASE_RATE, &bandwidth, sizeof(bandwidth));
+	if (diffserv)
+		addattr_l(n, 1024, TCA_CAKE_DIFFSERV_MODE, &diffserv, sizeof(diffserv));
+	if (atm != -1)
+		addattr_l(n, 1024, TCA_CAKE_ATM, &atm, sizeof(atm));
+	if (flowmode != -1)
+		addattr_l(n, 1024, TCA_CAKE_FLOW_MODE, &flowmode, sizeof(flowmode));
+	if (natmode != -1)
+		addattr_l(n, 1024, TCA_CAKE_NAT_MODE, &natmode, sizeof(natmode));
+	if (overhead_set)
+		addattr_l(n, 1024, TCA_CAKE_OVERHEAD, &overhead, sizeof(overhead));
+	if (interval)
+		addattr_l(n, 1024, TCA_CAKE_RTT, &interval, sizeof(interval));
+	if (target)
+		addattr_l(n, 1024, TCA_CAKE_TARGET, &target, sizeof(target));
+	if (autorate != -1)
+		addattr_l(n, 1024, TCA_CAKE_AUTORATE, &autorate, sizeof(autorate));
+	if (memlimit)
+		addattr_l(n, 1024, TCA_CAKE_MEMORY, &memlimit, sizeof(memlimit));
+
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+
+static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CAKE_MAX + 1];
+	unsigned bandwidth = 0;
+	unsigned diffserv = 0;
+	unsigned flowmode = 0;
+	unsigned natmode = 0;
+	unsigned interval = 0;
+	unsigned memlimit = 0;
+	int overhead = 0;
+	int atm = 0;
+	int autorate = 0;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CAKE_MAX, opt);
+
+	if (tb[TCA_CAKE_BASE_RATE] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_BASE_RATE]) >= sizeof(__u32)) {
+		bandwidth = rta_getattr_u32(tb[TCA_CAKE_BASE_RATE]);
+		if(bandwidth)
+			fprintf(f, "bandwidth %s ", sprint_rate(bandwidth, b1));
+		else
+			fprintf(f, "unlimited ");
+	}
+	if (tb[TCA_CAKE_AUTORATE] &&
+		RTA_PAYLOAD(tb[TCA_CAKE_AUTORATE]) >= sizeof(__u32)) {
+		autorate = rta_getattr_u32(tb[TCA_CAKE_AUTORATE]);
+		if(autorate == 1)
+			fprintf(f, "autorate_ingress ");
+		else if(autorate)
+			fprintf(f, "(?autorate?) ");
+	}
+	if (tb[TCA_CAKE_DIFFSERV_MODE] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_DIFFSERV_MODE]) >= sizeof(__u32)) {
+		diffserv = rta_getattr_u32(tb[TCA_CAKE_DIFFSERV_MODE]);
+		switch(diffserv) {
+		case 1:
+			fprintf(f, "besteffort ");
+			break;
+		case 2:
+			fprintf(f, "precedence ");
+			break;
+		case 3:
+			fprintf(f, "diffserv8 ");
+			break;
+		case 4:
+			fprintf(f, "diffserv4 ");
+			break;
+		case 5:
+			fprintf(f, "diffserv-llt ");
+			break;
+		default:
+			fprintf(f, "(?diffserv?) ");
+			break;
+		};
+	}
+	if (tb[TCA_CAKE_FLOW_MODE] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_FLOW_MODE]) >= sizeof(__u32)) {
+		flowmode = rta_getattr_u32(tb[TCA_CAKE_FLOW_MODE]);
+		switch(flowmode) {
+		case 0:
+			fprintf(f, "flowblind ");
+			break;
+		case 1:
+			fprintf(f, "srchost ");
+			break;
+		case 2:
+			fprintf(f, "dsthost ");
+			break;
+		case 3:
+			fprintf(f, "hosts ");
+			break;
+		case 4:
+			fprintf(f, "flows ");
+			break;
+		case 5:
+			fprintf(f, "dual-srchost ");
+			break;
+		case 6:
+			fprintf(f, "dual-dsthost ");
+			break;
+		case 7:
+			fprintf(f, "triple-isolate ");
+			break;
+		default:
+			fprintf(f, "(?flowmode?) ");
+			break;
+		};
+	}
+
+	if (tb[TCA_CAKE_NAT_MODE] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_NAT_MODE]) >= sizeof(__u32)) {
+		natmode = rta_getattr_u32(tb[TCA_CAKE_NAT_MODE]);
+		switch(natmode) {
+		case 0:
+			fprintf(f, "nonat ");
+			break;
+		case 1:
+			fprintf(f, "srcnat ");
+			break;
+		case 2:
+			fprintf(f, "dstnat ");
+			break;
+		case 3:
+			fprintf(f, "nat ");
+			break;
+		default:
+			fprintf(f, "(?natmode?) ");
+			break;
+		};
+	}
+
+	if (tb[TCA_CAKE_ATM] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_ATM]) >= sizeof(__u32)) {
+		atm = rta_getattr_u32(tb[TCA_CAKE_ATM]);
+	}
+	if (tb[TCA_CAKE_OVERHEAD] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_OVERHEAD]) >= sizeof(__u32)) {
+		overhead = rta_getattr_u32(tb[TCA_CAKE_OVERHEAD]);
+	}
+	if (tb[TCA_CAKE_RTT] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_RTT]) >= sizeof(__u32)) {
+		interval = rta_getattr_u32(tb[TCA_CAKE_RTT]);
+	}
+
+	if (interval)
+		fprintf(f, "rtt %s ", sprint_time(interval, b2));
+
+	if (atm)
+		fprintf(f, "atm ");
+	else if (overhead)
+		fprintf(f, "noatm ");
+
+	if (overhead || atm)
+		fprintf(f, "overhead %d ", overhead);
+
+	if (!atm && !overhead)
+		fprintf(f, "raw ");
+
+	if (memlimit)
+		fprintf(f, "memlimit %s", sprint_size(memlimit, b1));
+
+	return 0;
+}
+
+static int cake_print_xstats(struct qdisc_util *qu, FILE *f,
+				 struct rtattr *xstats)
+{
+	/* fq_codel stats format borrowed */
+	struct tc_fq_codel_xstats *st;
+	struct tc_cake_xstats     *stnc;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+
+	if (xstats == NULL)
+		return 0;
+
+	if (RTA_PAYLOAD(xstats) < sizeof(st->type))
+		return -1;
+
+	st   = RTA_DATA(xstats);
+	stnc = RTA_DATA(xstats);
+
+	if (st->type == TCA_FQ_CODEL_XSTATS_QDISC && RTA_PAYLOAD(xstats) >= sizeof(*st)) {
+		fprintf(f, "  maxpacket %u drop_overlimit %u new_flow_count %u ecn_mark %u",
+			st->qdisc_stats.maxpacket,
+			st->qdisc_stats.drop_overlimit,
+			st->qdisc_stats.new_flow_count,
+			st->qdisc_stats.ecn_mark);
+		fprintf(f, "\n  new_flows_len %u old_flows_len %u",
+			st->qdisc_stats.new_flows_len,
+			st->qdisc_stats.old_flows_len);
+	} else if (st->type == TCA_FQ_CODEL_XSTATS_CLASS && RTA_PAYLOAD(xstats) >= sizeof(*st)) {
+		fprintf(f, "  deficit %d count %u lastcount %u ldelay %s",
+			st->class_stats.deficit,
+			st->class_stats.count,
+			st->class_stats.lastcount,
+			sprint_time(st->class_stats.ldelay, b1));
+		if (st->class_stats.dropping) {
+			fprintf(f, " dropping");
+			if (st->class_stats.drop_next < 0)
+				fprintf(f, " drop_next -%s",
+					sprint_time(-st->class_stats.drop_next, b1));
+			else
+				fprintf(f, " drop_next %s",
+					sprint_time(st->class_stats.drop_next, b1));
+		}
+	} else if (stnc->version >= 1 && stnc->version < 0xFF
+				&& stnc->max_tins == TC_CAKE_MAX_TINS
+				&& RTA_PAYLOAD(xstats) >= offsetof(struct tc_cake_xstats, capacity_estimate))
+	{
+		int i;
+
+		if(stnc->version >= 3)
+			fprintf(f, " memory used: %s of %s\n", sprint_size(stnc->memory_used, b1), sprint_size(stnc->memory_limit, b2));
+
+		if(stnc->version >= 2)
+			fprintf(f, " capacity estimate: %s\n", sprint_rate(stnc->capacity_estimate, b1));
+
+		switch(stnc->tin_cnt) {
+		case 4:
+			fprintf(f, "                 Bulk   Best Effort      Video       Voice\n");
+			break;
+
+		case 5:
+			fprintf(f, "              Low Loss  Best Effort   Low Delay       Bulk  Net Control\n");
+			break;
+
+		default:
+			fprintf(f, "          ");
+			for(i=0; i < stnc->tin_cnt; i++)
+				fprintf(f, "       Tin %u", i);
+			fprintf(f, "\n");
+		};
+
+		fprintf(f, "  thresh  ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_rate(stnc->threshold_rate[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  target  ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->target_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  interval");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->interval_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  pk_delay");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->peak_delay_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  av_delay");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->avge_delay_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  sp_delay");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->base_delay_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  pkts    ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->sent[i].packets);
+		fprintf(f, "\n");
+
+		fprintf(f, "  bytes   ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12llu", stnc->sent[i].bytes);
+		fprintf(f, "\n");
+
+		fprintf(f, "  way_inds");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->way_indirect_hits[i]);
+		fprintf(f, "\n");
+
+		fprintf(f, "  way_miss");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->way_misses[i]);
+		fprintf(f, "\n");
+
+		fprintf(f, "  way_cols");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->way_collisions[i]);
+		fprintf(f, "\n");
+
+		fprintf(f, "  drops   ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->dropped[i].packets);
+		fprintf(f, "\n");
+
+		fprintf(f, "  marks   ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->ecn_marked[i].packets);
+		fprintf(f, "\n");
+
+		fprintf(f, "  sp_flows");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->sparse_flows[i]);
+		fprintf(f, "\n");
+
+		fprintf(f, "  bk_flows");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->bulk_flows[i]);
+		fprintf(f, "\n");
+
+		if(stnc->version >= 4) {
+			fprintf(f, "  un_flows");
+			for(i=0; i < stnc->tin_cnt; i++)
+				fprintf(f, "%12u", stnc->unresponse_flows[i]);
+			fprintf(f, "\n");
+		}
+
+		fprintf(f, "  max_len ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->max_skblen[i]);
+		fprintf(f, "\n");
+	} else {
+		return -1;
+	}
+	return 0;
+}
+
+struct qdisc_util cake_qdisc_util = {
+	.id		= "cake",
+	.parse_qopt	= cake_parse_opt,
+	.print_qopt	= cake_print_opt,
+	.print_xstats	= cake_print_xstats,
+};

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-27 15:28       ` Kevin Darbyshire-Bryant
@ 2016-09-27 20:40         ` Noah Causin
  2016-09-27 20:44           ` Jonathan Morton
                             ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Noah Causin @ 2016-09-27 20:40 UTC (permalink / raw)
  To: cake

[-- Attachment #1: Type: text/plain, Size: 4310 bytes --]

Thank you for all your help.

The de-nat with dual-flow isolation works great.  I tested it 
simultaneously with two separate virtual machines, one running a Flent 
50 flows download test and the other running a Flent 8 flows download 
test.  Throughput was even between the machines, and the latency was great.

Noah Causin


On 9/27/2016 11:28 AM, Kevin Darbyshire-Bryant wrote:
> LEDE already has a patch included in the basefiles 'iproute2' package 
> to make tc cake aware.  If you replace 
> package/network/utils/iproute2/patches/950-add-cake-to-tc.patch (which 
> is unaware of the nat options) with the attached nat option aware 
> version then recompile, you should find you've a 'natted cake' capable 
> version of the iproute utilities without adding extra packages, feeds 
> etc.
>
> I've not yet got around to adding a 'de-nat ipv6' option, so assuming 
> you've built my latest version of cake, it can be configured to de-nat 
> ipv4 but not ipv6.
>
> Similarly I've not updated the LEDE stuff as I'd rather like some code 
> review/testing done before it gets pushed out....and I get shouted at 
> lots :-)
>
> Kevin
>
>
> On 27/09/16 15:52, Noah Causin wrote:
>> Thank you for helping me get cake to compile on LEDE.
>>
>> The issue I have now is getting tc-adv to compile.
>>
>> I use this MakeFile:
>> https://github.com/antoinedeschenes/openwrt-sqm/tree/master/net/tc-adv
>>
>> I use these commands to compile it:
>>
>> make package/feeds/sqm/tc-adv/clean -j 1 V=s
>> make package/feeds/sqm/tc-adv/prepare
>> USE_SOURCE_DIR=/home/n0man/Desktop/denat/ -j 1 V=s
>> make package/feeds/sqm/tc-adv/compile -j 1 V=s
>>
>> These are the errors I get:
>>
>> *namespace.c: In function 'bind_etc':**
>> **namespace.c:18:22: error: 'MAXPATHLEN' undeclared (first use in this
>> function)**
>> **  char etc_netns_path[MAXPATHLEN];**
>> **                      ^**
>> **namespace.c:18:22: note: each undeclared identifier is reported only
>> once for each function it appears in**
>> **namespace.c:20:7: warning: unused variable 'etc_name'
>> [-Wunused-variable]**
>> **  char etc_name[MAXPATHLEN];**
>> **       ^**
>> **namespace.c:19:7: warning: unused variable 'netns_name'
>> [-Wunused-variable]**
>> **  char netns_name[MAXPATHLEN];**
>> **       ^**
>> **namespace.c:18:7: warning: unused variable 'etc_netns_path'
>> [-Wunused-variable]**
>> **  char etc_netns_path[MAXPATHLEN];**
>> **       ^**
>> **namespace.c: In function 'netns_switch':**
>> **namespace.c:46:16: error: 'MAXPATHLEN' undeclared (first use in this
>> function)**
>> **  char net_path[MAXPATHLEN];**
>> **                ^**
>> **namespace.c:46:7: warning: unused variable 'net_path'
>> [-Wunused-variable]**
>> **  char net_path[MAXPATHLEN];**
>> **       ^**
>> **namespace.c: In function 'netns_get_fd':**
>> **namespace.c:90:15: error: 'MAXPATHLEN' undeclared (first use in this
>> function)**
>> **  char pathbuf[MAXPATHLEN];**
>> **               ^**
>> **namespace.c:90:7: warning: unused variable 'pathbuf' 
>> [-Wunused-variable]**
>> **  char pathbuf[MAXPATHLEN];**
>> **       ^**
>> **<builtin>: recipe for target 'namespace.o' failed**
>> **make[4]: *** [namespace.o] Error 1**
>> **make[4]: Leaving directory '/home/n0man/Desktop/denat/lib'*
>>
>> Any help would be appreciated.
>>
>> Thank you,
>>
>> Noah Causin
>>
>> On 9/26/2016 10:32 PM, Kevin Darbyshire-Bryant wrote:
>>> Easy fix.  See the added DEPENDS line in the attached patch for the
>>> package Makefile :-)
>>>
>>> I'm guessing you've updated the git checkout hash to point at a
>>> suitable place.
>>>
>>> Cheers,
>>>
>>> Kevin
>>>
>>>
>>>
>>> On 27/09/16 02:52, Noah Causin wrote:
>>>> I've been trying to compile this on LEDE, but I get this error:
>>>>
>>>> Package kmod-sched-cake is missing dependencies for the following
>>>> libraries:
>>>> nf_conntrack.ko
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Cake mailing list
>>> Cake@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cake
>>
>>
>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>>
>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


[-- Attachment #2: Type: text/html, Size: 7525 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-27 20:40         ` Noah Causin
@ 2016-09-27 20:44           ` Jonathan Morton
       [not found]           ` <CAA93jw6rPE8aAGEiqf7jp3hc1J0ThrVer8PFmFLPBqANdtEixg@mail.gmail.com>
  2016-09-28  4:38           ` Kevin Darbyshire-Bryant
  2 siblings, 0 replies; 29+ messages in thread
From: Jonathan Morton @ 2016-09-27 20:44 UTC (permalink / raw)
  To: Noah Causin; +Cc: cake


> On 27 Sep, 2016, at 23:40, Noah Causin <n0manletter@gmail.com> wrote:
> 
> The de-nat with dual-flow isolation works great.  I tested it simultaneously with two separate virtual machines, one running a Flent 50 flows download test and the other running a Flent 8 flows download test.  Throughput was even between the machines, and the latency was great.

Awesome.

Now, of course, I have work to do…

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
       [not found]           ` <CAA93jw6rPE8aAGEiqf7jp3hc1J0ThrVer8PFmFLPBqANdtEixg@mail.gmail.com>
@ 2016-09-27 20:58             ` Noah Causin
  0 siblings, 0 replies; 29+ messages in thread
From: Noah Causin @ 2016-09-27 20:58 UTC (permalink / raw)
  Cc: cake

[-- Attachment #1: Type: text/plain, Size: 4856 bytes --]

Idle Ping:  35ms

Link:  75mbps down 10Mbps up.

Interface:  WAN

No ECN

I was browsing around to test the responsiveness, and I started a 
dslreports speedtest in the last few seconds.

Noah


On 9/27/2016 4:41 PM, Dave Taht wrote:
> got pictures? Can you send the *flent.gz files?
>
> On Tue, Sep 27, 2016 at 1:40 PM, Noah Causin <n0manletter@gmail.com> wrote:
>> Thank you for all your help.
>>
>> The de-nat with dual-flow isolation works great.  I tested it simultaneously
>> with two separate virtual machines, one running a Flent 50 flows download
>> test and the other running a Flent 8 flows download test.  Throughput was
>> even between the machines, and the latency was great.
>>
>> Noah Causin
>>
>>
>> On 9/27/2016 11:28 AM, Kevin Darbyshire-Bryant wrote:
>>
>> LEDE already has a patch included in the basefiles 'iproute2' package to
>> make tc cake aware.  If you replace
>> package/network/utils/iproute2/patches/950-add-cake-to-tc.patch (which is
>> unaware of the nat options) with the attached nat option aware version then
>> recompile, you should find you've a 'natted cake' capable version of the
>> iproute utilities without adding extra packages, feeds etc.
>>
>> I've not yet got around to adding a 'de-nat ipv6' option, so assuming you've
>> built my latest version of cake, it can be configured to de-nat ipv4 but not
>> ipv6.
>>
>> Similarly I've not updated the LEDE stuff as I'd rather like some code
>> review/testing done before it gets pushed out....and I get shouted at lots
>> :-)
>>
>> Kevin
>>
>>
>> On 27/09/16 15:52, Noah Causin wrote:
>>
>> Thank you for helping me get cake to compile on LEDE.
>>
>> The issue I have now is getting tc-adv to compile.
>>
>> I use this MakeFile:
>> https://github.com/antoinedeschenes/openwrt-sqm/tree/master/net/tc-adv
>>
>> I use these commands to compile it:
>>
>> make package/feeds/sqm/tc-adv/clean -j 1 V=s
>> make package/feeds/sqm/tc-adv/prepare
>> USE_SOURCE_DIR=/home/n0man/Desktop/denat/ -j 1 V=s
>> make package/feeds/sqm/tc-adv/compile -j 1 V=s
>>
>> These are the errors I get:
>>
>> *namespace.c: In function 'bind_etc':**
>> **namespace.c:18:22: error: 'MAXPATHLEN' undeclared (first use in this
>> function)**
>> **  char etc_netns_path[MAXPATHLEN];**
>> **                      ^**
>> **namespace.c:18:22: note: each undeclared identifier is reported only
>> once for each function it appears in**
>> **namespace.c:20:7: warning: unused variable 'etc_name'
>> [-Wunused-variable]**
>> **  char etc_name[MAXPATHLEN];**
>> **       ^**
>> **namespace.c:19:7: warning: unused variable 'netns_name'
>> [-Wunused-variable]**
>> **  char netns_name[MAXPATHLEN];**
>> **       ^**
>> **namespace.c:18:7: warning: unused variable 'etc_netns_path'
>> [-Wunused-variable]**
>> **  char etc_netns_path[MAXPATHLEN];**
>> **       ^**
>> **namespace.c: In function 'netns_switch':**
>> **namespace.c:46:16: error: 'MAXPATHLEN' undeclared (first use in this
>> function)**
>> **  char net_path[MAXPATHLEN];**
>> **                ^**
>> **namespace.c:46:7: warning: unused variable 'net_path'
>> [-Wunused-variable]**
>> **  char net_path[MAXPATHLEN];**
>> **       ^**
>> **namespace.c: In function 'netns_get_fd':**
>> **namespace.c:90:15: error: 'MAXPATHLEN' undeclared (first use in this
>> function)**
>> **  char pathbuf[MAXPATHLEN];**
>> **               ^**
>> **namespace.c:90:7: warning: unused variable 'pathbuf' [-Wunused-variable]**
>> **  char pathbuf[MAXPATHLEN];**
>> **       ^**
>> **<builtin>: recipe for target 'namespace.o' failed**
>> **make[4]: *** [namespace.o] Error 1**
>> **make[4]: Leaving directory '/home/n0man/Desktop/denat/lib'*
>>
>> Any help would be appreciated.
>>
>> Thank you,
>>
>> Noah Causin
>>
>> On 9/26/2016 10:32 PM, Kevin Darbyshire-Bryant wrote:
>>
>> Easy fix.  See the added DEPENDS line in the attached patch for the
>> package Makefile :-)
>>
>> I'm guessing you've updated the git checkout hash to point at a
>> suitable place.
>>
>> Cheers,
>>
>> Kevin
>>
>>
>>
>> On 27/09/16 02:52, Noah Causin wrote:
>>
>> I've been trying to compile this on LEDE, but I get this error:
>>
>> Package kmod-sched-cake is missing dependencies for the following
>> libraries:
>> nf_conntrack.ko
>>
>>
>>
>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>>
>>
>>
>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>>
>>
>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>>
>>
>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>>
>
>


[-- Attachment #2: DENATDUALFLOW.zip --]
[-- Type: application/x-zip-compressed, Size: 1191311 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-26  3:20 [Cake] de-natting & host fairness Kevin Darbyshire-Bryant
                   ` (2 preceding siblings ...)
  2016-09-27  1:52 ` Noah Causin
@ 2016-09-27 23:08 ` Jonathan Morton
  2016-09-28  2:56   ` Kevin Darbyshire-Bryant
  2016-09-28  5:56   ` Sebastian Moeller
  3 siblings, 2 replies; 29+ messages in thread
From: Jonathan Morton @ 2016-09-27 23:08 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake


> On 26 Sep, 2016, at 06:20, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> Another github user 'tegularius' presented some beautifully crafted code that did the lookups in a much neater way.  Originally it too had an 'ingress' lookup problem.  This was worked on and I hacked some conditional 'denat' options into cake & tc.
> 
> For your 'delight' a denat cake https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along with a matching tc https://github.com/kdarbyshirebryant/tc-adv/tree/denat

As I’m now at the stage of trying to merge this, I’m going to make some executive design decisions:

- De-NAT IPv4 packets only.  I think it’s safe to assume that IPv6 NAT will be rare, and in any case will typically preserve host distinctions.  This eliminates switch blocks in favour of simple if blocks.

- Don’t bother with the distinction between src-NAT and dst-NAT lookups.  The full lookup has to be done anyway and then masked off, the use-case for the limited functionality is nebulous, and all we’re doing is adding a lot of nasty conditional branches to the fast path.

This in turn reduces the configuration interface for the feature to a flag, which I’ll call “nat”.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-27 23:08 ` Jonathan Morton
@ 2016-09-28  2:56   ` Kevin Darbyshire-Bryant
  2016-09-28  3:06     ` Jonathan Morton
  2016-09-28  5:56   ` Sebastian Moeller
  1 sibling, 1 reply; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-28  2:56 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake



On 28/09/16 00:08, Jonathan Morton wrote:
>
>> On 26 Sep, 2016, at 06:20, Kevin Darbyshire-Bryant
>> <kevin@darbyshire-bryant.me.uk> wrote:
>>
>> Another github user 'tegularius' presented some beautifully crafted
>> code that did the lookups in a much neater way.  Originally it too
>> had an 'ingress' lookup problem.  This was worked on and I hacked
>> some conditional 'denat' options into cake & tc.
>>
>> For your 'delight' a denat cake
>> https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along
>> with a matching tc
>> https://github.com/kdarbyshirebryant/tc-adv/tree/denat
>
> As I’m now at the stage of trying to merge this, I’m going to make
> some executive design decisions:
>
> - De-NAT IPv4 packets only.  I think it’s safe to assume that IPv6
> NAT will be rare, and in any case will typically preserve host
> distinctions.  This eliminates switch blocks in favour of simple if
> blocks.

Agree completely.  The IPv6 stuff was inherited/for completeness but 
anyone doing many to one host masquerading with IPv6 really needs a slap!

>
> - Don’t bother with the distinction between src-NAT and dst-NAT
> lookups.  The full lookup has to be done anyway and then masked off,
> the use-case for the limited functionality is nebulous, and all we’re
> doing is adding a lot of nasty conditional branches to the fast
> path.

I winced at every condition as it was being put in believe me!  It is 
horrible and I think now is a left over from when I was trying to 
understand how/why things weren't being translated as expected.  I still 
don't completely trust it, but that's what testing is for :-)

>
> This in turn reduces the configuration interface for the feature to a
> flag, which I’ll call “nat”.

Agreed.

Does this need to be another variable/parameter or could it be the next 
bit along in the flow type?

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-28  2:56   ` Kevin Darbyshire-Bryant
@ 2016-09-28  3:06     ` Jonathan Morton
  2016-09-28  3:33       ` Kevin Darbyshire-Bryant
  2016-09-28  6:07       ` Kevin Darbyshire-Bryant
  0 siblings, 2 replies; 29+ messages in thread
From: Jonathan Morton @ 2016-09-28  3:06 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake


> On 28 Sep, 2016, at 05:56, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> Does this need to be another variable/parameter or could it be the next bit along in the flow type?

I’ve already pushed it to the ‘cobalt’ branch, so you can see how I’ve done it and start testing.  I’ve verified that it compiles, no more than that so far.

For configuration, there is a separate flag parameter passed.  Internally, I’ve used another bit of the existing flow_mode field (but not the next one along).  The latter is also how the configuration is read back out again to tc.

Overall, the patch ended up much smaller than the original.  Switch statements in C are actually quite verbose.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-28  3:06     ` Jonathan Morton
@ 2016-09-28  3:33       ` Kevin Darbyshire-Bryant
  2016-09-28  3:49         ` Jonathan Morton
  2016-09-28  6:07       ` Kevin Darbyshire-Bryant
  1 sibling, 1 reply; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-28  3:33 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake



On 28/09/16 04:06, Jonathan Morton wrote:
>
>> On 28 Sep, 2016, at 05:56, Kevin Darbyshire-Bryant
>> <kevin@darbyshire-bryant.me.uk> wrote:
>>
>> Does this need to be another variable/parameter or could it be the
>> next bit along in the flow type?
>
> I’ve already pushed it to the ‘cobalt’ branch, so you can see how
> I’ve done it and start testing.  I’ve verified that it compiles, no
> more than that so far.
>
> For configuration, there is a separate flag parameter passed.
> Internally, I’ve used another bit of the existing flow_mode field
> (but not the next one along).  The latter is also how the
> configuration is read back out again to tc.
>
> Overall, the patch ended up much smaller than the original.  Switch
> statements in C are actually quite verbose.

Looks good and as you say much smaller without the switch stuff and 
IPv6.  As a further bikeshed.....

Is it worth doing:

if (reverse) {
    all the src/dst swaps
    if ports do port src/dst swaps
    nf_ct_put
} else {
    all the dst/src swaps
    if ports do port dst/src swaps
}

Code gets duplicated...or possibly not depending on the compiler, but 
those ternaries are if/else in disguise...and we do a few of them, so if 
we did one at the cost of some duplicate code......



>
> - Jonathan Morton
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-28  3:33       ` Kevin Darbyshire-Bryant
@ 2016-09-28  3:49         ` Jonathan Morton
  0 siblings, 0 replies; 29+ messages in thread
From: Jonathan Morton @ 2016-09-28  3:49 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake


> On 28 Sep, 2016, at 06:33, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> those ternaries are if/else in disguise...

Many CPUs can handle those as conditional moves without branching - including ARM in particular; near-universal conditional execution was one of its original headline features.  Most x86 CPUs (except very old ones) and some of the embedded-class PowerPCs (which are often found in “big” network appliances) also qualify.  Unswitching those would potentially be a retrograde step on those CPUs.

However the presence of a conditional function call suggests that unswitching would not in fact be harmful, except for some duplication of source code - since the branch has to be made anyway.  I think many compilers would be able to perform the loads before the branch and the stores after it, which would execute very slickly, while some CPUs do not execute large numbers of conditional moves very efficiently.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-27 20:40         ` Noah Causin
  2016-09-27 20:44           ` Jonathan Morton
       [not found]           ` <CAA93jw6rPE8aAGEiqf7jp3hc1J0ThrVer8PFmFLPBqANdtEixg@mail.gmail.com>
@ 2016-09-28  4:38           ` Kevin Darbyshire-Bryant
  2016-09-28  5:08             ` Noah Causin
  2 siblings, 1 reply; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-28  4:38 UTC (permalink / raw)
  To: cake

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]



On 27/09/16 21:40, Noah Causin wrote:
> Thank you for all your help.
>
> The de-nat with dual-flow isolation works great.  I tested it
> simultaneously with two separate virtual machines, one running a Flent
> 50 flows download test and the other running a Flent 8 flows download
> test.  Throughput was even between the machines, and the latency was great.
>
> Noah Causin

A new tc patch for when you update to Jonathan's re-working of things 
found at https://github.com/dtaht/sch_cake/commits/cobalt

I'll try to push this into LEDE properly in a few days unless somebody 
finds a nasty lurking :-)

Kevin

[-- Attachment #2: 950-add-cake-to-tc.patch --]
[-- Type: text/x-patch, Size: 21298 bytes --]

--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -850,4 +850,57 @@ struct tc_pie_xstats {
 	__u32 maxq;             /* maximum queue size */
 	__u32 ecn_mark;         /* packets marked with ecn*/
 };
+
+/* CAKE */
+enum {
+	TCA_CAKE_UNSPEC,
+	TCA_CAKE_BASE_RATE,
+	TCA_CAKE_DIFFSERV_MODE,
+	TCA_CAKE_ATM,
+	TCA_CAKE_FLOW_MODE,
+	TCA_CAKE_OVERHEAD,
+	TCA_CAKE_RTT,
+	TCA_CAKE_TARGET,
+	TCA_CAKE_AUTORATE,
+	TCA_CAKE_MEMORY,
+	TCA_CAKE_NAT,
+	__TCA_CAKE_MAX
+};
+#define TCA_CAKE_MAX	(__TCA_CAKE_MAX - 1)
+
+struct tc_cake_traffic_stats {
+	__u32 packets;
+	__u32 link_ms;
+	__u64 bytes;
+};
+
+#define TC_CAKE_MAX_TINS (8)
+struct tc_cake_xstats {
+	__u16 version;  /* == 4, increments when struct extended */
+	__u8  max_tins; /* == TC_CAKE_MAX_TINS */
+	__u8  tin_cnt;  /* <= TC_CAKE_MAX_TINS */
+
+	__u32 threshold_rate   [TC_CAKE_MAX_TINS];
+	__u32 target_us        [TC_CAKE_MAX_TINS];
+	struct tc_cake_traffic_stats sent      [TC_CAKE_MAX_TINS];
+	struct tc_cake_traffic_stats dropped   [TC_CAKE_MAX_TINS];
+	struct tc_cake_traffic_stats ecn_marked[TC_CAKE_MAX_TINS];
+	struct tc_cake_traffic_stats backlog   [TC_CAKE_MAX_TINS];
+	__u32 interval_us      [TC_CAKE_MAX_TINS];
+	__u32 way_indirect_hits[TC_CAKE_MAX_TINS];
+	__u32 way_misses       [TC_CAKE_MAX_TINS];
+	__u32 way_collisions   [TC_CAKE_MAX_TINS];
+	__u32 peak_delay_us    [TC_CAKE_MAX_TINS]; /* ~= delay to bulk flows */
+	__u32 avge_delay_us    [TC_CAKE_MAX_TINS];
+	__u32 base_delay_us    [TC_CAKE_MAX_TINS]; /* ~= delay to sparse flows */
+	__u16 sparse_flows     [TC_CAKE_MAX_TINS];
+	__u16 bulk_flows       [TC_CAKE_MAX_TINS];
+	__u16 unresponse_flows [TC_CAKE_MAX_TINS]; /* v4 - was u32 last_len  */
+	__u16 spare            [TC_CAKE_MAX_TINS]; /* v4 - split last_len */
+	__u32 max_skblen       [TC_CAKE_MAX_TINS];
+	__u32 capacity_estimate;  /* version 2 */
+	__u32 memory_limit;       /* version 3 */
+	__u32 memory_used;        /* version 3 */
+};
+
 #endif
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -63,6 +63,7 @@ TCMODULES += q_codel.o
 TCMODULES += q_fq_codel.o
 TCMODULES += q_fq.o
 TCMODULES += q_pie.o
+TCMODULES += q_cake.o
 TCMODULES += q_hhf.o
 TCMODULES += e_bpf.o
 
--- /dev/null
+++ b/tc/q_cake.c
@@ -0,0 +1,641 @@
+/*
+ * Common Applications Kept Enhanced  --  CAKE
+ *
+ *  Copyright (C) 2014-2015 Jonathan Morton <chromatix99@gmail.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions, and the following disclaimer,
+ *    without modification.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. The names of the authors may not be used to endorse or promote products
+ *    derived from this software without specific prior written permission.
+ *
+ * Alternatively, provided that this notice is retained in full, this
+ * software may be distributed under the terms of the GNU General
+ * Public License ("GPL") version 2, in which case the provisions of the
+ * GPL apply INSTEAD OF those given above.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
+ * DAMAGE.
+ *
+ */
+
+#include <stddef.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... cake [ bandwidth RATE | unlimited* | autorate_ingress ]\n"
+	                "                [ rtt TIME | datacentre | lan | metro | regional | internet* | oceanic | satellite | interplanetary ]\n"
+	                "                [ besteffort | precedence | diffserv8 | diffserv4* ]\n"
+	                "                [ flowblind | srchost | dsthost | hosts | flows* | dual-srchost | dual-dsthost | triple-isolate ] [ nat | nonat* ]\n"
+	                "                [ ptm | atm | noatm* ] [ overhead N | conservative | raw* ]\n"
+	                "                [ memlimit LIMIT ]\n"
+	                "    (* marks defaults)\n");
+}
+
+static int cake_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			      struct nlmsghdr *n)
+{
+	int unlimited = 0;
+	unsigned bandwidth = 0;
+	unsigned interval = 0;
+	unsigned target = 0;
+	unsigned diffserv = 0;
+	unsigned memlimit = 0;
+	int  overhead = 0;
+	bool overhead_set = false;
+	int flowmode = -1;
+	int nat = -1;
+	int atm = -1;
+	int autorate = -1;
+	struct rtattr *tail;
+
+	while (argc > 0) {
+		if (strcmp(*argv, "bandwidth") == 0) {
+			NEXT_ARG();
+			if (get_rate(&bandwidth, *argv)) {
+				fprintf(stderr, "Illegal \"bandwidth\"\n");
+				return -1;
+			}
+			unlimited = 0;
+			autorate = 0;
+		} else if (strcmp(*argv, "unlimited") == 0) {
+			bandwidth = 0;
+			unlimited = 1;
+			autorate = 0;
+		} else if (strcmp(*argv, "autorate_ingress") == 0) {
+			autorate = 1;
+
+		} else if (strcmp(*argv, "rtt") == 0) {
+			NEXT_ARG();
+			if (get_time(&interval, *argv)) {
+				fprintf(stderr, "Illegal \"rtt\"\n");
+				return -1;
+			}
+			target = interval / 20;
+			if(!target)
+				target = 1;
+		} else if (strcmp(*argv, "datacentre") == 0) {
+			interval = 100;
+			target   =   5;
+		} else if (strcmp(*argv, "lan") == 0) {
+			interval = 1000;
+			target   =   50;
+		} else if (strcmp(*argv, "metro") == 0) {
+			interval = 10000;
+			target   =   500;
+		} else if (strcmp(*argv, "regional") == 0) {
+			interval = 30000;
+			target    = 1500;
+		} else if (strcmp(*argv, "internet") == 0) {
+			interval = 100000;
+			target   =   5000;
+		} else if (strcmp(*argv, "oceanic") == 0) {
+			interval = 300000;
+			target   =  15000;
+		} else if (strcmp(*argv, "satellite") == 0) {
+			interval = 1000000;
+			target   =   50000;
+		} else if (strcmp(*argv, "interplanetary") == 0) {
+			interval = 3600000000U;
+			target   =       5000;
+
+		} else if (strcmp(*argv, "besteffort") == 0) {
+			diffserv = 1;
+		} else if (strcmp(*argv, "precedence") == 0) {
+			diffserv = 2;
+		} else if (strcmp(*argv, "diffserv8") == 0) {
+			diffserv = 3;
+		} else if (strcmp(*argv, "diffserv4") == 0) {
+			diffserv = 4;
+		} else if (strcmp(*argv, "diffserv") == 0) {
+			diffserv = 4;
+		} else if (strcmp(*argv, "diffserv-llt") == 0) {
+			diffserv = 5;
+
+		} else if (strcmp(*argv, "flowblind") == 0) {
+			flowmode = 0;
+		} else if (strcmp(*argv, "srchost") == 0) {
+			flowmode = 1;
+		} else if (strcmp(*argv, "dsthost") == 0) {
+			flowmode = 2;
+		} else if (strcmp(*argv, "hosts") == 0) {
+			flowmode = 3;
+		} else if (strcmp(*argv, "flows") == 0) {
+			flowmode = 4;
+		} else if (strcmp(*argv, "dual-srchost") == 0) {
+			flowmode = 5;
+		} else if (strcmp(*argv, "dual-dsthost") == 0) {
+			flowmode = 6;
+		} else if (strcmp(*argv, "triple-isolate") == 0) {
+			flowmode = 7;
+
+		} else if (strcmp(*argv, "nat") == 0) {
+			nat = 1;
+		} else if (strcmp(*argv, "nonat") == 0) {
+			nat = 0;
+
+		} else if (strcmp(*argv, "ptm") == 0) {
+			atm = 2;
+		} else if (strcmp(*argv, "atm") == 0) {
+			atm = 1;
+		} else if (strcmp(*argv, "noatm") == 0) {
+			atm = 0;
+
+		} else if (strcmp(*argv, "raw") == 0) {
+			atm = 0;
+			overhead = 0;
+			overhead_set = true;
+		} else if (strcmp(*argv, "conservative") == 0) {
+			/*
+			 * Deliberately over-estimate overhead:
+			 * one whole ATM cell plus ATM framing.
+			 * A safe choice if the actual overhead is unknown.
+			 */
+			atm = 1;
+			overhead = 48;
+			overhead_set = true;
+
+		/*
+		 * DOCSIS overhead figures courtesy of Greg White @ CableLabs.
+		 * The "-ip" versions include the Ethernet frame header, in case
+		 * you are shaping an IP interface instead of an Ethernet one.
+		 */
+		} else if (strcmp(*argv, "docsis-downstream-ip") == 0) {
+			atm = 0;
+			overhead += 35;
+			overhead_set = true;
+		} else if (strcmp(*argv, "docsis-downstream") == 0) {
+			atm = 0;
+			overhead += 35 - 14;
+			overhead_set = true;
+		} else if (strcmp(*argv, "docsis-upstream-ip") == 0) {
+			atm = 0;
+			overhead += 28;
+			overhead_set = true;
+		} else if (strcmp(*argv, "docsis-upstream") == 0) {
+			atm = 0;
+			overhead += 28 - 14;
+			overhead_set = true;
+
+		/* Various ADSL framing schemes, all over ATM cells */
+		} else if (strcmp(*argv, "ipoa-vcmux") == 0) {
+			atm = 1;
+			overhead += 8;
+			overhead_set = true;
+		} else if (strcmp(*argv, "ipoa-llcsnap") == 0) {
+			atm = 1;
+			overhead += 16;
+			overhead_set = true;
+		} else if (strcmp(*argv, "bridged-vcmux") == 0) {
+			atm = 1;
+			overhead += 24;
+			overhead_set = true;
+		} else if (strcmp(*argv, "bridged-llcsnap") == 0) {
+			atm = 1;
+			overhead += 32;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoa-vcmux") == 0) {
+			atm = 1;
+			overhead += 10;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoa-llc") == 0) {
+			atm = 1;
+			overhead += 14;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoe-vcmux") == 0) {
+			atm = 1;
+			overhead += 32;
+			overhead_set = true;
+		} else if (strcmp(*argv, "pppoe-llcsnap") == 0) {
+			atm = 1;
+			overhead += 40;
+			overhead_set = true;
+
+		/* Typical VDSL2 framing schemes, both over PTM */
+		/* PTM has 64b/65b coding which absorbs some bandwidth */
+		} else if (strcmp(*argv, "pppoe-ptm") == 0) {
+			atm = 2;
+			overhead += 27;
+		} else if (strcmp(*argv, "bridged-ptm") == 0) {
+			atm = 2;
+			overhead += 19;
+
+		} else if (strcmp(*argv, "via-ethernet") == 0) {
+			/*
+			 * The above overheads are relative to an IP packet,
+			 * but Linux includes Ethernet framing overhead already
+			 * if we are shaping an Ethernet interface rather than
+			 * an IP interface.
+			 */
+			overhead -= 14;
+			overhead_set = true;
+
+		/* Additional Ethernet-related overheads used by some ISPs */
+		} else if (strcmp(*argv, "ether-phy") == 0) {
+			/* ethernet pre-amble & interframe gap 20 bytes
+			 * Linux will have already accounted for MACs & frame type 14 bytes
+			 * you probably want to add an FCS as well*/
+			overhead += 20;
+			overhead_set = true;
+		} else if (strcmp(*argv, "ether-all") == 0) {
+			/* ethernet pre-amble & interframe gap & FCS
+			 * Linux will have already accounted for MACs & frame type 14 bytes
+			 * you may need to add vlan tag*/
+			overhead += 24;
+			overhead_set = true;
+
+		} else if (strcmp(*argv, "ether-fcs") == 0) {
+			/* Frame Check Sequence */
+			/* we ignore the minimum frame size, because IP packets usually meet it */
+			overhead += 4;
+			overhead_set = true;
+		} else if (strcmp(*argv, "ether-vlan") == 0) {
+			/* 802.1q VLAN tag - may be repeated */
+			overhead += 4;
+			overhead_set = true;
+
+		} else if (strcmp(*argv, "overhead") == 0) {
+			char* p = NULL;
+			NEXT_ARG();
+			overhead = strtol(*argv, &p, 10);
+			if(!p || *p || !*argv || overhead < -64 || overhead > 256) {
+				fprintf(stderr, "Illegal \"overhead\", valid range is -64 to 256\\n");
+				return -1;
+			}
+			overhead_set = true;
+
+		} else if (strcmp(*argv, "memlimit") == 0) {
+			NEXT_ARG();
+			if(get_size(&memlimit, *argv)) {
+				fprintf(stderr, "Illegal value for \"memlimit\": \"%s\"\n", *argv);
+				return -1;
+			}
+
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	if (bandwidth || unlimited)
+		addattr_l(n, 1024, TCA_CAKE_BASE_RATE, &bandwidth, sizeof(bandwidth));
+	if (diffserv)
+		addattr_l(n, 1024, TCA_CAKE_DIFFSERV_MODE, &diffserv, sizeof(diffserv));
+	if (atm != -1)
+		addattr_l(n, 1024, TCA_CAKE_ATM, &atm, sizeof(atm));
+	if (flowmode != -1)
+		addattr_l(n, 1024, TCA_CAKE_FLOW_MODE, &flowmode, sizeof(flowmode));
+	if (overhead_set)
+		addattr_l(n, 1024, TCA_CAKE_OVERHEAD, &overhead, sizeof(overhead));
+	if (interval)
+		addattr_l(n, 1024, TCA_CAKE_RTT, &interval, sizeof(interval));
+	if (target)
+		addattr_l(n, 1024, TCA_CAKE_TARGET, &target, sizeof(target));
+	if (autorate != -1)
+		addattr_l(n, 1024, TCA_CAKE_AUTORATE, &autorate, sizeof(autorate));
+	if (memlimit)
+		addattr_l(n, 1024, TCA_CAKE_MEMORY, &memlimit, sizeof(memlimit));
+	if (nat != -1)
+		addattr_l(n, 1024, TCA_CAKE_NAT, &nat, sizeof(nat));
+
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+
+static int cake_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CAKE_MAX + 1];
+	unsigned bandwidth = 0;
+	unsigned diffserv = 0;
+	unsigned flowmode = 0;
+	unsigned interval = 0;
+	unsigned memlimit = 0;
+	int overhead = 0;
+	int atm = 0;
+	int nat = 0;
+	int autorate = 0;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CAKE_MAX, opt);
+
+	if (tb[TCA_CAKE_BASE_RATE] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_BASE_RATE]) >= sizeof(__u32)) {
+		bandwidth = rta_getattr_u32(tb[TCA_CAKE_BASE_RATE]);
+		if(bandwidth)
+			fprintf(f, "bandwidth %s ", sprint_rate(bandwidth, b1));
+		else
+			fprintf(f, "unlimited ");
+	}
+	if (tb[TCA_CAKE_AUTORATE] &&
+		RTA_PAYLOAD(tb[TCA_CAKE_AUTORATE]) >= sizeof(__u32)) {
+		autorate = rta_getattr_u32(tb[TCA_CAKE_AUTORATE]);
+		if(autorate == 1)
+			fprintf(f, "autorate_ingress ");
+		else if(autorate)
+			fprintf(f, "(?autorate?) ");
+	}
+	if (tb[TCA_CAKE_DIFFSERV_MODE] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_DIFFSERV_MODE]) >= sizeof(__u32)) {
+		diffserv = rta_getattr_u32(tb[TCA_CAKE_DIFFSERV_MODE]);
+		switch(diffserv) {
+		case 1:
+			fprintf(f, "besteffort ");
+			break;
+		case 2:
+			fprintf(f, "precedence ");
+			break;
+		case 3:
+			fprintf(f, "diffserv8 ");
+			break;
+		case 4:
+			fprintf(f, "diffserv4 ");
+			break;
+		case 5:
+			fprintf(f, "diffserv-llt ");
+			break;
+		default:
+			fprintf(f, "(?diffserv?) ");
+			break;
+		};
+	}
+	if (tb[TCA_CAKE_FLOW_MODE] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_FLOW_MODE]) >= sizeof(__u32)) {
+		flowmode = rta_getattr_u32(tb[TCA_CAKE_FLOW_MODE]);
+		nat = !!(flowmode & 64);
+		flowmode &= ~64;
+		switch(flowmode) {
+		case 0:
+			fprintf(f, "flowblind ");
+			break;
+		case 1:
+			fprintf(f, "srchost ");
+			break;
+		case 2:
+			fprintf(f, "dsthost ");
+			break;
+		case 3:
+			fprintf(f, "hosts ");
+			break;
+		case 4:
+			fprintf(f, "flows ");
+			break;
+		case 5:
+			fprintf(f, "dual-srchost ");
+			break;
+		case 6:
+			fprintf(f, "dual-dsthost ");
+			break;
+		case 7:
+			fprintf(f, "triple-isolate ");
+			break;
+		default:
+			fprintf(f, "(?flowmode?) ");
+			break;
+		};
+
+		if(nat)
+			fprintf(f, "nat ");
+	}
+	if (tb[TCA_CAKE_ATM] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_ATM]) >= sizeof(__u32)) {
+		atm = rta_getattr_u32(tb[TCA_CAKE_ATM]);
+	}
+	if (tb[TCA_CAKE_OVERHEAD] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_OVERHEAD]) >= sizeof(__u32)) {
+		overhead = rta_getattr_u32(tb[TCA_CAKE_OVERHEAD]);
+	}
+	if (tb[TCA_CAKE_RTT] &&
+	    RTA_PAYLOAD(tb[TCA_CAKE_RTT]) >= sizeof(__u32)) {
+		interval = rta_getattr_u32(tb[TCA_CAKE_RTT]);
+	}
+
+	if (interval)
+		fprintf(f, "rtt %s ", sprint_time(interval, b2));
+
+	if (atm == 1)
+		fprintf(f, "atm ");
+	else if (atm == 2)
+		fprintf(f, "ptm ");
+	else if (overhead)
+		fprintf(f, "noatm ");
+
+	if (overhead || atm)
+		fprintf(f, "overhead %d ", overhead);
+
+	if (!atm && !overhead)
+		fprintf(f, "raw ");
+
+	if (memlimit)
+		fprintf(f, "memlimit %s", sprint_size(memlimit, b1));
+
+	return 0;
+}
+
+static int cake_print_xstats(struct qdisc_util *qu, FILE *f,
+				 struct rtattr *xstats)
+{
+	/* fq_codel stats format borrowed */
+	struct tc_fq_codel_xstats *st;
+	struct tc_cake_xstats     *stnc;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+
+	if (xstats == NULL)
+		return 0;
+
+	if (RTA_PAYLOAD(xstats) < sizeof(st->type))
+		return -1;
+
+	st   = RTA_DATA(xstats);
+	stnc = RTA_DATA(xstats);
+
+	if (st->type == TCA_FQ_CODEL_XSTATS_QDISC && RTA_PAYLOAD(xstats) >= sizeof(*st)) {
+		fprintf(f, "  maxpacket %u drop_overlimit %u new_flow_count %u ecn_mark %u",
+			st->qdisc_stats.maxpacket,
+			st->qdisc_stats.drop_overlimit,
+			st->qdisc_stats.new_flow_count,
+			st->qdisc_stats.ecn_mark);
+		fprintf(f, "\n  new_flows_len %u old_flows_len %u",
+			st->qdisc_stats.new_flows_len,
+			st->qdisc_stats.old_flows_len);
+	} else if (st->type == TCA_FQ_CODEL_XSTATS_CLASS && RTA_PAYLOAD(xstats) >= sizeof(*st)) {
+		fprintf(f, "  deficit %d count %u lastcount %u ldelay %s",
+			st->class_stats.deficit,
+			st->class_stats.count,
+			st->class_stats.lastcount,
+			sprint_time(st->class_stats.ldelay, b1));
+		if (st->class_stats.dropping) {
+			fprintf(f, " dropping");
+			if (st->class_stats.drop_next < 0)
+				fprintf(f, " drop_next -%s",
+					sprint_time(-st->class_stats.drop_next, b1));
+			else
+				fprintf(f, " drop_next %s",
+					sprint_time(st->class_stats.drop_next, b1));
+		}
+	} else if (stnc->version >= 1 && stnc->version < 0xFF
+				&& stnc->max_tins == TC_CAKE_MAX_TINS
+				&& RTA_PAYLOAD(xstats) >= offsetof(struct tc_cake_xstats, capacity_estimate))
+	{
+		int i;
+
+		if(stnc->version >= 3)
+			fprintf(f, " memory used: %s of %s\n", sprint_size(stnc->memory_used, b1), sprint_size(stnc->memory_limit, b2));
+
+		if(stnc->version >= 2)
+			fprintf(f, " capacity estimate: %s\n", sprint_rate(stnc->capacity_estimate, b1));
+
+		switch(stnc->tin_cnt) {
+		case 4:
+			fprintf(f, "                 Bulk   Best Effort      Video       Voice\n");
+			break;
+
+		case 5:
+			fprintf(f, "              Low Loss  Best Effort   Low Delay       Bulk  Net Control\n");
+			break;
+
+		default:
+			fprintf(f, "          ");
+			for(i=0; i < stnc->tin_cnt; i++)
+				fprintf(f, "       Tin %u", i);
+			fprintf(f, "\n");
+		};
+
+		fprintf(f, "  thresh  ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_rate(stnc->threshold_rate[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  target  ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->target_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  interval");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->interval_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  pk_delay");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->peak_delay_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  av_delay");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->avge_delay_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  sp_delay");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12s", sprint_time(stnc->base_delay_us[i], b1));
+		fprintf(f, "\n");
+
+		fprintf(f, "  pkts    ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->sent[i].packets);
+		fprintf(f, "\n");
+
+		fprintf(f, "  bytes   ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12llu", stnc->sent[i].bytes);
+		fprintf(f, "\n");
+
+		fprintf(f, "  way_inds");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->way_indirect_hits[i]);
+		fprintf(f, "\n");
+
+		fprintf(f, "  way_miss");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->way_misses[i]);
+		fprintf(f, "\n");
+
+		fprintf(f, "  way_cols");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->way_collisions[i]);
+		fprintf(f, "\n");
+
+		fprintf(f, "  drops   ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->dropped[i].packets);
+		fprintf(f, "\n");
+
+		fprintf(f, "  marks   ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->ecn_marked[i].packets);
+		fprintf(f, "\n");
+
+		fprintf(f, "  sp_flows");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->sparse_flows[i]);
+		fprintf(f, "\n");
+
+		fprintf(f, "  bk_flows");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->bulk_flows[i]);
+		fprintf(f, "\n");
+
+		if(stnc->version >= 4) {
+			fprintf(f, "  un_flows");
+			for(i=0; i < stnc->tin_cnt; i++)
+				fprintf(f, "%12u", stnc->unresponse_flows[i]);
+			fprintf(f, "\n");
+		}
+
+		fprintf(f, "  max_len ");
+		for(i=0; i < stnc->tin_cnt; i++)
+			fprintf(f, "%12u", stnc->max_skblen[i]);
+		fprintf(f, "\n");
+	} else {
+		return -1;
+	}
+	return 0;
+}
+
+struct qdisc_util cake_qdisc_util = {
+	.id		= "cake",
+	.parse_qopt	= cake_parse_opt,
+	.print_qopt	= cake_print_opt,
+	.print_xstats	= cake_print_xstats,
+};

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-28  4:38           ` Kevin Darbyshire-Bryant
@ 2016-09-28  5:08             ` Noah Causin
  0 siblings, 0 replies; 29+ messages in thread
From: Noah Causin @ 2016-09-28  5:08 UTC (permalink / raw)
  To: cake

[-- Attachment #1: Type: text/plain, Size: 858 bytes --]

Thank you.


On 9/28/2016 12:38 AM, Kevin Darbyshire-Bryant wrote:
>
>
> On 27/09/16 21:40, Noah Causin wrote:
>> Thank you for all your help.
>>
>> The de-nat with dual-flow isolation works great.  I tested it
>> simultaneously with two separate virtual machines, one running a Flent
>> 50 flows download test and the other running a Flent 8 flows download
>> test.  Throughput was even between the machines, and the latency was 
>> great.
>>
>> Noah Causin
>
> A new tc patch for when you update to Jonathan's re-working of things 
> found at https://github.com/dtaht/sch_cake/commits/cobalt
>
> I'll try to push this into LEDE properly in a few days unless somebody 
> finds a nasty lurking :-)
>
> Kevin
>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


[-- Attachment #2: Type: text/html, Size: 1894 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-27 23:08 ` Jonathan Morton
  2016-09-28  2:56   ` Kevin Darbyshire-Bryant
@ 2016-09-28  5:56   ` Sebastian Moeller
  1 sibling, 0 replies; 29+ messages in thread
From: Sebastian Moeller @ 2016-09-28  5:56 UTC (permalink / raw)
  To: Jonathan Morton, Kevin Darbyshire-Bryant; +Cc: cake

Hi Jonathan,


On September 28, 2016 1:08:04 AM GMT+02:00, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 26 Sep, 2016, at 06:20, Kevin Darbyshire-Bryant
><kevin@darbyshire-bryant.me.uk> wrote:
>> 
>> Another github user 'tegularius' presented some beautifully crafted
>code that did the lookups in a much neater way.  Originally it too had
>an 'ingress' lookup problem.  This was worked on and I hacked some
>conditional 'denat' options into cake & tc.
>> 
>> For your 'delight' a denat cake
>https://github.com/kdarbyshirebryant/sch_cake/tree/natoptions along
>with a matching tc
>https://github.com/kdarbyshirebryant/tc-adv/tree/denat
>
>As I’m now at the stage of trying to merge this, I’m going to make some
>executive design decisions:
>
>- De-NAT IPv4 packets only.  I think it’s safe to assume that IPv6 NAT
>will be rare, and in any case will typically preserve host
>distinctions.  This eliminates switch blocks in favour of simple if
>blocks.

      Famous last words.... I believe it is a bit premature to predict how IPv6 is going to be rolled out, you might be right, but I believe this to be one of the policy decisions (like ECN) that should be left to the users. Feel free to disagree...


>
>- Don’t bother with the distinction between src-NAT and dst-NAT
>lookups.  The full lookup has to be done anyway and then masked off,
>the use-case for the limited functionality is nebulous, and all we’re
>doing is adding a lot of nasty conditional branches to the fast path.
>
>This in turn reduces the configuration interface for the feature to a
>flag, which I’ll call “nat”.

       What about turning this around and make the option no-deNAT so the default is to do the right thing for most users. Also is there a way to detect, which features a given cake supports? That would be nice for sqm-scripts...


>
> - Jonathan Morton
>
>_______________________________________________
>Cake mailing list
>Cake@lists.bufferbloat.net
>https://lists.bufferbloat.net/listinfo/cake

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-28  3:06     ` Jonathan Morton
  2016-09-28  3:33       ` Kevin Darbyshire-Bryant
@ 2016-09-28  6:07       ` Kevin Darbyshire-Bryant
  2016-09-28 11:08         ` Kevin Darbyshire-Bryant
  1 sibling, 1 reply; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-28  6:07 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

Two buglets found:

in sch_cake - the atm/ptm flag options are not passed back to tc 
userspace correctly - ptm isn't sent back.

in tc/q_cake - the additional pre-set ptm+overhead options don't set 
'overhead_set' so the overhead doesn't get used.

On 28/09/16 04:06, Jonathan Morton wrote:
>
>> On 28 Sep, 2016, at 05:56, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>
>> Does this need to be another variable/parameter or could it be the next bit along in the flow type?
>
> I’ve already pushed it to the ‘cobalt’ branch, so you can see how I’ve done it and start testing.  I’ve verified that it compiles, no more than that so far.
>
> For configuration, there is a separate flag parameter passed.  Internally, I’ve used another bit of the existing flow_mode field (but not the next one along).  The latter is also how the configuration is read back out again to tc.
>
> Overall, the patch ended up much smaller than the original.  Switch statements in C are actually quite verbose.
>
>  - Jonathan Morton
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-28  6:07       ` Kevin Darbyshire-Bryant
@ 2016-09-28 11:08         ` Kevin Darbyshire-Bryant
  2016-09-28 11:49           ` Dave Taht
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-28 11:08 UTC (permalink / raw)
  To: cake



On 28/09/16 07:07, Kevin Darbyshire-Bryant wrote:
> Two buglets found:
>
> in sch_cake - the atm/ptm flag options are not passed back to tc
> userspace correctly - ptm isn't sent back.

Just fixed that & pushed.... don't forget to pull :-)



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-28 11:08         ` Kevin Darbyshire-Bryant
@ 2016-09-28 11:49           ` Dave Taht
  2016-09-28 14:11             ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Taht @ 2016-09-28 11:49 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

From a multiple IP perspective, at least on egress through a switch,
you could hash on the mac address instead of the IP...

/me hides

On Wed, Sep 28, 2016 at 4:08 AM, Kevin Darbyshire-Bryant
<kevin@darbyshire-bryant.me.uk> wrote:
>
>
> On 28/09/16 07:07, Kevin Darbyshire-Bryant wrote:
>>
>> Two buglets found:
>>
>> in sch_cake - the atm/ptm flag options are not passed back to tc
>> userspace correctly - ptm isn't sent back.
>
>
> Just fixed that & pushed.... don't forget to pull :-)
>
>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Cake] de-natting & host fairness
  2016-09-28 11:49           ` Dave Taht
@ 2016-09-28 14:11             ` Kevin Darbyshire-Bryant
  0 siblings, 0 replies; 29+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-09-28 14:11 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake



On 28/09/16 12:49, Dave Taht wrote:
> From a multiple IP perspective, at least on egress through a switch,
> you could hash on the mac address instead of the IP...
>
> /me hides

^^^^^^^   Yes!  Back to your spam trap :-)


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2016-09-28 14:11 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-26  3:20 [Cake] de-natting & host fairness Kevin Darbyshire-Bryant
2016-09-26  3:54 ` Dave Taht
2016-09-26  5:11   ` Dave Taht
2016-09-26  8:54 ` moeller0
2016-09-26 13:02   ` Kevin Darbyshire-Bryant
2016-09-26 13:28     ` moeller0
2016-09-26 14:06       ` Kevin Darbyshire-Bryant
2016-09-26 14:30       ` Jonathan Morton
2016-09-26 15:23         ` moeller0
2016-09-27  1:52 ` Noah Causin
2016-09-27  2:32   ` Kevin Darbyshire-Bryant
2016-09-27  4:20     ` Noah Causin
2016-09-27 14:52     ` Noah Causin
2016-09-27 15:28       ` Kevin Darbyshire-Bryant
2016-09-27 20:40         ` Noah Causin
2016-09-27 20:44           ` Jonathan Morton
     [not found]           ` <CAA93jw6rPE8aAGEiqf7jp3hc1J0ThrVer8PFmFLPBqANdtEixg@mail.gmail.com>
2016-09-27 20:58             ` Noah Causin
2016-09-28  4:38           ` Kevin Darbyshire-Bryant
2016-09-28  5:08             ` Noah Causin
2016-09-27 23:08 ` Jonathan Morton
2016-09-28  2:56   ` Kevin Darbyshire-Bryant
2016-09-28  3:06     ` Jonathan Morton
2016-09-28  3:33       ` Kevin Darbyshire-Bryant
2016-09-28  3:49         ` Jonathan Morton
2016-09-28  6:07       ` Kevin Darbyshire-Bryant
2016-09-28 11:08         ` Kevin Darbyshire-Bryant
2016-09-28 11:49           ` Dave Taht
2016-09-28 14:11             ` Kevin Darbyshire-Bryant
2016-09-28  5:56   ` Sebastian Moeller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox