[Cake] cake exploration

Cake - FQ_codel the next generation
 help / color / mirror / Atom feed

* [Cake] cake exploration
@ 2015-04-11 18:44 Dave Taht
  2015-04-11 18:45 ` Dave Taht
  2015-04-12 22:44 ` Jonathan Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Taht @ 2015-04-11 18:44 UTC (permalink / raw)
  To: cake

Stuff on my backlog of researchy stuff.

1) cake_drop_monitor - I wanted a way to throw drop AND mark
notifications up to userspace,
including the packet´s time of entry and the time of drop, as well as
the IP headers
and next hop destination macaddr.

There are many use cases for this:

A)  - testing the functionality of the algorithm and being able to
collect and analyze drops as  they happen.

NET_DROP_MONITOR did not cut it but I have not looked at it in a year.
It drives me crazy to be dropping packets all over the system and to
not be able to track down where they happened.

This is the primary reason why I had switched back to 64 bit timestamps, btw.

B) Having the drop notifications might be useful in tuning or steering
traffic to different routes.

C) It is way easier to do a graph of the drop pattern with this info
thrown to userspace.

2) Dearly wanted to actually be doing the timestamping AND hashing in
the native skb
struct on entry to the system itself, not the qdisc. Measuring the
latency from ingress from the
wire to egress would result in much better cpu overload behavior. I am
totally aware of
how much mainline linux would not take this option, but things have
evolved over there, so
leveraging the rxhash and skb->timestamp fields seems a possibility...

I think this would let us get along better with netem also, but would
have to go look again.

Call that cake-rxhash. :)

3) In my benchmark of the latest cake3, ecn traffic was not as good as
expected, but that might have been an anomoly of the test. Need to
test ecn thoroughly this time, almost in preference to looking at drop
behavior. Toke probably has ecn off by default right now. On, after
this test run?

4) Testing higher rates and looking at cwnd for codel is important.
The dropoff toke noted in his paper is real. Also there is possibly
some ideal ratio between number of flows and bandwidth that makes more
sense than a fixed number of flows. Also I keep harping on the darn
resumption algo... but need to test with lousier tcps like windows.

5) Byte Mode-ish handling

Dropping a single 64 byte packet does little good. You will find in
the 50 flow tests that a ton of traffic is acks, not being dropped,
and pie does better in this case than does fq, as it shoots
wildly at everything, but usually misses the fat packets, where DRR
will merrily store up an entire
MTU worth of useless acks when only one is needed.

So just trying to drop more little packets might be helpful in some cases.

6) Ack thinning. I gave what is conventionally called "stretch acks" a
new name, as stretch acks
have a deserved reputation as sucking. Well, they dont suck anymore in
linux, and what I was
mostly thinking was to drop no more than 2 in a row...

One thing this would help with is in packing wifi aggregates - which
have hard limits on the number of packets in a TXOP (42), and a byte
limit on wireless n of 64k. Sending 41 acks from
one flow, when you could send the last 2, seems like a big win on
packing a TXOP.

(this is something eric proposed, and given the drop rates we now see
from wifi and the wild and wooly internet I am inclined to agree that
it is worth fiddling with)

(I am not huge on it, though)

7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
an internal, switched network, it  would be better to try and maximize
the port usage rather than the 5 tuple in some cases.

I have never got around to writing a mac hash I liked, my goal
originally was to write one that found a minimal perfect hash solution
eventually as mac addrs tend to be pretty stable on a network and
rarely change.

Warning: minimal perfect hash attempts are a wet paint thing! I really
want a FPGA solver for them.... dont go play with the code out there,
you will lose days to it... you have been warned.

http://cmph.sourceforge.net/concepts.html

I would like there to be a generic mac hashing thing in tc, actually.

8) Parallel FIB lookup

IF you assume that you have tons of queues routing packets from
ingress to egress, on tons of cpus, you can actually do the FIB lookup
in parallel also. There is some old stuff on virtualqueue
and virtual clock fqing which makes for tighter

9) Need a codel *library* that works at the mac80211 layer. I think
codel*.h sufficies but am not sure. And for that matter, codel itself
seems like it would need a calculated target and a few other thing to
work right on wifi.

As for the hashing...

Personally I do not think that the 8 way set associative has is what
wifi needs for cake, I tend to think we need to "pack" aggregates with
as many different flows as possible, and randomize how we packet
them... I think.... maybe....

10) I really dont like BQL with multi-queued hardware queues. More
backpressure is needed in that case than we get.

11) GRO peeling

Offloads suck

-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Cake] cake exploration
  2015-04-11 18:44 [Cake] cake exploration Dave Taht
@ 2015-04-11 18:45 ` Dave Taht
  2015-04-11 18:47   ` Dave Taht
  2015-04-12 22:44 ` Jonathan Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Taht @ 2015-04-11 18:45 UTC (permalink / raw)
  To: cake

12) Better starting interval and target for codel´s maintence vars in
relationship to existing flows

Right now sch_fq, sch_pie give priority to flows in their first IW
phases. This makes them vulnerable to DDOS attacks with tons of new
flows.

sch_fq_codel mitigates this somewhat by starting to hash flows into
the same buckets.

sch_cake´s more perfect hashing gives IW more of a boost.

A thought was to do a combined ewma of all active flows and to hand
their current codel settings to new flows as they arrive, with less of
a boost.

This MIGHT work better when you have short RTTs generally on local
networks. Other thoughts appreciated.

There is another related problem in the resumption portion of the
algorithm as the decay of the existing state variables is arbitrary
and way too long in some cases. I think I had solved this by coming up
with an estimate for the amount of decay needed other than count - 2,
doing a calculation from the last time a flow had packets to the next,
but can´t remember how I did it! It is easy if you have a last time
per queue and use a normal sqrt with a divide... but my brain crashes
at the reciprocal cache math we have instead....

I am not allergic to a divide. I am not allergic to using a shift for
the target and calculating the interval only relative to bandwidth, as
mentioned elsewhere. At 64k worth of bandwidth we just end up with a
huge interval, no big deal. But plan to ride along with the two
separately for now.

13)  It might be possible to write a faster codel - and easier to read
by using a case statement on the 2 core variables in it. The current
code does not show the 3 way state machine as well as that could, and
for all I know there is something intelligent we could do with the 4th
state.

On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht <dave.taht@gmail.com> wrote:
> Stuff on my backlog of researchy stuff.
>
> 1) cake_drop_monitor - I wanted a way to throw drop AND mark
> notifications up to userspace,
> including the packet´s time of entry and the time of drop, as well as
> the IP headers
> and next hop destination macaddr.
>
> There are many use cases for this:
>
> A)  - testing the functionality of the algorithm and being able to
> collect and analyze drops as  they happen.
>
> NET_DROP_MONITOR did not cut it but I have not looked at it in a year.
> It drives me crazy to be dropping packets all over the system and to
> not be able to track down where they happened.
>
> This is the primary reason why I had switched back to 64 bit timestamps, btw.
>
> B) Having the drop notifications might be useful in tuning or steering
> traffic to different routes.
>
> C) It is way easier to do a graph of the drop pattern with this info
> thrown to userspace.
>
> 2) Dearly wanted to actually be doing the timestamping AND hashing in
> the native skb
> struct on entry to the system itself, not the qdisc. Measuring the
> latency from ingress from the
> wire to egress would result in much better cpu overload behavior. I am
> totally aware of
> how much mainline linux would not take this option, but things have
> evolved over there, so
> leveraging the rxhash and skb->timestamp fields seems a possibility...
>
> I think this would let us get along better with netem also, but would
> have to go look again.
>
> Call that cake-rxhash. :)
>
> 3) In my benchmark of the latest cake3, ecn traffic was not as good as
> expected, but that might have been an anomoly of the test. Need to
> test ecn thoroughly this time, almost in preference to looking at drop
> behavior. Toke probably has ecn off by default right now. On, after
> this test run?
>
> 4) Testing higher rates and looking at cwnd for codel is important.
> The dropoff toke noted in his paper is real. Also there is possibly
> some ideal ratio between number of flows and bandwidth that makes more
> sense than a fixed number of flows. Also I keep harping on the darn
> resumption algo... but need to test with lousier tcps like windows.
>
> 5) Byte Mode-ish handling
>
> Dropping a single 64 byte packet does little good. You will find in
> the 50 flow tests that a ton of traffic is acks, not being dropped,
> and pie does better in this case than does fq, as it shoots
> wildly at everything, but usually misses the fat packets, where DRR
> will merrily store up an entire
> MTU worth of useless acks when only one is needed.
>
> So just trying to drop more little packets might be helpful in some cases.
>
> 6) Ack thinning. I gave what is conventionally called "stretch acks" a
> new name, as stretch acks
> have a deserved reputation as sucking. Well, they dont suck anymore in
> linux, and what I was
> mostly thinking was to drop no more than 2 in a row...
>
> One thing this would help with is in packing wifi aggregates - which
> have hard limits on the number of packets in a TXOP (42), and a byte
> limit on wireless n of 64k. Sending 41 acks from
> one flow, when you could send the last 2, seems like a big win on
> packing a TXOP.
>
> (this is something eric proposed, and given the drop rates we now see
> from wifi and the wild and wooly internet I am inclined to agree that
> it is worth fiddling with)
>
> (I am not huge on it, though)
>
> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
> an internal, switched network, it  would be better to try and maximize
> the port usage rather than the 5 tuple in some cases.
>
> I have never got around to writing a mac hash I liked, my goal
> originally was to write one that found a minimal perfect hash solution
> eventually as mac addrs tend to be pretty stable on a network and
> rarely change.
>
> Warning: minimal perfect hash attempts are a wet paint thing! I really
> want a FPGA solver for them.... dont go play with the code out there,
> you will lose days to it... you have been warned.
>
> http://cmph.sourceforge.net/concepts.html
>
> I would like there to be a generic mac hashing thing in tc, actually.
>
> 8) Parallel FIB lookup
>
> IF you assume that you have tons of queues routing packets from
> ingress to egress, on tons of cpus, you can actually do the FIB lookup
> in parallel also. There is some old stuff on virtualqueue
> and virtual clock fqing which makes for tighter
>
> 9) Need a codel *library* that works at the mac80211 layer. I think
> codel*.h sufficies but am not sure. And for that matter, codel itself
> seems like it would need a calculated target and a few other thing to
> work right on wifi.
>
> As for the hashing...
>
> Personally I do not think that the 8 way set associative has is what
> wifi needs for cake, I tend to think we need to "pack" aggregates with
> as many different flows as possible, and randomize how we packet
> them... I think.... maybe....
>
> 10) I really dont like BQL with multi-queued hardware queues. More
> backpressure is needed in that case than we get.
>
> 11) GRO peeling
>
> Offloads suck
>
> --
> Dave Täht
> Let's make wifi fast, less jittery and reliable again!
>
> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Cake] cake exploration
  2015-04-11 18:45 ` Dave Taht
@ 2015-04-11 18:47   ` Dave Taht
  2015-04-11 18:48     ` Dave Taht
  2015-04-12 23:14     ` Jonathan Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Taht @ 2015-04-11 18:47 UTC (permalink / raw)
  To: cake

14) strict priority queues. Some CBR techniques, notably IPTV, want 0
packet loss, but run at a rate determined by the provider to be below
what the subscriber will use. Sharing that "fairly" will lead to loss
of packets to those applications.

I do not like strict priority queues. I would prefer, for example,
that the CBR application be marked with ECN, and ignored, vs the the
high probability someone will abuse a strict priority queue.



On Sat, Apr 11, 2015 at 11:45 AM, Dave Taht <dave.taht@gmail.com> wrote:
> 12) Better starting interval and target for codel´s maintence vars in
> relationship to existing flows
>
> Right now sch_fq, sch_pie give priority to flows in their first IW
> phases. This makes them vulnerable to DDOS attacks with tons of new
> flows.
>
> sch_fq_codel mitigates this somewhat by starting to hash flows into
> the same buckets.
>
> sch_cake´s more perfect hashing gives IW more of a boost.
>
> A thought was to do a combined ewma of all active flows and to hand
> their current codel settings to new flows as they arrive, with less of
> a boost.
>
> This MIGHT work better when you have short RTTs generally on local
> networks. Other thoughts appreciated.
>
> There is another related problem in the resumption portion of the
> algorithm as the decay of the existing state variables is arbitrary
> and way too long in some cases. I think I had solved this by coming up
> with an estimate for the amount of decay needed other than count - 2,
> doing a calculation from the last time a flow had packets to the next,
> but can´t remember how I did it! It is easy if you have a last time
> per queue and use a normal sqrt with a divide... but my brain crashes
> at the reciprocal cache math we have instead....
>
> I am not allergic to a divide. I am not allergic to using a shift for
> the target and calculating the interval only relative to bandwidth, as
> mentioned elsewhere. At 64k worth of bandwidth we just end up with a
> huge interval, no big deal. But plan to ride along with the two
> separately for now.
>
> 13)  It might be possible to write a faster codel - and easier to read
> by using a case statement on the 2 core variables in it. The current
> code does not show the 3 way state machine as well as that could, and
> for all I know there is something intelligent we could do with the 4th
> state.
>
> On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht <dave.taht@gmail.com> wrote:
>> Stuff on my backlog of researchy stuff.
>>
>> 1) cake_drop_monitor - I wanted a way to throw drop AND mark
>> notifications up to userspace,
>> including the packet´s time of entry and the time of drop, as well as
>> the IP headers
>> and next hop destination macaddr.
>>
>> There are many use cases for this:
>>
>> A)  - testing the functionality of the algorithm and being able to
>> collect and analyze drops as  they happen.
>>
>> NET_DROP_MONITOR did not cut it but I have not looked at it in a year.
>> It drives me crazy to be dropping packets all over the system and to
>> not be able to track down where they happened.
>>
>> This is the primary reason why I had switched back to 64 bit timestamps, btw.
>>
>> B) Having the drop notifications might be useful in tuning or steering
>> traffic to different routes.
>>
>> C) It is way easier to do a graph of the drop pattern with this info
>> thrown to userspace.
>>
>> 2) Dearly wanted to actually be doing the timestamping AND hashing in
>> the native skb
>> struct on entry to the system itself, not the qdisc. Measuring the
>> latency from ingress from the
>> wire to egress would result in much better cpu overload behavior. I am
>> totally aware of
>> how much mainline linux would not take this option, but things have
>> evolved over there, so
>> leveraging the rxhash and skb->timestamp fields seems a possibility...
>>
>> I think this would let us get along better with netem also, but would
>> have to go look again.
>>
>> Call that cake-rxhash. :)
>>
>> 3) In my benchmark of the latest cake3, ecn traffic was not as good as
>> expected, but that might have been an anomoly of the test. Need to
>> test ecn thoroughly this time, almost in preference to looking at drop
>> behavior. Toke probably has ecn off by default right now. On, after
>> this test run?
>>
>> 4) Testing higher rates and looking at cwnd for codel is important.
>> The dropoff toke noted in his paper is real. Also there is possibly
>> some ideal ratio between number of flows and bandwidth that makes more
>> sense than a fixed number of flows. Also I keep harping on the darn
>> resumption algo... but need to test with lousier tcps like windows.
>>
>> 5) Byte Mode-ish handling
>>
>> Dropping a single 64 byte packet does little good. You will find in
>> the 50 flow tests that a ton of traffic is acks, not being dropped,
>> and pie does better in this case than does fq, as it shoots
>> wildly at everything, but usually misses the fat packets, where DRR
>> will merrily store up an entire
>> MTU worth of useless acks when only one is needed.
>>
>> So just trying to drop more little packets might be helpful in some cases.
>>
>> 6) Ack thinning. I gave what is conventionally called "stretch acks" a
>> new name, as stretch acks
>> have a deserved reputation as sucking. Well, they dont suck anymore in
>> linux, and what I was
>> mostly thinking was to drop no more than 2 in a row...
>>
>> One thing this would help with is in packing wifi aggregates - which
>> have hard limits on the number of packets in a TXOP (42), and a byte
>> limit on wireless n of 64k. Sending 41 acks from
>> one flow, when you could send the last 2, seems like a big win on
>> packing a TXOP.
>>
>> (this is something eric proposed, and given the drop rates we now see
>> from wifi and the wild and wooly internet I am inclined to agree that
>> it is worth fiddling with)
>>
>> (I am not huge on it, though)
>>
>> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
>> an internal, switched network, it  would be better to try and maximize
>> the port usage rather than the 5 tuple in some cases.
>>
>> I have never got around to writing a mac hash I liked, my goal
>> originally was to write one that found a minimal perfect hash solution
>> eventually as mac addrs tend to be pretty stable on a network and
>> rarely change.
>>
>> Warning: minimal perfect hash attempts are a wet paint thing! I really
>> want a FPGA solver for them.... dont go play with the code out there,
>> you will lose days to it... you have been warned.
>>
>> http://cmph.sourceforge.net/concepts.html
>>
>> I would like there to be a generic mac hashing thing in tc, actually.
>>
>> 8) Parallel FIB lookup
>>
>> IF you assume that you have tons of queues routing packets from
>> ingress to egress, on tons of cpus, you can actually do the FIB lookup
>> in parallel also. There is some old stuff on virtualqueue
>> and virtual clock fqing which makes for tighter
>>
>> 9) Need a codel *library* that works at the mac80211 layer. I think
>> codel*.h sufficies but am not sure. And for that matter, codel itself
>> seems like it would need a calculated target and a few other thing to
>> work right on wifi.
>>
>> As for the hashing...
>>
>> Personally I do not think that the 8 way set associative has is what
>> wifi needs for cake, I tend to think we need to "pack" aggregates with
>> as many different flows as possible, and randomize how we packet
>> them... I think.... maybe....
>>
>> 10) I really dont like BQL with multi-queued hardware queues. More
>> backpressure is needed in that case than we get.
>>
>> 11) GRO peeling
>>
>> Offloads suck
>>
>> --
>> Dave Täht
>> Let's make wifi fast, less jittery and reliable again!
>>
>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>
>
>
> --
> Dave Täht
> Let's make wifi fast, less jittery and reliable again!
>
> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Cake] cake exploration
  2015-04-11 18:47   ` Dave Taht
@ 2015-04-11 18:48     ` Dave Taht
  2015-04-11 19:12       ` Dave Taht
  2015-04-12 22:52       ` Jonathan Morton
  2015-04-12 23:14     ` Jonathan Morton
  1 sibling, 2 replies; 10+ messages in thread
From: Dave Taht @ 2015-04-11 18:48 UTC (permalink / raw)
  To: cake

15) Needs to work so an ISP can create service classes for their customers

DRR 1: cake bandwidth X
DRR 2: cake bandwidth Y
DRR 3:

I have no idea whether this can work at all, last I tried it DRR would
stall everytime fq_codel had no packets to deliver.

A related issue is that there needs to be a way to have a tc or
iptables filter to map multiple IPs and addresses so that they return
a single distinct integer for placement into such queue systems
so that a lookup of someone that has x.y.x.z, q.f.b.a/24, j:k:h/64 and
l:m:n/48 can return a single integer representing the customer so it
can be fed into the above correct sub-queuedisc.

I think, but that is not sure, that is all my backlog!

On Sat, Apr 11, 2015 at 11:47 AM, Dave Taht <dave.taht@gmail.com> wrote:
> 14) strict priority queues. Some CBR techniques, notably IPTV, want 0
> packet loss, but run at a rate determined by the provider to be below
> what the subscriber will use. Sharing that "fairly" will lead to loss
> of packets to those applications.
>
> I do not like strict priority queues. I would prefer, for example,
> that the CBR application be marked with ECN, and ignored, vs the the
> high probability someone will abuse a strict priority queue.
>
>
>
> On Sat, Apr 11, 2015 at 11:45 AM, Dave Taht <dave.taht@gmail.com> wrote:
>> 12) Better starting interval and target for codel´s maintence vars in
>> relationship to existing flows
>>
>> Right now sch_fq, sch_pie give priority to flows in their first IW
>> phases. This makes them vulnerable to DDOS attacks with tons of new
>> flows.
>>
>> sch_fq_codel mitigates this somewhat by starting to hash flows into
>> the same buckets.
>>
>> sch_cake´s more perfect hashing gives IW more of a boost.
>>
>> A thought was to do a combined ewma of all active flows and to hand
>> their current codel settings to new flows as they arrive, with less of
>> a boost.
>>
>> This MIGHT work better when you have short RTTs generally on local
>> networks. Other thoughts appreciated.
>>
>> There is another related problem in the resumption portion of the
>> algorithm as the decay of the existing state variables is arbitrary
>> and way too long in some cases. I think I had solved this by coming up
>> with an estimate for the amount of decay needed other than count - 2,
>> doing a calculation from the last time a flow had packets to the next,
>> but can´t remember how I did it! It is easy if you have a last time
>> per queue and use a normal sqrt with a divide... but my brain crashes
>> at the reciprocal cache math we have instead....
>>
>> I am not allergic to a divide. I am not allergic to using a shift for
>> the target and calculating the interval only relative to bandwidth, as
>> mentioned elsewhere. At 64k worth of bandwidth we just end up with a
>> huge interval, no big deal. But plan to ride along with the two
>> separately for now.
>>
>> 13)  It might be possible to write a faster codel - and easier to read
>> by using a case statement on the 2 core variables in it. The current
>> code does not show the 3 way state machine as well as that could, and
>> for all I know there is something intelligent we could do with the 4th
>> state.
>>
>> On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht <dave.taht@gmail.com> wrote:
>>> Stuff on my backlog of researchy stuff.
>>>
>>> 1) cake_drop_monitor - I wanted a way to throw drop AND mark
>>> notifications up to userspace,
>>> including the packet´s time of entry and the time of drop, as well as
>>> the IP headers
>>> and next hop destination macaddr.
>>>
>>> There are many use cases for this:
>>>
>>> A)  - testing the functionality of the algorithm and being able to
>>> collect and analyze drops as  they happen.
>>>
>>> NET_DROP_MONITOR did not cut it but I have not looked at it in a year.
>>> It drives me crazy to be dropping packets all over the system and to
>>> not be able to track down where they happened.
>>>
>>> This is the primary reason why I had switched back to 64 bit timestamps, btw.
>>>
>>> B) Having the drop notifications might be useful in tuning or steering
>>> traffic to different routes.
>>>
>>> C) It is way easier to do a graph of the drop pattern with this info
>>> thrown to userspace.
>>>
>>> 2) Dearly wanted to actually be doing the timestamping AND hashing in
>>> the native skb
>>> struct on entry to the system itself, not the qdisc. Measuring the
>>> latency from ingress from the
>>> wire to egress would result in much better cpu overload behavior. I am
>>> totally aware of
>>> how much mainline linux would not take this option, but things have
>>> evolved over there, so
>>> leveraging the rxhash and skb->timestamp fields seems a possibility...
>>>
>>> I think this would let us get along better with netem also, but would
>>> have to go look again.
>>>
>>> Call that cake-rxhash. :)
>>>
>>> 3) In my benchmark of the latest cake3, ecn traffic was not as good as
>>> expected, but that might have been an anomoly of the test. Need to
>>> test ecn thoroughly this time, almost in preference to looking at drop
>>> behavior. Toke probably has ecn off by default right now. On, after
>>> this test run?
>>>
>>> 4) Testing higher rates and looking at cwnd for codel is important.
>>> The dropoff toke noted in his paper is real. Also there is possibly
>>> some ideal ratio between number of flows and bandwidth that makes more
>>> sense than a fixed number of flows. Also I keep harping on the darn
>>> resumption algo... but need to test with lousier tcps like windows.
>>>
>>> 5) Byte Mode-ish handling
>>>
>>> Dropping a single 64 byte packet does little good. You will find in
>>> the 50 flow tests that a ton of traffic is acks, not being dropped,
>>> and pie does better in this case than does fq, as it shoots
>>> wildly at everything, but usually misses the fat packets, where DRR
>>> will merrily store up an entire
>>> MTU worth of useless acks when only one is needed.
>>>
>>> So just trying to drop more little packets might be helpful in some cases.
>>>
>>> 6) Ack thinning. I gave what is conventionally called "stretch acks" a
>>> new name, as stretch acks
>>> have a deserved reputation as sucking. Well, they dont suck anymore in
>>> linux, and what I was
>>> mostly thinking was to drop no more than 2 in a row...
>>>
>>> One thing this would help with is in packing wifi aggregates - which
>>> have hard limits on the number of packets in a TXOP (42), and a byte
>>> limit on wireless n of 64k. Sending 41 acks from
>>> one flow, when you could send the last 2, seems like a big win on
>>> packing a TXOP.
>>>
>>> (this is something eric proposed, and given the drop rates we now see
>>> from wifi and the wild and wooly internet I am inclined to agree that
>>> it is worth fiddling with)
>>>
>>> (I am not huge on it, though)
>>>
>>> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
>>> an internal, switched network, it  would be better to try and maximize
>>> the port usage rather than the 5 tuple in some cases.
>>>
>>> I have never got around to writing a mac hash I liked, my goal
>>> originally was to write one that found a minimal perfect hash solution
>>> eventually as mac addrs tend to be pretty stable on a network and
>>> rarely change.
>>>
>>> Warning: minimal perfect hash attempts are a wet paint thing! I really
>>> want a FPGA solver for them.... dont go play with the code out there,
>>> you will lose days to it... you have been warned.
>>>
>>> http://cmph.sourceforge.net/concepts.html
>>>
>>> I would like there to be a generic mac hashing thing in tc, actually.
>>>
>>> 8) Parallel FIB lookup
>>>
>>> IF you assume that you have tons of queues routing packets from
>>> ingress to egress, on tons of cpus, you can actually do the FIB lookup
>>> in parallel also. There is some old stuff on virtualqueue
>>> and virtual clock fqing which makes for tighter
>>>
>>> 9) Need a codel *library* that works at the mac80211 layer. I think
>>> codel*.h sufficies but am not sure. And for that matter, codel itself
>>> seems like it would need a calculated target and a few other thing to
>>> work right on wifi.
>>>
>>> As for the hashing...
>>>
>>> Personally I do not think that the 8 way set associative has is what
>>> wifi needs for cake, I tend to think we need to "pack" aggregates with
>>> as many different flows as possible, and randomize how we packet
>>> them... I think.... maybe....
>>>
>>> 10) I really dont like BQL with multi-queued hardware queues. More
>>> backpressure is needed in that case than we get.
>>>
>>> 11) GRO peeling
>>>
>>> Offloads suck
>>>
>>> --
>>> Dave Täht
>>> Let's make wifi fast, less jittery and reliable again!
>>>
>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>>
>>
>>
>> --
>> Dave Täht
>> Let's make wifi fast, less jittery and reliable again!
>>
>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>
>
>
> --
> Dave Täht
> Let's make wifi fast, less jittery and reliable again!
>
> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Cake] cake exploration
  2015-04-11 18:48     ` Dave Taht
@ 2015-04-11 19:12       ` Dave Taht
  2015-04-12  2:33         ` Dave Taht
  2015-04-12 22:52       ` Jonathan Morton
  1 sibling, 1 reply; 10+ messages in thread
From: Dave Taht @ 2015-04-11 19:12 UTC (permalink / raw)
  To: cake

16) Better VPN handling

5 flows of encapsulated vpn traffic competes badly with 5 flows of
(say) bittorrent traffic.

We could give the ipsec form of vpn traffic a boost by explicitly
recognising and classifying the AH and ESP headers into a different
diffserv bin.

Similarly, tinc and openvpn, could use a port match.



On Sat, Apr 11, 2015 at 11:48 AM, Dave Taht <dave.taht@gmail.com> wrote:
> 15) Needs to work so an ISP can create service classes for their customers
>
> DRR 1: cake bandwidth X
> DRR 2: cake bandwidth Y
> DRR 3:
>
> I have no idea whether this can work at all, last I tried it DRR would
> stall everytime fq_codel had no packets to deliver.
>
> A related issue is that there needs to be a way to have a tc or
> iptables filter to map multiple IPs and addresses so that they return
> a single distinct integer for placement into such queue systems
> so that a lookup of someone that has x.y.x.z, q.f.b.a/24, j:k:h/64 and
> l:m:n/48 can return a single integer representing the customer so it
> can be fed into the above correct sub-queuedisc.
>
> I think, but that is not sure, that is all my backlog!
>
> On Sat, Apr 11, 2015 at 11:47 AM, Dave Taht <dave.taht@gmail.com> wrote:
>> 14) strict priority queues. Some CBR techniques, notably IPTV, want 0
>> packet loss, but run at a rate determined by the provider to be below
>> what the subscriber will use. Sharing that "fairly" will lead to loss
>> of packets to those applications.
>>
>> I do not like strict priority queues. I would prefer, for example,
>> that the CBR application be marked with ECN, and ignored, vs the the
>> high probability someone will abuse a strict priority queue.
>>
>>
>>
>> On Sat, Apr 11, 2015 at 11:45 AM, Dave Taht <dave.taht@gmail.com> wrote:
>>> 12) Better starting interval and target for codel´s maintence vars in
>>> relationship to existing flows
>>>
>>> Right now sch_fq, sch_pie give priority to flows in their first IW
>>> phases. This makes them vulnerable to DDOS attacks with tons of new
>>> flows.
>>>
>>> sch_fq_codel mitigates this somewhat by starting to hash flows into
>>> the same buckets.
>>>
>>> sch_cake´s more perfect hashing gives IW more of a boost.
>>>
>>> A thought was to do a combined ewma of all active flows and to hand
>>> their current codel settings to new flows as they arrive, with less of
>>> a boost.
>>>
>>> This MIGHT work better when you have short RTTs generally on local
>>> networks. Other thoughts appreciated.
>>>
>>> There is another related problem in the resumption portion of the
>>> algorithm as the decay of the existing state variables is arbitrary
>>> and way too long in some cases. I think I had solved this by coming up
>>> with an estimate for the amount of decay needed other than count - 2,
>>> doing a calculation from the last time a flow had packets to the next,
>>> but can´t remember how I did it! It is easy if you have a last time
>>> per queue and use a normal sqrt with a divide... but my brain crashes
>>> at the reciprocal cache math we have instead....
>>>
>>> I am not allergic to a divide. I am not allergic to using a shift for
>>> the target and calculating the interval only relative to bandwidth, as
>>> mentioned elsewhere. At 64k worth of bandwidth we just end up with a
>>> huge interval, no big deal. But plan to ride along with the two
>>> separately for now.
>>>
>>> 13)  It might be possible to write a faster codel - and easier to read
>>> by using a case statement on the 2 core variables in it. The current
>>> code does not show the 3 way state machine as well as that could, and
>>> for all I know there is something intelligent we could do with the 4th
>>> state.
>>>
>>> On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht <dave.taht@gmail.com> wrote:
>>>> Stuff on my backlog of researchy stuff.
>>>>
>>>> 1) cake_drop_monitor - I wanted a way to throw drop AND mark
>>>> notifications up to userspace,
>>>> including the packet´s time of entry and the time of drop, as well as
>>>> the IP headers
>>>> and next hop destination macaddr.
>>>>
>>>> There are many use cases for this:
>>>>
>>>> A)  - testing the functionality of the algorithm and being able to
>>>> collect and analyze drops as  they happen.
>>>>
>>>> NET_DROP_MONITOR did not cut it but I have not looked at it in a year.
>>>> It drives me crazy to be dropping packets all over the system and to
>>>> not be able to track down where they happened.
>>>>
>>>> This is the primary reason why I had switched back to 64 bit timestamps, btw.
>>>>
>>>> B) Having the drop notifications might be useful in tuning or steering
>>>> traffic to different routes.
>>>>
>>>> C) It is way easier to do a graph of the drop pattern with this info
>>>> thrown to userspace.
>>>>
>>>> 2) Dearly wanted to actually be doing the timestamping AND hashing in
>>>> the native skb
>>>> struct on entry to the system itself, not the qdisc. Measuring the
>>>> latency from ingress from the
>>>> wire to egress would result in much better cpu overload behavior. I am
>>>> totally aware of
>>>> how much mainline linux would not take this option, but things have
>>>> evolved over there, so
>>>> leveraging the rxhash and skb->timestamp fields seems a possibility...
>>>>
>>>> I think this would let us get along better with netem also, but would
>>>> have to go look again.
>>>>
>>>> Call that cake-rxhash. :)
>>>>
>>>> 3) In my benchmark of the latest cake3, ecn traffic was not as good as
>>>> expected, but that might have been an anomoly of the test. Need to
>>>> test ecn thoroughly this time, almost in preference to looking at drop
>>>> behavior. Toke probably has ecn off by default right now. On, after
>>>> this test run?
>>>>
>>>> 4) Testing higher rates and looking at cwnd for codel is important.
>>>> The dropoff toke noted in his paper is real. Also there is possibly
>>>> some ideal ratio between number of flows and bandwidth that makes more
>>>> sense than a fixed number of flows. Also I keep harping on the darn
>>>> resumption algo... but need to test with lousier tcps like windows.
>>>>
>>>> 5) Byte Mode-ish handling
>>>>
>>>> Dropping a single 64 byte packet does little good. You will find in
>>>> the 50 flow tests that a ton of traffic is acks, not being dropped,
>>>> and pie does better in this case than does fq, as it shoots
>>>> wildly at everything, but usually misses the fat packets, where DRR
>>>> will merrily store up an entire
>>>> MTU worth of useless acks when only one is needed.
>>>>
>>>> So just trying to drop more little packets might be helpful in some cases.
>>>>
>>>> 6) Ack thinning. I gave what is conventionally called "stretch acks" a
>>>> new name, as stretch acks
>>>> have a deserved reputation as sucking. Well, they dont suck anymore in
>>>> linux, and what I was
>>>> mostly thinking was to drop no more than 2 in a row...
>>>>
>>>> One thing this would help with is in packing wifi aggregates - which
>>>> have hard limits on the number of packets in a TXOP (42), and a byte
>>>> limit on wireless n of 64k. Sending 41 acks from
>>>> one flow, when you could send the last 2, seems like a big win on
>>>> packing a TXOP.
>>>>
>>>> (this is something eric proposed, and given the drop rates we now see
>>>> from wifi and the wild and wooly internet I am inclined to agree that
>>>> it is worth fiddling with)
>>>>
>>>> (I am not huge on it, though)
>>>>
>>>> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
>>>> an internal, switched network, it  would be better to try and maximize
>>>> the port usage rather than the 5 tuple in some cases.
>>>>
>>>> I have never got around to writing a mac hash I liked, my goal
>>>> originally was to write one that found a minimal perfect hash solution
>>>> eventually as mac addrs tend to be pretty stable on a network and
>>>> rarely change.
>>>>
>>>> Warning: minimal perfect hash attempts are a wet paint thing! I really
>>>> want a FPGA solver for them.... dont go play with the code out there,
>>>> you will lose days to it... you have been warned.
>>>>
>>>> http://cmph.sourceforge.net/concepts.html
>>>>
>>>> I would like there to be a generic mac hashing thing in tc, actually.
>>>>
>>>> 8) Parallel FIB lookup
>>>>
>>>> IF you assume that you have tons of queues routing packets from
>>>> ingress to egress, on tons of cpus, you can actually do the FIB lookup
>>>> in parallel also. There is some old stuff on virtualqueue
>>>> and virtual clock fqing which makes for tighter
>>>>
>>>> 9) Need a codel *library* that works at the mac80211 layer. I think
>>>> codel*.h sufficies but am not sure. And for that matter, codel itself
>>>> seems like it would need a calculated target and a few other thing to
>>>> work right on wifi.
>>>>
>>>> As for the hashing...
>>>>
>>>> Personally I do not think that the 8 way set associative has is what
>>>> wifi needs for cake, I tend to think we need to "pack" aggregates with
>>>> as many different flows as possible, and randomize how we packet
>>>> them... I think.... maybe....
>>>>
>>>> 10) I really dont like BQL with multi-queued hardware queues. More
>>>> backpressure is needed in that case than we get.
>>>>
>>>> 11) GRO peeling
>>>>
>>>> Offloads suck
>>>>
>>>> --
>>>> Dave Täht
>>>> Let's make wifi fast, less jittery and reliable again!
>>>>
>>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>> Let's make wifi fast, less jittery and reliable again!
>>>
>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>>
>>
>>
>> --
>> Dave Täht
>> Let's make wifi fast, less jittery and reliable again!
>>
>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>
>
>
> --
> Dave Täht
> Let's make wifi fast, less jittery and reliable again!
>
> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Cake] cake exploration
  2015-04-11 19:12       ` Dave Taht
@ 2015-04-12  2:33         ` Dave Taht
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Taht @ 2015-04-12  2:33 UTC (permalink / raw)
  To: cake

17)  the atm compensation in cake is entirely untested. And it is
unclear as to how best handle pppoe.

On Sat, Apr 11, 2015 at 12:12 PM, Dave Taht <dave.taht@gmail.com> wrote:
> 16) Better VPN handling
>
> 5 flows of encapsulated vpn traffic competes badly with 5 flows of
> (say) bittorrent traffic.
>
> We could give the ipsec form of vpn traffic a boost by explicitly
> recognising and classifying the AH and ESP headers into a different
> diffserv bin.
>
> Similarly, tinc and openvpn, could use a port match.
>
>
>
> On Sat, Apr 11, 2015 at 11:48 AM, Dave Taht <dave.taht@gmail.com> wrote:
>> 15) Needs to work so an ISP can create service classes for their customers
>>
>> DRR 1: cake bandwidth X
>> DRR 2: cake bandwidth Y
>> DRR 3:
>>
>> I have no idea whether this can work at all, last I tried it DRR would
>> stall everytime fq_codel had no packets to deliver.
>>
>> A related issue is that there needs to be a way to have a tc or
>> iptables filter to map multiple IPs and addresses so that they return
>> a single distinct integer for placement into such queue systems
>> so that a lookup of someone that has x.y.x.z, q.f.b.a/24, j:k:h/64 and
>> l:m:n/48 can return a single integer representing the customer so it
>> can be fed into the above correct sub-queuedisc.
>>
>> I think, but that is not sure, that is all my backlog!
>>
>> On Sat, Apr 11, 2015 at 11:47 AM, Dave Taht <dave.taht@gmail.com> wrote:
>>> 14) strict priority queues. Some CBR techniques, notably IPTV, want 0
>>> packet loss, but run at a rate determined by the provider to be below
>>> what the subscriber will use. Sharing that "fairly" will lead to loss
>>> of packets to those applications.
>>>
>>> I do not like strict priority queues. I would prefer, for example,
>>> that the CBR application be marked with ECN, and ignored, vs the the
>>> high probability someone will abuse a strict priority queue.
>>>
>>>
>>>
>>> On Sat, Apr 11, 2015 at 11:45 AM, Dave Taht <dave.taht@gmail.com> wrote:
>>>> 12) Better starting interval and target for codel´s maintence vars in
>>>> relationship to existing flows
>>>>
>>>> Right now sch_fq, sch_pie give priority to flows in their first IW
>>>> phases. This makes them vulnerable to DDOS attacks with tons of new
>>>> flows.
>>>>
>>>> sch_fq_codel mitigates this somewhat by starting to hash flows into
>>>> the same buckets.
>>>>
>>>> sch_cake´s more perfect hashing gives IW more of a boost.
>>>>
>>>> A thought was to do a combined ewma of all active flows and to hand
>>>> their current codel settings to new flows as they arrive, with less of
>>>> a boost.
>>>>
>>>> This MIGHT work better when you have short RTTs generally on local
>>>> networks. Other thoughts appreciated.
>>>>
>>>> There is another related problem in the resumption portion of the
>>>> algorithm as the decay of the existing state variables is arbitrary
>>>> and way too long in some cases. I think I had solved this by coming up
>>>> with an estimate for the amount of decay needed other than count - 2,
>>>> doing a calculation from the last time a flow had packets to the next,
>>>> but can´t remember how I did it! It is easy if you have a last time
>>>> per queue and use a normal sqrt with a divide... but my brain crashes
>>>> at the reciprocal cache math we have instead....
>>>>
>>>> I am not allergic to a divide. I am not allergic to using a shift for
>>>> the target and calculating the interval only relative to bandwidth, as
>>>> mentioned elsewhere. At 64k worth of bandwidth we just end up with a
>>>> huge interval, no big deal. But plan to ride along with the two
>>>> separately for now.
>>>>
>>>> 13)  It might be possible to write a faster codel - and easier to read
>>>> by using a case statement on the 2 core variables in it. The current
>>>> code does not show the 3 way state machine as well as that could, and
>>>> for all I know there is something intelligent we could do with the 4th
>>>> state.
>>>>
>>>> On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht <dave.taht@gmail.com> wrote:
>>>>> Stuff on my backlog of researchy stuff.
>>>>>
>>>>> 1) cake_drop_monitor - I wanted a way to throw drop AND mark
>>>>> notifications up to userspace,
>>>>> including the packet´s time of entry and the time of drop, as well as
>>>>> the IP headers
>>>>> and next hop destination macaddr.
>>>>>
>>>>> There are many use cases for this:
>>>>>
>>>>> A)  - testing the functionality of the algorithm and being able to
>>>>> collect and analyze drops as  they happen.
>>>>>
>>>>> NET_DROP_MONITOR did not cut it but I have not looked at it in a year.
>>>>> It drives me crazy to be dropping packets all over the system and to
>>>>> not be able to track down where they happened.
>>>>>
>>>>> This is the primary reason why I had switched back to 64 bit timestamps, btw.
>>>>>
>>>>> B) Having the drop notifications might be useful in tuning or steering
>>>>> traffic to different routes.
>>>>>
>>>>> C) It is way easier to do a graph of the drop pattern with this info
>>>>> thrown to userspace.
>>>>>
>>>>> 2) Dearly wanted to actually be doing the timestamping AND hashing in
>>>>> the native skb
>>>>> struct on entry to the system itself, not the qdisc. Measuring the
>>>>> latency from ingress from the
>>>>> wire to egress would result in much better cpu overload behavior. I am
>>>>> totally aware of
>>>>> how much mainline linux would not take this option, but things have
>>>>> evolved over there, so
>>>>> leveraging the rxhash and skb->timestamp fields seems a possibility...
>>>>>
>>>>> I think this would let us get along better with netem also, but would
>>>>> have to go look again.
>>>>>
>>>>> Call that cake-rxhash. :)
>>>>>
>>>>> 3) In my benchmark of the latest cake3, ecn traffic was not as good as
>>>>> expected, but that might have been an anomoly of the test. Need to
>>>>> test ecn thoroughly this time, almost in preference to looking at drop
>>>>> behavior. Toke probably has ecn off by default right now. On, after
>>>>> this test run?
>>>>>
>>>>> 4) Testing higher rates and looking at cwnd for codel is important.
>>>>> The dropoff toke noted in his paper is real. Also there is possibly
>>>>> some ideal ratio between number of flows and bandwidth that makes more
>>>>> sense than a fixed number of flows. Also I keep harping on the darn
>>>>> resumption algo... but need to test with lousier tcps like windows.
>>>>>
>>>>> 5) Byte Mode-ish handling
>>>>>
>>>>> Dropping a single 64 byte packet does little good. You will find in
>>>>> the 50 flow tests that a ton of traffic is acks, not being dropped,
>>>>> and pie does better in this case than does fq, as it shoots
>>>>> wildly at everything, but usually misses the fat packets, where DRR
>>>>> will merrily store up an entire
>>>>> MTU worth of useless acks when only one is needed.
>>>>>
>>>>> So just trying to drop more little packets might be helpful in some cases.
>>>>>
>>>>> 6) Ack thinning. I gave what is conventionally called "stretch acks" a
>>>>> new name, as stretch acks
>>>>> have a deserved reputation as sucking. Well, they dont suck anymore in
>>>>> linux, and what I was
>>>>> mostly thinking was to drop no more than 2 in a row...
>>>>>
>>>>> One thing this would help with is in packing wifi aggregates - which
>>>>> have hard limits on the number of packets in a TXOP (42), and a byte
>>>>> limit on wireless n of 64k. Sending 41 acks from
>>>>> one flow, when you could send the last 2, seems like a big win on
>>>>> packing a TXOP.
>>>>>
>>>>> (this is something eric proposed, and given the drop rates we now see
>>>>> from wifi and the wild and wooly internet I am inclined to agree that
>>>>> it is worth fiddling with)
>>>>>
>>>>> (I am not huge on it, though)
>>>>>
>>>>> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
>>>>> an internal, switched network, it  would be better to try and maximize
>>>>> the port usage rather than the 5 tuple in some cases.
>>>>>
>>>>> I have never got around to writing a mac hash I liked, my goal
>>>>> originally was to write one that found a minimal perfect hash solution
>>>>> eventually as mac addrs tend to be pretty stable on a network and
>>>>> rarely change.
>>>>>
>>>>> Warning: minimal perfect hash attempts are a wet paint thing! I really
>>>>> want a FPGA solver for them.... dont go play with the code out there,
>>>>> you will lose days to it... you have been warned.
>>>>>
>>>>> http://cmph.sourceforge.net/concepts.html
>>>>>
>>>>> I would like there to be a generic mac hashing thing in tc, actually.
>>>>>
>>>>> 8) Parallel FIB lookup
>>>>>
>>>>> IF you assume that you have tons of queues routing packets from
>>>>> ingress to egress, on tons of cpus, you can actually do the FIB lookup
>>>>> in parallel also. There is some old stuff on virtualqueue
>>>>> and virtual clock fqing which makes for tighter
>>>>>
>>>>> 9) Need a codel *library* that works at the mac80211 layer. I think
>>>>> codel*.h sufficies but am not sure. And for that matter, codel itself
>>>>> seems like it would need a calculated target and a few other thing to
>>>>> work right on wifi.
>>>>>
>>>>> As for the hashing...
>>>>>
>>>>> Personally I do not think that the 8 way set associative has is what
>>>>> wifi needs for cake, I tend to think we need to "pack" aggregates with
>>>>> as many different flows as possible, and randomize how we packet
>>>>> them... I think.... maybe....
>>>>>
>>>>> 10) I really dont like BQL with multi-queued hardware queues. More
>>>>> backpressure is needed in that case than we get.
>>>>>
>>>>> 11) GRO peeling
>>>>>
>>>>> Offloads suck
>>>>>
>>>>> --
>>>>> Dave Täht
>>>>> Let's make wifi fast, less jittery and reliable again!
>>>>>
>>>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>>>>
>>>>
>>>>
>>>> --
>>>> Dave Täht
>>>> Let's make wifi fast, less jittery and reliable again!
>>>>
>>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>> Let's make wifi fast, less jittery and reliable again!
>>>
>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>>
>>
>>
>> --
>> Dave Täht
>> Let's make wifi fast, less jittery and reliable again!
>>
>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
>
>
>
> --
> Dave Täht
> Let's make wifi fast, less jittery and reliable again!
>
> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Cake] cake exploration
  2015-04-11 18:44 [Cake] cake exploration Dave Taht
  2015-04-11 18:45 ` Dave Taht
@ 2015-04-12 22:44 ` Jonathan Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Jonathan Morton @ 2015-04-12 22:44 UTC (permalink / raw)
  To: cake

Okay, lots of things to think about all at once.  Cherry-picking:

> 5) Byte Mode-ish handling
> 
> Dropping a single 64 byte packet does little good. You will find in
> the 50 flow tests that a ton of traffic is acks, not being dropped,
> and pie does better in this case than does fq, as it shoots
> wildly at everything, but usually misses the fat packets, where DRR
> will merrily store up an entire
> MTU worth of useless acks when only one is needed.
> 
> So just trying to drop more little packets might be helpful in some cases.
> 
> 6) Ack thinning. I gave what is conventionally called "stretch acks" a
> new name, as stretch acks
> have a deserved reputation as sucking. Well, they dont suck anymore in
> linux, and what I was
> mostly thinking was to drop no more than 2 in a row...
> 
> One thing this would help with is in packing wifi aggregates - which
> have hard limits on the number of packets in a TXOP (42), and a byte
> limit on wireless n of 64k. Sending 41 acks from
> one flow, when you could send the last 2, seems like a big win on
> packing a TXOP.
> 
> (this is something eric proposed, and given the drop rates we now see
> from wifi and the wild and wooly internet I am inclined to agree that
> it is worth fiddling with)
> 
> (I am not huge on it, though)

Okay, dropping individual 64-byte packets does little good.  But transmitting 64-byte packets costs very little.  On the gripping hand, receiving and processing 64-byte packets is probably disproportionately expensive; probably much of our forwarding CPU overhead is per-packet, not per-byte.  (This could be tested by reducing MTU and seeing how much the CPU saturation threshold moves.)

Here’s something important:  ECE-Echo is carried on acks.

As far as wifi is concerned, if you’ve got 42 acks from one flow aggregating into one transmit opportunity, then I’d say that’s a very rare and degenerate case.  Maybe it happens occasionally on real traffic, but it would be a transient condition - not occurring continuously.  So it only matters if there’s competing traffic for the same station, and that’s something we can tackle a different way.

> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
> an internal, switched network, it  would be better to try and maximize
> the port usage rather than the 5 tuple in some cases.

We’ve sort-of almost got that already with “dsthost" mode.

> Personally I do not think that the 8 way set associative has is what
> wifi needs for cake, I tend to think we need to "pack" aggregates with
> as many different flows as possible, and randomize how we packet
> them... I think.... maybe….

Wifi needs per-flow fairness on a per-station basis, ie. not hashing the entire 5-tuple at once, but processing different parts of it separately.

That’s potentially a good thing in the more general case, actually; at the moment, with multiple hosts served by a single cake instance, one user can outcompete another by simply using more flows.

> 13)  It might be possible to write a faster codel - and easier to read
> by using a case statement on the 2 core variables in it. The current
> code does not show the 3 way state machine as well as that could, and
> for all I know there is something intelligent we could do with the 4th
> state.

Yes.  Easier to read code may even turn out to be faster (for a variety of reasons), and will certainly be easier for hardware folks to get their heads around.

- Jonathan Morton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Cake] cake exploration
  2015-04-11 18:48     ` Dave Taht
  2015-04-11 19:12       ` Dave Taht
@ 2015-04-12 22:52       ` Jonathan Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Jonathan Morton @ 2015-04-12 22:52 UTC (permalink / raw)
  To: cake

> On 11 Apr, 2015, at 21:48, Dave Taht <dave.taht@gmail.com> wrote:
> 
> 15) Needs to work so an ISP can create service classes for their customers
> 
> DRR 1: cake bandwidth X
> DRR 2: cake bandwidth Y
> DRR 3:
> 
> I have no idea whether this can work at all, last I tried it DRR would
> stall everytime fq_codel had no packets to deliver.

Sounds like a classful version of cake.  I may have mentioned the need for something like it before, myself.

Actually, I think the required algorithms were already in cake1.  The key point is that you must track the backlog per class, and only accept the bandwidth grant from a class if there’s a backlog to service with it.  Even then, I had to backtrack if Codel goes and drops the last packet in the class, so that I could either select another class or reschedule the timer.

Cake3 simplifies that slightly, in that the “reschedule the timer” option is taken care of before class selection begins.

- Jonathan Morton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Cake] cake exploration
  2015-04-11 18:47   ` Dave Taht
  2015-04-11 18:48     ` Dave Taht
@ 2015-04-12 23:14     ` Jonathan Morton
  2015-04-13 15:54       ` Dave Taht
  1 sibling, 1 reply; 10+ messages in thread
From: Jonathan Morton @ 2015-04-12 23:14 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake

> On 11 Apr, 2015, at 21:47, Dave Taht <dave.taht@gmail.com> wrote:
> 
> 14) strict priority queues. Some CBR techniques, notably IPTV, want 0
> packet loss, but run at a rate determined by the provider to be below
> what the subscriber will use. Sharing that "fairly" will lead to loss
> of packets to those applications.
> 
> I do not like strict priority queues. I would prefer, for example,
> that the CBR application be marked with ECN, and ignored, vs the the
> high probability someone will abuse a strict priority queue.

The new priority mechanism in cake3 actually still supports a hard rate-limit function, albeit with a small amount of slop in it.  You would simply need to force the “bandwidth share” quantum value to zero, which would mean that the class involved only gets quanta when it’s running within its limit.

A sufficiently large “priority share” quantum value would also behave an awful lot like strict priority.  This is aided by the fact that cake3 still charges the bandwidth of high priority classes to all lower priority classes - but note that if the normal strictly-decreasing structure of the classes is violated, it becomes possible to force some of the high-priority classes to operate permanently in bandwidth-sharing mode, robbing them of most of their original benefit.

I feel fairly strongly that this type of traffic should be handled in one of two ways:

- Mark it with an appropriate DSCP, such as CS3 or VA, and accept contention if it occurs.

- Permanently mark off the CBR bandwidth as unavailable to normal traffic, and configure cake to use the remainder; use a separate mechanism to have the CBR traffic bypass cake.  This would be particularly appropriate for a shared broadcast stream.

As a variation on the second option, it may be that the CBR stream is only present intermittently.  In that case, cake can be reconfigured on the fly by an external mechanism, to use either the full or reduced bandwidth; the bypass mechanism should remain in place meanwhile.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Cake] cake exploration
  2015-04-12 23:14     ` Jonathan Morton
@ 2015-04-13 15:54       ` Dave Taht
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Taht @ 2015-04-13 15:54 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

On Sun, Apr 12, 2015 at 4:14 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 11 Apr, 2015, at 21:47, Dave Taht <dave.taht@gmail.com> wrote:
>>
>> 14) strict priority queues. Some CBR techniques, notably IPTV, want 0
>> packet loss, but run at a rate determined by the provider to be below
>> what the subscriber will use. Sharing that "fairly" will lead to loss
>> of packets to those applications.
>>
>> I do not like strict priority queues. I would prefer, for example,
>> that the CBR application be marked with ECN, and ignored, vs the the
>> high probability someone will abuse a strict priority queue.
>
> The new priority mechanism in cake3 actually still supports a hard rate-limit function, albeit with a small amount of slop in it.  You would simply need to force the “bandwidth share” quantum value to zero, which would mean that the class involved only gets quanta when it’s running within its limit.
>
> A sufficiently large “priority share” quantum value would also behave an awful lot like strict priority.  This is aided by the fact that cake3 still charges the bandwidth of high priority classes to all lower priority classes - but note that if the normal strictly-decreasing structure of the classes is violated, it becomes possible to force some of the high-priority classes to operate permanently in bandwidth-sharing mode, robbing them of most of their original benefit.
>
> I feel fairly strongly that this type of traffic should be handled in one of two ways:
>
> - Mark it with an appropriate DSCP, such as CS3 or VA, and accept contention if it occurs.
>
> - Permanently mark off the CBR bandwidth as unavailable to normal traffic, and configure cake to use the remainder; use a separate mechanism to have the CBR traffic bypass cake.  This would be particularly appropriate for a shared broadcast stream.

In the one IPTV case I have a grip on, it is a joined multicast
stream, so it is not present when the tv is not active.

How it is done in japan I do not know.

Free.fr 's DSL device actually totally hides the the tv code on a
different DSL vlan(ish) thing in the device driver, so the tv stream
is not visible to the overlying qdisc at all, which I have no idea how
to handle.

> As a variation on the second option, it may be that the CBR stream is only present intermittently.  In that case, cake can be reconfigured on the fly by an external mechanism, to use either the full or reduced bandwidth; the bypass mechanism should remain in place meanwhile.

Well, I perhaps misspoke about it being CBR. I would argue that in
many cases it is VBR.

>  - Jonathan Morton
>



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-04-13 15:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-11 18:44 [Cake] cake exploration Dave Taht
2015-04-11 18:45 ` Dave Taht
2015-04-11 18:47   ` Dave Taht
2015-04-11 18:48     ` Dave Taht
2015-04-11 19:12       ` Dave Taht
2015-04-12  2:33         ` Dave Taht
2015-04-12 22:52       ` Jonathan Morton
2015-04-12 23:14     ` Jonathan Morton
2015-04-13 15:54       ` Dave Taht
2015-04-12 22:44 ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox