Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
* [Cake] cake byte limits too high by 10x
@ 2015-05-24  5:14 Dave Taht
  2015-05-25  4:47 ` Jonathan Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Taht @ 2015-05-24  5:14 UTC (permalink / raw)
  To: cake

at 100Mbit we had 5 megabytes of max queuing. I don't think this was
jonathon's intent, as the default if no rate was specified was 1Mbyte.

Even what i did below is kind of wrong, but it did have satisfying
results for kicking in the cake ecn overload and-switch-to-drop
behavior, and stomping on slow start before it got too out of hand.

tcp's behavior is quadratic to the buffering... (it is too late at
night for me to think harder on that), and a queue is there to absorb
bursts (and also too late to think on this)

anywhy, here so we end up at 100mbit at ~340 full size packets max
with byte limits, and about 2000 acks. Compare this to the packet
limits in the sqm system (800), which were developed primarily against
testing against 5-50Mbit workloads.  On the one hand I have hope we
can always use less memory with a byte queue limited qdisc system
while still preserving good reverse direction performance. (yea! makes
for saner small router behavior) On the other hand codel does not
react fast enough to major bursts without engaging the out of
bufferspace cake_drop which is pretty darn cpu intensive. On the
gripping hand there becomes no right outer limit for a wildly variable
802.11ac wifi queue, with speeds from one to 1.5Gbit, but I sure would
like a cake-mq that handled the existing queues right with less
memory.

        if(q->rate_bps)
        {
                u64 t = q->rate_bps * q->classes[0].cparams.interval;
                do_div(t, NSEC_PER_SEC * 10 / 4);
                                                         ^^^^ added the *10
                q->buffer_limit = t;

                if(q->buffer_limit < 131072)
                        q->buffer_limit = 131072; // 64k too low a lower bound
        } else {
                q->buffer_limit = 1 << 20;
        }

        printk(KERN_WARNING "cake buffer_limit: %d", q->buffer_limit);


-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] cake byte limits too high by 10x
  2015-05-24  5:14 [Cake] cake byte limits too high by 10x Dave Taht
@ 2015-05-25  4:47 ` Jonathan Morton
  2015-05-25 19:46   ` Dave Taht
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Morton @ 2015-05-25  4:47 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake


> On 24 May, 2015, at 08:14, Dave Taht <dave.taht@gmail.com> wrote:
> 
> at 100Mbit we had 5 megabytes of max queuing. I don't think this was
> jonathon's intent, as the default if no rate was specified was 1Mbyte.
> 
> … On the other hand codel does not
> react fast enough to major bursts without engaging the out of
> bufferspace cake_drop which is pretty darn cpu intensive.

The 1-megabyte default is not intended to reflect the highest possible link rate.  Frankly, if you’re triggering cake_drop() without involving truly unresponsive flows, then the buffer limit is too small - you need to give Codel the time it needs to come up to speed.  I definitely should make cake_drop() more efficient, but that’s not the problem here.

I consider it *far* less important to control the short-term length of individual queues, compared to the latency observed by competing latency-sensitive traffic.

And I also think that ECN and packet-drops are too crude a tool to control queue length to the extent you want.  Hence ELR - but that’s still in the future.  (How much funding do you think we can get for that?)

> On the
> gripping hand there becomes no right outer limit for a wildly variable
> 802.11ac wifi queue, with speeds from one to 1.5Gbit, but I sure would
> like a cake-mq that handled the existing queues right with less
> memory.

Something that I haven’t had time to implement yet is a dual-mode FQ, performing both flow isolation and host isolation.  That seems like a step towards what wifi needs, as well as more directly addressing the case where swarms and sensitive traffic are running on different endpoints, without Diffserv assistance.

However, the overall design I have in mind for a wifi-aware qdisc is a little bit inside-out compared to cake.  Cake goes:

shaper -> priority -> flows -> signalling -> queues

…or, with host isolation added:

shaper -> priority -> hosts -> flows -> signalling -> queues

What wifi needs is a bit different:

(hosts/aggregates?) -> (shapers/minstrel?) -> priority -> flows -> signalling -> queues

Of course, since the priority layer is buried three+ layers deep, it’s obviously not going to use hardware 802.11e support.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] cake byte limits too high by 10x
  2015-05-25  4:47 ` Jonathan Morton
@ 2015-05-25 19:46   ` Dave Taht
  2015-05-29 12:24     ` Jonathan Morton
                       ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dave Taht @ 2015-05-25 19:46 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

On Sun, May 24, 2015 at 9:47 PM, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 24 May, 2015, at 08:14, Dave Taht <dave.taht@gmail.com> wrote:
>>
>> at 100Mbit we had 5 megabytes of max queuing. I don't think this was
>> jonathon's intent, as the default if no rate was specified was 1Mbyte.
>>
>> … On the other hand codel does not
>> react fast enough to major bursts without engaging the out of
>> bufferspace cake_drop which is pretty darn cpu intensive.
>
> The 1-megabyte default is not intended to reflect the highest possible link rate.  Frankly, if you’re triggering cake_drop() without involving truly unresponsive flows, then the buffer limit is too small - you need to give Codel the time it needs to come up to speed.  I definitely should make cake_drop() more efficient, but that’s not the problem here.

At this point I think that codel is too un-responsive to deal with
tons of flows in slow start, even if fq'd. Pie does better.

So I disagree. I think dropping at a sane, and pretty small buffer
limit, is in-line with the theory and assumptions everyone has made
about the right sizes for buffering (which is usually in the range of
50-100 packets at most speeds!) and what I just did is a win over
excessive buffering - and a HUGE win on devices with many interfaces
and memory limitations. 300+k of buffering is quite a lot at these
speeds - the algorithm(s) settle down at about 90k at 100mbit, which
leaves plenty of room for finer levels of control and an outside limit
that is relatively sane. Fq_codel (with offloads at the 1000 packet
default) could hit 64Megabytes and even without offloads would
1.5mbytes.

A codel-ish algorithm that reacted faster on overload overall would be
nice, and being number of flow sensitive in cake seems like a partial
way to get there, and I have a few other ideas under test.

> I consider it *far* less important to control the short-term length of individual queues, compared to the latency observed by competing latency-sensitive traffic.

I share the opinion that fq is more important than aqm. AND: both are needed.

However there are many other objectives that need to be met also -
keeping the pipe filled and utilization high for starters. The ideal
buffer length is 1 packet, all the time, no idle periods. trying to
smooth out bursts. being resistant to attacks. etc. THIS STUFF IS
HARD!

we need better tools to look at it. In particular, examples of typical
workloads would be saner than engineering to the dslreports or flent
tests.

>
> And I also think that ECN and packet-drops are too crude a tool to control queue length to the extent you want.  Hence ELR - but that’s still in the future.  (How much funding do you think we can get for that?)

At the moment, I am merely hoping that demand for those doing the work
increases enough so that those potential funders (as opposed to
buyers) that need it most are willing to sink an investment into those
doing the work - and are satisified merely by having the ideas and
code in deployable form earlier.

Most of what I have fought personally is that with "employment" corps
expect to also own the ideas, and here there is no such possibility. I
have chosen to live in a yurt until the "problem" is solved enough to
move into hardware...

With open source methods, if corps stay maneuverable enough (as for
example free.fr, openwrt derivatives, and google fiber are), they can
capitalize on new inventions faster - but "owning the ideas" always
seems to end up being a higher priority in many mindsets.

There aren't enough minds on the planet, working together, to solve
these problems working alone.

>> On the
>> gripping hand there becomes no right outer limit for a wildly variable
>> 802.11ac wifi queue, with speeds from one to 1.5Gbit, but I sure would
>> like a cake-mq that handled the existing queues right with less
>> memory.
>
> Something that I haven’t had time to implement yet is a dual-mode FQ, performing both flow isolation and host isolation.  That seems like a step towards what wifi needs, as well as more directly addressing the case where swarms and sensitive traffic are running on different endpoints, without Diffserv assistance.

Pretty sure that we need way more information down at the mac80211 layer.

> However, the overall design I have in mind for a wifi-aware qdisc is a little bit inside-out compared to cake.  Cake goes:
>
> shaper -> priority -> flows -> signalling -> queues
>
> …or, with host isolation added:
>
> shaper -> priority -> hosts -> flows -> signalling -> queues
>
> What wifi needs is a bit different:
>
> (hosts/aggregates?) -> (shapers/minstrel?) -> priority -> flows -> signalling -> queues

Yep.

I don't see a huge need for "shaping" on wifi. I do see a huge need -
as a starting point - an airtime fair per station queue that is
aggregation aware, and slightly higher up, something that is aware of
the overall workload on the AP. You HAVE to accept some additional
delays in queuing when you have lots of stations. some of what I was
prototyping is applicable to that. I should start pushing branches for
each idea, I guess.


> Of course, since the priority layer is buried three+ layers deep, it’s obviously not going to use hardware 802.11e support.

I don't see a use for the QoS stuff as currently structured in cake.
Too high up.

>
>  - Jonathan Morton
>



-- 
Dave Täht
Open Networking needs **Open Source Hardware**

https://plus.google.com/u/0/+EricRaymond/posts/JqxCe2pFr67

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] cake byte limits too high by 10x
  2015-05-25 19:46   ` Dave Taht
@ 2015-05-29 12:24     ` Jonathan Morton
  2015-05-29 12:36     ` Jonathan Morton
  2015-05-29 13:02     ` Jonathan Morton
  2 siblings, 0 replies; 7+ messages in thread
From: Jonathan Morton @ 2015-05-29 12:24 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake

[-- Attachment #1: Type: text/plain, Size: 1602 bytes --]

> However there are many other objectives that need to be met also -
keeping the pipe filled and utilization high for starters. The ideal buffer
length is 1 packet, all the time, no idle periods. trying to smooth out
bursts. being resistant to attacks. etc. THIS STUFF IS HARD!

It's not hard. It's impossible - as long as we only have this one bit per
RTT per flow signalling capacity. Which is why I came up with ELR.

Until we get ELR, we'll have to put up with oscillating congestion window
sizes, and the inherent trade-off between utilisation and peak queue depth
they imply. ConEx and DCTCP don't solve that either; they just allow
different amounts of downward pressure at the upper end.

Even Codel implicitly acknowledges this oscillation in its design. Someone
described it recently as a self-adjusting bang-bang controller, and that's
basically fair; it has distinct "on" and "off" phases with some hysteresis
between them, just like your refrigerator. It works well for the given
conditions, but it too is inherently incapable of providing perfectly
smooth control.

And then there's the vastly different dynamics of slow start versus
congestion avoidance. Codel with ECN and either Westwood+ or CUBIC are
actually pretty good in the congestion avoidance phase. Slow start behaves
entirely differently, but it's hard to detect the optimal point at which it
should be told to back off, without risking being too aggressive at
signalling to congestion avoidance.

An observation: 50 simultaneous slow start flows with IW10 is equivalent to
one slow start flow with IW500.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 1780 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] cake byte limits too high by 10x
  2015-05-25 19:46   ` Dave Taht
  2015-05-29 12:24     ` Jonathan Morton
@ 2015-05-29 12:36     ` Jonathan Morton
  2015-05-29 13:02     ` Jonathan Morton
  2 siblings, 0 replies; 7+ messages in thread
From: Jonathan Morton @ 2015-05-29 12:36 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake

[-- Attachment #1: Type: text/plain, Size: 319 bytes --]

> being number of flow sensitive

I still don't entirely understand what you mean by this. It's already flow
number sensitive because it operates per flow - for N saturated flows it'll
initially send N signals per interval - and it gets more so every time I
find a way to improve the flow isolation.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 393 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] cake byte limits too high by 10x
  2015-05-25 19:46   ` Dave Taht
  2015-05-29 12:24     ` Jonathan Morton
  2015-05-29 12:36     ` Jonathan Morton
@ 2015-05-29 13:02     ` Jonathan Morton
  2015-05-29 17:49       ` Dave Taht
  2 siblings, 1 reply; 7+ messages in thread
From: Jonathan Morton @ 2015-05-29 13:02 UTC (permalink / raw)
  To: Dave Taht; +Cc: cake

[-- Attachment #1: Type: text/plain, Size: 1858 bytes --]

> > What wifi needs is a bit different:
> >
> > (hosts/aggregates?) -> (shapers/minstrel?) -> priority -> flows ->
signalling -> queues

> Yep.

> I don't see a huge need for "shaping" on wifi. I do see a huge need - as
a starting point - an airtime fair per station queue that is aggregation
aware, and slightly higher up, something that is aware of the overall
workload on the AP.

Which is basically what I'm thinking of. "Airtime fair" is also known as
"proportional fair". You need to know the data rate to each station for
that.

But knowing the data rate via minstrel, and the average expected time
between transit opportunities to each station (which could probably be
calculated rather than measured), would allow inferring certain things that
I currently do using the shaper in cake, such as the appropriate size for
the aggregate, and the amount of bandwidth that we can allow priority in.
The only reason I can't reliably do that in cake with the shaper disabled
is because I can't sense the hardware link rate, which is exactly what
minstrel is in charge of.

Incidentally, I suspect that aggregates of equal temporal length are
valuable for multi station MIMO.

It's also pretty obvious that selective link layer acks need to be acted
on, and aggregates reformed for each transmit opportunity, whether a retry
or not. I'm very disappointed to hear that those things often aren't
already done, but it's probably a consequence of leaving that functionality
to the hardware vendors so far. It's even possible that the current mess of
binary blobs is a consequence of the weak support for aggregation in the
kernel.

Probably not so much raw code (which autocorrected to coffee TWICE) can be
reused from cake, not least due to simply operating at a different layer of
the stack, but I think a lot of the conceptual advances can.

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 2134 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] cake byte limits too high by 10x
  2015-05-29 13:02     ` Jonathan Morton
@ 2015-05-29 17:49       ` Dave Taht
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Taht @ 2015-05-29 17:49 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

On Fri, May 29, 2015 at 6:02 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>> > What wifi needs is a bit different:
>> >
>> > (hosts/aggregates?) -> (shapers/minstrel?) -> priority -> flows ->
>> > signalling -> queues
>
>> Yep.
>
>> I don't see a huge need for "shaping" on wifi. I do see a huge need - as a
>> starting point - an airtime fair per station queue that is aggregation
>> aware, and slightly higher up, something that is aware of the overall
>> workload on the AP.
>
> Which is basically what I'm thinking of. "Airtime fair" is also known as
> "proportional fair". You need to know the data rate to each station for
> that.
>
> But knowing the data rate via minstrel, and the average expected time
> between transit opportunities to each station (which could probably be
> calculated rather than measured), would allow inferring certain things that
> I currently do using the shaper in cake, such as the appropriate size for
> the aggregate, and the amount of bandwidth that we can allow priority in.
> The only reason I can't reliably do that in cake with the shaper disabled is
> because I can't sense the hardware link rate, which is exactly what minstrel
> is in charge of.
>
> Incidentally, I suspect that aggregates of equal temporal length are
> valuable for multi station MIMO.
>
> It's also pretty obvious that selective link layer acks need to be acted on,
> and aggregates reformed for each transmit opportunity, whether a retry or
> not. I'm very disappointed to hear that those things often aren't already
> done, but it's probably a consequence of leaving that functionality to the
> hardware vendors so far. It's even possible that the current mess of binary
> blobs is a consequence of the weak support for aggregation in the kernel.

Hardware retry was very commonly used with very little control. So far
as I know the ath9k driver is now nearly pure software retry, with
very few smarts about it, but the infrastructure is there.

Reforming aggregates (and also giving up) is something that you have,
like 10us (don't quote me, relevant spec not at hand) to do. Even in a
double buffered scenario, older hardware would have trouble doing that
much work in that much time - and of course, nobody "got" how
important it was to do aggregates better in the multi-station
environment until recently.

some hardware offloads aggregation entirely (iwl). And I've been
meaning to check into the mwl driver stack since that is a new chipset
I don't grok yet... and there are a few other new candidates.....

> Probably not so much raw code (which autocorrected to coffee TWICE) can be
> reused from cake, not least due to simply operating at a different layer of
> the stack, but I think a lot of the conceptual advances can.

while I would like cake to one day stablize, it seems to be a good
vehicle to experiment in also, at least for a while.
;). It is (to me at least, at the moment) more important that the
ideas get thrashed out - despite the pent up demand for a better
shaper elsewhere.

Still, when (if) more resources arrive for make-wifi-fast, it will
make sense to produce a wifi "ap" std qdisc that interacts with
whatever ends up in mac80211, properly to give the two tiers of
per-station fairness and per flow "breakup" on a per station basis -
(I lean not towards calling it fq, I tend to think of it as "better
packing into an aggregate") - and freezing cake.

There are other issues up and down the stack on all sides.

> - Jonathan Morton



-- 
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-05-29 17:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-24  5:14 [Cake] cake byte limits too high by 10x Dave Taht
2015-05-25  4:47 ` Jonathan Morton
2015-05-25 19:46   ` Dave Taht
2015-05-29 12:24     ` Jonathan Morton
2015-05-29 12:36     ` Jonathan Morton
2015-05-29 13:02     ` Jonathan Morton
2015-05-29 17:49       ` Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox