[Bloat] lwn.net's tcp small queues vs wifi aggregation solved

Mon Jun 25 19:54:18 EDT 2018

On Mon, Jun 25, 2018 at 6:38 AM Toke Høiland-Jørgensen <toke at toke.dk> wrote:

> Michael Richardson <mcr at sandelman.ca> writes:
>
> > Jonathan Morton <chromatix99 at gmail.com> wrote:
> >     >>> I would instead frame the problem as "how can we get hardware to
> >     >>> incorporate extra packets, which arrive between the request and
> grant
> >     >>> phases of the MAC, into the same TXOP?"  Then we no longer need
> to
> >     >>> think probabilistically, or induce unnecessary delay in the case
> that
> >     >>> no further packets arrive.
> >     >>
> >     >> I've never looked at the ring/buffer/descriptor structure of the
> ath9k, but
> >     >> with most ethernet devices, they would just continue reading
> descriptors
> >     >> until it was empty.   Is there some reason that something similar
> can not
> >     >> occur?
> >     >>
> >     >> Or is the problem at a higher level?
> >     >> Or is that we don't want to enqueue packets so early, because
> it's a source
> >     >> of bloat?
> >
> >     > The question is of when the aggregate frame is constructed and
> >     > "frozen", using only the packets in the queue at that instant.
> When
> >     > the MAC grant occurs, transmission must begin immediately, so most
> >     > hardware prepares the frame in advance of that moment - but how
> far in
> >     > advance?
> >
> > Oh, I understand now.  The aggregate frame has to be constructed, and
> it's
> > this frame that is actually in the xmit queue.  I'm guessing that it's
> in the
> > hardware, because if it was in the driver, then we could perhaps do
> > something?
>
> No, it's in the driver for ath9k. So it would be possible to delay it
> slightly to try to build a larger one. The timing constraints are too
> tight to do it reactively when the request is granted, though; so
> delaying would result in idleness if there are no other flows to queue
> before then...
>
> Even for devices that build aggregates in firmware or hardware (as all
> AC chipsets do), it might be possible to throttle the queues at higher
> levels to try to get better batching. It's just not obvious that there's
> an algorithm that can do this in a way that will "do no harm" for other
> types of traffic, for instance...
>
>
> 

Isn't this sort of delay a natural consequence of a busy channel?

What matters is not conserving txops *all the time*, but only when the
channel is busy and there aren't more txops available....

So when you are trying to transmit on a busy channel, that contention time
will naturally increase, since you won't
be able to get a transmit opportunity immediately.  So you should queue up
more packets into an aggregate in that case.

We only care about conserving txops when they are scarce, not when they are
abundant.

This principle is why a window system as crazy as X11 is competitive: it
naturally becomes more efficient in the
face of load (more and more requests batch up and are handled at maximum
efficiency, so the system is at maximum
efficiency at full load.

Or am I missing something here?

Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20180625/c39643a8/attachment.html>