From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id B93563B29E for ; Mon, 25 Jun 2018 19:54:35 -0400 (EDT) Received: by mail-ed1-f43.google.com with SMTP id g12-v6so1839165edi.9 for ; Mon, 25 Jun 2018 16:54:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aGvH+xea8yrTjufamjS+lx6jQTX0P6zob2i+tYyQgDI=; b=O4i72SVfEfmEk4NiGEayeRzqerifg2YcrfLTVeiZdpoTBCCqsWBwOXHDTjv0FY2U+S KWKJsdNQU6Rr0F2FIgPP5so8/lXOKE1Xn+CF6yLzm/DbP6H90/ObJd7G32r0/FvmF+z1 3CNNf41Tq4tB6J9gOO2eD3sr99YFv0v0u6WZIFcs9o3xR4KrKXq7Cg3VYBZ2RSDTkZS0 8Ljgh7RCFrZozIJZpSj9tD0l47ysnpKETnfhL9AHBLtL4BrEU2yKSBOxXz2tlBEjITnK CXCgOn486yqJotr2ARk+3r7OGx5jtszSZqd7j1fZ6ED/Jz4mn6zcKTPKkQremmwYL2dF QkIw== X-Gm-Message-State: APt69E2czPkn5wSO3fCL5eBR0Bbej6esmldXon2gObK9Uwy7KfrEIe7H MN2pYHgwrQbMcXikXRJeEwZz0XdJveckSEm3wLtxrw== X-Google-Smtp-Source: ADUXVKJ5UNrquQcH1Zx5Ie+9yC/ItQPCCnFbi+afj9zD/NUW3F1tPd9PpAIXSoxClxdWwkhCOG5F90QdPvmbX0qYTww= X-Received: by 2002:a50:a1a7:: with SMTP id 36-v6mr13101905edk.287.1529970874759; Mon, 25 Jun 2018 16:54:34 -0700 (PDT) MIME-Version: 1.0 References: <8736xgsdcp.fsf@toke.dk> <838b212e-7a8c-6139-1306-9e60bfda926b@gmail.com> <8f80b36b-ef81-eadc-6218-350132f4d56a@pollere.com> <9dbb8dc8-bec6-8252-c063-ff0ba5fd7c1a@pollere.com> <25305.1529678986@localhost> <47EC21F5-94D2-4982-B0BE-FA1FA30E7C88@gmail.com> <18224.1529704505@localhost> <87muvjnobj.fsf@toke.dk> In-Reply-To: <87muvjnobj.fsf@toke.dk> From: Jim Gettys Date: Mon, 25 Jun 2018 19:54:18 -0400 Message-ID: To: =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= Cc: Michael Richardson , bloat Content-Type: multipart/alternative; boundary="00000000000081be88056f801a74" Subject: Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jun 2018 23:54:36 -0000 --00000000000081be88056f801a74 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Jun 25, 2018 at 6:38 AM Toke H=C3=B8iland-J=C3=B8rgensen wrote: > Michael Richardson writes: > > > Jonathan Morton wrote: > > >>> I would instead frame the problem as "how can we get hardware t= o > > >>> incorporate extra packets, which arrive between the request and > grant > > >>> phases of the MAC, into the same TXOP?" Then we no longer need > to > > >>> think probabilistically, or induce unnecessary delay in the cas= e > that > > >>> no further packets arrive. > > >> > > >> I've never looked at the ring/buffer/descriptor structure of the > ath9k, but > > >> with most ethernet devices, they would just continue reading > descriptors > > >> until it was empty. Is there some reason that something simila= r > can not > > >> occur? > > >> > > >> Or is the problem at a higher level? > > >> Or is that we don't want to enqueue packets so early, because > it's a source > > >> of bloat? > > > > > The question is of when the aggregate frame is constructed and > > > "frozen", using only the packets in the queue at that instant. > When > > > the MAC grant occurs, transmission must begin immediately, so mos= t > > > hardware prepares the frame in advance of that moment - but how > far in > > > advance? > > > > Oh, I understand now. The aggregate frame has to be constructed, and > it's > > this frame that is actually in the xmit queue. I'm guessing that it's > in the > > hardware, because if it was in the driver, then we could perhaps do > > something? > > No, it's in the driver for ath9k. So it would be possible to delay it > slightly to try to build a larger one. The timing constraints are too > tight to do it reactively when the request is granted, though; so > delaying would result in idleness if there are no other flows to queue > before then... > > Even for devices that build aggregates in firmware or hardware (as all > AC chipsets do), it might be possible to throttle the queues at higher > levels to try to get better batching. It's just not obvious that there's > an algorithm that can do this in a way that will "do no harm" for other > types of traffic, for instance... > > > =E2=80=8B =E2=80=8B =E2=80=8BIsn't this sort of delay a natural consequence of a busy channel? What matters is not conserving txops *all the time*, but only when the channel is busy and there aren't more txops available.... So when you are trying to transmit on a busy channel, that contention time will naturally increase, since you won't be able to get a transmit opportunity immediately. So you should queue up more packets into an aggregate in that case. We only care about conserving txops when they are scarce, not when they are abundant. This principle is why a window system as crazy as X11 is competitive: it naturally becomes more efficient in the face of load (more and more requests batch up and are handled at maximum efficiency, so the system is at maximum efficiency at full load. Or am I missing something here? Jim --00000000000081be88056f801a74 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Mon, Jun 25, 2018= at 6:38 AM Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> wrote:
M= ichael Richardson <mcr@sandelman.ca> writes:

> Jonathan Morton <chromatix99@gmail.com> wrote:
>=C2=A0 =C2=A0 =C2=A0>>> I would instead frame the problem as &= quot;how can we get hardware to
>=C2=A0 =C2=A0 =C2=A0>>> incorporate extra packets, which arriv= e between the request and grant
>=C2=A0 =C2=A0 =C2=A0>>> phases of the MAC, into the same TXOP?= "=C2=A0 Then we no longer need to
>=C2=A0 =C2=A0 =C2=A0>>> think probabilistically, or induce unn= ecessary delay in the case that
>=C2=A0 =C2=A0 =C2=A0>>> no further packets arrive.
>=C2=A0 =C2=A0 =C2=A0>>
>=C2=A0 =C2=A0 =C2=A0>> I've never looked at the ring/buffer/d= escriptor structure of the ath9k, but
>=C2=A0 =C2=A0 =C2=A0>> with most ethernet devices, they would jus= t continue reading descriptors
>=C2=A0 =C2=A0 =C2=A0>> until it was empty.=C2=A0 =C2=A0Is there s= ome reason that something similar can not
>=C2=A0 =C2=A0 =C2=A0>> occur?
>=C2=A0 =C2=A0 =C2=A0>>
>=C2=A0 =C2=A0 =C2=A0>> Or is the problem at a higher level?
>=C2=A0 =C2=A0 =C2=A0>> Or is that we don't want to enqueue pa= ckets so early, because it's a source
>=C2=A0 =C2=A0 =C2=A0>> of bloat?
>
>=C2=A0 =C2=A0 =C2=A0> The question is of when the aggregate frame is= constructed and
>=C2=A0 =C2=A0 =C2=A0> "frozen", using only the packets in = the queue at that instant.=C2=A0 When
>=C2=A0 =C2=A0 =C2=A0> the MAC grant occurs, transmission must begin = immediately, so most
>=C2=A0 =C2=A0 =C2=A0> hardware prepares the frame in advance of that= moment - but how far in
>=C2=A0 =C2=A0 =C2=A0> advance?
>
> Oh, I understand now.=C2=A0 The aggregate frame has to be constructed,= and it's
> this frame that is actually in the xmit queue.=C2=A0 I'm guessing = that it's in the
> hardware, because if it was in the driver, then we could perhaps do > something?

No, it's in the driver for ath9k. So it would be possible to delay it slightly to try to build a larger one. The timing constraints are too
tight to do it reactively when the request is granted, though; so
delaying would result in idleness if there are no other flows to queue
before then...

Even for devices that build aggregates in firmware or hardware (as all
AC chipsets do), it might be possible to throttle the queues at higher
levels to try to get better batching. It's just not obvious that there&= #39;s
an algorithm that can do this in a way that will "do no harm" for= other
types of traffic, for instance...


=E2=80=8B
=E2=80=8B
=E2=80=8BIsn't this sort of de= lay a natural consequence of a busy channel?

Wha= t matters is not conserving txops *all the time*, but only when the channel= is busy and there aren't more txops available....

So when you are trying to transmit on a busy channel, that contentio= n time will naturally increase, since you won't
be able to g= et a transmit opportunity immediately.=C2=A0 So you should queue up more pa= ckets into an aggregate in that case.

We only ca= re about conserving txops when they are scarce, not when they are abundant.=

This principle is why a window system as crazy = as X11 is competitive: it naturally becomes more efficient in the
face of load (more and more requests batch up and are handled at maximum = efficiency, so the system is at maximum
efficiency at full load.=

Or am I missing something here?

=
Jim

--00000000000081be88056f801a74--