From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <shemminger@vyatta.com>
Received: from mail-03-iad.dyndns.com (mxout-136-iad.mailhop.org
	[216.146.32.136])
	by lists.bufferbloat.net (Postfix) with ESMTP id 265202E0392
	for <bloat@lists.bufferbloat.net>; Tue, 15 Mar 2011 15:19:50 -0700 (PDT)
Received: from scan-02-iad.mailhop.org (scan-02-iad.local [10.150.0.207])
	by mail-03-iad.dyndns.com (Postfix) with ESMTP id B69578343B4
	for <bloat@lists.bufferbloat.net>; Tue, 15 Mar 2011 22:19:52 +0000 (UTC)
X-Spam-Score: 0.0 ()
X-Mail-Handler: MailHop by DynDNS
X-Originating-IP: 76.74.103.46
Received: from mail.vyatta.com (mail.vyatta.com [76.74.103.46])
	by mail-03-iad.dyndns.com (Postfix) with ESMTP id 0DC2B83439E
	for <bloat@lists.bufferbloat.net>; Tue, 15 Mar 2011 22:19:52 +0000 (UTC)
Received: from localhost (localhost.localdomain [127.0.0.1])
	by mail.vyatta.com (Postfix) with ESMTP id 3222518293E9;
	Tue, 15 Mar 2011 15:19:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at tahiti.vyatta.com
Received: from mail.vyatta.com ([127.0.0.1])
	by localhost (mail.vyatta.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id Rf5A6TK4FIsw; Tue, 15 Mar 2011 15:19:48 -0700 (PDT)
Received: from nehalam (pool-74-107-135-205.ptldor.fios.verizon.net
	[74.107.135.205])
	by mail.vyatta.com (Postfix) with ESMTPSA id 15005182914C;
	Tue, 15 Mar 2011 15:19:48 -0700 (PDT)
Date: Tue, 15 Mar 2011 15:19:46 -0700
From: Stephen Hemminger <shemminger@vyatta.com>
To: Jonathan Morton <chromatix99@gmail.com>
Message-ID: <20110315151946.31e86b46@nehalam>
In-Reply-To: <219C7840-ED79-49EA-929D-96C5A6200401@gmail.com>
References: <4D7F4121.40307@freedesktop.org>
	<AANLkTinu4tvVpNPDE51ERRYP7gdbJixS3_jeV6Vr6D+6@mail.gmail.com>
	<A53D1457-D873-4946-A19A-B68B4AC80BF5@gmail.com>
	<20110315175942.GA10064@goldfish>
	<1300212877.2087.2155.camel@tardy>
	<20110315183111.GB2542@tuxdriver.com>
	<29B06777-CC5F-4802-8727-B04F58CDA9E3@gmail.com>
	<20110315205146.GF2542@tuxdriver.com>
	<219C7840-ED79-49EA-929D-96C5A6200401@gmail.com>
Organization: Vyatta
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.22.0; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Random idea in reaction to all the discussion of TCP
 flavours - timestamps?
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 15 Mar 2011 22:19:51 -0000

On Wed, 16 Mar 2011 00:01:41 +0200
Jonathan Morton <chromatix99@gmail.com> wrote:

> 
> On 15 Mar, 2011, at 10:51 pm, John W. Linville wrote:
> 
> >>> If you don't throttle _both_
> >>> the _enqueue_ and the _dequeue_, then you could be keeping a nice,
> >>> near-empty tx queue on the host and still have a long, bloated queue
> >>> building at the device.
> >> 
> >> Don't devices at least let you query how full their queue is?
> > 
> > I suppose it depends on what you mean?  Presumably drivers know that,
> > or at least can figure it out.  The accuracy of that might depend on
> > the exact mechanism, how often the tx rings are replinished, etc.
> > 
> > However, I'm not aware of any API that would let something in the
> > stack (e.g. a qdisc) query the device driver for the current device
> > queue depth.  At least, I don't think Linux has one -- do other
> > kernels/stacks provide that?
> 
> I get the impression that eBDP is supposed to work relatively close to the device driver, rather than in the core network stack.  As such it's not a qdisc, but instead manages a parameter used by a well-behaved device driver.  (The number of well-behaved device drivers appears to be small at present.)
> 
> So there's a queue in the qdisc, and there's a queue in the hardware, and eBDP tries to make the latter smaller when possible, allowing the former (which is potentially much more intelligent) to do more work.
> 
> There is a tradeoff with wireless devices: if the buffer is bigger, more packets can be aggregated into a single timeslot and a greater packet loss rate can be hidden by local retransmission, but the latency gets bigger.  So bigger buffers are required when the network is running fast, and smaller buffers when it is running slow.  Packets which don't fit in the hardware buffer go to the qdisc instead.
> 
> Meanwhile the qdisc can re-order packets (eg. SFQ) so that one packet from each of a number of different flows is presented to the device in turn.  This tends to increase fairness and smoothness, and makes the delay on interactive traffic much less dependent on the queue length occupied by bulk flows.  It can also detect congestion (eg. nRED, SFB) and mark packets to cause TCPs to back off.  But the qdisc can only operate effectively, for both of these tasks, if the hardware buffers are as small as possible.
> 
> In short:
> 
>  - Network-stack queues can be large as long as they are smart.
> 
>  - Hardware buffers can be dumb but should be as small as possible.
> 
> Knowing the occupancy of the hardware buffer is useful if the size of the buffer cannot be changed, because it is then possible to simply decline to fill the buffer more than a certain amount.  If you can also assume that packets are sent in order of submission, or by some other easy rule, then you can also infer the time that the oldest packet has spent there, and use it to tune the future occupancy limit even if you can't cancel the old packet.
> 
> Cancelling old packets is potentially desirable because it allows TCPs and applications to retransmit (which they will do anyway) without fear of exacerbating a wireless congestion collapse.  I do appreciate that not all hardware will support this, however, and it should be totally unnecessary for wired links.

Have you looked at actual hardware interfaces. They usually are designed to
be "fire and go" with little to no checking by CPU. This is intentional because
of the overhead of bus and CPU access. Once packets go into the tx ring there
is no choice but to send or shutdown the device.


--