From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-03-iad.dyndns.com (mxout-136-iad.mailhop.org [216.146.32.136]) by lists.bufferbloat.net (Postfix) with ESMTP id 265202E0392 for ; Tue, 15 Mar 2011 15:19:50 -0700 (PDT) Received: from scan-02-iad.mailhop.org (scan-02-iad.local [10.150.0.207]) by mail-03-iad.dyndns.com (Postfix) with ESMTP id B69578343B4 for ; Tue, 15 Mar 2011 22:19:52 +0000 (UTC) X-Spam-Score: 0.0 () X-Mail-Handler: MailHop by DynDNS X-Originating-IP: 76.74.103.46 Received: from mail.vyatta.com (mail.vyatta.com [76.74.103.46]) by mail-03-iad.dyndns.com (Postfix) with ESMTP id 0DC2B83439E for ; Tue, 15 Mar 2011 22:19:52 +0000 (UTC) Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.vyatta.com (Postfix) with ESMTP id 3222518293E9; Tue, 15 Mar 2011 15:19:49 -0700 (PDT) X-Virus-Scanned: amavisd-new at tahiti.vyatta.com Received: from mail.vyatta.com ([127.0.0.1]) by localhost (mail.vyatta.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Rf5A6TK4FIsw; Tue, 15 Mar 2011 15:19:48 -0700 (PDT) Received: from nehalam (pool-74-107-135-205.ptldor.fios.verizon.net [74.107.135.205]) by mail.vyatta.com (Postfix) with ESMTPSA id 15005182914C; Tue, 15 Mar 2011 15:19:48 -0700 (PDT) Date: Tue, 15 Mar 2011 15:19:46 -0700 From: Stephen Hemminger To: Jonathan Morton Message-ID: <20110315151946.31e86b46@nehalam> In-Reply-To: <219C7840-ED79-49EA-929D-96C5A6200401@gmail.com> References: <4D7F4121.40307@freedesktop.org> <20110315175942.GA10064@goldfish> <1300212877.2087.2155.camel@tardy> <20110315183111.GB2542@tuxdriver.com> <29B06777-CC5F-4802-8727-B04F58CDA9E3@gmail.com> <20110315205146.GF2542@tuxdriver.com> <219C7840-ED79-49EA-929D-96C5A6200401@gmail.com> Organization: Vyatta X-Mailer: Claws Mail 3.7.6 (GTK+ 2.22.0; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Random idea in reaction to all the discussion of TCP flavours - timestamps? X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Mar 2011 22:19:51 -0000 On Wed, 16 Mar 2011 00:01:41 +0200 Jonathan Morton wrote: > > On 15 Mar, 2011, at 10:51 pm, John W. Linville wrote: > > >>> If you don't throttle _both_ > >>> the _enqueue_ and the _dequeue_, then you could be keeping a nice, > >>> near-empty tx queue on the host and still have a long, bloated queue > >>> building at the device. > >> > >> Don't devices at least let you query how full their queue is? > > > > I suppose it depends on what you mean? Presumably drivers know that, > > or at least can figure it out. The accuracy of that might depend on > > the exact mechanism, how often the tx rings are replinished, etc. > > > > However, I'm not aware of any API that would let something in the > > stack (e.g. a qdisc) query the device driver for the current device > > queue depth. At least, I don't think Linux has one -- do other > > kernels/stacks provide that? > > I get the impression that eBDP is supposed to work relatively close to the device driver, rather than in the core network stack. As such it's not a qdisc, but instead manages a parameter used by a well-behaved device driver. (The number of well-behaved device drivers appears to be small at present.) > > So there's a queue in the qdisc, and there's a queue in the hardware, and eBDP tries to make the latter smaller when possible, allowing the former (which is potentially much more intelligent) to do more work. > > There is a tradeoff with wireless devices: if the buffer is bigger, more packets can be aggregated into a single timeslot and a greater packet loss rate can be hidden by local retransmission, but the latency gets bigger. So bigger buffers are required when the network is running fast, and smaller buffers when it is running slow. Packets which don't fit in the hardware buffer go to the qdisc instead. > > Meanwhile the qdisc can re-order packets (eg. SFQ) so that one packet from each of a number of different flows is presented to the device in turn. This tends to increase fairness and smoothness, and makes the delay on interactive traffic much less dependent on the queue length occupied by bulk flows. It can also detect congestion (eg. nRED, SFB) and mark packets to cause TCPs to back off. But the qdisc can only operate effectively, for both of these tasks, if the hardware buffers are as small as possible. > > In short: > > - Network-stack queues can be large as long as they are smart. > > - Hardware buffers can be dumb but should be as small as possible. > > Knowing the occupancy of the hardware buffer is useful if the size of the buffer cannot be changed, because it is then possible to simply decline to fill the buffer more than a certain amount. If you can also assume that packets are sent in order of submission, or by some other easy rule, then you can also infer the time that the oldest packet has spent there, and use it to tune the future occupancy limit even if you can't cancel the old packet. > > Cancelling old packets is potentially desirable because it allows TCPs and applications to retransmit (which they will do anyway) without fear of exacerbating a wireless congestion collapse. I do appreciate that not all hardware will support this, however, and it should be totally unnecessary for wired links. Have you looked at actual hardware interfaces. They usually are designed to be "fire and go" with little to no checking by CPU. This is intentional because of the overhead of bus and CPU access. Once packets go into the tx ring there is no choice but to send or shutdown the device. --