From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-31-ewr.dyndns.com (mxout-084-ewr.mailhop.org [216.146.33.84]) by lists.bufferbloat.net (Postfix) with ESMTP id 7D7512E0330 for ; Mon, 7 Mar 2011 14:02:00 -0800 (PST) Received: from scan-31-ewr.mailhop.org (scan-31-ewr.local [10.0.141.237]) by mail-31-ewr.dyndns.com (Postfix) with ESMTP id 65E4B6FBC82 for ; Mon, 7 Mar 2011 22:01:59 +0000 (UTC) X-Spam-Score: 0.0 () X-Mail-Handler: MailHop by DynDNS X-Originating-IP: 76.96.62.48 Received: from qmta05.westchester.pa.mail.comcast.net (qmta05.westchester.pa.mail.comcast.net [76.96.62.48]) by mail-31-ewr.dyndns.com (Postfix) with ESMTP id 1E1DB6FB35B for ; Mon, 7 Mar 2011 22:01:54 +0000 (UTC) Received: from omta13.westchester.pa.mail.comcast.net ([76.96.62.52]) by qmta05.westchester.pa.mail.comcast.net with comcast id GN0k1g00617dt5G55N1uPh; Mon, 07 Mar 2011 22:01:55 +0000 Received: from [192.168.1.119] ([98.229.99.32]) by omta13.westchester.pa.mail.comcast.net with comcast id GN1s1g00H0hvpMe3ZN1s2U; Mon, 07 Mar 2011 22:01:53 +0000 Message-ID: <4D7555CD.1090304@freedesktop.org> Date: Mon, 07 Mar 2011 17:01:49 -0500 From: Jim Gettys Organization: Bell Labs User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101208 Lightning/1.0b2 Thunderbird/3.1.7 MIME-Version: 1.0 To: Justin McCann References: <7A1666CF-3E98-4668-A265-F89B50D23909@cisco.com> <24D67DE4-5637-4FFE-A375-23CF52A6BBAF@cisco.com> <4D7523E5.3070009@freedesktop.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Getting current interface queue sizes X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2011 22:02:00 -0000 On 03/07/2011 04:18 PM, Justin McCann wrote: > On Mon, Mar 7, 2011 at 1:28 PM, Jim Gettys > wrote: > > Cisco is far from unique. I found it impossible to get this > information from Linux. Dunno about other operating systems. > It's one of the things we need to fix in general. > > > So I'm not the only one. :) I'm looking to get this for Linux, and am > willing to implement it if necessary, and was looking for the One True > Way. I assume reporting back through netlink is the way to go. Please do. The lack of these stats meant I had to base my conclusions on the experiments I described in my blog using ethtool, and extrapolating to wireless (since Linux wireless drivers have not implemented the ring control commands in the past). I first went looking for the simple instantaneous number of packets queued, and came up empty. The hardware drivers of course are often running multiple queues these days. There went several more days of head scratching and emailing people, when a simple number would have answered the question so much faster. > > Exactly what the right metric(s) is (are), is interesting, of > course. The problem with only providing instantaneous queue depth is > that while it tells you you are currently suffering, it won't really > help you detect transient bufferbloat due to web traffic, etc, > unless you sample at a very high rate. I really care about those > frequent 100-200ms impulses I see in my traffic. So a bit of > additional information would be goodness.g > > > My PhD research is focused on automatically diagnosing these sorts of > hiccups on a local host. I collect a common set of statistics across the > entire local stack every 100ms, then run a diagnosis algorithm to detect > which parts of the stack (connections, applications, interfaces) aren't > doing their job sending/receiving packets. I don't mind (for diagnosis) a tool querying once every 100ms; the problem is transient bufferbloat has much finer time scales than this, and you can easily miss these short impulses that are themselves only tens or a hundred milliseconds. So instantaneous measurements by themselves will miss a lot. > > Among the research questions: What stats are necessary/sufficient for > this kind of diagnosis, What should their semantics be, and What's the > largest useful sample interval? Dunno. I suspect we should take this class of questions to the bloat list until it's time to write code, as we'll drown out people doing development here on the bloat-devel list. > > It turns out that when send/recv stops altogether, the queue lengths > indicate where things are being held up, leading to this discussion. I > have them for TCP (via web100), but since my diagnosis rules are > generic, I'd like to get them for the interfaces as well. I don't expect > that the Ethernet driver would stop transmitting for a few 100 ms at a > time, but a wireless driver might have to. > Certainly true under some circumstances, particularly on 3g. On 3g, unless you are really obnoxious to the carrier (which I don't recommend), when you go idle, it will take a quite a while before you can get airtime again. - Jim