From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jg@freedesktop.org>
Received: from mail-31-ewr.dyndns.com (mxout-084-ewr.mailhop.org
	[216.146.33.84])
	by lists.bufferbloat.net (Postfix) with ESMTP id 7D7512E0330
	for <bloat@lists.bufferbloat.net>; Mon,  7 Mar 2011 14:02:00 -0800 (PST)
Received: from scan-31-ewr.mailhop.org (scan-31-ewr.local [10.0.141.237])
	by mail-31-ewr.dyndns.com (Postfix) with ESMTP id 65E4B6FBC82
	for <bloat@lists.bufferbloat.net>; Mon,  7 Mar 2011 22:01:59 +0000 (UTC)
X-Spam-Score: 0.0 ()
X-Mail-Handler: MailHop by DynDNS
X-Originating-IP: 76.96.62.48
Received: from qmta05.westchester.pa.mail.comcast.net
	(qmta05.westchester.pa.mail.comcast.net [76.96.62.48])
	by mail-31-ewr.dyndns.com (Postfix) with ESMTP id 1E1DB6FB35B
	for <bloat@lists.bufferbloat.net>; Mon,  7 Mar 2011 22:01:54 +0000 (UTC)
Received: from omta13.westchester.pa.mail.comcast.net ([76.96.62.52])
	by qmta05.westchester.pa.mail.comcast.net with comcast
	id GN0k1g00617dt5G55N1uPh; Mon, 07 Mar 2011 22:01:55 +0000
Received: from [192.168.1.119] ([98.229.99.32])
	by omta13.westchester.pa.mail.comcast.net with comcast
	id GN1s1g00H0hvpMe3ZN1s2U; Mon, 07 Mar 2011 22:01:53 +0000
Message-ID: <4D7555CD.1090304@freedesktop.org>
Date: Mon, 07 Mar 2011 17:01:49 -0500
From: Jim Gettys <jg@freedesktop.org>
Organization: Bell Labs
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.13) Gecko/20101208 Lightning/1.0b2 Thunderbird/3.1.7
MIME-Version: 1.0
To: Justin McCann <jneilm@gmail.com>
References: <AANLkTimpq5JtVXA-YgeA-iDJCKGuLnNX5kBw0WbtAZ6_@mail.gmail.com>	<7A1666CF-3E98-4668-A265-F89B50D23909@cisco.com>	<AANLkTimmJ1LZfee81Sww3426gpNU2jVPZ6U+TwjQMRWo@mail.gmail.com>	<24D67DE4-5637-4FFE-A375-23CF52A6BBAF@cisco.com>	<4D7523E5.3070009@freedesktop.org>
	<AANLkTi=+3qNjSFH9OZO+nn+ZJKzPf0y5_Ze6oYJOzEa_@mail.gmail.com>
In-Reply-To: <AANLkTi=+3qNjSFH9OZO+nn+ZJKzPf0y5_Ze6oYJOzEa_@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Getting current interface queue sizes
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 07 Mar 2011 22:02:00 -0000

On 03/07/2011 04:18 PM, Justin McCann wrote:
> On Mon, Mar 7, 2011 at 1:28 PM, Jim Gettys <jg@freedesktop.org
> <mailto:jg@freedesktop.org>> wrote:
>
>     Cisco is far from unique.  I found it impossible to get this
>     information from Linux.  Dunno about other operating systems.
>     It's one of the things we need to fix in general.
>
>
> So I'm not the only one. :) I'm looking to get this for Linux, and am
> willing to implement it if necessary, and was looking for the One True
> Way. I assume reporting back through netlink is the way to go.

Please do.

The lack of these stats meant I had to base my conclusions on the 
experiments I described in my blog using ethtool, and extrapolating to 
wireless (since Linux wireless drivers have not implemented the ring 
control commands in the past).  I first went looking for the simple 
instantaneous number of packets queued, and came up empty.  The hardware 
drivers of course are often running multiple queues these days. There 
went several more days of head scratching and emailing people, when a 
simple number would have answered the question so much faster.


>
>     Exactly what the right metric(s) is (are), is interesting, of
>     course. The problem with only providing instantaneous queue depth is
>     that while it tells you you are currently suffering, it won't really
>     help you detect transient bufferbloat due to web traffic, etc,
>     unless you sample at a very high rate.  I really care about those
>     frequent 100-200ms impulses I see in my traffic. So a bit of
>     additional information would be goodness.g
>
>
> My PhD research is focused on automatically diagnosing these sorts of
> hiccups on a local host. I collect a common set of statistics across the
> entire local stack every 100ms, then run a diagnosis algorithm to detect
> which parts of the stack (connections, applications, interfaces) aren't
> doing their job sending/receiving packets.

I don't mind (for diagnosis) a tool querying once every 100ms; the 
problem is transient bufferbloat  has much finer time scales than this, 
and you can easily miss these short impulses that are themselves only 
tens or a hundred milliseconds.  So instantaneous measurements by 
themselves will miss a lot.

>
> Among the research questions: What stats are necessary/sufficient for
> this kind of diagnosis, What should their semantics be, and What's the
> largest useful sample interval?

Dunno.  I suspect we should take this class of questions to the bloat 
list until it's time to write code, as we'll drown out people doing 
development here on the bloat-devel list.

>
> It turns out that when send/recv stops altogether, the queue lengths
> indicate where things are being held up, leading to this discussion. I
> have them for TCP (via web100), but since my diagnosis rules are
> generic, I'd like to get them for the interfaces as well. I don't expect
> that the Ethernet driver would stop transmitting for a few 100 ms at a
> time, but a wireless driver might have to.
>

Certainly true under some circumstances, particularly on 3g.  On 3g, 
unless you are really obnoxious to the carrier (which I don't 
recommend), when you go idle, it will take a quite a while before you 
can get airtime again.
			- Jim