On Mon, Mar 7, 2011 at 1:28 PM, Jim Gettys wrote: > Cisco is far from unique. I found it impossible to get this information > from Linux. Dunno about other operating systems. > It's one of the things we need to fix in general. So I'm not the only one. :) I'm looking to get this for Linux, and am willing to implement it if necessary, and was looking for the One True Way. I assume reporting back through netlink is the way to go. > Exactly what the right metric(s) is (are), is interesting, of course. The > problem with only providing instantaneous queue depth is that while it tells > you you are currently suffering, it won't really help you detect transient > bufferbloat due to web traffic, etc, unless you sample at a very high rate. > I really care about those frequent 100-200ms impulses I see in my traffic. > So a bit of additional information would be goodness.g > My PhD research is focused on automatically diagnosing these sorts of hiccups on a local host. I collect a common set of statistics across the entire local stack every 100ms, then run a diagnosis algorithm to detect which parts of the stack (connections, applications, interfaces) aren't doing their job sending/receiving packets. Among the research questions: What stats are necessary/sufficient for this kind of diagnosis, What should their semantics be, and What's the largest useful sample interval? It turns out that when send/recv stops altogether, the queue lengths indicate where things are being held up, leading to this discussion. I have them for TCP (via web100), but since my diagnosis rules are generic, I'd like to get them for the interfaces as well. I don't expect that the Ethernet driver would stop transmitting for a few 100 ms at a time, but a wireless driver might have to. Justin