Mice [was Re: QoS for system critical packets on wireless]

Wed Jun 22 15:39:48 EDT 2011

On Wed, Jun 22, 2011 at 11:17 AM, Dave Taht <dave.taht at gmail.com> wrote:
> The biggest fallout of the diffserv work I was trying was observing
> that most packets fell into one of 3 buckets:
>
> 1) System control and 'MICE' are < less than 1% of all packets. Mice
> includes a bunch of messages like ARP, NTP, UDP, and most of the icmp6
> portion of the stack, in what I'm doing currently. Mice are
> desperately needed for the network to continue to function.
>
> 2) unresponsive streams and udp (varying, but generally pretty low)
> 3) responsive tcp streams (closer to 99%)

Dave,

I want to avoid hijacking your thread, because I think the QoS issues
you're raising are important, but I think you've misused the term
'mice'. Mice are short-lived TCP flows (see
http://www.google.com/search?q=mice+elephants+tcp), not these
control/management-related low-bandwidth protocols. Obviously the
control/management protocols need priority on the network. "Mice"
really belongs in category 3), which often gets broken down into 3a)
mice and 3b) elephants.

To resurrect an ancient thread, what Kathie Nichols was referring to
is a more sinister problem: short-lived TCP flows make up a lot of the
*number of flows* in the Internet--- think of all the little JS, CSS
and image pulls a web page causes--- but not the majority of bytes.
You can make long-lived high-bandwidth flows behave without starvation
using ECN and the like, and a lot (most?) of AQM research has focused
mainly on that.

The problem is, how do you handle tons of flows which each transmit
just a few packets? Head-dropping one or more packets in a
high-bandwidth flow using SACK won't cause that flow much of a
problem. However, if your queue is full of packets from many different
flows that each consist of just a few packets, dropping *even one*
might cause that flow to hit a retransmission timeout. The overall
connection-completion latency (SYN to FIN) goes up significantly,
which is especially bad if the connections are done serially.

When you have a relatively small number of high-bandwidth flows,
dropping packets can quickly change the fill rate at the bottleneck
queue. When you have a lot of small flows, dropping packets from only
a few flows doesn't make much difference--- each flow doesn't
contribute as much to the overall sustained rate, so it can't reduce
it by much either.

I'm not really read up on all the various queueing disciplines. Maybe
when you have a large number of flows in your queue, you need to drop
more packets (from many of them) to get a corresponding drop in the
fill rate. Paying attention to IP-to-IP flows vs. TCP/UDP flows may
also make up the difference when you've got a lot of separate TCP
connections.

I think most consumer-ISPs' approach to elephant flows is to simply
rate-limit them after a period of time. So much for that rate you
thought you were paying for.

      Justin