[Bloat] Random idea in reaction to all the discussion of TCP flavours - timestamps?

Tue Mar 15 18:01:41 EDT 2011

On 15 Mar, 2011, at 10:51 pm, John W. Linville wrote:

>>> If you don't throttle _both_
>>> the _enqueue_ and the _dequeue_, then you could be keeping a nice,
>>> near-empty tx queue on the host and still have a long, bloated queue
>>> building at the device.
>> 
>> Don't devices at least let you query how full their queue is?
> 
> I suppose it depends on what you mean?  Presumably drivers know that,
> or at least can figure it out.  The accuracy of that might depend on
> the exact mechanism, how often the tx rings are replinished, etc.
> 
> However, I'm not aware of any API that would let something in the
> stack (e.g. a qdisc) query the device driver for the current device
> queue depth.  At least, I don't think Linux has one -- do other
> kernels/stacks provide that?

I get the impression that eBDP is supposed to work relatively close to the device driver, rather than in the core network stack.  As such it's not a qdisc, but instead manages a parameter used by a well-behaved device driver.  (The number of well-behaved device drivers appears to be small at present.)

So there's a queue in the qdisc, and there's a queue in the hardware, and eBDP tries to make the latter smaller when possible, allowing the former (which is potentially much more intelligent) to do more work.

There is a tradeoff with wireless devices: if the buffer is bigger, more packets can be aggregated into a single timeslot and a greater packet loss rate can be hidden by local retransmission, but the latency gets bigger.  So bigger buffers are required when the network is running fast, and smaller buffers when it is running slow.  Packets which don't fit in the hardware buffer go to the qdisc instead.

Meanwhile the qdisc can re-order packets (eg. SFQ) so that one packet from each of a number of different flows is presented to the device in turn.  This tends to increase fairness and smoothness, and makes the delay on interactive traffic much less dependent on the queue length occupied by bulk flows.  It can also detect congestion (eg. nRED, SFB) and mark packets to cause TCPs to back off.  But the qdisc can only operate effectively, for both of these tasks, if the hardware buffers are as small as possible.

In short:

 - Network-stack queues can be large as long as they are smart.

 - Hardware buffers can be dumb but should be as small as possible.

Knowing the occupancy of the hardware buffer is useful if the size of the buffer cannot be changed, because it is then possible to simply decline to fill the buffer more than a certain amount.  If you can also assume that packets are sent in order of submission, or by some other easy rule, then you can also infer the time that the oldest packet has spent there, and use it to tune the future occupancy limit even if you can't cancel the old packet.

Cancelling old packets is potentially desirable because it allows TCPs and applications to retransmit (which they will do anyway) without fear of exacerbating a wireless congestion collapse.  I do appreciate that not all hardware will support this, however, and it should be totally unnecessary for wired links.

 - Jonathan