[Bloat] Thoughts on Stochastic Fair Blue

Thu Mar 24 05:40:27 PDT 2011

On 24 Mar, 2011, at 3:03 am, Juliusz Chroboczek wrote:

> (I'm the original author of sch_sfb.)
> 
>> Having read some more documents and code, I have some extra insight into
>> SFB that might prove helpful.  Note that I haven't actually tried it
>> yet, but it looks good anyway.  In control-systems parlance, this is
>> effectively a multichannel I-type controller, where RED is
>> a single-channel P-type controller.
> 
> Methinks that it would be worthwile to implement plain BLUE, in order to
> see how it compares.  (Of course, once Jim comes down from Mount Sinai
> and hands us RED-Lite, it might also be worth thinking about SFRed.)

I'd be interested to see if you can make a BLUE implementation which doesn't throw a wobbler with lossy child qdiscs.  Because there's only one queue, you should be able to query the child's queue length instead of maintaining it internally.

I'd *also* be interested in an SFB implementation which also has the packet-reordering characteristics of SFQ built-in, so that applying child qdiscs to it would be unnecessary.  I'm just about to try putting this combination together manually on a live network.

Finally, it might also be interesting and useful to add bare-bones ECN support to the existing "dumb" qdiscs, such as SFQ and the FIFO family.  Simply start marking (and dropping non-supporting flows) when the queue is more than half full.

>> My first thought after reading just the paper was that unconditionally
>> dropping the packets which increase the marking probability was suspect.
>> It should be quite possible to manage a queue using just ECN, without
>> any packet loss, in simple cases such as a single bulk TCP flow.  Thus
>> I am pleased to see that the SFB code in the debloat tree has separate
>> thresholds for increasing the marking rate and tail-dropping.  They are
>> fairly close together, but they are at least distinct.
> 
> I hesitated for a long time before doing that, and would dearly like to
> see some conclusive experimental data that this is a good idea.  The
> trouble is -- when the drop rate is too low, we risk receiving a burst
> of packets from a traditional TCP sender.  Having the drop threshold
> larger than the increase threshold will get such bursts into our
> buffer.  I'm not going to explain on this particular list why such
> bursts are ungood ;-)

Actually, we *do* need to support momentary bursts of packets, although with ECN we should expect these excursions to be smaller and less frequent than without it.  The primary cause of a large packet burst is presumably from packet loss recovery, although some broken TCPs can produce them with no provocation.

At the bare minimum, you need to support ECN-marking the first triggering packet rather than dropping it.  The goal here is to have functioning congestion control without packet loss (a concept which should theoretically please the Cisco crowd).  With BLUE as described in the paper, a packet would always be dropped before ECN marking started, and that is what I was concerned about.  With even a small extra buffer on top, the TCP has some chance to back off before loss occurs.

With packet reordering like SFQ, the effects of bursts of packets on a single flow are (mostly) isolated to that flow.  I think it's better to accommodate them than to squash them, especially as dropping packets will lead to more bursts as the sending TCP tries to compensate and recover.

> The other, somewhat unrelated, issue you should be aware of is that
> ECN marking has some issues in highly congested networks [1]; this is
> the reason why sch_sfb will start dropping after the mark probability
> has gone above 1/2.

I haven't had time to read the paper thoroughly, but I don't argue with this - if the marking probability goes above 1/2 then you probably have an unresponsive flow anyway.  I can't imagine any sane TCP responding so aggressively to the equivalent of a 50% packet loss.

>> the length of the queue - which does not appear to be self-tuned by
>> the flow rate.  However, the default values appear to be sensible.
> 
> Please clarify.

The consensus seems to be that queue length should depend on bandwidth - if we assume that link latency is negligible, then the RTT is usually dominated by the general Internet, assumed constant at 100ms.  OTOH, there is another school of thought which says that queue length must *also* depend on the number of flows, with a greater number of flows causing a shortening in optimum queue length (because the bandwidth and thus burst size from an individual flow is smaller).

But tuning the queue length might not actually be necessary, provided the qdisc is sufficiently sophisticated in other ways.  We shall see.

>> The major concern with this arrangement is the incompatibility with
>> qdiscs that can drop packets internally, since this is not necessarily
>> obvious to end-user admins.
> 
> Agreed.  More generally, Linux' qdisc setup is error-prone, and
> certainly beyond the abilities of the people we're targeting; we need to
> get a bunch of reasonable defaults into distributions.  (Please start
> with OpenWRT, whose qos-scripts package[2] is used by a fair number of
> people.)

Something better than pfifo_fast is definitely warranted by default, except on the tiniest embedded devices which cannot cope with the memory requirements.  But those are always a corner case.

>> I also thought of a different way to implement the hash rotation.
>> Instead of shadowing the entire set of buckets, simply replace the hash
>> on one row at a time.  This requires that the next-to-minimum values for
>> q_len and p_mark are used, rather than the strict minima.  It is still
>> necessary to calculate two hash values for each packet, but the memory
>> requirements are reduced at the expense of effectively removing one row
>> from the Bloom filter.
> 
> Interesting idea.

 - Jonathan