[Bloat] when does the CoDel part of fq_codel help in the real world?

Thu Nov 29 03:13:43 EST 2018

Hi Dave,

Am 29.11.18 um 08:33 schrieb Dave Taht:
> "Bless, Roland (TM)" <roland.bless at kit.edu> writes:
> 
>> Hi Luca,
>>
>> Am 27.11.18 um 10:24 schrieb Luca Muscariello:
>>> A congestion controlled protocol such as TCP or others, including QUIC,
>>> LEDBAT and so on
>>> need at least the BDP in the transmission queue to get full link
>>> efficiency, i.e. the queue never empties out.
>>
>> This is not true. There are congestion control algorithms
>> (e.g., TCP LoLa [1] or BBRv2) that can fully utilize the bottleneck link
>> capacity without filling the buffer to its maximum capacity. The BDP
> 
> Just to stay cynical, I would rather like the BBR and Lola folk to look
> closely at asymmetric networks, ack path delay, and lower rates than
> 1Gbit. And what the heck... wifi. :)

Yes, absolutely right from a practical point of view.
The thing is that we have to prioritize our research work
at the moment. LoLa is meant to be a conceptual study rather
than a real-world full blown, rock solid congestion control.
It came out of a research project that focuses on high speed networks,
thus we were experimenting with that. Scaling a CC across several
orders of magnitude w.r.t. to speed is a challenge. I think, Mario
also used 100Mbit/s for experiments (but they aren't in that paper)
and it still works fine. However, experimenting with LoLa in real
world environments will always be a problem if flows with
loss-based CC are actually present at the same bottleneck, because LoLa
will back-off (it will not sacrifice its low latency goal for getting
more bandwidth). However, LoLa shows that you can actually get very
close to the goal of limiting queuing delay, but achieving high
utilization _and_ fairness at the same time. BTW, there is an ns-3
implementation of LoLa available...

> BBRv1, for example, is hard coded to reduce cwnd to 4, not lower - because
> that works in the data center. Lola, so far as I know, achieves its
> tested results at 1-10Gbits. My world and much of the rest of the world,
> barely gets to a gbit, on a good day, with a tail-wind.
> 
> If either of these TCPs could be tuned to work well and not saturate
> 5Mbit links I would be a happier person. RRUL benchmarks anyone?

I think we need some students to do this...

> I did, honestly, want to run lola, (codebase was broken), and I am
> patiently waiting for BBRv2 to escape (while hoping that the googlers
> actually run some flent tests at edge bandwidths before I tear into it)

LoLa code is currently revised by Felix and I think it will converge
to a more stable state within the next few weeks.

> Personally, I'd settle for SFQ on the CMTSes, fq_codel on the home
> routers, and then let the tcp-ers decide how much delay and loss they
> can tolerate.
> 
> Another thought... I mean... can't we all just agree to make cubic
> more gentle and go fix that, and not a have a flag day? "From linux 5.0
> forward cubic shall:
> 
> Stop increasing its window at 250ms of delay greater than
> the initial RTT? 
> 
> Have it occasionally rtt probe a bit, more like BBR?

RTT probing is fine, but in order to measure RTTmin you have
to make sure that the bottleneck queue is empty. This isn't that
trivial, because all flows need to synchronize a bit in order to
achieve that. But both, BBR and LoLa, have such mechanisms.

>> rule of thumb basically stems from the older loss-based congestion
>> control variants that profit from the standing queue that they built
>> over time when they detect a loss:
>> while they back-off and stop sending, the queue keeps the bottleneck
>> output busy and you'll not see underutilization of the link. Moreover,
>> once you get good loss de-synchronization, the buffer size requirement
>> for multiple long-lived flows decreases.
>>
>>> This gives rule of thumbs to size buffers which is also very practical
>>> and thanks to flow isolation becomes very accurate.
>>
>> The positive effect of buffers is merely their role to absorb
>> short-term bursts (i.e., mismatch in arrival and departure rates)
>> instead of dropping packets. One does not need a big buffer to
>> fully utilize a link (with perfect knowledge you can keep the link
>> saturated even without a single packet waiting in the buffer).
>> Furthermore, large buffers (e.g., using the BDP rule of thumb)
>> are not useful/practical anymore at very high speed such as 100 Gbit/s:
>> memory is also quite costly at such high speeds...
>>
>> Regards,
>>  Roland
>>
>> [1] M. Hock, F. Neumeister, M. Zitterbart, R. Bless.
>> TCP LoLa: Congestion Control for Low Latencies and High Throughput.
>> Local Computer Networks (LCN), 2017 IEEE 42nd Conference on, pp.
>> 215-218, Singapore, Singapore, October 2017
>> http://doc.tm.kit.edu/2017-LCN-lola-paper-authors-copy.pdf
> 
> 
> This whole thread, although diversive... well, I'd really like everybody
> to get together and try to write a joint paper on the best stuff to do,
> worldwide, to make bufferbloat go away.

Yea, at least if everyone would use LoLa you could eliminate
bufferbloat, but a flag day is impossible and loss-based CC
will not go away so soon. However, self-inflicted queueing
delay from loss-based CCs hurts nowadays and now we know how to do
better...

>>> Which is: 
>>>
>>> 1) find a way to keep the number of backlogged flows at a reasonable value. 
>>> This largely depends on the minimum fair rate an application may need in
>>> the long term.
>>> We discussed a little bit of available mechanisms to achieve that in the
>>> literature.
>>>
>>> 2) fix the largest RTT you want to serve at full utilization and size
>>> the buffer using BDP * N_backlogged.  
>>> Or the other way round: check how much memory you can use 
>>> in the router/line card/device and for a fixed N, compute the largest
>>> RTT you can serve at full utilization. 
>>>
>>> 3) there is still some memory to dimension for sparse flows in addition
>>> to that, but this is not based on BDP. 
>>> It is just enough to compute the total utilization of sparse flows and
>>> use the same simple model Toke has used 
>>> to compute the (de)prioritization probability.
>>>
>>> This procedure would allow to size FQ_codel but also SFQ.
>>> It would be interesting to compare the two under this buffer sizing. 
>>> It would also be interesting to compare another mechanism that we have
>>> mentioned during the defense
>>> which is AFD + a sparse flow queue. Which is, BTW, already available in
>>> Cisco nexus switches for data centres.
>>>
>>> I think that the the codel part would still provide the ECN feature,
>>> that all the others cannot have.
>>> However the others, the last one especially can be implemented in
>>> silicon with reasonable cost.

Regards,
 Roland