[Ecn-sane] I think a defense of fq_x and co-design of new transports might be good

Dave Taht dave.taht at gmail.com
Sat Jun 15 12:57:33 EDT 2019


it would be a good paper to write. This is a draft of points I'd like
to cover, not an attempt at a more formal email,
I just needed to get this much out of my system, on the ecn-sane list.

# about fq_x

fq_x (presently fq_codel, fq_pie, sch_cake) have pretty much the same
fq algorithm. It has one new characteristic
compared to all the prior FQ ones - truly sparse flows see no queue at
all, otherwise the observed queue size is f,
where f = the number of queue building flows. If you have 3 full size
packets queued, you have 3f. No transport currently takes advantage of
this fairly tiny difference between "no queue" and "f queue".

We use bytes, rather than packets, also, in our calculations as that
translates to time.

I'm perpetually throwing around a statistic like "95% of all flows
never get out of slow start", that most are sender limited,
and so on, and thus (especially if paced) get 0 delay all the time in
FQ_x, or "0 first packet + pf" for the burst of packets.

this is an essential, fine difference in measurement that can be
tracked receiver side unique to fq_x.

... where all it takes with a single queue, with AQM on, is one greedy
flow, to induce L latency on all flows, which in the case of pie/codel
is > 16/5ms - with plenty of jitter until things settle down. ( I wish
there was a way to express in a variable that it has a bounded range
of some sort, a ~16ms isn't good, >16ms or 16+ms neither )

dualpi retains that >16ms characteristic for normal flows, and a
claimed 1ms for dualpi, which is... IMHO simply impossible in a wide
range of circumstances, but I'd just as soon try to focus on improving
FQ_x and co-designed transports in a more ideal world for a while, on
this thread.

For purposes of exposition, let's assume that fq_x is the dominant AQM
algorithm in the world, the only one with
a proven and oft enabled, and *deterministic*, RFC3168 CE response on
overload, where a loss is assumed equivalent to a mark.

In terms of co-designing a transport for it, a transport can then
assume that a CE mark is coming from FQ_x. Knowing that,
there are new curves that can be followed in various phases of the
evolution of a flow.

Abstractly:

0 delay - we have capacity to spare, grow the window
"some delay" - we have a queue of "f", and thus a thinner setpoint observable.
mild jitter between a recent arrival and the rest of the burst (the
sparse flow optimization)

# Benefits of FQ_x

FQ_x is robust against abuse. A single flow cannot overwhelm it. Some
level of service is guaranteed for the vast
majority of flows (excepting collisions) in the number of flows configured.
FQ_x is also robust against different treatments of drop (bbr without
ecn) and CE (l4s)
FQ_x allows for delay based and hybrid delay based (like BBR) to "just
work", without any ecn support at all. The additional support in "x"
pushes queue lengths for drop based algorithms back to where the most
common TCPs can shift back
into classic slow start and congestion avoidance modes, instead of
being bound (as they are often today) in rwind, etc.
FQ_x is (add more)

# Some observations regarding a CE mark

Packet loss is a weak signal of a variety of events.

A CE mark is a currently a strong signal you are in FQ_x - the odds
are good, this will be the event that kicks the transport out of slow
start. Now knowing you got a CE mark, gives you a chance to optimize,
knowing that your queue length is not a fifo, but relative to "f". In
BBR's case in particular, resetting the bandwidth and pacing rate to
the lowest recently observed (in the last 100 ms) "RTT - a little" is
better than the classic RFC3168 response of halving.

One thing that bugs me about RTT based measurements is when the return
path is inflated - in FQ_x it's a decent assumption that both sides of
the path have FQ, so the ack return path is far less inflated, but in
pie/dualpi/codel it certainly can be for a variety of reasons. This is
why the rrul test exists. ack thinning does help also. the amount of
potential
jitter in the return path is enormous, and one benchmark I've not yet
seen from anyone on that side.

moving sideways:

I happen to like (in terms of determinism) an even stronger signal
than RFC3168, "loss and mark", where a combination of loss and marks
is even more meaningful than either, and thus the sender should back
off even harder (or, the receiver pretend it got CE in two different
RTTs). when we have queue sizes elsewhere measured in seconds, and a
colossal bufferbloat mess in general, anything that moves a link below
capacity would be great. The deterministic "loss and mark" feature was
in cake until a year or two back but I never got around much to
mucking with a transport's interpretation of it.

# The SCE concept in addition to that

With or without SCE, just that much, just that normal CE signal, is
enough to evolve a transport towards more sensitive
delay based signaling. It could be added to cubic, for example...

Anyway...

We have two public  implementations of SCE under test - the cake one
uses a ramp, the fq_codel_fast one just uses
a setpoint where we have a consistently measurable queue (1ms), and
that setpoint is different
for wifi (1-2 TXOPs)

SCE (presently) kicks in almost immediately upon building a queue.
Often, immediately! with IW10 at low bandwidths, (without initial
spreading, pacing or chirping). There is also the bulkyness of
draining the oft-large rx ring and the effects
of NAPI interrupt mitigation to deal with - which is usually around 1ms.

Thus it is an extremely strong signal both that there is a queue, and
that fq_x is present. SCE requires support at the receiver - not the
sender - in order to work at all. The receiver can decide what to do
with it. My own first experimental preference was to kick tcp out of
slow start on receipt of any SCE mark, but afterwards in congestion
avoidance as a much more gradual signal, or even ignore it entirely.
I'm grumpy enough about IW10 to still consider that, but as the
current
sch_fq code does indeed pace the next burst, perhaps ignoring SCE on
the first few packets of a connection is useful to consider, also.

There is plenty of work on all the congestion avoidance mode stuff
(reusing nonce sum, accecn, etc), but the key point
(for me) was signalling and thinking hard about the fact that fq_x was
present and that f governed the behavior of the queues. Knowing this,
growth and signalling patterns such as ELR, dctcp etc, can change.

# Benefits of SCE

* Plenty of stuff to write here that has been written elsewhere

* Backward compatible
* gradual upgrade
* easy change to fq_x
* SCE re-enables the possibility of low priority congestion control
for background tcp flows


-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740


More information about the Ecn-sane mailing list