From: Dave Taht <dave.taht@gmail.com>
To: ECN-Sane <ecn-sane@lists.bufferbloat.net>
Subject: Re: [Ecn-sane] I think a defense of fq_x and co-design of new transports might be good
Date: Sat, 15 Jun 2019 11:28:29 -0700 [thread overview]
Message-ID: <CAA93jw4f9uuE4NntoRq5wC+Qnd+==_2d-4u7RoVSUuktsiyLfg@mail.gmail.com> (raw)
In-Reply-To: <CAA93jw6ZXnWStsr=CkVyP4V5Rvf8bnjTR91BPrU3Tr4D+M=ELg@mail.gmail.com>
On Sat, Jun 15, 2019 at 9:57 AM Dave Taht <dave.taht@gmail.com> wrote:
>
> it would be a good paper to write. This is a draft of points I'd like
> to cover, not an attempt at a more formal email,
> I just needed to get this much out of my system, on the ecn-sane list.
>
> # about fq_x
>
> fq_x (presently fq_codel, fq_pie, sch_cake) have pretty much the same
> fq algorithm. It has one new characteristic
> compared to all the prior FQ ones - truly sparse flows see no queue at
> all, otherwise the observed queue size is f,
> where f = the number of queue building flows. If you have 3 full size
> packets queued, you have 3f.
Actually, I need a means to express and clarify this? As 3f assumes
that all your
queue building flows have at least 3 packets in them, which is usually
not the case.
> No transport currently takes advantage of
> this fairly tiny difference between "no queue" and "f queue".
>
> We use bytes, rather than packets, also, in our calculations as that
> translates to time.
>
> I'm perpetually throwing around a statistic like "95% of all flows
> never get out of slow start", that most are sender limited,
> and so on, and thus (especially if paced) get 0 delay all the time in
> FQ_x, or "0 first packet + pf" for the burst of packets.
>
> this is an essential, fine difference in measurement that can be
> tracked receiver side unique to fq_x.
>
> ... where all it takes with a single queue, with AQM on, is one greedy
> flow, to induce L latency on all flows, which in the case of pie/codel
> is > 16/5ms - with plenty of jitter until things settle down. ( I wish
> there was a way to express in a variable that it has a bounded range
> of some sort, a ~16ms isn't good, >16ms or 16+ms neither )
>
> dualpi retains that >16ms characteristic for normal flows, and a
> claimed 1ms for dualpi, which is... IMHO simply impossible in a wide
> range of circumstances, but I'd just as soon try to focus on improving
> FQ_x and co-designed transports in a more ideal world for a while, on
> this thread.
>
> For purposes of exposition, let's assume that fq_x is the dominant AQM
> algorithm in the world, the only one with
> a proven and oft enabled, and *deterministic*, RFC3168 CE response on
> overload, where a loss is assumed equivalent to a mark.
>
> In terms of co-designing a transport for it, a transport can then
> assume that a CE mark is coming from FQ_x. Knowing that,
> there are new curves that can be followed in various phases of the
> evolution of a flow.
>
> Abstractly:
>
> 0 delay - we have capacity to spare, grow the window
> "some delay" - we have a queue of "f", and thus a thinner setpoint observable.
> mild jitter between a recent arrival and the rest of the burst (the
> sparse flow optimization)
>
> # Benefits of FQ_x
>
> FQ_x is robust against abuse. A single flow cannot overwhelm it. Some
> level of service is guaranteed for the vast
> majority of flows (excepting collisions) in the number of flows configured.
> FQ_x is also robust against different treatments of drop (bbr without
> ecn) and CE (l4s)
> FQ_x allows for delay based and hybrid delay based (like BBR) to "just
> work", without any ecn support at all. The additional support in "x"
> pushes queue lengths for drop based algorithms back to where the most
> common TCPs can shift back
> into classic slow start and congestion avoidance modes, instead of
> being bound (as they are often today) in rwind, etc.
> FQ_x is (add more)
>
> # Some observations regarding a CE mark
>
> Packet loss is a weak signal of a variety of events.
>
> A CE mark is a currently a strong signal you are in FQ_x - the odds
> are good, this will be the event that kicks the transport out of slow
> start. Now knowing you got a CE mark, gives you a chance to optimize,
> knowing that your queue length is not a fifo, but relative to "f". In
> BBR's case in particular, resetting the bandwidth and pacing rate to
> the lowest recently observed (in the last 100 ms) "RTT - a little" is
> better than the classic RFC3168 response of halving.
>
> One thing that bugs me about RTT based measurements is when the return
> path is inflated - in FQ_x it's a decent assumption that both sides of
> the path have FQ, so the ack return path is far less inflated, but in
> pie/dualpi/codel it certainly can be for a variety of reasons. This is
> why the rrul test exists. ack thinning does help also. the amount of
> potential
> jitter in the return path is enormous, and one benchmark I've not yet
> seen from anyone on that side.
>
> moving sideways:
>
> I happen to like (in terms of determinism) an even stronger signal
> than RFC3168, "loss and mark", where a combination of loss and marks
> is even more meaningful than either, and thus the sender should back
> off even harder (or, the receiver pretend it got CE in two different
> RTTs). when we have queue sizes elsewhere measured in seconds, and a
> colossal bufferbloat mess in general, anything that moves a link below
> capacity would be great. The deterministic "loss and mark" feature was
> in cake until a year or two back but I never got around much to
> mucking with a transport's interpretation of it.
>
> # The SCE concept in addition to that
>
> With or without SCE, just that much, just that normal CE signal, is
> enough to evolve a transport towards more sensitive
> delay based signaling. It could be added to cubic, for example...
>
> Anyway...
>
> We have two public implementations of SCE under test - the cake one
> uses a ramp, the fq_codel_fast one just uses
> a setpoint where we have a consistently measurable queue (1ms), and
> that setpoint is different
> for wifi (1-2 TXOPs)
>
> SCE (presently) kicks in almost immediately upon building a queue.
> Often, immediately! with IW10 at low bandwidths, (without initial
> spreading, pacing or chirping). There is also the bulkyness of
> draining the oft-large rx ring and the effects
> of NAPI interrupt mitigation to deal with - which is usually around 1ms.
>
> Thus it is an extremely strong signal both that there is a queue, and
> that fq_x is present. SCE requires support at the receiver - not the
> sender - in order to work at all. The receiver can decide what to do
> with it. My own first experimental preference was to kick tcp out of
> slow start on receipt of any SCE mark, but afterwards in congestion
> avoidance as a much more gradual signal, or even ignore it entirely.
> I'm grumpy enough about IW10 to still consider that, but as the
> current
> sch_fq code does indeed pace the next burst, perhaps ignoring SCE on
> the first few packets of a connection is useful to consider, also.
>
> There is plenty of work on all the congestion avoidance mode stuff
> (reusing nonce sum, accecn, etc), but the key point
> (for me) was signalling and thinking hard about the fact that fq_x was
> present and that f governed the behavior of the queues. Knowing this,
> growth and signalling patterns such as ELR, dctcp etc, can change.
>
> # Benefits of SCE
>
> * Plenty of stuff to write here that has been written elsewhere
>
> * Backward compatible
> * gradual upgrade
> * easy change to fq_x
> * SCE re-enables the possibility of low priority congestion control
> for background tcp flows
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
next prev parent reply other threads:[~2019-06-15 18:28 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-15 16:57 Dave Taht
2019-06-15 18:28 ` Dave Taht [this message]
2019-06-15 20:32 ` David P. Reed
2019-06-18 4:32 ` Dave Taht
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/ecn-sane.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAA93jw4f9uuE4NntoRq5wC+Qnd+==_2d-4u7RoVSUuktsiyLfg@mail.gmail.com' \
--to=dave.taht@gmail.com \
--cc=ecn-sane@lists.bufferbloat.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox