[Bloat] quick review and rant of "Identifying and Handling Non Queue Building Flows in a Bottleneck Link"

Thu Nov 1 15:13:19 EDT 2018

On Thu, Nov 1, 2018 at 6:25 AM Toke Høiland-Jørgensen <toke at toke.dk> wrote:
>
> Greg White <g.white at CableLabs.com> writes:
>
> > Hi Toke, thanks for the pointer to your paper, I had not seen it
> > before.
>
> You're welcome :)
>
> > I agree that could be a candidate algorithm. It is certainly simple.
> > It may not be the only (or perhaps even the best) solution for the
> > dual-queue case though. I'm thinking in the context of an L4S TCP
> > flow, which can respond quickly to "new" ECN markings and achieve link
> > saturation with ultra low (but not zero) queuing delay. A good
> > property for a queue protection algorithm would be that these L4S
> > flows could be consistently placed in the NQB queue. I think that the
> > simple approach you mentioned would result in many L4S flows being
> > deemed Queue Building.
>
> Yes, I think you are right (depending on traffic mix, of course). It
> might be possible to tweak it to work better, though. E.g., by changing
> the threshold (moving flows to QB if they end up with more than X ms of
> queue). This would only work if you start out all flows at NQB, with the
> associated aggressive marking behaviour; so I'm not sure if a normal TCP
> flow would ever manage to get to QB state before getting clobbered by
> the NQB markings...
>
> > Additionally, we've observed applications that send variable sized
> > "messages" at a fixed rate (e.g. 20 messages/second) where the message
> > sometimes exceeds the MTU and results in two closely spaced (possibly
> > back-to-back) packets. This is a flow that I think should be
> > considered to be NQB, but would get flagged as QB by the simple
> > approach. You described this case in your paper, where you state that
> > the first Q bytes of each burst will be treated as NQB (the first
> > packet in the case we're talking about here), but the rest will be
> > treated as QB. Assuming that message latency is important for these
> > sorts of applications, this is equivalent to saying that the entire
> > burst is considered as QB. In the fq_codel case, the message latency
> > would be something like Q(n-1)(N+1)/R (assuming no other sparse flow
> > arrivals), something like 1.3ms using the example values in your paper
> > (plus n=2, N=10) which may be ok. In the dual-queue case it is a
> > bigger deal, because the remaining packets would be put at the end of
> > the QB queue, which could have a latency of 10 or 20 ms.
>
> Sure, it's by no means a perfect mechanism. But it makes up for that by
> it's simplicity, IMO. And it does work really well for *a lot* of
> today's latency-sensitive traffic.
>
> (In your case of two-MTU messages, you could tune the quantum to allow
> those; but of course you can construct examples that won't work).

sch_fq has a default quantum of 2 mtu, with a initial burst of 10.
There's all sorts of interesting work inside that to "right-size" the
ongoing gso offloads and a major new advance over there on calculating
rtts properly is described here:

https://lwn.net/Articles/766564/

...

there was at one point, a fq_pie implementation that used the rbtree
in sch_fq to
achieve perfect fairness.

we often tune fq_codel's quantum as low as 300, at low rates.

> > So, a queue protection function that provides a bit more (but still
> > limited) allowance for a flow to have packets in queue would likely
> > work better in the dual-queue case.

For inbound shaping sch_cake defaults to 2mtu at higher rates.

This kind of opens a question in that, what is a typical target
bandwidth for l4s applications?

the videoconferencing paper I dissed in my earlier rant focused only
at 2mbits (and drew the conclusion that sfq was the best option for
the cc algo's 300ms desired range)

I have generally been focused on badwidths in the 4mbit-40gbit range.

>
> Yeah, that's tricky, especially if you want it to be very accurate in
> its distinction; which I sort of gather that you do, right?

In part, that is why I would like the language in the problem
statement clarified. To give a concrete counterexample,
think upon the ultimate choice of a crc32 algorithm and the parameters
that drove that choice.

If an accuracy of identifying "sparse flows" or NQB flows can be
specified, then all sorts of algorithms can be thrown into the fray.
AI. Bloom filters. rbtrees. regexes. cookoo++ hashes... and boring
old fq tech. :)

> -Toke
> _______________________________________________
> Bloat mailing list
> Bloat at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740