[Bloat] quick review and rant of "Identifying and Handling Non Queue Building Flows in a Bottleneck Link"

Greg White g.white at CableLabs.com
Mon Nov 5 23:17:38 EST 2018


Hi Dave,

Thanks for the constructive review comments on the text of the draft, including pointers to references that could be included.  I'll take those comments on in the next revision.  

As I know you are aware, the majority of bottleneck links are not linux machines (running tc and feeding packets to a continuous dedicated link), so comparing lines of kernel code as a metric for complexity isn't the most useful.  In the bottleneck links that I am concerned about (and other similar links), it is advantageous to tightly couple the queuing and scheduling mechanisms with the hardware-optimized discontinuous MAC layer, and also take into account engineering tradeoffs that are different than those that apply in a linux machine. 

The point about bittorrent and RMCAT wasn't about latency, it was about throughput.  I have come to share the view that others have expressed that it is a valuable property of the internet that congestion control algorithm developers are free to innovate.  FQ in the bottleneck link diminishes this ability in some important ways.  Some applications (cloud backup, mass uploads like the icloud photo library updates, bittorrent, etc) might not be interested in per-flow fairness with TCP, but would prefer to be less aggressive and thus take a smaller fraction of the link when there are competing TCP flows (note, I am not arguing that the *current* ledbat protocol is the right one).  Other applications (RMCAT) might in the future aim to be per-flow fair, but need slightly longer to adapt than FQ allows.

Some folks are just uncomfortable with the departure from the "end-to-end" principal with fq (i.e. a middlebox gets to determine what constitutes a "flow", then enforces that all fat "flows" get equal bandwidth at every instant).  Fq_codel is a pragmatic solution for some systems, and it gives a real undeniable benefit in common cases with current applications/mixes, but it isn't perfect.  Perhaps sch_cake is better in some cases, but it makes a lot more decisions on behalf of the user that might be ok now, but may not be in the future.  In light of all of the above, I'm open to considering other alternatives.  So, I don't see this as a non-problem, and I don't see the potential solutions as being in any way magical.

-Greg


On 11/2/18, 2:13 AM, "Dave Taht" <dave.taht at gmail.com> wrote:

    On Thu, Nov 1, 2018 at 6:25 AM Toke Høiland-Jørgensen <toke at toke.dk> wrote:
    >
    > Greg White <g.white at CableLabs.com> writes:
    >
    > > Hi Toke, thanks for the pointer to your paper, I had not seen it
    > > before.
    >
    > You're welcome :)
    >
    > > I agree that could be a candidate algorithm. It is certainly simple.
    > > It may not be the only (or perhaps even the best) solution for the
    > > dual-queue case though. I'm thinking in the context of an L4S TCP
    > > flow, which can respond quickly to "new" ECN markings and achieve link
    > > saturation with ultra low (but not zero) queuing delay. A good
    > > property for a queue protection algorithm would be that these L4S
    > > flows could be consistently placed in the NQB queue. I think that the
    > > simple approach you mentioned would result in many L4S flows being
    > > deemed Queue Building.
    >
    > Yes, I think you are right (depending on traffic mix, of course). It
    > might be possible to tweak it to work better, though. E.g., by changing
    > the threshold (moving flows to QB if they end up with more than X ms of
    > queue). This would only work if you start out all flows at NQB, with the
    > associated aggressive marking behaviour; so I'm not sure if a normal TCP
    > flow would ever manage to get to QB state before getting clobbered by
    > the NQB markings...
    >
    > > Additionally, we've observed applications that send variable sized
    > > "messages" at a fixed rate (e.g. 20 messages/second) where the message
    > > sometimes exceeds the MTU and results in two closely spaced (possibly
    > > back-to-back) packets. This is a flow that I think should be
    > > considered to be NQB, but would get flagged as QB by the simple
    > > approach. You described this case in your paper, where you state that
    > > the first Q bytes of each burst will be treated as NQB (the first
    > > packet in the case we're talking about here), but the rest will be
    > > treated as QB. Assuming that message latency is important for these
    > > sorts of applications, this is equivalent to saying that the entire
    > > burst is considered as QB. In the fq_codel case, the message latency
    > > would be something like Q(n-1)(N+1)/R (assuming no other sparse flow
    > > arrivals), something like 1.3ms using the example values in your paper
    > > (plus n=2, N=10) which may be ok. In the dual-queue case it is a
    > > bigger deal, because the remaining packets would be put at the end of
    > > the QB queue, which could have a latency of 10 or 20 ms.
    >
    > Sure, it's by no means a perfect mechanism. But it makes up for that by
    > it's simplicity, IMO. And it does work really well for *a lot* of
    > today's latency-sensitive traffic.
    >
    > (In your case of two-MTU messages, you could tune the quantum to allow
    > those; but of course you can construct examples that won't work).
    
    sch_fq has a default quantum of 2 mtu, with a initial burst of 10.
    There's all sorts of interesting work inside that to "right-size" the
    ongoing gso offloads and a major new advance over there on calculating
    rtts properly is described here:
    
    https://lwn.net/Articles/766564/
    
    ...
    
    there was at one point, a fq_pie implementation that used the rbtree
    in sch_fq to
    achieve perfect fairness.
    
    we often tune fq_codel's quantum as low as 300, at low rates.
    
    > > So, a queue protection function that provides a bit more (but still
    > > limited) allowance for a flow to have packets in queue would likely
    > > work better in the dual-queue case.
    
    For inbound shaping sch_cake defaults to 2mtu at higher rates.
    
    This kind of opens a question in that, what is a typical target
    bandwidth for l4s applications?
    
    the videoconferencing paper I dissed in my earlier rant focused only
    at 2mbits (and drew the conclusion that sfq was the best option for
    the cc algo's 300ms desired range)
    
    I have generally been focused on badwidths in the 4mbit-40gbit range.
    
    >
    > Yeah, that's tricky, especially if you want it to be very accurate in
    > its distinction; which I sort of gather that you do, right?
    
    In part, that is why I would like the language in the problem
    statement clarified. To give a concrete counterexample,
    think upon the ultimate choice of a crc32 algorithm and the parameters
    that drove that choice.
    
    If an accuracy of identifying "sparse flows" or NQB flows can be
    specified, then all sorts of algorithms can be thrown into the fray.
    AI. Bloom filters. rbtrees. regexes. cookoo++ hashes... and boring
    old fq tech. :)
    
    > -Toke
    > _______________________________________________
    > Bloat mailing list
    > Bloat at lists.bufferbloat.net
    > https://lists.bufferbloat.net/listinfo/bloat
    
    
    
    -- 
    
    Dave Täht
    CTO, TekLibre, LLC
    http://www.teklibre.com
    Tel: 1-831-205-9740
    



More information about the Bloat mailing list