From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd34.google.com (mail-io1-xd34.google.com [IPv6:2607:f8b0:4864:20::d34]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id B85543B29E for ; Sat, 15 Jun 2019 14:28:40 -0400 (EDT) Received: by mail-io1-xd34.google.com with SMTP id k20so12791578ios.10 for ; Sat, 15 Jun 2019 11:28:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=vgvpTklLRw4Ezzr7ZxwsqKshiNQ3rm02Zx2u7Ja2Ook=; b=arFOyuQEslUdnak99pIiN8YUYqTpBfAbvJUTdGwB526tbHt5lCuJK9arv2O1K/F+vs atA1VEd/2I5QdXsRgN2XU/O1n+0pwM7uTwdF6Ds7Z02aKzXzQvVp29dmLjGBH+l3W9YV YhcqYU3YAVOnaswIArilOlNXN6idm7iIzximldFS3haK2be7VysHM6fSsliQT5UHOa0u mtEsKG2orcUzmJZtUEdSH9sRJnopDtDEPdeB8lKSD2bwMTgSlFregOuoDE6H/EV3/sT8 75KNP92yhkESEz247HZ7I+ElSBwJCvHz4e4BYwW4u3rL2DD8HyJICjmNEo7THCsaLn+m hgmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=vgvpTklLRw4Ezzr7ZxwsqKshiNQ3rm02Zx2u7Ja2Ook=; b=MTvL+It4CKzAAD0dFaS+OrTKFrSbDG8jfCcjOz/kcMOMeX7iNVL1ddMdNgFMBtxLjZ oguXdkAYXp+SjzKNt56c/kXuF2XKvGdIpJm8E3pwASq6cgWqOR/dHfwGFBc5X/dISXbo fN9u5R4omfZNCi9RBfUtcJN39GhrZ9Ve6BuFL2fsZ4cbw2oYDCZ1PbFv5cocRXml8gXh msQNtDu5TxrQDfUDJshwNFKzgoVYL2TgrEh/v7owyrFTezCL7BdbthGAdU1bFYbTIkhL YRKFvBYNzOrJderBpDFknFoNL7o4b2sVxZMUam4t9C8vDpOcVrUZdnmwFfiS/5f5r31X GCXg== X-Gm-Message-State: APjAAAXbYlvR1deFXZSHtKSxfsRVzqnUhMb4fI6JG0WhKSxO9u6O4JUd HrhJ0iUjGof9Zu7YSjp0XM0yyGb2wxez9lssagvHXg== X-Google-Smtp-Source: APXvYqywUvI7oCabf5jt4fGJOwFE2QisO/AFcc2Z8ClaK6bxi97g+scVyvf2S267eut0atB6R+1MXVpEC7fsu6jjyYA= X-Received: by 2002:a5e:8b43:: with SMTP id z3mr9529423iom.287.1560623319942; Sat, 15 Jun 2019 11:28:39 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dave Taht Date: Sat, 15 Jun 2019 11:28:29 -0700 Message-ID: To: ECN-Sane Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Ecn-sane] I think a defense of fq_x and co-design of new transports might be good X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Jun 2019 18:28:40 -0000 On Sat, Jun 15, 2019 at 9:57 AM Dave Taht wrote: > > it would be a good paper to write. This is a draft of points I'd like > to cover, not an attempt at a more formal email, > I just needed to get this much out of my system, on the ecn-sane list. > > # about fq_x > > fq_x (presently fq_codel, fq_pie, sch_cake) have pretty much the same > fq algorithm. It has one new characteristic > compared to all the prior FQ ones - truly sparse flows see no queue at > all, otherwise the observed queue size is f, > where f =3D the number of queue building flows. If you have 3 full size > packets queued, you have 3f. Actually, I need a means to express and clarify this? As 3f assumes that all your queue building flows have at least 3 packets in them, which is usually not the case. > No transport currently takes advantage of > this fairly tiny difference between "no queue" and "f queue". > > We use bytes, rather than packets, also, in our calculations as that > translates to time. > > I'm perpetually throwing around a statistic like "95% of all flows > never get out of slow start", that most are sender limited, > and so on, and thus (especially if paced) get 0 delay all the time in > FQ_x, or "0 first packet + pf" for the burst of packets. > > this is an essential, fine difference in measurement that can be > tracked receiver side unique to fq_x. > > ... where all it takes with a single queue, with AQM on, is one greedy > flow, to induce L latency on all flows, which in the case of pie/codel > is > 16/5ms - with plenty of jitter until things settle down. ( I wish > there was a way to express in a variable that it has a bounded range > of some sort, a ~16ms isn't good, >16ms or 16+ms neither ) > > dualpi retains that >16ms characteristic for normal flows, and a > claimed 1ms for dualpi, which is... IMHO simply impossible in a wide > range of circumstances, but I'd just as soon try to focus on improving > FQ_x and co-designed transports in a more ideal world for a while, on > this thread. > > For purposes of exposition, let's assume that fq_x is the dominant AQM > algorithm in the world, the only one with > a proven and oft enabled, and *deterministic*, RFC3168 CE response on > overload, where a loss is assumed equivalent to a mark. > > In terms of co-designing a transport for it, a transport can then > assume that a CE mark is coming from FQ_x. Knowing that, > there are new curves that can be followed in various phases of the > evolution of a flow. > > Abstractly: > > 0 delay - we have capacity to spare, grow the window > "some delay" - we have a queue of "f", and thus a thinner setpoint observ= able. > mild jitter between a recent arrival and the rest of the burst (the > sparse flow optimization) > > # Benefits of FQ_x > > FQ_x is robust against abuse. A single flow cannot overwhelm it. Some > level of service is guaranteed for the vast > majority of flows (excepting collisions) in the number of flows configure= d. > FQ_x is also robust against different treatments of drop (bbr without > ecn) and CE (l4s) > FQ_x allows for delay based and hybrid delay based (like BBR) to "just > work", without any ecn support at all. The additional support in "x" > pushes queue lengths for drop based algorithms back to where the most > common TCPs can shift back > into classic slow start and congestion avoidance modes, instead of > being bound (as they are often today) in rwind, etc. > FQ_x is (add more) > > # Some observations regarding a CE mark > > Packet loss is a weak signal of a variety of events. > > A CE mark is a currently a strong signal you are in FQ_x - the odds > are good, this will be the event that kicks the transport out of slow > start. Now knowing you got a CE mark, gives you a chance to optimize, > knowing that your queue length is not a fifo, but relative to "f". In > BBR's case in particular, resetting the bandwidth and pacing rate to > the lowest recently observed (in the last 100 ms) "RTT - a little" is > better than the classic RFC3168 response of halving. > > One thing that bugs me about RTT based measurements is when the return > path is inflated - in FQ_x it's a decent assumption that both sides of > the path have FQ, so the ack return path is far less inflated, but in > pie/dualpi/codel it certainly can be for a variety of reasons. This is > why the rrul test exists. ack thinning does help also. the amount of > potential > jitter in the return path is enormous, and one benchmark I've not yet > seen from anyone on that side. > > moving sideways: > > I happen to like (in terms of determinism) an even stronger signal > than RFC3168, "loss and mark", where a combination of loss and marks > is even more meaningful than either, and thus the sender should back > off even harder (or, the receiver pretend it got CE in two different > RTTs). when we have queue sizes elsewhere measured in seconds, and a > colossal bufferbloat mess in general, anything that moves a link below > capacity would be great. The deterministic "loss and mark" feature was > in cake until a year or two back but I never got around much to > mucking with a transport's interpretation of it. > > # The SCE concept in addition to that > > With or without SCE, just that much, just that normal CE signal, is > enough to evolve a transport towards more sensitive > delay based signaling. It could be added to cubic, for example... > > Anyway... > > We have two public implementations of SCE under test - the cake one > uses a ramp, the fq_codel_fast one just uses > a setpoint where we have a consistently measurable queue (1ms), and > that setpoint is different > for wifi (1-2 TXOPs) > > SCE (presently) kicks in almost immediately upon building a queue. > Often, immediately! with IW10 at low bandwidths, (without initial > spreading, pacing or chirping). There is also the bulkyness of > draining the oft-large rx ring and the effects > of NAPI interrupt mitigation to deal with - which is usually around 1ms. > > Thus it is an extremely strong signal both that there is a queue, and > that fq_x is present. SCE requires support at the receiver - not the > sender - in order to work at all. The receiver can decide what to do > with it. My own first experimental preference was to kick tcp out of > slow start on receipt of any SCE mark, but afterwards in congestion > avoidance as a much more gradual signal, or even ignore it entirely. > I'm grumpy enough about IW10 to still consider that, but as the > current > sch_fq code does indeed pace the next burst, perhaps ignoring SCE on > the first few packets of a connection is useful to consider, also. > > There is plenty of work on all the congestion avoidance mode stuff > (reusing nonce sum, accecn, etc), but the key point > (for me) was signalling and thinking hard about the fact that fq_x was > present and that f governed the behavior of the queues. Knowing this, > growth and signalling patterns such as ELR, dctcp etc, can change. > > # Benefits of SCE > > * Plenty of stuff to write here that has been written elsewhere > > * Backward compatible > * gradual upgrade > * easy change to fq_x > * SCE re-enables the possibility of low priority congestion control > for background tcp flows > > > -- > > Dave T=C3=A4ht > CTO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-831-205-9740 --=20 Dave T=C3=A4ht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740