From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp93.iad3a.emailsrvr.com (smtp93.iad3a.emailsrvr.com [173.203.187.93]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id E08D53B29D for ; Wed, 20 Apr 2022 18:21:34 -0400 (EDT) Received: from app37.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by smtp20.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 0D2CE249DE; Wed, 20 Apr 2022 18:21:34 -0400 (EDT) Received: from deepplum.com (localhost.localdomain [127.0.0.1]) by app37.wa-webapps.iad3a (Postfix) with ESMTP id D2942612E7; Wed, 20 Apr 2022 18:21:33 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) with HTTP; Wed, 20 Apr 2022 18:21:33 -0400 (EDT) X-Auth-ID: dpreed@deepplum.com Date: Wed, 20 Apr 2022 18:21:33 -0400 (EDT) From: "David P. Reed" To: "Sebastian Moeller" Cc: "Michael Welzl" , ecn-sane@lists.bufferbloat.net MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20220420182133000000_54999" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: References: <4430DD9F-2556-4D38-8BE2-6609265319AF@ifi.uio.no> <1649778681.721621839@apps.rackspace.com> <0026CF35-46DF-4C0C-8FEE-B5309246C1B7@ifi.uio.no> <08F92DA0-1D59-4E58-A289-3D35103CF78B@gmx.de> <1649955272.49298319@apps.rackspace.com> <1650400809.579413230@apps.rackspace.com> X-Client-IP: 209.6.168.128 Message-ID: <1650493293.85915194@apps.rackspace.com> X-Mailer: webmail/19.0.13-RC X-Classification-ID: 2abb0535-57f6-4152-91cb-e52f1a8ac33c-1-1 Subject: Re: [Ecn-sane] rtt-fairness question X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Apr 2022 22:21:35 -0000 ------=_20220420182133000000_54999 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0AHi Sebastian -=0A =0AActually, fq in fq_codel does achieve throughput-fa= irness on the bottleneck link, approximately, given TCP.=0AAnd I do agree t= hat throughput fairness is about all you can define locally.=0A =0AThat is,= no matter what the RTT (unloaded), dropping and ECN marking all flows equa= lly at the bottleneck link will achieve approximate throughput sharing. The= end-to-end windows of independent TCP will size themselves to the underlyi= ng RTT, as they are wont to do, and as desired if you want to get both good= utilization and minimize queueing delay across all paths in the network as= a whole. (a reasonable definition of a good operating point).=0A =0ATo do = this, each router need not know at all what the RTT of the packets flowing = through should be. The router strategy is RTT agnostic.=0A =0AMy concern wa= s focused on trying to balance RTT among all flows by decisionmaking in a r= outer watching the packets flowing by. That seems like a terrible idea, tho= ugh I suppose any metric might have some supporters out there. [snide remar= k: look at all the "diffserv control points", some person actually is a fan= of each one, though I doubt anyone knows what each one has an implementati= on technique that actually will achieve anything like what is stated in the= RFC's describing them. It's why I think that diffserv couldn't have result= ed from any process like "rough consensus and working code", but instead ca= me from the usual committee-style "standards process" that has produced the= millions of useless standards in the world standards organizations].=0A = =0AThe nice thing about fq_codel and cake, to me, is that they come close t= o achieving a pragmatic throughput-fairness, eliminate queueing delay on a = link - two important factors that allow good end-to-end protocols to be bui= lt on top of them. (A key property is reducing the likelihood of load-based= starvation of new flows, etc. as long as those flows handle drops and mark= s by reducing sending rate compatibly with TCP flows). Of course, if implem= ented badly (like refusing to drop or mark some packets based on some theor= y like "lost packets are just evil") they may not work well.=0A =0AThinking= about measuring the right things rather than the wrong things, to me, is c= rucial. Optimizing for 100% link utlilzation is an example of the wrong met= ric. It should be obvious why, but apparently it is the metric most financi= al executives in network operators want to see prioritized. With a second m= etric of "lost packets/sent packets" being optimized to 0. Imagine if roads= were required to be 100% utilized by cars at all times... Yup, I've talked= to folks at RBOCs in charge of financing (and BT executives, too) who actu= ally respond to that analogy with cars, by saying "data is nothing like car= s - you must be kidding" and then going back to saying that they want 100% = utilization and 0 dropped packets. That's what accountants do to you.=0A = =0A =0A =0AOn Wednesday, April 20, 2022 8:54am, "Sebastian Moeller" said:=0A=0A=0A=0A> Hi David,=0A> =0A> =0A> > On Apr 19, 2022, at= 22:40, David P. Reed wrote:=0A> >=0A> > Sebastian - = all your thoughts here seem reasonable.=0A> >=0A> > I would point out only = two things:=0A> >=0A> > 1) 100 ms. is a magic number for human perception. = It's basically the order=0A> of magnitude of humans' ability to respond to = unpredictable events outside the=0A> human.=0A> =0A> Yes, with this I fully= agree, "order of magnitude", the actual numerical value of=0A> 100 is for = convenience and has no real significance IMHO. Which I should have=0A> phra= sed better. Side-note such experiments typically require the subject to=0A>= create a measurable response, which will take additional time to the initi= al=0A> event detection, but that still fits within the 100ms order of magni= tude much=0A> better than a hypothetical 10ms. (for visual events at 10ms t= he frontal lobe will=0A> not even have the information available that somet= hing changed, vision is=0A> amazingly slow*)=0A> =0A> > That's why it is ma= gic. Now humans can actually perceive intervals much, much=0A> shorter (dep= ending on how we pay attention), but usually it is by comparing two=0A> eve= nts' time ordering. We can even synchronize to external, predictable events= =0A> with finer resolution (as in Jazz improv or just good chamber music pl= aying). A=0A> century of careful scientific research supports this, niot ju= st one experiment.=0A> =0A> Quite a number of experiments however are misin= terpreted (or rather interpreted=0A> without the required nuance) on the in= ternet (yes, I know shocking ;) that the=0A> internet can be factually impr= ecise).=0A> =0A> =0A> > Which is why one should take it seriously as a usef= ul target. (the fact that=0A> one can achieve it across the planet with dig= ital signalling networks makes it a=0A> desirable goal for anything interac= tive between a human and any entity, be it=0A> computer or human). If one c= an do better, of course, that's great. I like that=0A> from my home compute= r I can get lots of places in under 8 msec (15 msec RTT).=0A> >=0A> > 2) gi= ven that a particular heavily utilized link might be shared for paths=0A> w= here the light-speed-in-fiber round trip for active flows varies by an orde= r of=0A> magnitude, why does one try to make fair RTT (as opposed to all ot= her possible=0A> metrics on each flow) among flows.=0A> =0A> I think the me= asure that is equalized here is throughput per flow, it is just=0A> that if= done competently this will also alleviate the inherent disadvantage that= =0A> longer RTT flows have compared to shorter RTT flows. But then again, o= ther=0A> measures are possible as well assuming the bottleneck can get at t= hese easily.=0A> =0A> > It doesn't make any sense to me why. Going back to = human interaction times,=0A> it makes sense to me that you might want to be= unfair so that most flows get=0A> faster than 200 ms. RTT, for example, pe= nalizing those who are really close to=0A> each other anyway.=0A> > If the = RTT is already low because congestion has been controlled, you can't=0A> ma= ke it lower. Basically, the ideal queue state is < 1 packet in the bottlene= ck=0A> outbound queues, no matter what the RTT through that queue is.=0A> = =0A> Well, why RTT-fairness? My answer is similar as for why I like FQ, bec= ause=0A> equitable sharing is the one strategy that without information abo= ut the flows=0A> relative importance avoids the pitfall of starving importa= nt flows that just=0A> happen to have a long RTT or a less aggressive contr= oller... So IMHO RTT fairness=0A> does not need to be absolute but simply g= ood enough to keep all flows at making=0A> decent forward progress. The ver= y moment someone comes in knowing more about the=0A> different flows' impor= tance, more optimal capacity sharing becomes possible (like=0A> in Vint's e= xample)... in a sense neither FQ nor the "accidental" RTT-fairness it=0A> o= ffers are likely optimal but they are IMHO considerably less likely to be= =0A> pessimal than any uninformed inequitable sharing.=0A> =0A> =0A> Regard= s=0A> Sebastian=0A> =0A> =0A> *) Given that vision is essentially our long-= range sense** that internal latency=0A> typically is not an issue, since ev= ents/objects will often be far enough away that=0A> detection can afford th= at extra time=0A> =0A> **) In space and time, just look at the stars ;)=0A>= =0A> =0A> >=0A> >=0A> >=0A> > On Thursday, April 14, 2022 5:25pm, "Sebasti= an Moeller"=0A> said:=0A> >=0A> > > Just indulge me here = for a few crazy ideas ;)=0A> > >=0A> > > > On Apr 14, 2022, at 18:54, David= P. Reed=0A> wrote:=0A> > > >=0A> > > > Am I to assum= e, then, that routers need not pay any attention to=0A> RTT to=0A> > > achi= eve RTT-fairness?=0A> > >=0A> > > Part of RTT-bias seems caused by the simp= le fact that tight control=0A> loops work=0A> > > better than sloppy ones ;= )=0A> > >=0A> > > There seem to be three ways to try to remedy that to some= degree:=0A> > > 1) the daft one:=0A> > > define a reference RTT (larger th= an typically encountered) and have all=0A> TCPs=0A> > > respond as if encou= ntering that delay -> until the path RTT exceeds=0A> that=0A> > > reference= TCP things should be reasonably fair=0A> > >=0A> > > 2) the flows communic= ate with the bottleneck honestly:=0A> > > if flows would communicate their = RTT to the bottleneck the bottleneck=0A> could=0A> > > partition its resour= ces such that signaling (mark/drop) and puffer size=0A> is=0A> > > bespoke = per-flow. In theory that can work, but relies on either the RTT=0A> > > inf= ormation being non-gameably linked to the protocol's operation* or=0A> ever= ybody=0A> > > being fully veridical and honest=0A> > > *) think a protocol = that will only work if the best estimate of the RTT=0A> is=0A> > > communic= ated between the two sides continuously=0A> > >=0A> > > 3) the router being= verbose:=0A> > > If routers communicate the fill-state of their queue (glo= bal or per-flow=0A> does not=0A> > > matter all that much) flows in theory = can do a better job at not putting=0A> way too=0A> > > much data in flight = remedying the cost of drops/marks that affects high=0A> RTT flows=0A> > > m= ore than the shorter ones. (The router has little incentive to lie=0A> here= , if it=0A> > > wanted to punish a flow it would be easier to simply drop i= ts packets=0A> and be done=0A> > > with).=0A> > >=0A> > >=0A> > > IMHO 3, w= hile theoretically the least effective of the three is the only=0A> one tha= t=0A> > > has a reasonable chance of being employed... or rather is already= =0A> deployed in the=0A> > > form of ECN (with mild effects).=0A> > >=0A> >= > > How does a server or client (at the endpoint) adjust RTT so that it=0A= > is fair?=0A> > >=0A> > > See 1) above, but who in their right mind would = actually implement=0A> something like=0A> > > that (TCP Prague did that, bu= t IMHO never in earnest but just to=0A> "address" the=0A> > > L4S bullet po= int RTT-bias reduction).=0A> > >=0A> > > > Now RTT, technically, is just th= e sum of the instantaneous queue=0A> lengths in=0A> > > bytes along the pat= h and the reverse path, plus a fixed wire-level=0A> delay. And=0A> > > rout= ers along any path do not have correlated queue sizes.=0A> > > >=0A> > > > = It seems to me that RTT adjustment requires collective real-time=0A> cooper= ation=0A> > > among all-or-most future users of that path. The path is part= ially=0A> shared by many=0A> > > servers and many users, none of whom direc= tly speak to each other.=0A> > > >=0A> > > > And routers have very limited = memory compared to their=0A> throughput-RTdelay=0A> > > product. So calcula= ting the RTT using spin bits and UIDs for packets=0A> seems a bit=0A> > > m= uch to expect all routers to do.=0A> > >=0A> > > If posed like this, I gues= s the better question is, what can/should=0A> routers be=0A> > > expected t= o do here: either equitably share their queues or share queue=0A> > > inequ= itably such that throughput is equitable. From a pure router point=0A> of t= he=0A> > > view the first seems "fairest", but as fq_codel and cake show, w= ithin=0A> reason=0A> > > equitable capacity sharing is possible (so not per= fectly and not for=0A> every=0A> > > possible RTT spread).=0A> > >=0A> > > = >=0A> > > > So, what process measures the cross-interactions among all the= =0A> users of all=0A> > > the paths, and what control-loop (presumably stab= le and TCP-compatible)=0A> actually=0A> > > converges to RTT fairness IRL.= =0A> > >=0A> > > Theoretically nothing, in reality on a home link FQ+compet= ent AQM goes a=0A> long way=0A> > > in that direction.=0A> > >=0A> > >=0A> = > > >=0A> > > > Today, the basis of congestion control in the Internet is t= hat each=0A> router is=0A> > > a controller of all endpoint flows that shar= e a link, and each router is=0A> free to=0A> > > do whatever it takes to re= duce its queue length to near zero as an=0A> average on all=0A> > > timesca= les larger than about 1/10 of a second (a magic number that is=0A> directly= =0A> > > derived from measured human brain time resolution).=0A> > >=0A> > = > The typical applies, be suspicious of too round numbers.... 100ms is in= =0A> no way=0A> > > magic and also not "correct" it is however a decent des= cription of=0A> reaction times=0A> > > in a number of perceptul tasks that = can be mis-interpreted as showing=0A> things like=0A> > > the brain runs at= 10Hz or similar...=0A> > >=0A> > >=0A> > > >=0A> > > > So, for any two mac= hines separated by less than 1/10 of a=0A> light-second in=0A> > > distance= , the total queueing delay has to stabilize in about 1/10 of a=0A> second.= =0A> > > (I'm using a light-second in a fiber medium, not free-space, as th= e=0A> speed of light=0A> > > in fiber is a lot slower than the speed of lig= ht on microwaves, as Wall=0A> Street has=0A> > > recently started recoginiz= ing and investing in).=0A> > > >=0A> > > > I don't see how RTT-fairness can= be achieved by some set of bits in=0A> the IP=0A> > > header. You can't sh= orten RTT below about 2/10 of a second in that=0A> desired system=0A> > > s= tate. You can only "lengthen" RTT by delaying packets in source or=0A> endp= oint=0A> > > buffers, because it's unreasonable to manage all the routers.= =0A> > > >=0A> > > > And the endpoints that share a path can't talk to each= other and=0A> reach a=0A> > > decision in on the order of 2/10 of a second= .=0A> > > >=0A> > > > So at the very highest level, what is RTT-fairness's = objective=0A> function=0A> > > optimizing, and how can it work?=0A> > > >= =0A> > > > Can it be done without any change to routers?=0A> > >=0A> > > We= ll the goal here seems to undo the RTT-dependence of throughput so a=0A> ro= uter can=0A> > > equalize per flow throughput and thereby (from its own van= tage point)=0A> enforce RTT=0A> > > independence, within the amount of memo= ry available. And that already=0A> works today=0A> > > for all identifiable= flows, but apparently at a computational cost that=0A> larger=0A> > > rout= ers do not want to pay. But you knew all that=0A> > >=0A> > >=0A> > > >=0A>= > > >=0A> > > >=0A> > > >=0A> > > > On Tuesday, April 12, 2022 3:07pm, "Mi= chael Welzl"=0A> =0A> > > said:=0A> > > >=0A> > > >=0A>= > > >=0A> > > > On Apr 12, 2022, at 8:52 PM, Sebastian Moeller=0A> =0A> > > wrote:=0A> > > > Question: is QUIC actually using the sp= in bit as an essential part=0A> of the=0A> > > protocol?=0A> > > > The spec= says it=E2=80=99s optional:=0A> > > https://www.rfc-editor.org/rfc/rfc9000= .html#name-latency-spin-bit=0A> > > > Otherwise endpoints might just game t= his if faking their RTT at a=0A> router=0A> > > yields an advantage...=0A> = > > > This was certainly discussed in the QUIC WG. Probably perceived as=0A= > an unclear=0A> > > incentive, but I didn=E2=80=99t really follow this.=0A= > > > > Cheers,=0A> > > > Michael=0A> > > >=0A> > > > This is why pping's u= se of tcp timestamps is elegant, little=0A> incentive for=0A> > > the endpo= ints to fudge....=0A> > > >=0A> > > > Regards=0A> > > > Sebastian=0A> > > >= =0A> > > >=0A> > > > On 12 April 2022 18:00:15 CEST, Michael Welzl=0A> =0A> > > wrote:=0A> > > > Hi,=0A> > > > Who or what are you= objecting against? At least nothing that I=0A> described=0A> > > does what= you suggest.=0A> > > > BTW, just as a side point, for QUIC, routers can kn= ow the RTT today=0A> - using=0A> > > the spin bit, which was designed for t= hat specific purpose.=0A> > > > Cheers,=0A> > > > Michael=0A> > > >=0A> > >= >=0A> > > > On Apr 12, 2022, at 5:51 PM, David P. Reed=0A> =0A> > > wrote:=0A> > > > I strongly object to congestion control *in= the network* attempting=0A> to=0A> > > measure RTT (which is an end-to-end= comparative metric). Unless the=0A> current RTT is=0A> > > passed in each = packet a router cannot enforce fairness. Period.=0A> > > >=0A> > > > Today,= by packet drops and fair marking, information is passed to=0A> the sending= =0A> > > nodes (eventually) about congestion. But the router can't know RTT= =0A> today.=0A> > > >=0A> > > > The result of *requiring* RTT fairness woul= d be to put the random=0A> bottleneck=0A> > > router (chosen because it is = the slowest forwarder on a contended path)=0A> become the=0A> > > endpoint = controller.=0A> > > >=0A> > > > That's the opposite of an "end-to-end resou= rce sharing protocol".=0A> > > >=0A> > > > Now, I'm not saying it is imposs= ible - what I'm saying it is asking=0A> all=0A> > > endpoints to register w= ith an "Internet-wide" RTT real-time tracking and=0A> control=0A> > > servi= ce.=0A> > > >=0A> > > > This would be the technical equivalent of an ITU ce= ntral control=0A> point.=0A> > > >=0A> > > > So, either someone will invent= something I cannot imagine (a=0A> distributed,=0A> > > rapid-convergence a= lgortithm that rellects to *every potential user* of=0A> a shared=0A> > > r= outer along the current path the RTT's of ALL other users (and=0A> potentia= l users).=0A> > > >=0A> > > > IMHO, the wish for RTT fairness is like sayin= g that the entire=0A> solar system's=0A> > > gravitational pull should be e= qualized so that all planets and asteroids=0A> have fair=0A> > > access to = 1G gravity.=0A> > > >=0A> > > >=0A> > > > On Friday, April 8, 2022 2:03pm, = "Michael Welzl"=0A> =0A> > > said:=0A> > > >=0A> > > > = Hi,=0A> > > > FWIW, we have done some analysis of fairness and convergence = of=0A> DCTCP in:=0A> > > > Peyman Teymoori, David Hayes, Michael Welzl, Ste= in Gjessing:=0A> "Estimating an=0A> > > Additive Path Cost with Explicit Co= ngestion Notification", IEEE=0A> Transactions on=0A> > > Control of Network= Systems, 8(2), pp. 859-871, June 2021. DOI=0A> > > 10.1109/TCNS.2021.30531= 79=0A> > > > Technical report (longer version):=0A> > > >=0A> > >=0A> https= ://folk.universitetetioslo.no/michawe/research/publications/NUM-ECN_report_= 2019.pdf=0A> > > > and there=E2=80=99s also some in this paper, which first= introduced=0A> our LGC=0A> > > mechanism:=0A> > > > https://ieeexplore.iee= e.org/document/7796757=0A> > > > See the technical report on page 9, sectio= n D: a simple trick can=0A> improve=0A> > > DCTCP=E2=80=99s fairness (if th= at=E2=80=99s really the mechanism to stay=0A> with=E2=80=A6=0A> > > I=E2=80= =99m getting quite happy with the results we get with our LGC=0A> scheme :-= )=0A> > > )=0A> > > >=0A> > > > Cheers,=0A> > > > Michael=0A> > > >=0A> > >= > On Apr 8, 2022, at 6:33 PM, Dave Taht =0A> wrote:= =0A> > > > I have managed to drop most of my state regarding the state of= =0A> various=0A> > > > dctcp-like solutions. At one level it's good to have= not been=0A> keeping=0A> > > > up, washing my brain clean, as it were. For= some reason or another=0A> I=0A> > > > went back to the original paper las= t week, and have been pounding=0A> > > > through this one again:=0A> > > >= =0A> > > > Analysis of DCTCP: Stability, Convergence, and Fairness=0A> > > = >=0A> > > > "Instead, we propose subtracting =CE=B1/2 from the window size = for=0A> each=0A> > > marked ACK,=0A> > > > resulting in the following simpl= e window update equation:=0A> > > >=0A> > > > One result of which I was mos= t proud recently was of demonstrating=0A> > > > perfect rtt fairness in a r= ange of 20ms to 260ms with fq_codel=0A> > > > https://forum.mikrotik.com/vi= ewtopic.php?t=3D179307 )- and I'm=0A> pretty=0A> > > > interested in 2-260m= s, but haven't got around to it.=0A> > > >=0A> > > > Now, one early result = from the sce vs l4s testing I recall was=0A> severe=0A> > > > latecomer con= vergence problems - something like 40s to come into=0A> flow=0A> > > > bala= nce - but I can't remember what presentation, paper, or rtt=0A> that=0A> > = > > was from. ?=0A> > > >=0A> > > > Another one has been various claims tow= ards some level of rtt=0A> > > > unfairness being ok, but not the actual ra= tio, nor (going up to=0A> the=0A> > > > paper's proposal above) whether tha= t method had been tried.=0A> > > >=0A> > > > My opinion has long been that = any form of marking should look more=0A> > > > closely at the observed RTT = than any fixed rate reduction method,=0A> and=0A> > > > compensate the pace= d rate to suit. But that's presently just=0A> reduced=0A> > > > to an opini= on, not having kept up with progress on prague,=0A> dctcp-sce,=0A> > > > or= bbrv2. As one example of ignorance, are 2 packets still paced=0A> back=0A>= > > > to back? DRR++ + early marking seems to lead to one packet being=0A>= > > > consistently unmarked and the other marked.=0A> > > >=0A> > > > --= =0A> > > > I tried to build a better future, a few times:=0A> > > > https:/= /wayforward.archive.org/?site=3Dhttps%3A%2F%2Fwww.icei.org=0A> > > >=0A> > = > > Dave T=C3=A4ht CEO, TekLibre, LLC=0A> > > > ___________________________= ____________________=0A> > > > Ecn-sane mailing list=0A> > > > Ecn-sane@lis= ts.bufferbloat.net=0A> > > > https://lists.bufferbloat.net/listinfo/ecn-san= e=0A> > > >=0A> > > > --=0A> > > > Sent from my Android device with K-9 Mai= l. Please excuse my=0A> brevity.=0A> > > >=0A> > >=0A> > >=0A> =0A> ------=_20220420182133000000_54999 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi Sebastian -

=0A<= p style=3D"margin:0;padding:0;font-family: arial; font-size: 10pt; overflow= -wrap: break-word;"> 

=0A

Actually, fq in fq_co= del does achieve throughput-fairness on the bottleneck link, approximately,= given TCP.

=0A

And I do agree that throughput fairn= ess is about all you can define locally.

=0A

 <= /p>=0A

That is, no matter what the RTT (unloaded), drop= ping and ECN marking all flows equally at the bottleneck link will achieve = approximate throughput sharing. The end-to-end windows of independent TCP w= ill size themselves to the underlying RTT, as they are wont to do, and as d= esired if you want to get both good utilization and minimize queueing delay= across all paths in the network as a whole. (a reasonable definition of a = good operating point).

=0A

 

=0A

To do this, each router need not know at all what the RTT of the p= ackets flowing through should be. The router strategy is RTT agnostic.

= =0A

 

=0A

My concern was fo= cused on trying to balance RTT among all flows by decisionmaking in a route= r watching the packets flowing by. That seems like a terrible idea, though = I suppose any metric might have some supporters out there. [snide remark: l= ook at all the "diffserv control points", some person actually is a fan of = each one, though I doubt anyone knows what each one has an implementation t= echnique that actually will achieve anything like what is stated in the RFC= 's describing them. It's why I think that diffserv couldn't have resulted f= rom any process like "rough consensus and working code", but instead came f= rom the usual committee-style "standards process" that has produced the mil= lions of useless standards in the world standards organizations].

=0A

 

=0A

The nice thing about fq= _codel and cake, to me, is that they come close to achieving a pragmatic th= roughput-fairness, eliminate queueing delay on a link - two important facto= rs that allow good end-to-end protocols to be built on top of them. (A key = property is reducing the likelihood of load-based starvation of new flows, = etc. as long as those flows handle drops and marks by reducing sending rate= compatibly with TCP flows). Of course, if implemented badly (like refusing= to drop or mark some packets based on some theory like "lost packets are j= ust evil") they may not work well.

=0A

 

=0A=

Thinking about measuring the right things rather than = the wrong things, to me, is crucial. Optimizing for 100% link utlilzation i= s an example of the wrong metric. It should be obvious why, but apparently = it is the metric most financial executives in network operators want to see= prioritized. With a second metric of "lost packets/sent packets" being opt= imized to 0. Imagine if roads were required to be 100% utilized by cars at = all times... Yup, I've talked to folks at RBOCs in charge of financing (and= BT executives, too) who actually respond to that analogy with cars, by say= ing "data is nothing like cars - you must be kidding" and then going back t= o saying that they want 100% utilization and 0 dropped packets. That's what= accountants do to you.

=0A

 

=0A

 

=0A

 

=0A

On Wednesday, April 20, 2022 8:54am, "Sebastian Moeller" <moeller0@gm= x.de> said:

=0A
=0A

> Hi David,
>
>
> > On Apr 1= 9, 2022, at 22:40, David P. Reed <dpreed@deepplum.com> wrote:
&g= t; >
> > Sebastian - all your thoughts here seem reasonable.<= br />> >
> > I would point out only two things:
> = >
> > 1) 100 ms. is a magic number for human perception. It's= basically the order
> of magnitude of humans' ability to respond t= o unpredictable events outside the
> human.
>
> Ye= s, with this I fully agree, "order of magnitude", the actual numerical valu= e of
> 100 is for convenience and has no real significance IMHO. Wh= ich I should have
> phrased better. Side-note such experiments typi= cally require the subject to
> create a measurable response, which = will take additional time to the initial
> event detection, but tha= t still fits within the 100ms order of magnitude much
> better than= a hypothetical 10ms. (for visual events at 10ms the frontal lobe will
> not even have the information available that something changed, visio= n is
> amazingly slow*)
>
> > That's why it is = magic. Now humans can actually perceive intervals much, much
> shor= ter (depending on how we pay attention), but usually it is by comparing two=
> events' time ordering. We can even synchronize to external, pred= ictable events
> with finer resolution (as in Jazz improv or just g= ood chamber music playing). A
> century of careful scientific resea= rch supports this, niot just one experiment.
>
> Quite a n= umber of experiments however are misinterpreted (or rather interpreted
> without the required nuance) on the internet (yes, I know shocking ;)= that the
> internet can be factually imprecise).
>
&= gt;
> > Which is why one should take it seriously as a useful t= arget. (the fact that
> one can achieve it across the planet with d= igital signalling networks makes it a
> desirable goal for anything= interactive between a human and any entity, be it
> computer or hu= man). If one can do better, of course, that's great. I like that
> = from my home computer I can get lots of places in under 8 msec (15 msec RTT= ).
> >
> > 2) given that a particular heavily utilize= d link might be shared for paths
> where the light-speed-in-fiber r= ound trip for active flows varies by an order of
> magnitude, why d= oes one try to make fair RTT (as opposed to all other possible
> me= trics on each flow) among flows.
>
> I think the measure t= hat is equalized here is throughput per flow, it is just
> that if = done competently this will also alleviate the inherent disadvantage that> longer RTT flows have compared to shorter RTT flows. But then again= , other
> measures are possible as well assuming the bottleneck can= get at these easily.
>
> > It doesn't make any sense t= o me why. Going back to human interaction times,
> it makes sense t= o me that you might want to be unfair so that most flows get
> fast= er than 200 ms. RTT, for example, penalizing those who are really close to<= br />> each other anyway.
> > If the RTT is already low becau= se congestion has been controlled, you can't
> make it lower. Basic= ally, the ideal queue state is < 1 packet in the bottleneck
> ou= tbound queues, no matter what the RTT through that queue is.
>
> Well, why RTT-fairness? My answer is similar as for why I like FQ, b= ecause
> equitable sharing is the one strategy that without informa= tion about the flows
> relative importance avoids the pitfall of st= arving important flows that just
> happen to have a long RTT or a l= ess aggressive controller... So IMHO RTT fairness
> does not need t= o be absolute but simply good enough to keep all flows at making
> = decent forward progress. The very moment someone comes in knowing more abou= t the
> different flows' importance, more optimal capacity sharing = becomes possible (like
> in Vint's example)... in a sense neither F= Q nor the "accidental" RTT-fairness it
> offers are likely optimal = but they are IMHO considerably less likely to be
> pessimal than an= y uninformed inequitable sharing.
>
>
> Regards> Sebastian
>
>
> *) Given that vision is = essentially our long-range sense** that internal latency
> typicall= y is not an issue, since events/objects will often be far enough away that<= br />> detection can afford that extra time
>
> **) In = space and time, just look at the stars ;)
>
>
> &= gt;
> >
> >
> > On Thursday, April 14, 202= 2 5:25pm, "Sebastian Moeller"
> <moeller0@gmx.de> said:
= > >
> > > Just indulge me here for a few crazy ideas ;)=
> > >
> > > > On Apr 14, 2022, at 18:54, Da= vid P. Reed
> <dpreed@deepplum.com> wrote:
> > >= ; >
> > > > Am I to assume, then, that routers need not= pay any attention to
> RTT to
> > > achieve RTT-fair= ness?
> > >
> > > Part of RTT-bias seems caused= by the simple fact that tight control
> loops work
> > = > better than sloppy ones ;)
> > >
> > > The= re seem to be three ways to try to remedy that to some degree:
> &g= t; > 1) the daft one:
> > > define a reference RTT (larger= than typically encountered) and have all
> TCPs
> > >= ; respond as if encountering that delay -> until the path RTT exceeds> that
> > > reference TCP things should be reasonably = fair
> > >
> > > 2) the flows communicate with = the bottleneck honestly:
> > > if flows would communicate the= ir RTT to the bottleneck the bottleneck
> could
> > >= partition its resources such that signaling (mark/drop) and puffer size> is
> > > bespoke per-flow. In theory that can work, b= ut relies on either the RTT
> > > information being non-gamea= bly linked to the protocol's operation* or
> everybody
> &g= t; > being fully veridical and honest
> > > *) think a pro= tocol that will only work if the best estimate of the RTT
> is
> > > communicated between the two sides continuously
> &= gt; >
> > > 3) the router being verbose:
> > &g= t; If routers communicate the fill-state of their queue (global or per-flow=
> does not
> > > matter all that much) flows in theo= ry can do a better job at not putting
> way too
> > >= much data in flight remedying the cost of drops/marks that affects high> RTT flows
> > > more than the shorter ones. (The rout= er has little incentive to lie
> here, if it
> > > wa= nted to punish a flow it would be easier to simply drop its packets
&g= t; and be done
> > > with).
> > >
> >= ; >
> > > IMHO 3, while theoretically the least effective = of the three is the only
> one that
> > > has a reaso= nable chance of being employed... or rather is already
> deployed i= n the
> > > form of ECN (with mild effects).
> > &= gt;
> > > > How does a server or client (at the endpoint) = adjust RTT so that it
> is fair?
> > >
> >= > See 1) above, but who in their right mind would actually implement> something like
> > > that (TCP Prague did that, but I= MHO never in earnest but just to
> "address" the
> > >= ; L4S bullet point RTT-bias reduction).
> > >
> > = > > Now RTT, technically, is just the sum of the instantaneous queue<= br />> lengths in
> > > bytes along the path and the rever= se path, plus a fixed wire-level
> delay. And
> > > r= outers along any path do not have correlated queue sizes.
> > &g= t; >
> > > > It seems to me that RTT adjustment require= s collective real-time
> cooperation
> > > among all-= or-most future users of that path. The path is partially
> shared b= y many
> > > servers and many users, none of whom directly sp= eak to each other.
> > > >
> > > > And ro= uters have very limited memory compared to their
> throughput-RTdel= ay
> > > product. So calculating the RTT using spin bits and = UIDs for packets
> seems a bit
> > > much to expect a= ll routers to do.
> > >
> > > If posed like thi= s, I guess the better question is, what can/should
> routers be
> > > expected to do here: either equitably share their queues o= r share queue
> > > inequitably such that throughput is equit= able. From a pure router point
> of the
> > > view th= e first seems "fairest", but as fq_codel and cake show, within
> re= ason
> > > equitable capacity sharing is possible (so not per= fectly and not for
> every
> > > possible RTT spread)= .
> > >
> > > >
> > > > So,= what process measures the cross-interactions among all the
> users= of all
> > > the paths, and what control-loop (presumably st= able and TCP-compatible)
> actually
> > > converges t= o RTT fairness IRL.
> > >
> > > Theoretically n= othing, in reality on a home link FQ+competent AQM goes a
> long wa= y
> > > in that direction.
> > >
> >= >
> > > >
> > > > Today, the basis of= congestion control in the Internet is that each
> router is
&= gt; > > a controller of all endpoint flows that share a link, and eac= h router is
> free to
> > > do whatever it takes to r= educe its queue length to near zero as an
> average on all
>= ; > > timescales larger than about 1/10 of a second (a magic number t= hat is
> directly
> > > derived from measured human b= rain time resolution).
> > >
> > > The typical = applies, be suspicious of too round numbers.... 100ms is in
> no wa= y
> > > magic and also not "correct" it is however a decent d= escription of
> reaction times
> > > in a number of p= erceptul tasks that can be mis-interpreted as showing
> things like=
> > > the brain runs at 10Hz or similar...
> > &g= t;
> > >
> > > >
> > > > So= , for any two machines separated by less than 1/10 of a
> light-sec= ond in
> > > distance, the total queueing delay has to stabil= ize in about 1/10 of a
> second.
> > > (I'm using a l= ight-second in a fiber medium, not free-space, as the
> speed of li= ght
> > > in fiber is a lot slower than the speed of light on= microwaves, as Wall
> Street has
> > > recently star= ted recoginizing and investing in).
> > > >
> >= > > I don't see how RTT-fairness can be achieved by some set of bits= in
> the IP
> > > header. You can't shorten RTT belo= w about 2/10 of a second in that
> desired system
> > &g= t; state. You can only "lengthen" RTT by delaying packets in source or
> endpoint
> > > buffers, because it's unreasonable to ma= nage all the routers.
> > > >
> > > > And= the endpoints that share a path can't talk to each other and
> rea= ch a
> > > decision in on the order of 2/10 of a second.
> > > >
> > > > So at the very highest level,= what is RTT-fairness's objective
> function
> > > op= timizing, and how can it work?
> > > >
> > >= > Can it be done without any change to routers?
> > >
> > > Well the goal here seems to undo the RTT-dependence of thr= oughput so a
> router can
> > > equalize per flow thr= oughput and thereby (from its own vantage point)
> enforce RTT
> > > independence, within the amount of memory available. And th= at already
> works today
> > > for all identifiable f= lows, but apparently at a computational cost that
> larger
>= ; > > routers do not want to pay. But you knew all that
> >= ; >
> > >
> > > >
> > > >= ;
> > > >
> > > >
> > > >= ; On Tuesday, April 12, 2022 3:07pm, "Michael Welzl"
> <michawe@= ifi.uio.no>
> > > said:
> > > >
>= > > >
> > > >
> > > > On Apr 12= , 2022, at 8:52 PM, Sebastian Moeller
> <moeller0@gmx.de>
> > > wrote:
> > > > Question: is QUIC actually= using the spin bit as an essential part
> of the
> > &g= t; protocol?
> > > > The spec says it=E2=80=99s optional:<= br />> > > https://www.rfc-editor.org/rfc/rfc9000.html#name-latenc= y-spin-bit
> > > > Otherwise endpoints might just game thi= s if faking their RTT at a
> router
> > > yields an a= dvantage...
> > > > This was certainly discussed in the QU= IC WG. Probably perceived as
> an unclear
> > > incen= tive, but I didn=E2=80=99t really follow this.
> > > > Che= ers,
> > > > Michael
> > > >
> &g= t; > > This is why pping's use of tcp timestamps is elegant, little> incentive for
> > > the endpoints to fudge....
= > > > >
> > > > Regards
> > > &g= t; Sebastian
> > > >
> > > >
> &g= t; > > On 12 April 2022 18:00:15 CEST, Michael Welzl
> <mi= chawe@ifi.uio.no>
> > > wrote:
> > > > Hi= ,
> > > > Who or what are you objecting against? At least = nothing that I
> described
> > > does what you sugges= t.
> > > > BTW, just as a side point, for QUIC, routers ca= n know the RTT today
> - using
> > > the spin bit, wh= ich was designed for that specific purpose.
> > > > Cheers= ,
> > > > Michael
> > > >
> > = > >
> > > > On Apr 12, 2022, at 5:51 PM, David P. Re= ed
> <dpreed@deepplum.com>
> > > wrote:
&g= t; > > > I strongly object to congestion control *in the network* = attempting
> to
> > > measure RTT (which is an end-to= -end comparative metric). Unless the
> current RTT is
> >= ; > passed in each packet a router cannot enforce fairness. Period.
> > > >
> > > > Today, by packet drops and fa= ir marking, information is passed to
> the sending
> > &= gt; nodes (eventually) about congestion. But the router can't know RTT
> today.
> > > >
> > > > The result o= f *requiring* RTT fairness would be to put the random
> bottleneck<= br />> > > router (chosen because it is the slowest forwarder on a= contended path)
> become the
> > > endpoint controll= er.
> > > >
> > > > That's the opposite o= f an "end-to-end resource sharing protocol".
> > > >
= > > > > Now, I'm not saying it is impossible - what I'm saying = it is asking
> all
> > > endpoints to register with a= n "Internet-wide" RTT real-time tracking and
> control
> &g= t; > service.
> > > >
> > > > This wou= ld be the technical equivalent of an ITU central control
> point.> > > >
> > > > So, either someone will in= vent something I cannot imagine (a
> distributed,
> > &g= t; rapid-convergence algortithm that rellects to *every potential user* of<= br />> a shared
> > > router along the current path the RT= T's of ALL other users (and
> potential users).
> > >= >
> > > > IMHO, the wish for RTT fairness is like sayi= ng that the entire
> solar system's
> > > gravitation= al pull should be equalized so that all planets and asteroids
> hav= e fair
> > > access to 1G gravity.
> > > >> > > >
> > > > On Friday, April 8, 2022 2= :03pm, "Michael Welzl"
> <michawe@ifi.uio.no>
> > = > said:
> > > >
> > > > Hi,
> = > > > FWIW, we have done some analysis of fairness and convergence= of
> DCTCP in:
> > > > Peyman Teymoori, David Hay= es, Michael Welzl, Stein Gjessing:
> "Estimating an
> > = > Additive Path Cost with Explicit Congestion Notification", IEEE
&= gt; Transactions on
> > > Control of Network Systems, 8(2), p= p. 859-871, June 2021. DOI
> > > 10.1109/TCNS.2021.3053179> > > > Technical report (longer version):
> > &g= t; >
> > >
> https://folk.universitetetioslo.no/mi= chawe/research/publications/NUM-ECN_report_2019.pdf
> > > >= ; and there=E2=80=99s also some in this paper, which first introduced
= > our LGC
> > > mechanism:
> > > > https:= //ieeexplore.ieee.org/document/7796757
> > > > See the tec= hnical report on page 9, section D: a simple trick can
> improve> > > DCTCP=E2=80=99s fairness (if that=E2=80=99s really the me= chanism to stay
> with=E2=80=A6
> > > I=E2=80=99m get= ting quite happy with the results we get with our LGC
> scheme :-)<= br />> > > )
> > > >
> > > > Che= ers,
> > > > Michael
> > > >
> &g= t; > > On Apr 8, 2022, at 6:33 PM, Dave Taht <dave.taht@gmail.com&= gt;
> wrote:
> > > > I have managed to drop most o= f my state regarding the state of
> various
> > > >= ; dctcp-like solutions. At one level it's good to have not been
> k= eeping
> > > > up, washing my brain clean, as it were. For= some reason or another
> I
> > > > went back to t= he original paper last week, and have been pounding
> > > >= ; through this one again:
> > > >
> > > >= Analysis of DCTCP: Stability, Convergence, and Fairness
> > >= ; >
> > > > "Instead, we propose subtracting =CE=B1/2 f= rom the window size for
> each
> > > marked ACK,
> > > > resulting in the following simple window update equati= on:
> > > >
> > > > One result of which I= was most proud recently was of demonstrating
> > > > perf= ect rtt fairness in a range of 20ms to 260ms with fq_codel
> > &= gt; > https://forum.mikrotik.com/viewtopic.php?t=3D179307 )- and I'm
> pretty
> > > > interested in 2-260ms, but haven't g= ot around to it.
> > > >
> > > > Now, one= early result from the sce vs l4s testing I recall was
> severe
> > > > latecomer convergence problems - something like 40s t= o come into
> flow
> > > > balance - but I can't r= emember what presentation, paper, or rtt
> that
> > >= > was from. ?
> > > >
> > > > Another= one has been various claims towards some level of rtt
> > > = > unfairness being ok, but not the actual ratio, nor (going up to
&= gt; the
> > > > paper's proposal above) whether that metho= d had been tried.
> > > >
> > > > My opin= ion has long been that any form of marking should look more
> > = > > closely at the observed RTT than any fixed rate reduction method,=
> and
> > > > compensate the paced rate to suit. = But that's presently just
> reduced
> > > > to an = opinion, not having kept up with progress on prague,
> dctcp-sce,> > > > or bbrv2. As one example of ignorance, are 2 packet= s still paced
> back
> > > > to back? DRR++ + earl= y marking seems to lead to one packet being
> > > > consis= tently unmarked and the other marked.
> > > >
> &g= t; > > --
> > > > I tried to build a better future, = a few times:
> > > > https://wayforward.archive.org/?site= =3Dhttps%3A%2F%2Fwww.icei.org
> > > >
> > > = > Dave T=C3=A4ht CEO, TekLibre, LLC
> > > > ___________= ____________________________________
> > > > Ecn-sane mail= ing list
> > > > Ecn-sane@lists.bufferbloat.net
> = > > > https://lists.bufferbloat.net/listinfo/ecn-sane
> &g= t; > >
> > > > --
> > > > Sent from= my Android device with K-9 Mail. Please excuse my
> brevity.
= > > > >
> > >
> > >
>
>

=0A
------=_20220420182133000000_54999--