From mboxrd@z Thu Jan 1 00:00:00 1970
Return-Path:
Received: from smtp93.iad3a.emailsrvr.com (smtp93.iad3a.emailsrvr.com
[173.203.187.93])
(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
(No client certificate requested)
by lists.bufferbloat.net (Postfix) with ESMTPS id E08D53B29D
for ; Wed, 20 Apr 2022 18:21:34 -0400 (EDT)
Received: from app37.wa-webapps.iad3a (relay-webapps.rsapps.net
[172.27.255.140])
by smtp20.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 0D2CE249DE;
Wed, 20 Apr 2022 18:21:34 -0400 (EDT)
Received: from deepplum.com (localhost.localdomain [127.0.0.1])
by app37.wa-webapps.iad3a (Postfix) with ESMTP id D2942612E7;
Wed, 20 Apr 2022 18:21:33 -0400 (EDT)
Received: by apps.rackspace.com
(Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com)
with HTTP; Wed, 20 Apr 2022 18:21:33 -0400 (EDT)
X-Auth-ID: dpreed@deepplum.com
Date: Wed, 20 Apr 2022 18:21:33 -0400 (EDT)
From: "David P. Reed"
To: "Sebastian Moeller"
Cc: "Michael Welzl" ,
ecn-sane@lists.bufferbloat.net
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_20220420182133000000_54999"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To:
References:
<4430DD9F-2556-4D38-8BE2-6609265319AF@ifi.uio.no>
<1649778681.721621839@apps.rackspace.com>
<0026CF35-46DF-4C0C-8FEE-B5309246C1B7@ifi.uio.no>
<08F92DA0-1D59-4E58-A289-3D35103CF78B@gmx.de>
<1649955272.49298319@apps.rackspace.com>
<1650400809.579413230@apps.rackspace.com>
X-Client-IP: 209.6.168.128
Message-ID: <1650493293.85915194@apps.rackspace.com>
X-Mailer: webmail/19.0.13-RC
X-Classification-ID: 2abb0535-57f6-4152-91cb-e52f1a8ac33c-1-1
Subject: Re: [Ecn-sane] rtt-fairness question
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
Internet
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Wed, 20 Apr 2022 22:21:35 -0000
------=_20220420182133000000_54999
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
=0AHi Sebastian -=0A =0AActually, fq in fq_codel does achieve throughput-fa=
irness on the bottleneck link, approximately, given TCP.=0AAnd I do agree t=
hat throughput fairness is about all you can define locally.=0A =0AThat is,=
no matter what the RTT (unloaded), dropping and ECN marking all flows equa=
lly at the bottleneck link will achieve approximate throughput sharing. The=
end-to-end windows of independent TCP will size themselves to the underlyi=
ng RTT, as they are wont to do, and as desired if you want to get both good=
utilization and minimize queueing delay across all paths in the network as=
a whole. (a reasonable definition of a good operating point).=0A =0ATo do =
this, each router need not know at all what the RTT of the packets flowing =
through should be. The router strategy is RTT agnostic.=0A =0AMy concern wa=
s focused on trying to balance RTT among all flows by decisionmaking in a r=
outer watching the packets flowing by. That seems like a terrible idea, tho=
ugh I suppose any metric might have some supporters out there. [snide remar=
k: look at all the "diffserv control points", some person actually is a fan=
of each one, though I doubt anyone knows what each one has an implementati=
on technique that actually will achieve anything like what is stated in the=
RFC's describing them. It's why I think that diffserv couldn't have result=
ed from any process like "rough consensus and working code", but instead ca=
me from the usual committee-style "standards process" that has produced the=
millions of useless standards in the world standards organizations].=0A =
=0AThe nice thing about fq_codel and cake, to me, is that they come close t=
o achieving a pragmatic throughput-fairness, eliminate queueing delay on a =
link - two important factors that allow good end-to-end protocols to be bui=
lt on top of them. (A key property is reducing the likelihood of load-based=
starvation of new flows, etc. as long as those flows handle drops and mark=
s by reducing sending rate compatibly with TCP flows). Of course, if implem=
ented badly (like refusing to drop or mark some packets based on some theor=
y like "lost packets are just evil") they may not work well.=0A =0AThinking=
about measuring the right things rather than the wrong things, to me, is c=
rucial. Optimizing for 100% link utlilzation is an example of the wrong met=
ric. It should be obvious why, but apparently it is the metric most financi=
al executives in network operators want to see prioritized. With a second m=
etric of "lost packets/sent packets" being optimized to 0. Imagine if roads=
were required to be 100% utilized by cars at all times... Yup, I've talked=
to folks at RBOCs in charge of financing (and BT executives, too) who actu=
ally respond to that analogy with cars, by saying "data is nothing like car=
s - you must be kidding" and then going back to saying that they want 100% =
utilization and 0 dropped packets. That's what accountants do to you.=0A =
=0A =0A =0AOn Wednesday, April 20, 2022 8:54am, "Sebastian Moeller" said:=0A=0A=0A=0A> Hi David,=0A> =0A> =0A> > On Apr 19, 2022, at=
22:40, David P. Reed wrote:=0A> >=0A> > Sebastian - =
all your thoughts here seem reasonable.=0A> >=0A> > I would point out only =
two things:=0A> >=0A> > 1) 100 ms. is a magic number for human perception. =
It's basically the order=0A> of magnitude of humans' ability to respond to =
unpredictable events outside the=0A> human.=0A> =0A> Yes, with this I fully=
agree, "order of magnitude", the actual numerical value of=0A> 100 is for =
convenience and has no real significance IMHO. Which I should have=0A> phra=
sed better. Side-note such experiments typically require the subject to=0A>=
create a measurable response, which will take additional time to the initi=
al=0A> event detection, but that still fits within the 100ms order of magni=
tude much=0A> better than a hypothetical 10ms. (for visual events at 10ms t=
he frontal lobe will=0A> not even have the information available that somet=
hing changed, vision is=0A> amazingly slow*)=0A> =0A> > That's why it is ma=
gic. Now humans can actually perceive intervals much, much=0A> shorter (dep=
ending on how we pay attention), but usually it is by comparing two=0A> eve=
nts' time ordering. We can even synchronize to external, predictable events=
=0A> with finer resolution (as in Jazz improv or just good chamber music pl=
aying). A=0A> century of careful scientific research supports this, niot ju=
st one experiment.=0A> =0A> Quite a number of experiments however are misin=
terpreted (or rather interpreted=0A> without the required nuance) on the in=
ternet (yes, I know shocking ;) that the=0A> internet can be factually impr=
ecise).=0A> =0A> =0A> > Which is why one should take it seriously as a usef=
ul target. (the fact that=0A> one can achieve it across the planet with dig=
ital signalling networks makes it a=0A> desirable goal for anything interac=
tive between a human and any entity, be it=0A> computer or human). If one c=
an do better, of course, that's great. I like that=0A> from my home compute=
r I can get lots of places in under 8 msec (15 msec RTT).=0A> >=0A> > 2) gi=
ven that a particular heavily utilized link might be shared for paths=0A> w=
here the light-speed-in-fiber round trip for active flows varies by an orde=
r of=0A> magnitude, why does one try to make fair RTT (as opposed to all ot=
her possible=0A> metrics on each flow) among flows.=0A> =0A> I think the me=
asure that is equalized here is throughput per flow, it is just=0A> that if=
done competently this will also alleviate the inherent disadvantage that=
=0A> longer RTT flows have compared to shorter RTT flows. But then again, o=
ther=0A> measures are possible as well assuming the bottleneck can get at t=
hese easily.=0A> =0A> > It doesn't make any sense to me why. Going back to =
human interaction times,=0A> it makes sense to me that you might want to be=
unfair so that most flows get=0A> faster than 200 ms. RTT, for example, pe=
nalizing those who are really close to=0A> each other anyway.=0A> > If the =
RTT is already low because congestion has been controlled, you can't=0A> ma=
ke it lower. Basically, the ideal queue state is < 1 packet in the bottlene=
ck=0A> outbound queues, no matter what the RTT through that queue is.=0A> =
=0A> Well, why RTT-fairness? My answer is similar as for why I like FQ, bec=
ause=0A> equitable sharing is the one strategy that without information abo=
ut the flows=0A> relative importance avoids the pitfall of starving importa=
nt flows that just=0A> happen to have a long RTT or a less aggressive contr=
oller... So IMHO RTT fairness=0A> does not need to be absolute but simply g=
ood enough to keep all flows at making=0A> decent forward progress. The ver=
y moment someone comes in knowing more about the=0A> different flows' impor=
tance, more optimal capacity sharing becomes possible (like=0A> in Vint's e=
xample)... in a sense neither FQ nor the "accidental" RTT-fairness it=0A> o=
ffers are likely optimal but they are IMHO considerably less likely to be=
=0A> pessimal than any uninformed inequitable sharing.=0A> =0A> =0A> Regard=
s=0A> Sebastian=0A> =0A> =0A> *) Given that vision is essentially our long-=
range sense** that internal latency=0A> typically is not an issue, since ev=
ents/objects will often be far enough away that=0A> detection can afford th=
at extra time=0A> =0A> **) In space and time, just look at the stars ;)=0A>=
=0A> =0A> >=0A> >=0A> >=0A> > On Thursday, April 14, 2022 5:25pm, "Sebasti=
an Moeller"=0A> said:=0A> >=0A> > > Just indulge me here =
for a few crazy ideas ;)=0A> > >=0A> > > > On Apr 14, 2022, at 18:54, David=
P. Reed=0A> wrote:=0A> > > >=0A> > > > Am I to assum=
e, then, that routers need not pay any attention to=0A> RTT to=0A> > > achi=
eve RTT-fairness?=0A> > >=0A> > > Part of RTT-bias seems caused by the simp=
le fact that tight control=0A> loops work=0A> > > better than sloppy ones ;=
)=0A> > >=0A> > > There seem to be three ways to try to remedy that to some=
degree:=0A> > > 1) the daft one:=0A> > > define a reference RTT (larger th=
an typically encountered) and have all=0A> TCPs=0A> > > respond as if encou=
ntering that delay -> until the path RTT exceeds=0A> that=0A> > > reference=
TCP things should be reasonably fair=0A> > >=0A> > > 2) the flows communic=
ate with the bottleneck honestly:=0A> > > if flows would communicate their =
RTT to the bottleneck the bottleneck=0A> could=0A> > > partition its resour=
ces such that signaling (mark/drop) and puffer size=0A> is=0A> > > bespoke =
per-flow. In theory that can work, but relies on either the RTT=0A> > > inf=
ormation being non-gameably linked to the protocol's operation* or=0A> ever=
ybody=0A> > > being fully veridical and honest=0A> > > *) think a protocol =
that will only work if the best estimate of the RTT=0A> is=0A> > > communic=
ated between the two sides continuously=0A> > >=0A> > > 3) the router being=
verbose:=0A> > > If routers communicate the fill-state of their queue (glo=
bal or per-flow=0A> does not=0A> > > matter all that much) flows in theory =
can do a better job at not putting=0A> way too=0A> > > much data in flight =
remedying the cost of drops/marks that affects high=0A> RTT flows=0A> > > m=
ore than the shorter ones. (The router has little incentive to lie=0A> here=
, if it=0A> > > wanted to punish a flow it would be easier to simply drop i=
ts packets=0A> and be done=0A> > > with).=0A> > >=0A> > >=0A> > > IMHO 3, w=
hile theoretically the least effective of the three is the only=0A> one tha=
t=0A> > > has a reasonable chance of being employed... or rather is already=
=0A> deployed in the=0A> > > form of ECN (with mild effects).=0A> > >=0A> >=
> > How does a server or client (at the endpoint) adjust RTT so that it=0A=
> is fair?=0A> > >=0A> > > See 1) above, but who in their right mind would =
actually implement=0A> something like=0A> > > that (TCP Prague did that, bu=
t IMHO never in earnest but just to=0A> "address" the=0A> > > L4S bullet po=
int RTT-bias reduction).=0A> > >=0A> > > > Now RTT, technically, is just th=
e sum of the instantaneous queue=0A> lengths in=0A> > > bytes along the pat=
h and the reverse path, plus a fixed wire-level=0A> delay. And=0A> > > rout=
ers along any path do not have correlated queue sizes.=0A> > > >=0A> > > > =
It seems to me that RTT adjustment requires collective real-time=0A> cooper=
ation=0A> > > among all-or-most future users of that path. The path is part=
ially=0A> shared by many=0A> > > servers and many users, none of whom direc=
tly speak to each other.=0A> > > >=0A> > > > And routers have very limited =
memory compared to their=0A> throughput-RTdelay=0A> > > product. So calcula=
ting the RTT using spin bits and UIDs for packets=0A> seems a bit=0A> > > m=
uch to expect all routers to do.=0A> > >=0A> > > If posed like this, I gues=
s the better question is, what can/should=0A> routers be=0A> > > expected t=
o do here: either equitably share their queues or share queue=0A> > > inequ=
itably such that throughput is equitable. From a pure router point=0A> of t=
he=0A> > > view the first seems "fairest", but as fq_codel and cake show, w=
ithin=0A> reason=0A> > > equitable capacity sharing is possible (so not per=
fectly and not for=0A> every=0A> > > possible RTT spread).=0A> > >=0A> > > =
>=0A> > > > So, what process measures the cross-interactions among all the=
=0A> users of all=0A> > > the paths, and what control-loop (presumably stab=
le and TCP-compatible)=0A> actually=0A> > > converges to RTT fairness IRL.=
=0A> > >=0A> > > Theoretically nothing, in reality on a home link FQ+compet=
ent AQM goes a=0A> long way=0A> > > in that direction.=0A> > >=0A> > >=0A> =
> > >=0A> > > > Today, the basis of congestion control in the Internet is t=
hat each=0A> router is=0A> > > a controller of all endpoint flows that shar=
e a link, and each router is=0A> free to=0A> > > do whatever it takes to re=
duce its queue length to near zero as an=0A> average on all=0A> > > timesca=
les larger than about 1/10 of a second (a magic number that is=0A> directly=
=0A> > > derived from measured human brain time resolution).=0A> > >=0A> > =
> The typical applies, be suspicious of too round numbers.... 100ms is in=
=0A> no way=0A> > > magic and also not "correct" it is however a decent des=
cription of=0A> reaction times=0A> > > in a number of perceptul tasks that =
can be mis-interpreted as showing=0A> things like=0A> > > the brain runs at=
10Hz or similar...=0A> > >=0A> > >=0A> > > >=0A> > > > So, for any two mac=
hines separated by less than 1/10 of a=0A> light-second in=0A> > > distance=
, the total queueing delay has to stabilize in about 1/10 of a=0A> second.=
=0A> > > (I'm using a light-second in a fiber medium, not free-space, as th=
e=0A> speed of light=0A> > > in fiber is a lot slower than the speed of lig=
ht on microwaves, as Wall=0A> Street has=0A> > > recently started recoginiz=
ing and investing in).=0A> > > >=0A> > > > I don't see how RTT-fairness can=
be achieved by some set of bits in=0A> the IP=0A> > > header. You can't sh=
orten RTT below about 2/10 of a second in that=0A> desired system=0A> > > s=
tate. You can only "lengthen" RTT by delaying packets in source or=0A> endp=
oint=0A> > > buffers, because it's unreasonable to manage all the routers.=
=0A> > > >=0A> > > > And the endpoints that share a path can't talk to each=
other and=0A> reach a=0A> > > decision in on the order of 2/10 of a second=
.=0A> > > >=0A> > > > So at the very highest level, what is RTT-fairness's =
objective=0A> function=0A> > > optimizing, and how can it work?=0A> > > >=
=0A> > > > Can it be done without any change to routers?=0A> > >=0A> > > We=
ll the goal here seems to undo the RTT-dependence of throughput so a=0A> ro=
uter can=0A> > > equalize per flow throughput and thereby (from its own van=
tage point)=0A> enforce RTT=0A> > > independence, within the amount of memo=
ry available. And that already=0A> works today=0A> > > for all identifiable=
flows, but apparently at a computational cost that=0A> larger=0A> > > rout=
ers do not want to pay. But you knew all that=0A> > >=0A> > >=0A> > > >=0A>=
> > >=0A> > > >=0A> > > >=0A> > > > On Tuesday, April 12, 2022 3:07pm, "Mi=
chael Welzl"=0A> =0A> > > said:=0A> > > >=0A> > > >=0A>=
> > >=0A> > > > On Apr 12, 2022, at 8:52 PM, Sebastian Moeller=0A> =0A> > > wrote:=0A> > > > Question: is QUIC actually using the sp=
in bit as an essential part=0A> of the=0A> > > protocol?=0A> > > > The spec=
says it=E2=80=99s optional:=0A> > > https://www.rfc-editor.org/rfc/rfc9000=
.html#name-latency-spin-bit=0A> > > > Otherwise endpoints might just game t=
his if faking their RTT at a=0A> router=0A> > > yields an advantage...=0A> =
> > > This was certainly discussed in the QUIC WG. Probably perceived as=0A=
> an unclear=0A> > > incentive, but I didn=E2=80=99t really follow this.=0A=
> > > > Cheers,=0A> > > > Michael=0A> > > >=0A> > > > This is why pping's u=
se of tcp timestamps is elegant, little=0A> incentive for=0A> > > the endpo=
ints to fudge....=0A> > > >=0A> > > > Regards=0A> > > > Sebastian=0A> > > >=
=0A> > > >=0A> > > > On 12 April 2022 18:00:15 CEST, Michael Welzl=0A> =0A> > > wrote:=0A> > > > Hi,=0A> > > > Who or what are you=
objecting against? At least nothing that I=0A> described=0A> > > does what=
you suggest.=0A> > > > BTW, just as a side point, for QUIC, routers can kn=
ow the RTT today=0A> - using=0A> > > the spin bit, which was designed for t=
hat specific purpose.=0A> > > > Cheers,=0A> > > > Michael=0A> > > >=0A> > >=
>=0A> > > > On Apr 12, 2022, at 5:51 PM, David P. Reed=0A> =0A> > > wrote:=0A> > > > I strongly object to congestion control *in=
the network* attempting=0A> to=0A> > > measure RTT (which is an end-to-end=
comparative metric). Unless the=0A> current RTT is=0A> > > passed in each =
packet a router cannot enforce fairness. Period.=0A> > > >=0A> > > > Today,=
by packet drops and fair marking, information is passed to=0A> the sending=
=0A> > > nodes (eventually) about congestion. But the router can't know RTT=
=0A> today.=0A> > > >=0A> > > > The result of *requiring* RTT fairness woul=
d be to put the random=0A> bottleneck=0A> > > router (chosen because it is =
the slowest forwarder on a contended path)=0A> become the=0A> > > endpoint =
controller.=0A> > > >=0A> > > > That's the opposite of an "end-to-end resou=
rce sharing protocol".=0A> > > >=0A> > > > Now, I'm not saying it is imposs=
ible - what I'm saying it is asking=0A> all=0A> > > endpoints to register w=
ith an "Internet-wide" RTT real-time tracking and=0A> control=0A> > > servi=
ce.=0A> > > >=0A> > > > This would be the technical equivalent of an ITU ce=
ntral control=0A> point.=0A> > > >=0A> > > > So, either someone will invent=
something I cannot imagine (a=0A> distributed,=0A> > > rapid-convergence a=
lgortithm that rellects to *every potential user* of=0A> a shared=0A> > > r=
outer along the current path the RTT's of ALL other users (and=0A> potentia=
l users).=0A> > > >=0A> > > > IMHO, the wish for RTT fairness is like sayin=
g that the entire=0A> solar system's=0A> > > gravitational pull should be e=
qualized so that all planets and asteroids=0A> have fair=0A> > > access to =
1G gravity.=0A> > > >=0A> > > >=0A> > > > On Friday, April 8, 2022 2:03pm, =
"Michael Welzl"=0A> =0A> > > said:=0A> > > >=0A> > > > =
Hi,=0A> > > > FWIW, we have done some analysis of fairness and convergence =
of=0A> DCTCP in:=0A> > > > Peyman Teymoori, David Hayes, Michael Welzl, Ste=
in Gjessing:=0A> "Estimating an=0A> > > Additive Path Cost with Explicit Co=
ngestion Notification", IEEE=0A> Transactions on=0A> > > Control of Network=
Systems, 8(2), pp. 859-871, June 2021. DOI=0A> > > 10.1109/TCNS.2021.30531=
79=0A> > > > Technical report (longer version):=0A> > > >=0A> > >=0A> https=
://folk.universitetetioslo.no/michawe/research/publications/NUM-ECN_report_=
2019.pdf=0A> > > > and there=E2=80=99s also some in this paper, which first=
introduced=0A> our LGC=0A> > > mechanism:=0A> > > > https://ieeexplore.iee=
e.org/document/7796757=0A> > > > See the technical report on page 9, sectio=
n D: a simple trick can=0A> improve=0A> > > DCTCP=E2=80=99s fairness (if th=
at=E2=80=99s really the mechanism to stay=0A> with=E2=80=A6=0A> > > I=E2=80=
=99m getting quite happy with the results we get with our LGC=0A> scheme :-=
)=0A> > > )=0A> > > >=0A> > > > Cheers,=0A> > > > Michael=0A> > > >=0A> > >=
> On Apr 8, 2022, at 6:33 PM, Dave Taht =0A> wrote:=
=0A> > > > I have managed to drop most of my state regarding the state of=
=0A> various=0A> > > > dctcp-like solutions. At one level it's good to have=
not been=0A> keeping=0A> > > > up, washing my brain clean, as it were. For=
some reason or another=0A> I=0A> > > > went back to the original paper las=
t week, and have been pounding=0A> > > > through this one again:=0A> > > >=
=0A> > > > Analysis of DCTCP: Stability, Convergence, and Fairness=0A> > > =
>=0A> > > > "Instead, we propose subtracting =CE=B1/2 from the window size =
for=0A> each=0A> > > marked ACK,=0A> > > > resulting in the following simpl=
e window update equation:=0A> > > >=0A> > > > One result of which I was mos=
t proud recently was of demonstrating=0A> > > > perfect rtt fairness in a r=
ange of 20ms to 260ms with fq_codel=0A> > > > https://forum.mikrotik.com/vi=
ewtopic.php?t=3D179307 )- and I'm=0A> pretty=0A> > > > interested in 2-260m=
s, but haven't got around to it.=0A> > > >=0A> > > > Now, one early result =
from the sce vs l4s testing I recall was=0A> severe=0A> > > > latecomer con=
vergence problems - something like 40s to come into=0A> flow=0A> > > > bala=
nce - but I can't remember what presentation, paper, or rtt=0A> that=0A> > =
> > was from. ?=0A> > > >=0A> > > > Another one has been various claims tow=
ards some level of rtt=0A> > > > unfairness being ok, but not the actual ra=
tio, nor (going up to=0A> the=0A> > > > paper's proposal above) whether tha=
t method had been tried.=0A> > > >=0A> > > > My opinion has long been that =
any form of marking should look more=0A> > > > closely at the observed RTT =
than any fixed rate reduction method,=0A> and=0A> > > > compensate the pace=
d rate to suit. But that's presently just=0A> reduced=0A> > > > to an opini=
on, not having kept up with progress on prague,=0A> dctcp-sce,=0A> > > > or=
bbrv2. As one example of ignorance, are 2 packets still paced=0A> back=0A>=
> > > to back? DRR++ + early marking seems to lead to one packet being=0A>=
> > > consistently unmarked and the other marked.=0A> > > >=0A> > > > --=
=0A> > > > I tried to build a better future, a few times:=0A> > > > https:/=
/wayforward.archive.org/?site=3Dhttps%3A%2F%2Fwww.icei.org=0A> > > >=0A> > =
> > Dave T=C3=A4ht CEO, TekLibre, LLC=0A> > > > ___________________________=
____________________=0A> > > > Ecn-sane mailing list=0A> > > > Ecn-sane@lis=
ts.bufferbloat.net=0A> > > > https://lists.bufferbloat.net/listinfo/ecn-san=
e=0A> > > >=0A> > > > --=0A> > > > Sent from my Android device with K-9 Mai=
l. Please excuse my=0A> brevity.=0A> > > >=0A> > >=0A> > >=0A> =0A>
------=_20220420182133000000_54999
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Hi Sebastian -
=0A<=
p style=3D"margin:0;padding:0;font-family: arial; font-size: 10pt; overflow=
-wrap: break-word;">
=0AActually, fq in fq_co=
del does achieve throughput-fairness on the bottleneck link, approximately,=
given TCP.
=0AAnd I do agree that throughput fairn=
ess is about all you can define locally.
=0A <=
/p>=0A
That is, no matter what the RTT (unloaded), drop=
ping and ECN marking all flows equally at the bottleneck link will achieve =
approximate throughput sharing. The end-to-end windows of independent TCP w=
ill size themselves to the underlying RTT, as they are wont to do, and as d=
esired if you want to get both good utilization and minimize queueing delay=
across all paths in the network as a whole. (a reasonable definition of a =
good operating point).
=0A
=0ATo do this, each router need not know at all what the RTT of the p=
ackets flowing through should be. The router strategy is RTT agnostic.
=
=0A
=0AMy concern was fo=
cused on trying to balance RTT among all flows by decisionmaking in a route=
r watching the packets flowing by. That seems like a terrible idea, though =
I suppose any metric might have some supporters out there. [snide remark: l=
ook at all the "diffserv control points", some person actually is a fan of =
each one, though I doubt anyone knows what each one has an implementation t=
echnique that actually will achieve anything like what is stated in the RFC=
's describing them. It's why I think that diffserv couldn't have resulted f=
rom any process like "rough consensus and working code", but instead came f=
rom the usual committee-style "standards process" that has produced the mil=
lions of useless standards in the world standards organizations].
=0A
=0AThe nice thing about fq=
_codel and cake, to me, is that they come close to achieving a pragmatic th=
roughput-fairness, eliminate queueing delay on a link - two important facto=
rs that allow good end-to-end protocols to be built on top of them. (A key =
property is reducing the likelihood of load-based starvation of new flows, =
etc. as long as those flows handle drops and marks by reducing sending rate=
compatibly with TCP flows). Of course, if implemented badly (like refusing=
to drop or mark some packets based on some theory like "lost packets are j=
ust evil") they may not work well.
=0A
=0A=
Thinking about measuring the right things rather than =
the wrong things, to me, is crucial. Optimizing for 100% link utlilzation i=
s an example of the wrong metric. It should be obvious why, but apparently =
it is the metric most financial executives in network operators want to see=
prioritized. With a second metric of "lost packets/sent packets" being opt=
imized to 0. Imagine if roads were required to be 100% utilized by cars at =
all times... Yup, I've talked to folks at RBOCs in charge of financing (and=
BT executives, too) who actually respond to that analogy with cars, by say=
ing "data is nothing like cars - you must be kidding" and then going back t=
o saying that they want 100% utilization and 0 dropped packets. That's what=
accountants do to you.
=0A
=0A
=0A
=0AOn Wednesday, April 20, 2022 8:54am, "Sebastian Moeller" <moeller0@gm=
x.de> said:
=0A=0A
> Hi David,
>
>
> > On Apr 1=
9, 2022, at 22:40, David P. Reed <dpreed@deepplum.com> wrote:
&g=
t; >
> > Sebastian - all your thoughts here seem reasonable.<=
br />> >
> > I would point out only two things:
> =
>
> > 1) 100 ms. is a magic number for human perception. It's=
basically the order
> of magnitude of humans' ability to respond t=
o unpredictable events outside the
> human.
>
> Ye=
s, with this I fully agree, "order of magnitude", the actual numerical valu=
e of
> 100 is for convenience and has no real significance IMHO. Wh=
ich I should have
> phrased better. Side-note such experiments typi=
cally require the subject to
> create a measurable response, which =
will take additional time to the initial
> event detection, but tha=
t still fits within the 100ms order of magnitude much
> better than=
a hypothetical 10ms. (for visual events at 10ms the frontal lobe will
> not even have the information available that something changed, visio=
n is
> amazingly slow*)
>
> > That's why it is =
magic. Now humans can actually perceive intervals much, much
> shor=
ter (depending on how we pay attention), but usually it is by comparing two=
> events' time ordering. We can even synchronize to external, pred=
ictable events
> with finer resolution (as in Jazz improv or just g=
ood chamber music playing). A
> century of careful scientific resea=
rch supports this, niot just one experiment.
>
> Quite a n=
umber of experiments however are misinterpreted (or rather interpreted
> without the required nuance) on the internet (yes, I know shocking ;)=
that the
> internet can be factually imprecise).
>
&=
gt;
> > Which is why one should take it seriously as a useful t=
arget. (the fact that
> one can achieve it across the planet with d=
igital signalling networks makes it a
> desirable goal for anything=
interactive between a human and any entity, be it
> computer or hu=
man). If one can do better, of course, that's great. I like that
> =
from my home computer I can get lots of places in under 8 msec (15 msec RTT=
).
> >
> > 2) given that a particular heavily utilize=
d link might be shared for paths
> where the light-speed-in-fiber r=
ound trip for active flows varies by an order of
> magnitude, why d=
oes one try to make fair RTT (as opposed to all other possible
> me=
trics on each flow) among flows.
>
> I think the measure t=
hat is equalized here is throughput per flow, it is just
> that if =
done competently this will also alleviate the inherent disadvantage that
> longer RTT flows have compared to shorter RTT flows. But then again=
, other
> measures are possible as well assuming the bottleneck can=
get at these easily.
>
> > It doesn't make any sense t=
o me why. Going back to human interaction times,
> it makes sense t=
o me that you might want to be unfair so that most flows get
> fast=
er than 200 ms. RTT, for example, penalizing those who are really close to<=
br />> each other anyway.
> > If the RTT is already low becau=
se congestion has been controlled, you can't
> make it lower. Basic=
ally, the ideal queue state is < 1 packet in the bottleneck
> ou=
tbound queues, no matter what the RTT through that queue is.
>
> Well, why RTT-fairness? My answer is similar as for why I like FQ, b=
ecause
> equitable sharing is the one strategy that without informa=
tion about the flows
> relative importance avoids the pitfall of st=
arving important flows that just
> happen to have a long RTT or a l=
ess aggressive controller... So IMHO RTT fairness
> does not need t=
o be absolute but simply good enough to keep all flows at making
> =
decent forward progress. The very moment someone comes in knowing more abou=
t the
> different flows' importance, more optimal capacity sharing =
becomes possible (like
> in Vint's example)... in a sense neither F=
Q nor the "accidental" RTT-fairness it
> offers are likely optimal =
but they are IMHO considerably less likely to be
> pessimal than an=
y uninformed inequitable sharing.
>
>
> Regards> Sebastian
>
>
> *) Given that vision is =
essentially our long-range sense** that internal latency
> typicall=
y is not an issue, since events/objects will often be far enough away that<=
br />> detection can afford that extra time
>
> **) In =
space and time, just look at the stars ;)
>
>
> &=
gt;
> >
> >
> > On Thursday, April 14, 202=
2 5:25pm, "Sebastian Moeller"
> <moeller0@gmx.de> said:
=
> >
> > > Just indulge me here for a few crazy ideas ;)=
> > >
> > > > On Apr 14, 2022, at 18:54, Da=
vid P. Reed
> <dpreed@deepplum.com> wrote:
> > >=
; >
> > > > Am I to assume, then, that routers need not=
pay any attention to
> RTT to
> > > achieve RTT-fair=
ness?
> > >
> > > Part of RTT-bias seems caused=
by the simple fact that tight control
> loops work
> > =
> better than sloppy ones ;)
> > >
> > > The=
re seem to be three ways to try to remedy that to some degree:
> &g=
t; > 1) the daft one:
> > > define a reference RTT (larger=
than typically encountered) and have all
> TCPs
> > >=
; respond as if encountering that delay -> until the path RTT exceeds
> that
> > > reference TCP things should be reasonably =
fair
> > >
> > > 2) the flows communicate with =
the bottleneck honestly:
> > > if flows would communicate the=
ir RTT to the bottleneck the bottleneck
> could
> > >=
partition its resources such that signaling (mark/drop) and puffer size
> is
> > > bespoke per-flow. In theory that can work, b=
ut relies on either the RTT
> > > information being non-gamea=
bly linked to the protocol's operation* or
> everybody
> &g=
t; > being fully veridical and honest
> > > *) think a pro=
tocol that will only work if the best estimate of the RTT
> is
> > > communicated between the two sides continuously
> &=
gt; >
> > > 3) the router being verbose:
> > &g=
t; If routers communicate the fill-state of their queue (global or per-flow=
> does not
> > > matter all that much) flows in theo=
ry can do a better job at not putting
> way too
> > >=
much data in flight remedying the cost of drops/marks that affects high
> RTT flows
> > > more than the shorter ones. (The rout=
er has little incentive to lie
> here, if it
> > > wa=
nted to punish a flow it would be easier to simply drop its packets
&g=
t; and be done
> > > with).
> > >
> >=
; >
> > > IMHO 3, while theoretically the least effective =
of the three is the only
> one that
> > > has a reaso=
nable chance of being employed... or rather is already
> deployed i=
n the
> > > form of ECN (with mild effects).
> > &=
gt;
> > > > How does a server or client (at the endpoint) =
adjust RTT so that it
> is fair?
> > >
> >=
> See 1) above, but who in their right mind would actually implement
> something like
> > > that (TCP Prague did that, but I=
MHO never in earnest but just to
> "address" the
> > >=
; L4S bullet point RTT-bias reduction).
> > >
> > =
> > Now RTT, technically, is just the sum of the instantaneous queue<=
br />> lengths in
> > > bytes along the path and the rever=
se path, plus a fixed wire-level
> delay. And
> > > r=
outers along any path do not have correlated queue sizes.
> > &g=
t; >
> > > > It seems to me that RTT adjustment require=
s collective real-time
> cooperation
> > > among all-=
or-most future users of that path. The path is partially
> shared b=
y many
> > > servers and many users, none of whom directly sp=
eak to each other.
> > > >
> > > > And ro=
uters have very limited memory compared to their
> throughput-RTdel=
ay
> > > product. So calculating the RTT using spin bits and =
UIDs for packets
> seems a bit
> > > much to expect a=
ll routers to do.
> > >
> > > If posed like thi=
s, I guess the better question is, what can/should
> routers be
> > > expected to do here: either equitably share their queues o=
r share queue
> > > inequitably such that throughput is equit=
able. From a pure router point
> of the
> > > view th=
e first seems "fairest", but as fq_codel and cake show, within
> re=
ason
> > > equitable capacity sharing is possible (so not per=
fectly and not for
> every
> > > possible RTT spread)=
.
> > >
> > > >
> > > > So,=
what process measures the cross-interactions among all the
> users=
of all
> > > the paths, and what control-loop (presumably st=
able and TCP-compatible)
> actually
> > > converges t=
o RTT fairness IRL.
> > >
> > > Theoretically n=
othing, in reality on a home link FQ+competent AQM goes a
> long wa=
y
> > > in that direction.
> > >
> >=
>
> > > >
> > > > Today, the basis of=
congestion control in the Internet is that each
> router is
&=
gt; > > a controller of all endpoint flows that share a link, and eac=
h router is
> free to
> > > do whatever it takes to r=
educe its queue length to near zero as an
> average on all
>=
; > > timescales larger than about 1/10 of a second (a magic number t=
hat is
> directly
> > > derived from measured human b=
rain time resolution).
> > >
> > > The typical =
applies, be suspicious of too round numbers.... 100ms is in
> no wa=
y
> > > magic and also not "correct" it is however a decent d=
escription of
> reaction times
> > > in a number of p=
erceptul tasks that can be mis-interpreted as showing
> things like=
> > > the brain runs at 10Hz or similar...
> > &g=
t;
> > >
> > > >
> > > > So=
, for any two machines separated by less than 1/10 of a
> light-sec=
ond in
> > > distance, the total queueing delay has to stabil=
ize in about 1/10 of a
> second.
> > > (I'm using a l=
ight-second in a fiber medium, not free-space, as the
> speed of li=
ght
> > > in fiber is a lot slower than the speed of light on=
microwaves, as Wall
> Street has
> > > recently star=
ted recoginizing and investing in).
> > > >
> >=
> > I don't see how RTT-fairness can be achieved by some set of bits=
in
> the IP
> > > header. You can't shorten RTT belo=
w about 2/10 of a second in that
> desired system
> > &g=
t; state. You can only "lengthen" RTT by delaying packets in source or
> endpoint
> > > buffers, because it's unreasonable to ma=
nage all the routers.
> > > >
> > > > And=
the endpoints that share a path can't talk to each other and
> rea=
ch a
> > > decision in on the order of 2/10 of a second.
> > > >
> > > > So at the very highest level,=
what is RTT-fairness's objective
> function
> > > op=
timizing, and how can it work?
> > > >
> > >=
> Can it be done without any change to routers?
> > >
> > > Well the goal here seems to undo the RTT-dependence of thr=
oughput so a
> router can
> > > equalize per flow thr=
oughput and thereby (from its own vantage point)
> enforce RTT
> > > independence, within the amount of memory available. And th=
at already
> works today
> > > for all identifiable f=
lows, but apparently at a computational cost that
> larger
>=
; > > routers do not want to pay. But you knew all that
> >=
; >
> > >
> > > >
> > > >=
;
> > > >
> > > >
> > > >=
; On Tuesday, April 12, 2022 3:07pm, "Michael Welzl"
> <michawe@=
ifi.uio.no>
> > > said:
> > > >
>=
> > >
> > > >
> > > > On Apr 12=
, 2022, at 8:52 PM, Sebastian Moeller
> <moeller0@gmx.de>
> > > wrote:
> > > > Question: is QUIC actually=
using the spin bit as an essential part
> of the
> > &g=
t; protocol?
> > > > The spec says it=E2=80=99s optional:<=
br />> > > https://www.rfc-editor.org/rfc/rfc9000.html#name-latenc=
y-spin-bit
> > > > Otherwise endpoints might just game thi=
s if faking their RTT at a
> router
> > > yields an a=
dvantage...
> > > > This was certainly discussed in the QU=
IC WG. Probably perceived as
> an unclear
> > > incen=
tive, but I didn=E2=80=99t really follow this.
> > > > Che=
ers,
> > > > Michael
> > > >
> &g=
t; > > This is why pping's use of tcp timestamps is elegant, little> incentive for
> > > the endpoints to fudge....
=
> > > >
> > > > Regards
> > > &g=
t; Sebastian
> > > >
> > > >
> &g=
t; > > On 12 April 2022 18:00:15 CEST, Michael Welzl
> <mi=
chawe@ifi.uio.no>
> > > wrote:
> > > > Hi=
,
> > > > Who or what are you objecting against? At least =
nothing that I
> described
> > > does what you sugges=
t.
> > > > BTW, just as a side point, for QUIC, routers ca=
n know the RTT today
> - using
> > > the spin bit, wh=
ich was designed for that specific purpose.
> > > > Cheers=
,
> > > > Michael
> > > >
> > =
> >
> > > > On Apr 12, 2022, at 5:51 PM, David P. Re=
ed
> <dpreed@deepplum.com>
> > > wrote:
&g=
t; > > > I strongly object to congestion control *in the network* =
attempting
> to
> > > measure RTT (which is an end-to=
-end comparative metric). Unless the
> current RTT is
> >=
; > passed in each packet a router cannot enforce fairness. Period.
> > > >
> > > > Today, by packet drops and fa=
ir marking, information is passed to
> the sending
> > &=
gt; nodes (eventually) about congestion. But the router can't know RTT
> today.
> > > >
> > > > The result o=
f *requiring* RTT fairness would be to put the random
> bottleneck<=
br />> > > router (chosen because it is the slowest forwarder on a=
contended path)
> become the
> > > endpoint controll=
er.
> > > >
> > > > That's the opposite o=
f an "end-to-end resource sharing protocol".
> > > >
=
> > > > Now, I'm not saying it is impossible - what I'm saying =
it is asking
> all
> > > endpoints to register with a=
n "Internet-wide" RTT real-time tracking and
> control
> &g=
t; > service.
> > > >
> > > > This wou=
ld be the technical equivalent of an ITU central control
> point.> > > >
> > > > So, either someone will in=
vent something I cannot imagine (a
> distributed,
> > &g=
t; rapid-convergence algortithm that rellects to *every potential user* of<=
br />> a shared
> > > router along the current path the RT=
T's of ALL other users (and
> potential users).
> > >=
>
> > > > IMHO, the wish for RTT fairness is like sayi=
ng that the entire
> solar system's
> > > gravitation=
al pull should be equalized so that all planets and asteroids
> hav=
e fair
> > > access to 1G gravity.
> > > >> > > >
> > > > On Friday, April 8, 2022 2=
:03pm, "Michael Welzl"
> <michawe@ifi.uio.no>
> > =
> said:
> > > >
> > > > Hi,
> =
> > > FWIW, we have done some analysis of fairness and convergence=
of
> DCTCP in:
> > > > Peyman Teymoori, David Hay=
es, Michael Welzl, Stein Gjessing:
> "Estimating an
> > =
> Additive Path Cost with Explicit Congestion Notification", IEEE
&=
gt; Transactions on
> > > Control of Network Systems, 8(2), p=
p. 859-871, June 2021. DOI
> > > 10.1109/TCNS.2021.3053179
> > > > Technical report (longer version):
> > &g=
t; >
> > >
> https://folk.universitetetioslo.no/mi=
chawe/research/publications/NUM-ECN_report_2019.pdf
> > > >=
; and there=E2=80=99s also some in this paper, which first introduced
=
> our LGC
> > > mechanism:
> > > > https:=
//ieeexplore.ieee.org/document/7796757
> > > > See the tec=
hnical report on page 9, section D: a simple trick can
> improve
> > > DCTCP=E2=80=99s fairness (if that=E2=80=99s really the me=
chanism to stay
> with=E2=80=A6
> > > I=E2=80=99m get=
ting quite happy with the results we get with our LGC
> scheme :-)<=
br />> > > )
> > > >
> > > > Che=
ers,
> > > > Michael
> > > >
> &g=
t; > > On Apr 8, 2022, at 6:33 PM, Dave Taht <dave.taht@gmail.com&=
gt;
> wrote:
> > > > I have managed to drop most o=
f my state regarding the state of
> various
> > > >=
; dctcp-like solutions. At one level it's good to have not been
> k=
eeping
> > > > up, washing my brain clean, as it were. For=
some reason or another
> I
> > > > went back to t=
he original paper last week, and have been pounding
> > > >=
; through this one again:
> > > >
> > > >=
Analysis of DCTCP: Stability, Convergence, and Fairness
> > >=
; >
> > > > "Instead, we propose subtracting =CE=B1/2 f=
rom the window size for
> each
> > > marked ACK,
> > > > resulting in the following simple window update equati=
on:
> > > >
> > > > One result of which I=
was most proud recently was of demonstrating
> > > > perf=
ect rtt fairness in a range of 20ms to 260ms with fq_codel
> > &=
gt; > https://forum.mikrotik.com/viewtopic.php?t=3D179307 )- and I'm
> pretty
> > > > interested in 2-260ms, but haven't g=
ot around to it.
> > > >
> > > > Now, one=
early result from the sce vs l4s testing I recall was
> severe
> > > > latecomer convergence problems - something like 40s t=
o come into
> flow
> > > > balance - but I can't r=
emember what presentation, paper, or rtt
> that
> > >=
> was from. ?
> > > >
> > > > Another=
one has been various claims towards some level of rtt
> > > =
> unfairness being ok, but not the actual ratio, nor (going up to
&=
gt; the
> > > > paper's proposal above) whether that metho=
d had been tried.
> > > >
> > > > My opin=
ion has long been that any form of marking should look more
> > =
> > closely at the observed RTT than any fixed rate reduction method,=
> and
> > > > compensate the paced rate to suit. =
But that's presently just
> reduced
> > > > to an =
opinion, not having kept up with progress on prague,
> dctcp-sce,> > > > or bbrv2. As one example of ignorance, are 2 packet=
s still paced
> back
> > > > to back? DRR++ + earl=
y marking seems to lead to one packet being
> > > > consis=
tently unmarked and the other marked.
> > > >
> &g=
t; > > --
> > > > I tried to build a better future, =
a few times:
> > > > https://wayforward.archive.org/?site=
=3Dhttps%3A%2F%2Fwww.icei.org
> > > >
> > > =
> Dave T=C3=A4ht CEO, TekLibre, LLC
> > > > ___________=
____________________________________
> > > > Ecn-sane mail=
ing list
> > > > Ecn-sane@lists.bufferbloat.net
> =
> > > https://lists.bufferbloat.net/listinfo/ecn-sane
> &g=
t; > >
> > > > --
> > > > Sent from=
my Android device with K-9 Mail. Please excuse my
> brevity.
=
> > > >
> > >
> > >
>
>
=0A
------=_20220420182133000000_54999--