From mboxrd@z Thu Jan 1 00:00:00 1970
Return-Path:
Received: from smtp124.iad3a.emailsrvr.com (smtp124.iad3a.emailsrvr.com
[173.203.187.124])
(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
(No client certificate requested)
by lists.bufferbloat.net (Postfix) with ESMTPS id 6E3FD3B29E
for ; Sat, 15 Jun 2019 16:32:24 -0400 (EDT)
Received: from smtp16.relay.iad3a.emailsrvr.com (localhost [127.0.0.1])
by smtp16.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 44EE243D1;
Sat, 15 Jun 2019 16:32:24 -0400 (EDT)
X-SMTPDoctor-Processed: csmtpprox beta
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=g001.emailsrvr.com;
s=20190322-9u7zjiwi; t=1560630744;
bh=jsoCPcUsupTlRymu9MYl1lf2RSdNKcjQOxE0DDQ83rE=;
h=Date:Subject:From:To:From;
b=z38oWiUgF8xb2c1r62e1mFaO788uAUfO0uxiMMkvM39wqh5lN3Sdk7CUBId0YPfQf
FkaSlEJs/W+qYLMmcvjxBlYvUNhz2wXSJJf43R2/ZwAfUNibR/HYW6ubHE+H37wzCb
8/nj+LLEptUqLXmUH5wXD8DaM+R+bGd70sO3jVSY=
Received: from app62.wa-webapps.iad3a (relay-webapps.rsapps.net
[172.27.255.140])
by smtp16.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 02EF1BA7;
Sat, 15 Jun 2019 16:32:23 -0400 (EDT)
X-Sender-Id: dpreed@deepplum.com
Received: from app62.wa-webapps.iad3a (relay-webapps.rsapps.net
[172.27.255.140]) by 0.0.0.0:25 (trex/5.7.12);
Sat, 15 Jun 2019 16:32:24 -0400
Received: from deepplum.com (localhost.localdomain [127.0.0.1])
by app62.wa-webapps.iad3a (Postfix) with ESMTP id E414D60046;
Sat, 15 Jun 2019 16:32:23 -0400 (EDT)
Received: by apps.rackspace.com
(Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com)
with HTTP; Sat, 15 Jun 2019 16:32:23 -0400 (EDT)
X-Auth-ID: dpreed@deepplum.com
Date: Sat, 15 Jun 2019 16:32:23 -0400 (EDT)
From: "David P. Reed"
To: "Dave Taht"
Cc: "ECN-Sane"
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_20190615163223000000_48702"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To:
References:
Message-ID: <1560630743.930819555@apps.rackspace.com>
X-Mailer: webmail/16.4.5-RC
Subject: Re: [Ecn-sane]
=?utf-8?q?I_think_a_defense_of_fq=5Fx_and_co-design_of?=
=?utf-8?q?_new_transports_might_be_good?=
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
Internet
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Sat, 15 Jun 2019 20:32:24 -0000
------=_20190615163223000000_48702
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
=0AMost web servers I see (like NGINX configurations recommended) do not se=
em to be in slow start much of the time.=0A =0AI'd like to see some actual =
data, rather than hand waving or references to 10 year old papers.=0A =0AGo=
ogle is moving rapidly to protocols that run on UDP and have vestigial cong=
estion control, if any. (and AFAICT, no research whatever regarding congest=
ion behavior under load that saturates the last mile link.)=0A =0AIt bugs t=
he heck out of me that the congestion control community doesn't look at the=
"real world", just simulations and benchmarks that are of dubious reality.=
=0A =0A =0AOn Saturday, June 15, 2019 12:57pm, "Dave Taht" said:=0A=0A=0A=0A> it would be a good paper to write. This is a draft=
of points I'd like=0A> to cover, not an attempt at a more formal email,=0A=
> I just needed to get this much out of my system, on the ecn-sane list.=0A=
> =0A> # about fq_x=0A> =0A> fq_x (presently fq_codel, fq_pie, sch_cake) ha=
ve pretty much the same=0A> fq algorithm. It has one new characteristic=0A>=
compared to all the prior FQ ones - truly sparse flows see no queue at=0A>=
all, otherwise the observed queue size is f,=0A> where f =3D the number of=
queue building flows. If you have 3 full size=0A> packets queued, you have=
3f. No transport currently takes advantage of=0A> this fairly tiny differe=
nce between "no queue" and "f queue".=0A> =0A> We use bytes, rather than pa=
ckets, also, in our calculations as that=0A> translates to time.=0A> =0A> I=
'm perpetually throwing around a statistic like "95% of all flows=0A> never=
get out of slow start", that most are sender limited,=0A> and so on, and t=
hus (especially if paced) get 0 delay all the time in=0A> FQ_x, or "0 first=
packet + pf" for the burst of packets.=0A> =0A> this is an essential, fine=
difference in measurement that can be=0A> tracked receiver side unique to =
fq_x.=0A> =0A> ... where all it takes with a single queue, with AQM on, is =
one greedy=0A> flow, to induce L latency on all flows, which in the case of=
pie/codel=0A> is > 16/5ms - with plenty of jitter until things settle down=
. ( I wish=0A> there was a way to express in a variable that it has a bound=
ed range=0A> of some sort, a ~16ms isn't good, >16ms or 16+ms neither )=0A>=
=0A> dualpi retains that >16ms characteristic for normal flows, and a=0A> =
claimed 1ms for dualpi, which is... IMHO simply impossible in a wide=0A> ra=
nge of circumstances, but I'd just as soon try to focus on improving=0A> FQ=
_x and co-designed transports in a more ideal world for a while, on=0A> thi=
s thread.=0A> =0A> For purposes of exposition, let's assume that fq_x is th=
e dominant AQM=0A> algorithm in the world, the only one with=0A> a proven a=
nd oft enabled, and *deterministic*, RFC3168 CE response on=0A> overload, w=
here a loss is assumed equivalent to a mark.=0A> =0A> In terms of co-design=
ing a transport for it, a transport can then=0A> assume that a CE mark is c=
oming from FQ_x. Knowing that,=0A> there are new curves that can be followe=
d in various phases of the=0A> evolution of a flow.=0A> =0A> Abstractly:=0A=
> =0A> 0 delay - we have capacity to spare, grow the window=0A> "some delay=
" - we have a queue of "f", and thus a thinner setpoint observable.=0A> mil=
d jitter between a recent arrival and the rest of the burst (the=0A> sparse=
flow optimization)=0A> =0A> # Benefits of FQ_x=0A> =0A> FQ_x is robust aga=
inst abuse. A single flow cannot overwhelm it. Some=0A> level of service is=
guaranteed for the vast=0A> majority of flows (excepting collisions) in th=
e number of flows configured.=0A> FQ_x is also robust against different tre=
atments of drop (bbr without=0A> ecn) and CE (l4s)=0A> FQ_x allows for dela=
y based and hybrid delay based (like BBR) to "just=0A> work", without any e=
cn support at all. The additional support in "x"=0A> pushes queue lengths f=
or drop based algorithms back to where the most=0A> common TCPs can shift b=
ack=0A> into classic slow start and congestion avoidance modes, instead of=
=0A> being bound (as they are often today) in rwind, etc.=0A> FQ_x is (add =
more)=0A> =0A> # Some observations regarding a CE mark=0A> =0A> Packet loss=
is a weak signal of a variety of events.=0A> =0A> A CE mark is a currently=
a strong signal you are in FQ_x - the odds=0A> are good, this will be the =
event that kicks the transport out of slow=0A> start. Now knowing you got a=
CE mark, gives you a chance to optimize,=0A> knowing that your queue lengt=
h is not a fifo, but relative to "f". In=0A> BBR's case in particular, rese=
tting the bandwidth and pacing rate to=0A> the lowest recently observed (in=
the last 100 ms) "RTT - a little" is=0A> better than the classic RFC3168 r=
esponse of halving.=0A> =0A> One thing that bugs me about RTT based measure=
ments is when the return=0A> path is inflated - in FQ_x it's a decent assum=
ption that both sides of=0A> the path have FQ, so the ack return path is fa=
r less inflated, but in=0A> pie/dualpi/codel it certainly can be for a vari=
ety of reasons. This is=0A> why the rrul test exists. ack thinning does hel=
p also. the amount of=0A> potential=0A> jitter in the return path is enormo=
us, and one benchmark I've not yet=0A> seen from anyone on that side.=0A> =
=0A> moving sideways:=0A> =0A> I happen to like (in terms of determinism) a=
n even stronger signal=0A> than RFC3168, "loss and mark", where a combinati=
on of loss and marks=0A> is even more meaningful than either, and thus the =
sender should back=0A> off even harder (or, the receiver pretend it got CE =
in two different=0A> RTTs). when we have queue sizes elsewhere measured in =
seconds, and a=0A> colossal bufferbloat mess in general, anything that move=
s a link below=0A> capacity would be great. The deterministic "loss and mar=
k" feature was=0A> in cake until a year or two back but I never got around =
much to=0A> mucking with a transport's interpretation of it.=0A> =0A> # The=
SCE concept in addition to that=0A> =0A> With or without SCE, just that mu=
ch, just that normal CE signal, is=0A> enough to evolve a transport towards=
more sensitive=0A> delay based signaling. It could be added to cubic, for =
example...=0A> =0A> Anyway...=0A> =0A> We have two public implementations o=
f SCE under test - the cake one=0A> uses a ramp, the fq_codel_fast one just=
uses=0A> a setpoint where we have a consistently measurable queue (1ms), a=
nd=0A> that setpoint is different=0A> for wifi (1-2 TXOPs)=0A> =0A> SCE (pr=
esently) kicks in almost immediately upon building a queue.=0A> Often, imme=
diately! with IW10 at low bandwidths, (without initial=0A> spreading, pacin=
g or chirping). There is also the bulkyness of=0A> draining the oft-large r=
x ring and the effects=0A> of NAPI interrupt mitigation to deal with - whic=
h is usually around 1ms.=0A> =0A> Thus it is an extremely strong signal bot=
h that there is a queue, and=0A> that fq_x is present. SCE requires support=
at the receiver - not the=0A> sender - in order to work at all. The receiv=
er can decide what to do=0A> with it. My own first experimental preference =
was to kick tcp out of=0A> slow start on receipt of any SCE mark, but after=
wards in congestion=0A> avoidance as a much more gradual signal, or even ig=
nore it entirely.=0A> I'm grumpy enough about IW10 to still consider that, =
but as the=0A> current=0A> sch_fq code does indeed pace the next burst, per=
haps ignoring SCE on=0A> the first few packets of a connection is useful to=
consider, also.=0A> =0A> There is plenty of work on all the congestion avo=
idance mode stuff=0A> (reusing nonce sum, accecn, etc), but the key point=
=0A> (for me) was signalling and thinking hard about the fact that fq_x was=
=0A> present and that f governed the behavior of the queues. Knowing this,=
=0A> growth and signalling patterns such as ELR, dctcp etc, can change.=0A>=
=0A> # Benefits of SCE=0A> =0A> * Plenty of stuff to write here that has b=
een written elsewhere=0A> =0A> * Backward compatible=0A> * gradual upgrade=
=0A> * easy change to fq_x=0A> * SCE re-enables the possibility of low prio=
rity congestion control=0A> for background tcp flows=0A> =0A> =0A> --=0A> =
=0A> Dave T=C3=A4ht=0A> CTO, TekLibre, LLC=0A> http://www.teklibre.com=0A> =
Tel: 1-831-205-9740=0A> _______________________________________________=0A>=
Ecn-sane mailing list=0A> Ecn-sane@lists.bufferbloat.net=0A> https://lists=
.bufferbloat.net/listinfo/ecn-sane=0A>
------=_20190615163223000000_48702
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Most web servers I see=
(like NGINX configurations recommended) do not seem to be in slow start mu=
ch of the time.
=0A
=0AI'd like to see some actual data, rather than hand waving or references t=
o 10 year old papers.
=0A
=0AGoogle is moving rapidly to protocols that run on UDP and have vest=
igial congestion control, if any. (and AFAICT, no research whatever regardi=
ng congestion behavior under load that saturates the last mile link.)
=
=0A
=0AIt bugs the heck =
out of me that the congestion control community doesn't look at the "real w=
orld", just simulations and benchmarks that are of dubious reality.
=0A<=
p style=3D"margin:0;padding:0;font-family: arial; font-size: 12pt; overflow=
-wrap: break-word;">
=0A
=0AOn Saturday, June 15, 2019 12:57pm, "Dave Taht" <dave.taht@=
gmail.com> said:
=0A=0A<=
p style=3D"margin:0;padding:0;font-family: arial; font-size: 12pt; overflow=
-wrap: break-word;">> it would be a good paper to write. This is a draft=
of points I'd like
> to cover, not an attempt at a more formal ema=
il,
> I just needed to get this much out of my system, on the ecn-s=
ane list.
>
> # about fq_x
>
> fq_x (pres=
ently fq_codel, fq_pie, sch_cake) have pretty much the same
> fq al=
gorithm. It has one new characteristic
> compared to all the prior =
FQ ones - truly sparse flows see no queue at
> all, otherwise the o=
bserved queue size is f,
> where f =3D the number of queue building=
flows. If you have 3 full size
> packets queued, you have 3f. No t=
ransport currently takes advantage of
> this fairly tiny difference=
between "no queue" and "f queue".
>
> We use bytes, rathe=
r than packets, also, in our calculations as that
> translates to t=
ime.
>
> I'm perpetually throwing around a statistic like =
"95% of all flows
> never get out of slow start", that most are sen=
der limited,
> and so on, and thus (especially if paced) get 0 dela=
y all the time in
> FQ_x, or "0 first packet + pf" for the burst of=
packets.
>
> this is an essential, fine difference in mea=
surement that can be
> tracked receiver side unique to fq_x.
&=
gt;
> ... where all it takes with a single queue, with AQM on, is =
one greedy
> flow, to induce L latency on all flows, which in the c=
ase of pie/codel
> is > 16/5ms - with plenty of jitter until thi=
ngs settle down. ( I wish
> there was a way to express in a variabl=
e that it has a bounded range
> of some sort, a ~16ms isn't good, &=
gt;16ms or 16+ms neither )
>
> dualpi retains that >16m=
s characteristic for normal flows, and a
> claimed 1ms for dualpi, =
which is... IMHO simply impossible in a wide
> range of circumstanc=
es, but I'd just as soon try to focus on improving
> FQ_x and co-de=
signed transports in a more ideal world for a while, on
> this thre=
ad.
>
> For purposes of exposition, let's assume that fq_x=
is the dominant AQM
> algorithm in the world, the only one with
> a proven and oft enabled, and *deterministic*, RFC3168 CE response =
on
> overload, where a loss is assumed equivalent to a mark.
&=
gt;
> In terms of co-designing a transport for it, a transport can=
then
> assume that a CE mark is coming from FQ_x. Knowing that,
> there are new curves that can be followed in various phases of the<=
br />> evolution of a flow.
>
> Abstractly:
> <=
br />> 0 delay - we have capacity to spare, grow the window
> "s=
ome delay" - we have a queue of "f", and thus a thinner setpoint observable=
.
> mild jitter between a recent arrival and the rest of the burst =
(the
> sparse flow optimization)
>
> # Benefits of=
FQ_x
>
> FQ_x is robust against abuse. A single flow cann=
ot overwhelm it. Some
> level of service is guaranteed for the vast=
> majority of flows (excepting collisions) in the number of flows =
configured.
> FQ_x is also robust against different treatments of d=
rop (bbr without
> ecn) and CE (l4s)
> FQ_x allows for dela=
y based and hybrid delay based (like BBR) to "just
> work", without=
any ecn support at all. The additional support in "x"
> pushes que=
ue lengths for drop based algorithms back to where the most
> commo=
n TCPs can shift back
> into classic slow start and congestion avoi=
dance modes, instead of
> being bound (as they are often today) in =
rwind, etc.
> FQ_x is (add more)
>
> # Some observ=
ations regarding a CE mark
>
> Packet loss is a weak signa=
l of a variety of events.
>
> A CE mark is a currently a s=
trong signal you are in FQ_x - the odds
> are good, this will be th=
e event that kicks the transport out of slow
> start. Now knowing y=
ou got a CE mark, gives you a chance to optimize,
> knowing that yo=
ur queue length is not a fifo, but relative to "f". In
> BBR's case=
in particular, resetting the bandwidth and pacing rate to
> the lo=
west recently observed (in the last 100 ms) "RTT - a little" is
> b=
etter than the classic RFC3168 response of halving.
>
> On=
e thing that bugs me about RTT based measurements is when the return
&=
gt; path is inflated - in FQ_x it's a decent assumption that both sides of<=
br />> the path have FQ, so the ack return path is far less inflated, bu=
t in
> pie/dualpi/codel it certainly can be for a variety of reason=
s. This is
> why the rrul test exists. ack thinning does help also.=
the amount of
> potential
> jitter in the return path is e=
normous, and one benchmark I've not yet
> seen from anyone on that =
side.
>
> moving sideways:
>
> I happen t=
o like (in terms of determinism) an even stronger signal
> than RFC=
3168, "loss and mark", where a combination of loss and marks
> is e=
ven more meaningful than either, and thus the sender should back
> =
off even harder (or, the receiver pretend it got CE in two different
&=
gt; RTTs). when we have queue sizes elsewhere measured in seconds, and a
> colossal bufferbloat mess in general, anything that moves a link be=
low
> capacity would be great. The deterministic "loss and mark" fe=
ature was
> in cake until a year or two back but I never got around=
much to
> mucking with a transport's interpretation of it.
&g=
t;
> # The SCE concept in addition to that
>
> Wi=
th or without SCE, just that much, just that normal CE signal, is
>=
enough to evolve a transport towards more sensitive
> delay based =
signaling. It could be added to cubic, for example...
>
> =
Anyway...
>
> We have two public implementations of SCE un=
der test - the cake one
> uses a ramp, the fq_codel_fast one just u=
ses
> a setpoint where we have a consistently measurable queue (1ms=
), and
> that setpoint is different
> for wifi (1-2 TXOPs)<=
br />>
> SCE (presently) kicks in almost immediately upon build=
ing a queue.
> Often, immediately! with IW10 at low bandwidths, (wi=
thout initial
> spreading, pacing or chirping). There is also the b=
ulkyness of
> draining the oft-large rx ring and the effects
&=
gt; of NAPI interrupt mitigation to deal with - which is usually around 1ms=
.
>
> Thus it is an extremely strong signal both that ther=
e is a queue, and
> that fq_x is present. SCE requires support at t=
he receiver - not the
> sender - in order to work at all. The recei=
ver can decide what to do
> with it. My own first experimental pref=
erence was to kick tcp out of
> slow start on receipt of any SCE ma=
rk, but afterwards in congestion
> avoidance as a much more gradual=
signal, or even ignore it entirely.
> I'm grumpy enough about IW10=
to still consider that, but as the
> current
> sch_fq code=
does indeed pace the next burst, perhaps ignoring SCE on
> the fir=
st few packets of a connection is useful to consider, also.
>
> There is plenty of work on all the congestion avoidance mode stuff
> (reusing nonce sum, accecn, etc), but the key point
> (for =
me) was signalling and thinking hard about the fact that fq_x was
>=
present and that f governed the behavior of the queues. Knowing this,
> growth and signalling patterns such as ELR, dctcp etc, can change.
>
> # Benefits of SCE
>
> * Plenty of stuff=
to write here that has been written elsewhere
>
> * Backw=
ard compatible
> * gradual upgrade
> * easy change to fq_x<=
br />> * SCE re-enables the possibility of low priority congestion contr=
ol
> for background tcp flows
>
>
> --
>
> Dave T=C3=A4ht
> CTO, TekLibre, LLC
> ht=
tp://www.teklibre.com
> Tel: 1-831-205-9740
> _____________=
__________________________________
> Ecn-sane mailing list
>=
; Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/li=
stinfo/ecn-sane
> =0A
------=_20190615163223000000_48702--