From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <michawe@ifi.uio.no>
Received: from mail-out04.uio.no (mail-out04.uio.no
 [IPv6:2001:700:100:8210::76])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 150943B29E
 for <bloat@lists.bufferbloat.net>; Mon, 11 Jul 2022 02:24:56 -0400 (EDT)
Received: from mail-mx11.uio.no ([129.240.10.83])
 by mail-out04.uio.no with esmtps (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2)
 (envelope-from <michawe@ifi.uio.no>)
 id 1oAmqi-003THK-Tx; Mon, 11 Jul 2022 08:24:52 +0200
Received: from 178.115.63.84.wireless.dyn.drei.com ([178.115.63.84]
 helo=smtpclient.apple)
 by mail-mx11.uio.no with esmtpsa (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256)
 user michawe (Exim 4.94.2) (envelope-from <michawe@ifi.uio.no>)
 id 1oAmqh-0006Q2-Cs; Mon, 11 Jul 2022 08:24:52 +0200
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\))
From: Michael Welzl <michawe@ifi.uio.no>
In-Reply-To: <DE7A9468-056D-44E5-9ADF-DC83B5C10E03@gmx.de>
Date: Mon, 11 Jul 2022 08:24:49 +0200
Cc: Dave Taht <dave.taht@gmail.com>,
 bloat <bloat@lists.bufferbloat.net>
Content-Transfer-Encoding: quoted-printable
Message-Id: <0BAAEF4C-331B-493C-B1F5-47AA648C64F8@ifi.uio.no>
References: <6458C1E6-14CB-4A36-8BB3-740525755A95@ifi.uio.no>
 <CAA93jw6bE2NYSHeAcqL+w_j0Mv4KMxRWJR_AwRVT+vGCYiXg7A@mail.gmail.com>
 <7D20BEF3-8A1C-4050-AE6F-66E1B4203EE1@gmx.de>
 <4E163307-9B8A-4BCF-A2DE-8D7F3C6CCEF4@ifi.uio.no>
 <F5C9EFF0-9DEB-4843-A21E-2DB3E9E44483@gmx.de>
 <95FB54F9-973F-40DE-84BF-90D05A642D6B@ifi.uio.no>
 <DE7A9468-056D-44E5-9ADF-DC83B5C10E03@gmx.de>
To: Sebastian Moeller <moeller0@gmx.de>
X-Mailer: Apple Mail (2.3696.100.31)
X-UiO-SPF-Received: Received-SPF: neutral (mail-mx11.uio.no: 178.115.63.84 is
 neither permitted nor denied by domain of ifi.uio.no) client-ip=178.115.63.84;
 envelope-from=michawe@ifi.uio.no; helo=smtpclient.apple; 
X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0,
 autolearn=disabled, TVD_RCVD_IP=0.001, T_SCC_BODY_TEXT_LINE=-0.01,
 UIO_MAIL_IS_INTERNAL=-5)
X-UiO-Scanned: 0F6D3267929D428CD14A048A0EA24067A2EA0CFA
X-UiOonly: D9D4FBFFC7B8861CBA5072A3440763443607D536
Subject: Re: [Bloat] [iccrg] Musings on the future of Internet Congestion
 Control
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 11 Jul 2022 06:24:56 -0000

Hi Sebastian,

Neither our paper nor me are advocating one particular solution - we =
point at a problem and suggest that research on ways to solve the =
under-utilization problem might be worthwhile.
Jumping from this to discussing the pro=E2=80=99s and con=E2=80=99s of a =
potential concrete solution is quite a leap=E2=80=A6

More below:


> On Jul 10, 2022, at 11:29 PM, Sebastian Moeller <moeller0@gmx.de> =
wrote:
>=20
> Hi Michael,
>=20
>=20
>> On Jul 10, 2022, at 22:01, Michael Welzl <michawe@ifi.uio.no> wrote:
>>=20
>> Hi !
>>=20
>>=20
>>> On Jul 10, 2022, at 7:27 PM, Sebastian Moeller <moeller0@gmx.de> =
wrote:
>>>=20
>>> Hi Michael,
>>>=20
>>> so I reread your paper and stewed a bit on it.
>>=20
>> Many thanks for doing that! :)
>>=20
>>=20
>>> I believe that I do not buy some of your premises.
>>=20
>> you say so, but I don=E2=80=99t really see much disagreement here. =
Let=E2=80=99s see:
>>=20
>>=20
>>> e.g. you write:
>>>=20
>>> "We will now examine two factors that make the the present situation =
particularly worrisome. First, the way the infrastructure has been =
evolving gives TCP an increasingly large operational space in which it =
does not see any feedback at all. Second, most TCP connections are =
extremely short. As a result, it is quite rare for a TCP connection to =
even see a single congestion notification during its lifetime."
>>>=20
>>> And seem to see a problem that flows might be able to finish their =
data transfer business while still in slow start. I see the same data, =
but see no problem. Unless we have an oracle that tells each sender =
(over a shared bottleneck) exactly how much to send at any given time =
point, different control loops will interact on those intermediary =
nodes.
>>=20
>> You really say that you don=E2=80=99t see the solution. The problem =
is that capacities are underutilized, which means that flows take longer =
(sometimes, much longer!) to finish than they theoretically could, if we =
had a better solution.
>=20
> 	[SM] No IMHO the underutilization is the direct consequence of =
requiring a gradual filling of the "pipes" to probe he available =
capacity. I see no way how this could be done differently with the =
traffic sources/sinks being uncoordinated entities at the edge, and I =
see no way of coordinating all end points and handle all paths. In other =
words, we can fine tune a parameters to tweak the probing a bit, make it =
more or less aggressive/fast, but the fact that we need to probe =
capacity somehow means underutilization can not be avoided unless we =
find a way of coordinating all of the sinks and sources. But being =
sufficiently dumb, all I can come up with is an all-knowing oracle or =
faster than light communication, and neither strikes me to be realistic =
;)

There=E2=80=99s quite a spectrum of possibilities between an oracle or =
=E2=80=9Ccoordinating all of the sinks and sources=E2=80=9D on one hand, =
and quite =E2=80=9Cblindly=E2=80=9D probing from a constant IW on the =
other. The =E2=80=9Cfine tuning=E2=80=9D that you mention is interesting =
research, IMO!


>>> I might be limited in my depth of thought here, but having each flow =
probing for capacity seems exactly the right approach... and doubling =
CWND or rate every RTT is pretty aggressive already (making slow start =
shorter by reaching capacity faster within the slow-start framework =
requires either to start with a higher initial value (what increasing IW =
tries to achieve?) or use a larger increase factor than 2 per RTT). I =
consider increased IW a milder approach than the alternative. And once =
one accepts that a gradual rate increasing is the way forward it falls =
out logically that some flows will finish before they reach steady state =
capacity especially if that flows available capacity is large. So what =
exactly is the problem with short flows not reaching capacity and what =
alternative exists that does not lead to carnage if more-aggressive =
start-up phases drive the bottleneck load into emergency drop territory?
>>=20
>> There are various ways to do this

[snip: a couple of concrete suggestions from me, and answers about what =
problems they might have, with requests for references from you]

I=E2=80=99m sorry, but I wasn=E2=80=99t really going to have a =
discussion about these particular possibilities. My point was only that =
many possible directions exist - being completely =E2=80=9Cblind=E2=80=9D =
isn=E2=80=99t the only possible approach.
Instead of answering your comments to my suggestions, let me give you =
one single concrete piece here: our reference 6, as one example of the =
kind of resesarch that we consider worthwhile for the future:

"X. Nie, Y. Zhao, Z. Li, G. Chen, K. Sui, J. Zhang, Z. Ye, and D. Pei, =
=E2=80=9CDynamic TCP initial windows and congestion control schemes =
through reinforcement learning,=E2=80=9D IEEE JSAC, vol. 37, no. 6, =
2019.=E2=80=9D
https://1989chenguo.github.io/Publications/TCP-RL-JSAC19.pdf

This work learns a useful value of IW over time, rather than using a =
constant. One author works at Baidu, the paper uses data from Baidu, and =
it says:
"TCP-RL has been deployed in one of the top global search engines for =
more than a year. Our online and testbed experiments show that for short =
flow transmission, compared with the common practice of IW =3D 10, =
TCP-RL can reduce the average transmission time by 23% to 29%.=E2=80=9D

- so it=E2=80=99s probably fair to assume that this was (and perhaps =
still is) active in Baidu.


>>> And as an aside, a PEP (performance enhancing proxy) that does not =
enhance performance is useless at best and likely harmful (rather a PDP, =
performance degrading proxy).
>>=20
>> You=E2=80=99ve made it sound worse by changing the term, for whatever =
that=E2=80=99s worth. If they never help, why has anyone ever called =
them PEPs in the first place?
>=20
> 	[SM] I would guess because "marketing" was unhappy with =
"engineering" emphasizing the side-effects/potential problems and =
focussed in the best-case scenario? ;)

It appears that you want to just ill-talk PEPs. There are plenty of =
useful things that they can do and yes, I personally think they=E2=80=99re=
 the way of the future - but **not** in their current form, where they =
must =E2=80=9Clie=E2=80=9D to TCP, cause ossification, etc.  PEPs have =
never been considered as part of the congestion control design - when =
they came on the scene, in the IETF, they were despised for breaking the =
architecture, and then all the trouble with how they need to play tricks =
was discovered (spoofing IP addresses, making assumptions about header =
fields, and whatnot). That doesn=E2=80=99t mean that a very different =
kind of PEP - one which is authenticated and speaks an agreed-upon =
protocol - couldn=E2=80=99t be a good solution.

You=E2=80=99re bound to ask me for concrete things next, and if I give =
you something concrete (e.g., a paper on PEPs), you=E2=80=99ll find =
something bad about it - but this is not a constructive direction of =
this conversation. Please note that I=E2=80=99m not saying =E2=80=9CPEPs =
are always good=E2=80=9D: I only say that, in my personal opinion, =
they=E2=80=99re a worthwhile direction of future research. That=E2=80=99s =
a very different statement.


>> Why do people buy these boxes?
>=20
> 	[SM] Because e.g. for GEO links, latency is in a range where =
default unadulterated TCP will likely choke on itself, and when faced =
with requiring customers to change/tune TCPs or having "PEP" fudge it, =
ease of use of fudging won the day. That is a generous explanation (as =
this fudging is beneficial to both the operator and most end-users), I =
can come up with less charitable theories if you want ;) .
>=20
>>> The network so far has been doing reasonably well with putting more =
protocol smarts at the ends than in the parts in between.
>>=20
>> Truth is, PEPs are used a lot: at cellular edges, at satellite =
links=E2=80=A6 because the network is *not* always doing reasonably well =
without them.
>=20
> 	[SM] Fair enough, I accept that there are use cases for those, =
but again, only if the actually enhance the "experience" will users be =
happy to accept them.

=E2=80=A6 and that=E2=80=99s the only reason to deploy them, given that =
(as the name suggests) they=E2=80=99re meant to increase performance. =
I=E2=80=99d be happy to learn more about why you appear to hate them so =
much (even just anecdotal).


> The goals of the operators and the paying customers are not always =
aligned here, a PEP might be advantageous more to the operator than the =
end-user (theoretically also the other direction, but since operators =
pay for PEPs they are unlikely to deploy those) think mandatory image =
recompression or forced video quality downscaling.... (and sure these =
are not as clear as I pitched them, if after an emergency a PEP allows =
most/all users in a cell to still send somewhat degraded images that is =
better than the network choking itself with a few high quality images, =
assuming images from the emergency are somewhat useful).

What is this, are you inventing a (too me, frankly, strange) scenario =
where PEPs do some evil for customers yet help operators, or is there an =
anecdote here?


>>> I have witnessed the arguments in the "L4S wars" about how little =
processing one can ask the more central network nodes perform, e.g. flow =
queueing which would solve a lot of the issues (e.g. a hyper aggressive =
slow-start flow would mostly hurt itself if it overshoots its capacity) =
seems to be a complete no-go.
>>=20
>> That=E2=80=99s to do with scalability, which depends on how close to =
the network=E2=80=99s edge one is.
>=20
> 	[SM] I have heard the alternative that it has to do with what =
operators of core-links request from their vendors and what features =
they are willing to pay for... but this is very anecdotal as I have =
little insight into big-iron vendors or core-link operators.=20
>=20
>>> I personally think what we should do is have the network supply more =
information to the end points to control their behavior better. E.g. if =
we would mandate a max_queue-fill-percentage field in a protocol header =
and have each node write max(current_value_of_the_field, =
queue-filling_percentage_of_the_current_node) in every packet, end =
points could estimate how close to congestion the path is (e.g. by =
looking at the rate of %queueing changes) and tailor their =
growth/shrinkage rates accordingly, both during slow-start and during =
congestion avoidance.
>>=20
>> That could well be one way to go. Nice if we provoked you to think!
>=20
> 	[SM] You mostly made me realize what the recent increases in IW =
actually aim to accomplish ;)

That=E2=80=99s fine!  Increasing IW is surely a part of the solution =
space - though I advocate doing something else (as in the example above) =
than just to increase the constant in a worldwide standard.


> and that current slow start seems actually better than its reputation; =
it solves a hard problem surprisingly well.

Actually, given that the large majority of flows end somewhere in slow =
start, what makes you say that it solves it =E2=80=9Cwell=E2=80=9D?


> The max(pat_queue%) idea has been kicking around in my head ever since =
reading a paper about storing queue occupancy into packets to help CC =
along (sorry, do not recall the authors or the title right now) so that =
is not even my own original idea, but simply something I borrowed from =
smarter engineers simply because I found the data convincing and the =
theory sane. (Also because I grudgingly accept that latency increases =
measured over the internet are a tad too noisy to be easily useful* and =
too noisy for a meaningful controller based on the latency rate of =
change**)
>=20
>>> But alas we seem to go the path of a relative dumb 1 bit signal =
giving us an under-defined queue filling state instead and to estimate =
relative queue filling dynamics from that we need many samples (so =
literally too little too late, or L3T2), but I digress.
>>=20
>> Yeah you do :-)
>=20
> 	[SM] Less than you let on ;). If L4S gets ratified

[snip]

I=E2=80=99m really not interested in an L4S debate.

Cheers,
Michael