From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.15.18])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id B82153B29E
 for <bloat@lists.bufferbloat.net>; Sun, 10 Jul 2022 17:29:42 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net;
 s=badeba3b8450; t=1657488580;
 bh=bCUa9ius/d3WD8VZViPr8xRCdb9sGk19us18bUjj23g=;
 h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To;
 b=BNsZHRsP3HZfF9xc/5M3A2BF/XfxNvfjotwGfSgvtXcQVDKjsImRhvDHzXOUdXF8d
 lHrvvYvKBazFxRzvLkwvOxUlgL8KM4bjeKxQMX2jG6P74QusyOIKMdJ64yvYVzwROw
 VTq7Usl9OPTSRwh8FMOQ75hUutuBKfb5QGsy7nVI=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from smtpclient.apple ([77.3.115.149]) by mail.gmx.net (mrgmx004
 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MkYXs-1nimXS35Zu-00lzok; Sun, 10
 Jul 2022 23:29:40 +0200
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <95FB54F9-973F-40DE-84BF-90D05A642D6B@ifi.uio.no>
Date: Sun, 10 Jul 2022 23:29:39 +0200
Cc: =?utf-8?Q?Dave_T=C3=A4ht?= <dave.taht@gmail.com>,
 bloat <bloat@lists.bufferbloat.net>
Content-Transfer-Encoding: quoted-printable
Message-Id: <DE7A9468-056D-44E5-9ADF-DC83B5C10E03@gmx.de>
References: <6458C1E6-14CB-4A36-8BB3-740525755A95@ifi.uio.no>
 <CAA93jw6bE2NYSHeAcqL+w_j0Mv4KMxRWJR_AwRVT+vGCYiXg7A@mail.gmail.com>
 <7D20BEF3-8A1C-4050-AE6F-66E1B4203EE1@gmx.de>
 <4E163307-9B8A-4BCF-A2DE-8D7F3C6CCEF4@ifi.uio.no>
 <F5C9EFF0-9DEB-4843-A21E-2DB3E9E44483@gmx.de>
 <95FB54F9-973F-40DE-84BF-90D05A642D6B@ifi.uio.no>
To: Michael Welzl <michawe@ifi.uio.no>
X-Mailer: Apple Mail (2.3696.100.31)
X-Provags-ID: V03:K1:Ucusq0AIsyEAL77RwEdKP6MyG5ShmW9wNWG7nH1N1eCjMWepxTH
 /9PtRsVKOvKv+yvljRrND47LgEFr89MuG3I/KHV0aKYi1LGamKgIJJtzTrxFf8nONwzPXMc
 7hR8ATVS8moRZT7VbDc/jC+nsYUhXwHkZijAP8Zc5Yrqxd1aGPcoKA9kioJ7jvZipHuwXWW
 1UCkQ8gMu7HMB/FFvorxw==
X-Spam-Flag: NO
X-UI-Out-Filterresults: notjunk:1;V03:K0:eXql5JpAn5s=:xUUHyiJTWbK+2drCYDL6Ws
 UE1HqmIAAdxtskV+Wnuo4BqU8pOv6prua/aWF53VdgPzPSHQmy1RGawc4g7Zy1Bn8HioW3xLp
 oO0iqn+OXpo+TqdcuMD0xvQKQ5kM74NHVNCcsYOngP6MNn56qr4Rm6oaYYNRfLPSuGp2L3aou
 FApUoWSCEpFBMPllOrxcFHDjKM1ArRPr3YsA3qeJwevRsRae/gqryfkvkjVTko8PIzMEqmbVi
 BiUny/UMqQE9U0VlY+wTtG9qMnf7tutEKKDz2fhYteqbaizZ/Mr25JOwX9ApDcsc2ikE/CpAr
 2BanzA9eW0BFvM+4u/RBTqYAb9hMnlXd8QojHwnHBEVNEB7gnH4ezCzHc8XefTdS12vUt5eF8
 AbYbztCbeyzXWrGKxAVlyMFI++98eDOIpI+2EKj89OraJGvL14YjudUhHwz+M649qFg9IAQRl
 3UKJHG2dJstOZINnWj4ge7lwe4Bo/9U/dF4IJnl8kKhj2ttvihJbaeuTaFKZyRh1XQ0+xlyS3
 kgVO0LxRFTd8gOGFzFqp+fpT16w9oWe12yPzjSEwvb9ysSPa9igHgPfjuR5darxiN6YLzxgXb
 VMtswlN6KUnTVwCoi3m9E0jYbeSx8Hps6NNopjMPysHuh3EttuoW7xuoNpVjHto2Q0yPlEKON
 i9OPJn8hUqV/VxcUjRt3UmFiYciuQO2lFKBMNBRmFhh5dMDXBOiRhkWBePJhwKfbcSkh35EK+
 L1PAeP0wszdzcv+9IMpfJWANnQa5nbd882AYOKqBppX9DZXkrqmXYi07sM/95xsnBAe+Jp3+J
 0J5ur0wYGti9l6yMJR8olQOUVddfRPii/CwfAJxDGl6ja7Ywr0JABownz6k59uKn9rLEHPgyk
 3rNxUBo1Enb6O4Bq6TobxV5cYD7nTsKB5LfsPnf1UOi8qT63ZTSPUK+jK83WK0IDIq0qo9+jE
 dV+gBgw9T5z8Sryggm2d0c5zbTQ1p3f5Fcogy4MxOZ0PxbBf33z8c9lltLM0UtDJBCGe5HLY/
 8J7DOadv6S6XSSOJogwmC32FlN3pInv99d6QZ22dGENc7fuaxI347E44x1Kw+Ln8O51pjWmQV
 Dgw+cq9YPkYei6qpDElCHgOR6UyNBXfetG52Nc19G7KucRzHGqG+kkwNA==
Subject: Re: [Bloat] [iccrg] Musings on the future of Internet Congestion
 Control
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 10 Jul 2022 21:29:43 -0000

Hi Michael,


> On Jul 10, 2022, at 22:01, Michael Welzl <michawe@ifi.uio.no> wrote:
>=20
> Hi !
>=20
>=20
>> On Jul 10, 2022, at 7:27 PM, Sebastian Moeller <moeller0@gmx.de> =
wrote:
>>=20
>> Hi Michael,
>>=20
>> so I reread your paper and stewed a bit on it.
>=20
> Many thanks for doing that! :)
>=20
>=20
>> I believe that I do not buy some of your premises.
>=20
> you say so, but I don=E2=80=99t really see much disagreement here. =
Let=E2=80=99s see:
>=20
>=20
>> e.g. you write:
>>=20
>> "We will now examine two factors that make the the present situation =
particularly worrisome. First, the way the infrastructure has been =
evolving gives TCP an increasingly large operational space in which it =
does not see any feedback at all. Second, most TCP connections are =
extremely short. As a result, it is quite rare for a TCP connection to =
even see a single congestion notification during its lifetime."
>>=20
>> And seem to see a problem that flows might be able to finish their =
data transfer business while still in slow start. I see the same data, =
but see no problem. Unless we have an oracle that tells each sender =
(over a shared bottleneck) exactly how much to send at any given time =
point, different control loops will interact on those intermediary =
nodes.
>=20
> You really say that you don=E2=80=99t see the solution. The problem is =
that capacities are underutilized, which means that flows take longer =
(sometimes, much longer!) to finish than they theoretically could, if we =
had a better solution.

	[SM] No IMHO the underutilization is the direct consequence of =
requiring a gradual filling of the "pipes" to probe he available =
capacity. I see no way how this could be done differently with the =
traffic sources/sinks being uncoordinated entities at the edge, and I =
see no way of coordinating all end points and handle all paths. In other =
words, we can fine tune a parameters to tweak the probing a bit, make it =
more or less aggressive/fast, but the fact that we need to probe =
capacity somehow means underutilization can not be avoided unless we =
find a way of coordinating all of the sinks and sources. But being =
sufficiently dumb, all I can come up with is an all-knowing oracle or =
faster than light communication, and neither strikes me to be realistic =
;)


>=20
>=20
>> I might be limited in my depth of thought here, but having each flow =
probing for capacity seems exactly the right approach... and doubling =
CWND or rate every RTT is pretty aggressive already (making slow start =
shorter by reaching capacity faster within the slow-start framework =
requires either to start with a higher initial value (what increasing IW =
tries to achieve?) or use a larger increase factor than 2 per RTT). I =
consider increased IW a milder approach than the alternative. And once =
one accepts that a gradual rate increasing is the way forward it falls =
out logically that some flows will finish before they reach steady state =
capacity especially if that flows available capacity is large. So what =
exactly is the problem with short flows not reaching capacity and what =
alternative exists that does not lead to carnage if more-aggressive =
start-up phases drive the bottleneck load into emergency drop territory?
>=20
> There are various ways to do this; one is to cache information and =
re-use it, assuming that - at least sometimes - new flows will see the =
same path again.

	[SM] And equally important, that a flow's capacity share along a =
path did not change be other flows appearing on the same path. This is a =
case of speculation which depending on link and path type will work out =
well more or less often, the question then becomes is the improvement on =
successful speculation worth the cost of unsuccessful speculation =
(mostly the case where the estimate is wildly above the path capacity). =
Personally I think that having each flow start searching achievable =
capacity from the "bottom" seems more robust and reliable. I would agree =
though that better managing the typical overshoot of slow start is a =
worthy goal (one if tackled successfully might allow a faster capacity =
search approach).

> Another is to let parallel flows share information.

	[SM] Sounds sweet, but since not even two back-to-back packets =
send over the internet from A to B are guaranteed to take exactly the =
same path, confirming that flows actually share a sufficiently similar =
path seems tricky. Also stipulating two flows actually share a common =
path say over a capacity limiting node we have say 99 other flows and =
our parallel already established flow in equilibrium, now our new flow =
probably could start with an CWND close to the established flows. But if =
the bottleneck is fully occupied with our parallel established flow the =
limit would be 50% of the existing flow's rate, but only if that flow =
actually has enough time to give way...

Could you elaborate how that could work, please?

> Yet another is to just be blindly more aggressive.

	[SM] Sure, works if the "cost" of admitting too much data is =
acceptable, alas from an end users perspective I know I have flows where =
I do not care much if the overcommit and start throttling themselves =
(think background bulk transfer) but where I would get unhappy if their =
over aggression would interfere with other more important to me flows =
(that is a part why i am a happy flow queueing user, FQ helps a lot in =
isolating the fall-out from overly aggressive flows mainly to =
themselves).


> Yet another, chirping.

	[SM] I would love for that to work, but I have seen no =
convincing data yet demonstrating that over the existing internet, =
however we know already from other papers, that inter-packet delay is a =
somewhat unreliable estimator for capacity, so using that, even in =
clever ways requires some accumulation and smoothing, so I wonder how =
much faster/better this is actually going compared to existing slow =
start with a sufficiently high starting IW? In a meta criticism way, I =
am somewhat surprised how little splash paced chirping seems to be =
making given how positive its inventors presented it, might be the =
typical inertia of the field or an indication that PC might not yet be =
ready for show-time. However, if you think of somethnig else than paced =
chirping here, could you share a reference, please?


>=20
>=20
>> And as an aside, a PEP (performance enhancing proxy) that does not =
enhance performance is useless at best and likely harmful (rather a PDP, =
performance degrading proxy).
>=20
> You=E2=80=99ve made it sound worse by changing the term, for whatever =
that=E2=80=99s worth. If they never help, why has anyone ever called =
them PEPs in the first place?

	[SM] I would guess because "marketing" was unhappy with =
"engineering" emphasizing the side-effects/potential problems and =
focussed in the best-case scenario? ;)

> Why do people buy these boxes?

	[SM] Because e.g. for GEO links, latency is in a range where =
default unadulterated TCP will likely choke on itself, and when faced =
with requiring customers to change/tune TCPs or having "PEP" fudge it, =
ease of use of fudging won the day. That is a generous explanation (as =
this fudging is beneficial to both the operator and most end-users), I =
can come up with less charitable theories if you want ;) .

>> The network so far has been doing reasonably well with putting more =
protocol smarts at the ends than in the parts in between.
>=20
> Truth is, PEPs are used a lot: at cellular edges, at satellite =
links=E2=80=A6 because the network is *not* always doing reasonably well =
without them.

	[SM] Fair enough, I accept that there are use cases for those, =
but again, only if the actually enhance the "experience" will users be =
happy to accept them. The goals of the operators and the paying =
customers are not always aligned here, a PEP might be advantageous more =
to the operator than the end-user (theoretically also the other =
direction, but since operators pay for PEPs they are unlikely to deploy =
those) think mandatory image recompression or forced video quality =
downscaling.... (and sure these are not as clear as I pitched them, if =
after an emergency a PEP allows most/all users in a cell to still send =
somewhat degraded images that is better than the network choking itself =
with a few high quality images, assuming images from the emergency are =
somewhat useful).

>> I have witnessed the arguments in the "L4S wars" about how little =
processing one can ask the more central network nodes perform, e.g. flow =
queueing which would solve a lot of the issues (e.g. a hyper aggressive =
slow-start flow would mostly hurt itself if it overshoots its capacity) =
seems to be a complete no-go.
>=20
> That=E2=80=99s to do with scalability, which depends on how close to =
the network=E2=80=99s edge one is.

	[SM] I have heard the alternative that it has to do with what =
operators of core-links request from their vendors and what features =
they are willing to pay for... but this is very anecdotal as I have =
little insight into big-iron vendors or core-link operators.=20

>> I personally think what we should do is have the network supply more =
information to the end points to control their behavior better. E.g. if =
we would mandate a max_queue-fill-percentage field in a protocol header =
and have each node write max(current_value_of_the_field, =
queue-filling_percentage_of_the_current_node) in every packet, end =
points could estimate how close to congestion the path is (e.g. by =
looking at the rate of %queueing changes) and tailor their =
growth/shrinkage rates accordingly, both during slow-start and during =
congestion avoidance.
>=20
> That could well be one way to go. Nice if we provoked you to think!

	[SM] You mostly made me realize what the recent increases in IW =
actually aim to accomplish ;) and that current slow start seems actually =
better than its reputation; it solves a hard problem surprisingly well. =
The max(pat_queue%) idea has been kicking around in my head ever since =
reading a paper about storing queue occupancy into packets to help CC =
along (sorry, do not recall the authors or the title right now) so that =
is not even my own original idea, but simply something I borrowed from =
smarter engineers simply because I found the data convincing and the =
theory sane. (Also because I grudgingly accept that latency increases =
measured over the internet are a tad too noisy to be easily useful* and =
too noisy for a meaningful controller based on the latency rate of =
change**)

>> But alas we seem to go the path of a relative dumb 1 bit signal =
giving us an under-defined queue filling state instead and to estimate =
relative queue filling dynamics from that we need many samples (so =
literally too little too late, or L3T2), but I digress.
>=20
> Yeah you do :-)

	[SM] Less than you let on ;). If L4S gets ratified (increasingly =
likely, mostly for political*** reasons) it gets considerably harder to =
get yet another queue size related bits into the IP header...


Regards
	Sebastian

*) Participating in discussions about using active latency measurements =
to adapt traffic shapers for variable rate links which exposes quite a =
number of latency and throughput related issues, albeit for me on an =
amateurs level of understanding: =
https://github.com/lynxthecat/CAKE-autorate

**) I naively think that to make slow-start exist gracefully we need a =
quick and reliable measure for pre-congestion, and latency increases are =
so noisy that neither quick nor reliable can be achieved, let alone both =
at the same time.


***) Well aware that "political" is a "problematic" word in view of the =
IETF, but L4S certainly will not be ratified on its merits, because =
these have not (yet?) been conclusively demonstrated; not ruling out the =
merits can not be realized, just that currently there is not sufficient =
hard data to make a reasonable prediction.


>=20
> Cheers,
> Michael