From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id B82153B29E for ; Sun, 10 Jul 2022 17:29:42 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1657488580; bh=bCUa9ius/d3WD8VZViPr8xRCdb9sGk19us18bUjj23g=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=BNsZHRsP3HZfF9xc/5M3A2BF/XfxNvfjotwGfSgvtXcQVDKjsImRhvDHzXOUdXF8d lHrvvYvKBazFxRzvLkwvOxUlgL8KM4bjeKxQMX2jG6P74QusyOIKMdJ64yvYVzwROw VTq7Usl9OPTSRwh8FMOQ75hUutuBKfb5QGsy7nVI= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from smtpclient.apple ([77.3.115.149]) by mail.gmx.net (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MkYXs-1nimXS35Zu-00lzok; Sun, 10 Jul 2022 23:29:40 +0200 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) From: Sebastian Moeller In-Reply-To: <95FB54F9-973F-40DE-84BF-90D05A642D6B@ifi.uio.no> Date: Sun, 10 Jul 2022 23:29:39 +0200 Cc: =?utf-8?Q?Dave_T=C3=A4ht?= , bloat Content-Transfer-Encoding: quoted-printable Message-Id: References: <6458C1E6-14CB-4A36-8BB3-740525755A95@ifi.uio.no> <7D20BEF3-8A1C-4050-AE6F-66E1B4203EE1@gmx.de> <4E163307-9B8A-4BCF-A2DE-8D7F3C6CCEF4@ifi.uio.no> <95FB54F9-973F-40DE-84BF-90D05A642D6B@ifi.uio.no> To: Michael Welzl X-Mailer: Apple Mail (2.3696.100.31) X-Provags-ID: V03:K1:Ucusq0AIsyEAL77RwEdKP6MyG5ShmW9wNWG7nH1N1eCjMWepxTH /9PtRsVKOvKv+yvljRrND47LgEFr89MuG3I/KHV0aKYi1LGamKgIJJtzTrxFf8nONwzPXMc 7hR8ATVS8moRZT7VbDc/jC+nsYUhXwHkZijAP8Zc5Yrqxd1aGPcoKA9kioJ7jvZipHuwXWW 1UCkQ8gMu7HMB/FFvorxw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:eXql5JpAn5s=:xUUHyiJTWbK+2drCYDL6Ws UE1HqmIAAdxtskV+Wnuo4BqU8pOv6prua/aWF53VdgPzPSHQmy1RGawc4g7Zy1Bn8HioW3xLp oO0iqn+OXpo+TqdcuMD0xvQKQ5kM74NHVNCcsYOngP6MNn56qr4Rm6oaYYNRfLPSuGp2L3aou FApUoWSCEpFBMPllOrxcFHDjKM1ArRPr3YsA3qeJwevRsRae/gqryfkvkjVTko8PIzMEqmbVi BiUny/UMqQE9U0VlY+wTtG9qMnf7tutEKKDz2fhYteqbaizZ/Mr25JOwX9ApDcsc2ikE/CpAr 2BanzA9eW0BFvM+4u/RBTqYAb9hMnlXd8QojHwnHBEVNEB7gnH4ezCzHc8XefTdS12vUt5eF8 AbYbztCbeyzXWrGKxAVlyMFI++98eDOIpI+2EKj89OraJGvL14YjudUhHwz+M649qFg9IAQRl 3UKJHG2dJstOZINnWj4ge7lwe4Bo/9U/dF4IJnl8kKhj2ttvihJbaeuTaFKZyRh1XQ0+xlyS3 kgVO0LxRFTd8gOGFzFqp+fpT16w9oWe12yPzjSEwvb9ysSPa9igHgPfjuR5darxiN6YLzxgXb VMtswlN6KUnTVwCoi3m9E0jYbeSx8Hps6NNopjMPysHuh3EttuoW7xuoNpVjHto2Q0yPlEKON i9OPJn8hUqV/VxcUjRt3UmFiYciuQO2lFKBMNBRmFhh5dMDXBOiRhkWBePJhwKfbcSkh35EK+ L1PAeP0wszdzcv+9IMpfJWANnQa5nbd882AYOKqBppX9DZXkrqmXYi07sM/95xsnBAe+Jp3+J 0J5ur0wYGti9l6yMJR8olQOUVddfRPii/CwfAJxDGl6ja7Ywr0JABownz6k59uKn9rLEHPgyk 3rNxUBo1Enb6O4Bq6TobxV5cYD7nTsKB5LfsPnf1UOi8qT63ZTSPUK+jK83WK0IDIq0qo9+jE dV+gBgw9T5z8Sryggm2d0c5zbTQ1p3f5Fcogy4MxOZ0PxbBf33z8c9lltLM0UtDJBCGe5HLY/ 8J7DOadv6S6XSSOJogwmC32FlN3pInv99d6QZ22dGENc7fuaxI347E44x1Kw+Ln8O51pjWmQV Dgw+cq9YPkYei6qpDElCHgOR6UyNBXfetG52Nc19G7KucRzHGqG+kkwNA== Subject: Re: [Bloat] [iccrg] Musings on the future of Internet Congestion Control X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 10 Jul 2022 21:29:43 -0000 Hi Michael, > On Jul 10, 2022, at 22:01, Michael Welzl wrote: >=20 > Hi ! >=20 >=20 >> On Jul 10, 2022, at 7:27 PM, Sebastian Moeller = wrote: >>=20 >> Hi Michael, >>=20 >> so I reread your paper and stewed a bit on it. >=20 > Many thanks for doing that! :) >=20 >=20 >> I believe that I do not buy some of your premises. >=20 > you say so, but I don=E2=80=99t really see much disagreement here. = Let=E2=80=99s see: >=20 >=20 >> e.g. you write: >>=20 >> "We will now examine two factors that make the the present situation = particularly worrisome. First, the way the infrastructure has been = evolving gives TCP an increasingly large operational space in which it = does not see any feedback at all. Second, most TCP connections are = extremely short. As a result, it is quite rare for a TCP connection to = even see a single congestion notification during its lifetime." >>=20 >> And seem to see a problem that flows might be able to finish their = data transfer business while still in slow start. I see the same data, = but see no problem. Unless we have an oracle that tells each sender = (over a shared bottleneck) exactly how much to send at any given time = point, different control loops will interact on those intermediary = nodes. >=20 > You really say that you don=E2=80=99t see the solution. The problem is = that capacities are underutilized, which means that flows take longer = (sometimes, much longer!) to finish than they theoretically could, if we = had a better solution. [SM] No IMHO the underutilization is the direct consequence of = requiring a gradual filling of the "pipes" to probe he available = capacity. I see no way how this could be done differently with the = traffic sources/sinks being uncoordinated entities at the edge, and I = see no way of coordinating all end points and handle all paths. In other = words, we can fine tune a parameters to tweak the probing a bit, make it = more or less aggressive/fast, but the fact that we need to probe = capacity somehow means underutilization can not be avoided unless we = find a way of coordinating all of the sinks and sources. But being = sufficiently dumb, all I can come up with is an all-knowing oracle or = faster than light communication, and neither strikes me to be realistic = ;) >=20 >=20 >> I might be limited in my depth of thought here, but having each flow = probing for capacity seems exactly the right approach... and doubling = CWND or rate every RTT is pretty aggressive already (making slow start = shorter by reaching capacity faster within the slow-start framework = requires either to start with a higher initial value (what increasing IW = tries to achieve?) or use a larger increase factor than 2 per RTT). I = consider increased IW a milder approach than the alternative. And once = one accepts that a gradual rate increasing is the way forward it falls = out logically that some flows will finish before they reach steady state = capacity especially if that flows available capacity is large. So what = exactly is the problem with short flows not reaching capacity and what = alternative exists that does not lead to carnage if more-aggressive = start-up phases drive the bottleneck load into emergency drop territory? >=20 > There are various ways to do this; one is to cache information and = re-use it, assuming that - at least sometimes - new flows will see the = same path again. [SM] And equally important, that a flow's capacity share along a = path did not change be other flows appearing on the same path. This is a = case of speculation which depending on link and path type will work out = well more or less often, the question then becomes is the improvement on = successful speculation worth the cost of unsuccessful speculation = (mostly the case where the estimate is wildly above the path capacity). = Personally I think that having each flow start searching achievable = capacity from the "bottom" seems more robust and reliable. I would agree = though that better managing the typical overshoot of slow start is a = worthy goal (one if tackled successfully might allow a faster capacity = search approach). > Another is to let parallel flows share information. [SM] Sounds sweet, but since not even two back-to-back packets = send over the internet from A to B are guaranteed to take exactly the = same path, confirming that flows actually share a sufficiently similar = path seems tricky. Also stipulating two flows actually share a common = path say over a capacity limiting node we have say 99 other flows and = our parallel already established flow in equilibrium, now our new flow = probably could start with an CWND close to the established flows. But if = the bottleneck is fully occupied with our parallel established flow the = limit would be 50% of the existing flow's rate, but only if that flow = actually has enough time to give way... Could you elaborate how that could work, please? > Yet another is to just be blindly more aggressive. [SM] Sure, works if the "cost" of admitting too much data is = acceptable, alas from an end users perspective I know I have flows where = I do not care much if the overcommit and start throttling themselves = (think background bulk transfer) but where I would get unhappy if their = over aggression would interfere with other more important to me flows = (that is a part why i am a happy flow queueing user, FQ helps a lot in = isolating the fall-out from overly aggressive flows mainly to = themselves). > Yet another, chirping. [SM] I would love for that to work, but I have seen no = convincing data yet demonstrating that over the existing internet, = however we know already from other papers, that inter-packet delay is a = somewhat unreliable estimator for capacity, so using that, even in = clever ways requires some accumulation and smoothing, so I wonder how = much faster/better this is actually going compared to existing slow = start with a sufficiently high starting IW? In a meta criticism way, I = am somewhat surprised how little splash paced chirping seems to be = making given how positive its inventors presented it, might be the = typical inertia of the field or an indication that PC might not yet be = ready for show-time. However, if you think of somethnig else than paced = chirping here, could you share a reference, please? >=20 >=20 >> And as an aside, a PEP (performance enhancing proxy) that does not = enhance performance is useless at best and likely harmful (rather a PDP, = performance degrading proxy). >=20 > You=E2=80=99ve made it sound worse by changing the term, for whatever = that=E2=80=99s worth. If they never help, why has anyone ever called = them PEPs in the first place? [SM] I would guess because "marketing" was unhappy with = "engineering" emphasizing the side-effects/potential problems and = focussed in the best-case scenario? ;) > Why do people buy these boxes? [SM] Because e.g. for GEO links, latency is in a range where = default unadulterated TCP will likely choke on itself, and when faced = with requiring customers to change/tune TCPs or having "PEP" fudge it, = ease of use of fudging won the day. That is a generous explanation (as = this fudging is beneficial to both the operator and most end-users), I = can come up with less charitable theories if you want ;) . >> The network so far has been doing reasonably well with putting more = protocol smarts at the ends than in the parts in between. >=20 > Truth is, PEPs are used a lot: at cellular edges, at satellite = links=E2=80=A6 because the network is *not* always doing reasonably well = without them. [SM] Fair enough, I accept that there are use cases for those, = but again, only if the actually enhance the "experience" will users be = happy to accept them. The goals of the operators and the paying = customers are not always aligned here, a PEP might be advantageous more = to the operator than the end-user (theoretically also the other = direction, but since operators pay for PEPs they are unlikely to deploy = those) think mandatory image recompression or forced video quality = downscaling.... (and sure these are not as clear as I pitched them, if = after an emergency a PEP allows most/all users in a cell to still send = somewhat degraded images that is better than the network choking itself = with a few high quality images, assuming images from the emergency are = somewhat useful). >> I have witnessed the arguments in the "L4S wars" about how little = processing one can ask the more central network nodes perform, e.g. flow = queueing which would solve a lot of the issues (e.g. a hyper aggressive = slow-start flow would mostly hurt itself if it overshoots its capacity) = seems to be a complete no-go. >=20 > That=E2=80=99s to do with scalability, which depends on how close to = the network=E2=80=99s edge one is. [SM] I have heard the alternative that it has to do with what = operators of core-links request from their vendors and what features = they are willing to pay for... but this is very anecdotal as I have = little insight into big-iron vendors or core-link operators.=20 >> I personally think what we should do is have the network supply more = information to the end points to control their behavior better. E.g. if = we would mandate a max_queue-fill-percentage field in a protocol header = and have each node write max(current_value_of_the_field, = queue-filling_percentage_of_the_current_node) in every packet, end = points could estimate how close to congestion the path is (e.g. by = looking at the rate of %queueing changes) and tailor their = growth/shrinkage rates accordingly, both during slow-start and during = congestion avoidance. >=20 > That could well be one way to go. Nice if we provoked you to think! [SM] You mostly made me realize what the recent increases in IW = actually aim to accomplish ;) and that current slow start seems actually = better than its reputation; it solves a hard problem surprisingly well. = The max(pat_queue%) idea has been kicking around in my head ever since = reading a paper about storing queue occupancy into packets to help CC = along (sorry, do not recall the authors or the title right now) so that = is not even my own original idea, but simply something I borrowed from = smarter engineers simply because I found the data convincing and the = theory sane. (Also because I grudgingly accept that latency increases = measured over the internet are a tad too noisy to be easily useful* and = too noisy for a meaningful controller based on the latency rate of = change**) >> But alas we seem to go the path of a relative dumb 1 bit signal = giving us an under-defined queue filling state instead and to estimate = relative queue filling dynamics from that we need many samples (so = literally too little too late, or L3T2), but I digress. >=20 > Yeah you do :-) [SM] Less than you let on ;). If L4S gets ratified (increasingly = likely, mostly for political*** reasons) it gets considerably harder to = get yet another queue size related bits into the IP header... Regards Sebastian *) Participating in discussions about using active latency measurements = to adapt traffic shapers for variable rate links which exposes quite a = number of latency and throughput related issues, albeit for me on an = amateurs level of understanding: = https://github.com/lynxthecat/CAKE-autorate **) I naively think that to make slow-start exist gracefully we need a = quick and reliable measure for pre-congestion, and latency increases are = so noisy that neither quick nor reliable can be achieved, let alone both = at the same time. ***) Well aware that "political" is a "problematic" word in view of the = IETF, but L4S certainly will not be ratified on its merits, because = these have not (yet?) been conclusively demonstrated; not ruling out the = merits can not be realized, just that currently there is not sufficient = hard data to make a reasonable prediction. >=20 > Cheers, > Michael