From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-out02.uio.no (mail-out02.uio.no [IPv6:2001:700:100:8210::71]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 134D83B29E for ; Mon, 11 Jul 2022 04:49:35 -0400 (EDT) Received: from mail-mx11.uio.no ([129.240.10.83]) by mail-out02.uio.no with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1oAp6i-004Lod-7g; Mon, 11 Jul 2022 10:49:32 +0200 Received: from 77.119.212.158.wireless.dyn.drei.com ([77.119.212.158] helo=smtpclient.apple) by mail-mx11.uio.no with esmtpsa (TLS1.2:ECDHE-RSA-AES256-GCM-SHA384:256) user michawe (Exim 4.94.2) (envelope-from ) id 1oAp6d-00087F-AS; Mon, 11 Jul 2022 10:49:32 +0200 From: Michael Welzl Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_D21DA8C1-E7DD-458C-A5A0-137AA367B1D1" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Date: Mon, 11 Jul 2022 10:49:25 +0200 In-Reply-To: <9DF7ADFC-B5FC-4488-AF80-A905FECC17E8@gmx.de> Cc: Dave Taht , bloat To: Sebastian Moeller References: <6458C1E6-14CB-4A36-8BB3-740525755A95@ifi.uio.no> <7D20BEF3-8A1C-4050-AE6F-66E1B4203EE1@gmx.de> <4E163307-9B8A-4BCF-A2DE-8D7F3C6CCEF4@ifi.uio.no> <95FB54F9-973F-40DE-84BF-90D05A642D6B@ifi.uio.no> <0BAAEF4C-331B-493C-B1F5-47AA648C64F8@ifi.uio.no> <9DF7ADFC-B5FC-4488-AF80-A905FECC17E8@gmx.de> X-Mailer: Apple Mail (2.3696.100.31) X-UiO-SPF-Received: Received-SPF: neutral (mail-mx11.uio.no: 77.119.212.158 is neither permitted nor denied by domain of ifi.uio.no) client-ip=77.119.212.158; envelope-from=michawe@ifi.uio.no; helo=smtpclient.apple; X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, HTML_MESSAGE=0.001, TVD_RCVD_IP=0.001, T_SCC_BODY_TEXT_LINE=-0.01, UIO_MAIL_IS_INTERNAL=-5) X-UiO-Scanned: 9D5DB6B8E6300F388FBCFBDC78F90892948BE76A X-UiOonly: E4E0F4AEABFC7F0A30BD3A79446A915BABFEDF0F Subject: Re: [Bloat] [iccrg] Musings on the future of Internet Congestion Control X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2022 08:49:35 -0000 --Apple-Mail=_D21DA8C1-E7DD-458C-A5A0-137AA367B1D1 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi ! A few answers below - > On Jul 11, 2022, at 9:33 AM, Sebastian Moeller = wrote: >=20 > HI Michael, >=20 >=20 >> On Jul 11, 2022, at 08:24, Michael Welzl > wrote: >>=20 >> Hi Sebastian, >>=20 >> Neither our paper nor me are advocating one particular solution - we = point at a problem and suggest that research on ways to solve the = under-utilization problem might be worthwhile. >=20 > [SM2] That is easy to agree upon, as is agreeing on improving = slow start and trying to reduce underutilization, but actually doing is = hard; personally I am more interested in the hard part, so I might have = misunderstood the gist of the discussion you want to start with that = publication. What you=E2=80=99re doing is jumping ahead. I suggest doing this with = research rather than an email discussion, but that=E2=80=99s what = we=E2=80=99re now already into. >> Jumping from this to discussing the pro=E2=80=99s and con=E2=80=99s = of a potential concrete solution is quite a leap=E2=80=A6 >>=20 >> More below: >>=20 >>=20 >>> On Jul 10, 2022, at 11:29 PM, Sebastian Moeller = wrote: >>>=20 >>> Hi Michael, >>>=20 >>>=20 >>>> On Jul 10, 2022, at 22:01, Michael Welzl = wrote: >>>>=20 >>>> Hi ! >>>>=20 >>>>=20 >>>>> On Jul 10, 2022, at 7:27 PM, Sebastian Moeller = wrote: >>>>>=20 >>>>> Hi Michael, >>>>>=20 >>>>> so I reread your paper and stewed a bit on it. >>>>=20 >>>> Many thanks for doing that! :) >>>>=20 >>>>=20 >>>>> I believe that I do not buy some of your premises. >>>>=20 >>>> you say so, but I don=E2=80=99t really see much disagreement here. = Let=E2=80=99s see: >>>>=20 >>>>=20 >>>>> e.g. you write: >>>>>=20 >>>>> "We will now examine two factors that make the the present = situation particularly worrisome. First, the way the infrastructure has = been evolving gives TCP an increasingly large operational space in which = it does not see any feedback at all. Second, most TCP connections are = extremely short. As a result, it is quite rare for a TCP connection to = even see a single congestion notification during its lifetime." >>>>>=20 >>>>> And seem to see a problem that flows might be able to finish their = data transfer business while still in slow start. I see the same data, = but see no problem. Unless we have an oracle that tells each sender = (over a shared bottleneck) exactly how much to send at any given time = point, different control loops will interact on those intermediary = nodes. >>>>=20 >>>> You really say that you don=E2=80=99t see the solution. The problem = is that capacities are underutilized, which means that flows take longer = (sometimes, much longer!) to finish than they theoretically could, if we = had a better solution. >>>=20 >>> [SM] No IMHO the underutilization is the direct consequence of = requiring a gradual filling of the "pipes" to probe he available = capacity. I see no way how this could be done differently with the = traffic sources/sinks being uncoordinated entities at the edge, and I = see no way of coordinating all end points and handle all paths. In other = words, we can fine tune a parameters to tweak the probing a bit, make it = more or less aggressive/fast, but the fact that we need to probe = capacity somehow means underutilization can not be avoided unless we = find a way of coordinating all of the sinks and sources. But being = sufficiently dumb, all I can come up with is an all-knowing oracle or = faster than light communication, and neither strikes me to be realistic = ;) >>=20 >> There=E2=80=99s quite a spectrum of possibilities between an oracle = or =E2=80=9Ccoordinating all of the sinks and sources=E2=80=9D on one = hand, and quite =E2=80=9Cblindly=E2=80=9D probing from a constant IW on = the other. >=20 > [SM] You say "blindly" I say "starting from a conservative but = reliable prior"... And what I see is that qualitatively significantly = better approaches are not really possible, so we need to discuss small = quantitative changes. More about the term =E2=80=9Cblind=E2=80=9D below: >> The =E2=80=9Cfine tuning=E2=80=9D that you mention is interesting = research, IMO! >=20 > [SM] The paper did not read that you were soliciting ideas for = small gradual improvements to me. It calls for being drastic in the way we think about things, because it = makes the argument that PEPs (different kinds of them!) might in fact be = the right approach - but it doesn=E2=80=99t say that =E2=80=9Conly = drastic solutions are good solutions=E2=80=9D. Our =E2=80=9CThe Way = Forward=E2=80=9D section has 3 subsections; one of them is on end-to-end = approaches, where we call out the RL-IW approach I mention below as one = good way ahead. I would categorize this as =E2=80=9Csmall and = gradual=E2=80=9D. >>>>> I might be limited in my depth of thought here, but having each = flow probing for capacity seems exactly the right approach... and = doubling CWND or rate every RTT is pretty aggressive already (making = slow start shorter by reaching capacity faster within the slow-start = framework requires either to start with a higher initial value (what = increasing IW tries to achieve?) or use a larger increase factor than 2 = per RTT). I consider increased IW a milder approach than the = alternative. And once one accepts that a gradual rate increasing is the = way forward it falls out logically that some flows will finish before = they reach steady state capacity especially if that flows available = capacity is large. So what exactly is the problem with short flows not = reaching capacity and what alternative exists that does not lead to = carnage if more-aggressive start-up phases drive the bottleneck load = into emergency drop territory? >>>>=20 >>>> There are various ways to do this >>=20 >> [snip: a couple of concrete suggestions from me, and answers about = what problems they might have, with requests for references from you] >>=20 >> I=E2=80=99m sorry, but I wasn=E2=80=99t really going to have a = discussion about these particular possibilities. My point was only that = many possible directions exist - being completely =E2=80=9Cblind=E2=80=9D = isn=E2=80=99t the only possible approach. >=20 > [SM] Again I do not consider "blind" to be an appropriate = qualification here. IW is a global constant (not truly configured the same everywhere, most = probably for good reason! but the standard suggests a globally unique = value). =46rom then on, the cwnd is doubled a couple of times. No feedback about = the path=E2=80=99s capacity exists - and then, the connection is over. Okay, there is ONE thing that such a flow gets: the RTT. =E2=80=9CBlind = except for RTT measurements=E2=80=9D, then. Importantly, such a flow never learns how large its cwnd *could* have = become without ever causing a problem. Perhaps 10 times more? 100 times? >> Instead of answering your comments to my suggestions, let me give you = one single concrete piece here: our reference 6, as one example of the = kind of resesarch that we consider worthwhile for the future: >>=20 >> "X. Nie, Y. Zhao, Z. Li, G. Chen, K. Sui, J. Zhang, Z. Ye, and D. = Pei, =E2=80=9CDynamic TCP initial windows and congestion control schemes = through reinforcement learning,=E2=80=9D IEEE JSAC, vol. 37, no. 6, = 2019.=E2=80=9D >> https://1989chenguo.github.io/Publications/TCP-RL-JSAC19.pdf = >=20 > [SM] =46rom the title I predict that this is going to lean into = the "cache" idea trying to improve the average hit rate of said cache... >=20 >> This work learns a useful value of IW over time, rather than using a = constant. One author works at Baidu, the paper uses data from Baidu, and = it says: >> "TCP-RL has been deployed in one of the top global search engines for = more than a year. Our online and testbed experiments show that for short = flow transmission, compared with the common practice of IW =3D 10, = TCP-RL can reduce the average transmission time by 23% to 29%.=E2=80=9D >>=20 >> - so it=E2=80=99s probably fair to assume that this was (and perhaps = still is) active in Baidu. >=20 > [SM] This seems to confirm my prediction... however the paper = seems to be written pretty exclusively from the view of an operator of = server farms, not sure this approach will actually do any good for leaf = end-points in e.g. home networks (that is for their sending behavior). I = tend to prefer symmetric solutions, but if data center traffic can reach = higher utilization without compromising end-user quality of experience = and fairness, what is not to like about this. It is however fully within = the existing slow-start framework, no? Yes! >>>>> And as an aside, a PEP (performance enhancing proxy) that does not = enhance performance is useless at best and likely harmful (rather a PDP, = performance degrading proxy). >>>>=20 >>>> You=E2=80=99ve made it sound worse by changing the term, for = whatever that=E2=80=99s worth. If they never help, why has anyone ever = called them PEPs in the first place? >>>=20 >>> [SM] I would guess because "marketing" was unhappy with = "engineering" emphasizing the side-effects/potential problems and = focussed in the best-case scenario? ;) >>=20 >> It appears that you want to just ill-talk PEPs. >=20 > [SM] Not really, I just wanted to point out that I expect the = term PEP to come from entities selling those products and in our current = environment it is clear that products are named and promoted emphasizing = the potential benefit they can bring and not by the additional risks = they might carry (e.g. fission power plants were sold on the idea of = essentially unlimited cheap emission free energy, and not on the = concurrent problem with waste disposal over time frames in the order of = the aggregate human civilisation from the bronze age). I have no beef = with that, but I do not think that taking the "positive" name as a sign = that PEPs are generally liked or live up to their name (note I am also = not saying that they do not, just that the name PEP is a rather = unreliable predictor here). I don=E2=80=99t even think that this name has that kind of history. My = point was that they=E2=80=99re called PEPs because they=E2=80=99re = *meant* to improve performance; that=E2=80=99s what they=E2=80=99re = designed for. You describe =E2=80=9Ca PEP that does not enhance = performance=E2=80=9D, which, to me, is like talking about a web server = that doesn=E2=80=99t serve web pages. Sure, not all PEPs may always work = well, but they should - that=E2=80=99s their raison d=E2=80=99=C3=AAtre. >> There are plenty of useful things that they can do and yes, I = personally think they=E2=80=99re the way of the future - but **not** in = their current form, where they must =E2=80=9Clie=E2=80=9D to TCP, cause = ossification, >=20 > [SM] Here I happily agree, if we can get the nagative = side-effects removed that would be great, however is that actually = feasible or just desirable? >=20 >> etc. PEPs have never been considered as part of the congestion = control design - when they came on the scene, in the IETF, they were = despised for breaking the architecture, and then all the trouble with = how they need to play tricks was discovered (spoofing IP addresses, = making assumptions about header fields, and whatnot). That doesn=E2=80=99t= mean that a very different kind of PEP - one which is authenticated and = speaks an agreed-upon protocol - couldn=E2=80=99t be a good solution. >=20 > [SM] Again, I agree it could in theory especially if = well-architected.=20 That=E2=80=99s what I=E2=80=99m advocating. >> You=E2=80=99re bound to ask me for concrete things next, and if I = give you something concrete (e.g., a paper on PEPs), you=E2=80=99ll find = something bad about it >=20 > [SM] Them are the rules of the game... however if we should play = the game that way, I will come out of it having learned something new = and potentially changing my opinion. >=20 >> - but this is not a constructive direction of this conversation. = Please note that I=E2=80=99m not saying =E2=80=9CPEPs are always = good=E2=80=9D: I only say that, in my personal opinion, they=E2=80=99re = a worthwhile direction of future research. That=E2=80=99s a very = different statement. >=20 > [SM] Fair enough. I am less optimistic, but happy to be = disappointed in my pessimism. >=20 >>=20 >>>> Why do people buy these boxes? >>>=20 >>> [SM] Because e.g. for GEO links, latency is in a range where = default unadulterated TCP will likely choke on itself, and when faced = with requiring customers to change/tune TCPs or having "PEP" fudge it, = ease of use of fudging won the day. That is a generous explanation (as = this fudging is beneficial to both the operator and most end-users), I = can come up with less charitable theories if you want ;) . >>>=20 >>>>> The network so far has been doing reasonably well with putting = more protocol smarts at the ends than in the parts in between. >>>>=20 >>>> Truth is, PEPs are used a lot: at cellular edges, at satellite = links=E2=80=A6 because the network is *not* always doing reasonably well = without them. >>>=20 >>> [SM] Fair enough, I accept that there are use cases for those, = but again, only if the actually enhance the "experience" will users be = happy to accept them. >>=20 >> =E2=80=A6 and that=E2=80=99s the only reason to deploy them, given = that (as the name suggests) they=E2=80=99re meant to increase = performance. I=E2=80=99d be happy to learn more about why you appear to = hate them so much (even just anecdotal). >>=20 >>> The goals of the operators and the paying customers are not always = aligned here, a PEP might be advantageous more to the operator than the = end-user (theoretically also the other direction, but since operators = pay for PEPs they are unlikely to deploy those) think mandatory image = recompression or forced video quality downscaling.... (and sure these = are not as clear as I pitched them, if after an emergency a PEP allows = most/all users in a cell to still send somewhat degraded images that is = better than the network choking itself with a few high quality images, = assuming images from the emergency are somewhat useful). >>=20 >> What is this, are you inventing a (too me, frankly, strange) scenario = where PEPs do some evil for customers yet help operators, >=20 > [SM] This is no invention, but how capitalism works, sorry. The = party paying for the PEP decides on using it based on the advantages it = offers for them. E.g. a mobile carrier that (in the past) forcible = managed to downgrade the quality of streaming video over mobile links = without giving the paying end-user an option to use either choppy high = resolution or smooth low resolution video. By the way, that does not = make the operator evil, it is just that operator and paying customers = goals and desires are not all that well aligned (e.g. the operator wants = to maximize revenue, the customer to minimize cost). You claim that these goals and desires are not well aligned (and a PEP = is then an instrument in this evil) - do you have any proof, or even = anecdotes, to support that claim? I would think that operators generally try to make their customers happy = (or they would switch to different operators). Yes there may be some = misalignments in incentives, but I believe that these are more subtle = points. E.g., who wants a choppy high resolution video? Do such users = really exist? >> or is there an anecdote here? >=20 > [SM] I think the video downscaling thing actually happened in = the German market, but I am not sure on the exact details, so I might = misinterpret things a bit here. However the observation about alignment = of goals I believe to be universally true. I=E2=80=99d be interested in hearing more. Was there an outcry of = customers who wanted their choppy high resolution video back? :-) = :-) >>>>> I have witnessed the arguments in the "L4S wars" about how little = processing one can ask the more central network nodes perform, e.g. flow = queueing which would solve a lot of the issues (e.g. a hyper aggressive = slow-start flow would mostly hurt itself if it overshoots its capacity) = seems to be a complete no-go. >>>>=20 >>>> That=E2=80=99s to do with scalability, which depends on how close = to the network=E2=80=99s edge one is. >>>=20 >>> [SM] I have heard the alternative that it has to do with what = operators of core-links request from their vendors and what features = they are willing to pay for... but this is very anecdotal as I have = little insight into big-iron vendors or core-link operators.=20 >>>=20 >>>>> I personally think what we should do is have the network supply = more information to the end points to control their behavior better. = E.g. if we would mandate a max_queue-fill-percentage field in a protocol = header and have each node write max(current_value_of_the_field, = queue-filling_percentage_of_the_current_node) in every packet, end = points could estimate how close to congestion the path is (e.g. by = looking at the rate of %queueing changes) and tailor their = growth/shrinkage rates accordingly, both during slow-start and during = congestion avoidance. >>>>=20 >>>> That could well be one way to go. Nice if we provoked you to think! >>>=20 >>> [SM] You mostly made me realize what the recent increases in IW = actually aim to accomplish ;) >>=20 >> That=E2=80=99s fine! Increasing IW is surely a part of the solution = space - though I advocate doing something else (as in the example above) = than just to increase the constant in a worldwide standard. >=20 > [SM] Happy to agree, I am not saying I think increasing IW is = something I unconditionally support, just that I see what it offers. >=20 >=20 >>> and that current slow start seems actually better than its = reputation; it solves a hard problem surprisingly well. >>=20 >> Actually, given that the large majority of flows end somewhere in = slow start, what makes you say that it solves it =E2=80=9Cwell=E2=80=9D? >=20 > [SM] As I said, I accepted that there is no silver bullet, and = hence some gradual probing with increasing CWND/rate is unavoidable = which immediately implies that some flows will end before reaching = capacity. You say =E2=80=9Csome=E2=80=9D but data says =E2=80=9Cthe large = majority=E2=80=9D. > So the fact that flows end in slow-start is not a problem but part of = the solution. I see no way of ever having all flows immediately start at = their "stable" long-term capacity share (something that does not exist = in the first place in environments with un-correlated and unpredictable = cross traffic). But short of that almost all flows will need more round = trips to finish that theoretically minimally possible. I tried to make = that point before, and I am not saying current slow-start is 100% = perfect, but I do not expect the possible fine-tuning to get us close = enough to the theoretical performance of an "oracle" solution to count = as "revolutionary" improvement. It doesn=E2=80=99t need to be revolutionary; I think that ways to learn = / cache the IW are already quite useful. Now, you repeatedly mentioned that caching may not work because flows = don=E2=80=99t always traverse the same path. True =E2=80=A6 but then, = what about all the flows that do traverse the same bottleneck (to the = same receiver, or set of receivers in a home), which is usually at the = edge? That bottleneck may often be the same. Now, if we just had an = in-network device that could divide the path into a =E2=80=9Ccore=E2=80=9D= segment where it=E2=80=99s safe to use a pretty large IW value, and a = downstream segment where the IW value may need be smaller, but a certain = workable range might be known to the device, because that devices sits = right at the edge=E2=80=A6 >>> The max(pat_queue%) idea has been kicking around in my head ever = since reading a paper about storing queue occupancy into packets to help = CC along (sorry, do not recall the authors or the title right now) so = that is not even my own original idea, but simply something I borrowed = from smarter engineers simply because I found the data convincing and = the theory sane. (Also because I grudgingly accept that latency = increases measured over the internet are a tad too noisy to be easily = useful* and too noisy for a meaningful controller based on the latency = rate of change**) >>>=20 >>>>> But alas we seem to go the path of a relative dumb 1 bit signal = giving us an under-defined queue filling state instead and to estimate = relative queue filling dynamics from that we need many samples (so = literally too little too late, or L3T2), but I digress. >>>>=20 >>>> Yeah you do :-) >>>=20 >>> [SM] Less than you let on ;). If L4S gets ratified >>=20 >> [snip] >>=20 >> I=E2=80=99m really not interested in an L4S debate. >=20 > [SM] I understand, however I see clear reasons why L4S is = detrimental to your stated goals as it will getting more information = from the network less likely. I also tried to explain, why I believe = that to be a theoretically viable way forwards to improve slow-start = dynamics. Maybe show why my proposal is bunk while completely ignoring = L4S? Or is that the kind of "particular solution" you do not want to = discuss at the current stage? I=E2=80=99d say the latter. We could spend weeks of time and tonds of = emails discussing explicit-feedback based schemes=E2=80=A6 instead, if = you think your idea is good, why not build it, test it, and evaluate its = trade-offs? I don=E2=80=99t see L4S as being *detrimental* to our stated goals, BTW = - but, as it stands, I see limitations in its usefulness because TCP = Prague (AFAIK) only changes Congestion Avoidance, at least up to now. = I=E2=80=99m getting the impression that Congestion Avoidance with a = greedy sender is a rare animal. Non-greedy (i.e., the sender takes a = break) is a different thing again - various implementations exist, as do = proposals for how to handle this =E2=80=A6 a flow with pauses is not too = different from multiple consecutive short flows. Well, it always uses = the same 5-tuple, which makes caching strategies more likely to succeed. > Anyway, thanks for your time. I fear I have made my points in the last = mail already and are mostly repeating myself, so I would not feel = offended in any way if you let this sub-discussion sleep and wait for = more topical discussion entries.=20 >=20 >=20 > Regards > Sebastian Cheers, Michael --Apple-Mail=_D21DA8C1-E7DD-458C-A5A0-137AA367B1D1 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi = !

A few answers = below -


On = Jul 11, 2022, at 9:33 AM, Sebastian Moeller <moeller0@gmx.de> = wrote:

HI Michael,


On Jul 11, 2022, at 08:24, Michael = Welzl <michawe@ifi.uio.no> wrote:

Hi Sebastian,

Neither our paper = nor me are advocating one particular solution - we point at a problem = and suggest that research on ways to solve the under-utilization problem = might be worthwhile.

= [SM2] That is = easy to agree upon, as is agreeing on improving slow start and trying to = reduce underutilization, but actually doing is hard; personally I am = more interested in the hard part, so I might have misunderstood the gist = of the discussion you want to start with that publication.

What = you=E2=80=99re doing is jumping ahead. I suggest doing this with = research rather than an email discussion, but that=E2=80=99s what = we=E2=80=99re now already into.


Jumping= from this to discussing the pro=E2=80=99s and con=E2=80=99s of a = potential concrete solution is quite a leap=E2=80=A6

More below:


On Jul 10, 2022, at = 11:29 PM, Sebastian Moeller <moeller0@gmx.de> wrote:

Hi = Michael,


On Jul 10, 2022, at 22:01, Michael Welzl <michawe@ifi.uio.no> = wrote:

Hi !


On Jul 10, 2022, at 7:27 = PM, Sebastian Moeller <moeller0@gmx.de> wrote:

Hi = Michael,

so I reread your paper and stewed = a bit on it.

Many thanks for = doing that! :)


I believe that I do not buy some of your = premises.

you say so, but I = don=E2=80=99t really see much disagreement here. Let=E2=80=99s see:


e.g. you write:

"We will now = examine two factors that make the the present situation particularly = worrisome. First, the way the infrastructure has been evolving gives TCP = an increasingly large operational space in which it does not see any = feedback at all. Second, most TCP connections are extremely short. As a = result, it is quite rare for a TCP connection to even see a single = congestion notification during its lifetime."

And seem to see a problem that flows might be able to finish = their data transfer business while still in slow start. I see the same = data, but see no problem. Unless we have an oracle that tells each = sender (over a shared bottleneck) exactly how much to send at any given = time point, different control loops will interact on those intermediary = nodes.

You really say that you = don=E2=80=99t see the solution. The problem is that capacities are = underutilized, which means that flows take longer (sometimes, much = longer!) to finish than they theoretically could, if we had a better = solution.

[SM] No = IMHO the underutilization is the direct consequence of requiring a = gradual filling of the "pipes" to probe he available capacity. I see no = way how this could be done differently with the traffic sources/sinks = being uncoordinated entities at the edge, and I see no way of = coordinating all end points and handle all paths. In other words, we can = fine tune a parameters to tweak the probing a bit, make it more or less = aggressive/fast, but the fact that we need to probe capacity somehow = means underutilization can not be avoided unless we find a way of = coordinating all of the sinks and sources. But being sufficiently dumb, = all I can come up with is an all-knowing oracle or faster than light = communication, and neither strikes me to be realistic ;)

There=E2=80=99s quite a spectrum = of possibilities between an oracle or =E2=80=9Ccoordinating all of the = sinks and sources=E2=80=9D on one hand, and quite =E2=80=9Cblindly=E2=80=9D= probing from a constant IW on the other.

[SM] You say = "blindly" I say "starting from a conservative but reliable prior"... And = what I see is that qualitatively significantly better approaches are not = really possible, so we need to discuss small quantitative = changes.

More = about the term =E2=80=9Cblind=E2=80=9D below:


The = =E2=80=9Cfine tuning=E2=80=9D that you mention is interesting research, = IMO!

= [SM] The = paper did not read that you were soliciting ideas for small gradual = improvements to me.

It calls for being drastic in the way we think = about things, because it makes the argument that PEPs (different kinds = of them!) might in fact be the right approach - but it doesn=E2=80=99t = say that =E2=80=9Conly drastic solutions are good solutions=E2=80=9D. = Our =E2=80=9CThe Way Forward=E2=80=9D section has 3 subsections; one of = them is on end-to-end approaches, where we call out the RL-IW approach I = mention below as one good way ahead. I would categorize this as =E2=80=9Cs= mall and gradual=E2=80=9D.


I might be limited in my = depth of thought here, but having each flow probing for capacity seems = exactly the right approach... and doubling CWND or rate every RTT is = pretty aggressive already (making slow start shorter by reaching = capacity faster within the slow-start framework requires either to start = with a higher initial value (what increasing IW tries to achieve?) or = use a larger increase factor than 2 per RTT). I consider increased IW a = milder approach than the alternative. And once one accepts that a = gradual rate increasing is the way forward it falls out logically that = some flows will finish before they reach steady state capacity = especially if that flows available capacity is large. So what exactly is = the problem with short flows not reaching capacity and what alternative = exists that does not lead to carnage if more-aggressive start-up phases = drive the bottleneck load into emergency drop territory?

There are various ways to do = this

[snip: a = couple of concrete suggestions from me, and answers about what problems = they might have, with requests for references from you]

I=E2=80=99m sorry, but I wasn=E2=80=99t really going to have = a discussion about these particular possibilities. My point was only = that many possible directions exist - being completely =E2=80=9Cblind=E2=80= =9D isn=E2=80=99t the only possible approach.

= [SM] Again I = do not consider "blind" to be an appropriate qualification = here.

IW = is a global constant (not truly configured the same everywhere, most = probably for good reason!  but the standard suggests a globally = unique value).
=46rom then on, the cwnd is doubled a couple of = times. No feedback about the path=E2=80=99s capacity exists - and then, = the connection is over.
Okay, there is ONE thing that such a = flow gets: the RTT. =E2=80=9CBlind except for RTT measurements=E2=80=9D, = then.

Importantly, such a flow never = learns how large its cwnd *could* have become without ever causing a = problem. Perhaps 10 times more? 100 times?


Instead= of answering your comments to my suggestions, let me give you one = single concrete piece here: our reference 6, as one example of the kind = of resesarch that we consider worthwhile for the future:
"X. Nie, Y. Zhao, Z. Li, G. Chen, K. Sui, J. Zhang, Z. Ye, = and D. Pei, =E2=80=9CDynamic TCP initial windows and congestion control = schemes through reinforcement learning,=E2=80=9D IEEE JSAC, vol. 37, no. = 6, 2019.=E2=80=9D
https://1989chenguo.github.io/Publications/TCP-RL-JSAC19.pdf

= [SM] =46rom = the title I predict that this is going to lean into the "cache" idea = trying to improve the average hit rate of said cache...

This = work learns a useful value of IW over time, rather than using a = constant. One author works at Baidu, the paper uses data from Baidu, and = it says:
"TCP-RL has been deployed in one of the top = global search engines for more than a year. Our online and testbed = experiments show that for short flow transmission, compared with the = common practice of IW =3D 10, TCP-RL can reduce the average transmission = time by 23% to 29%.=E2=80=9D

- so it=E2=80=99= s probably fair to assume that this was (and perhaps still is) active in = Baidu.

= [SM] This = seems to confirm my prediction... however the paper seems to be written = pretty exclusively from the view of an operator of server farms, not = sure this approach will actually do any good for leaf end-points in e.g. = home networks (that is for their sending behavior). I tend to prefer = symmetric solutions, but if data center traffic can reach higher = utilization without compromising end-user quality of experience and = fairness, what is not to like about this. It is however fully within the = existing slow-start framework, no?



And as an aside, a PEP = (performance enhancing proxy) that does not enhance performance is = useless at best and likely harmful (rather a PDP, performance degrading = proxy).

You=E2=80=99ve made it = sound worse by changing the term, for whatever that=E2=80=99s worth. If = they never help, why has anyone ever called them PEPs in the first = place?

[SM] I = would guess because "marketing" was unhappy with "engineering" = emphasizing the side-effects/potential problems and focussed in the = best-case scenario? ;)

It = appears that you want to just ill-talk PEPs.

= [SM] Not = really, I just wanted to point out that I expect the term PEP to come = from entities selling those products and in our current environment it = is clear that products are named and promoted emphasizing the potential = benefit they can bring and not by the additional risks they might carry = (e.g. fission power plants were sold on the idea of essentially = unlimited cheap emission free energy, and not on the concurrent problem = with waste disposal over time frames in the order of the aggregate human = civilisation from the bronze age). I have no beef with that, but I do = not think that taking the "positive" name as a sign that PEPs are = generally liked or live up to their name (note I am also not saying that = they do not, just that the name PEP is a rather unreliable predictor = here).

I = don=E2=80=99t even think that this name has that kind of history. My = point was that they=E2=80=99re called PEPs because they=E2=80=99re = *meant* to improve performance; that=E2=80=99s what they=E2=80=99re = designed for. You describe =E2=80=9Ca PEP that does not enhance = performance=E2=80=9D, which, to me, is like talking about a web server = that doesn=E2=80=99t serve web pages. Sure, not all PEPs may always work = well, but they should - that=E2=80=99s their raison = d=E2=80=99=C3=AAtre.


There = are plenty of useful things that they can do and yes, I personally think = they=E2=80=99re the way of the future - but **not** in their current = form, where they must =E2=80=9Clie=E2=80=9D to TCP, cause = ossification,

= [SM] Here I = happily agree, if we can get the nagative side-effects removed that = would be great, however is that actually feasible or just = desirable?

etc. = PEPs have never been considered as part of the congestion control design = - when they came on the scene, in the IETF, they were despised for = breaking the architecture, and then all the trouble with how they need = to play tricks was discovered (spoofing IP addresses, making assumptions = about header fields, and whatnot). That doesn=E2=80=99t mean that a very = different kind of PEP - one which is authenticated and speaks an = agreed-upon protocol - couldn=E2=80=99t be a good solution.

= [SM] Again, I = agree it could in theory especially if well-architected. 

That=E2=80=99= s what I=E2=80=99m advocating.


You=E2=80=99re bound to ask me for concrete things next, and = if I give you something concrete (e.g., a paper on PEPs), you=E2=80=99ll = find something bad about it

[SM] Them are = the rules of the game... however if we should play the game that way, I = will come out of it having learned something new and potentially = changing my opinion.

- but this is not a constructive = direction of this conversation. Please note that I=E2=80=99m not saying = =E2=80=9CPEPs are always good=E2=80=9D: I only say that, in my personal = opinion, they=E2=80=99re a worthwhile direction of future research. = That=E2=80=99s a very different statement.

[SM] Fair = enough. I am less optimistic, but happy to be disappointed in my = pessimism.


Why do people buy these boxes?

= [SM] Because e.g. for GEO links, latency is in a range where = default unadulterated TCP will likely choke on itself, and when faced = with requiring customers to change/tune TCPs or having "PEP" fudge it, = ease of use of fudging won the day. That is a generous explanation (as = this fudging is beneficial to both the operator and most end-users), I = can come up with less charitable theories if you want ;) .

The network so far has been doing reasonably = well with putting more protocol smarts at the ends than in the parts in = between.

Truth is, PEPs are = used a lot: at cellular edges, at satellite links=E2=80=A6 because the = network is *not* always doing reasonably well without them.

[SM] Fair enough, I accept that = there are use cases for those, but again, only if the actually enhance = the "experience" will users be happy to accept them.

=E2=80=A6 and that=E2=80=99s the = only reason to deploy them, given that (as the name suggests) they=E2=80=99= re meant to increase performance. I=E2=80=99d be happy to learn more = about why you appear to hate them so much (even just anecdotal).

The goals = of the operators and the paying customers are not always aligned here, a = PEP might be advantageous more to the operator than the end-user = (theoretically also the other direction, but since operators pay for = PEPs they are unlikely to deploy those) think mandatory image = recompression or forced video quality downscaling.... (and sure these = are not as clear as I pitched them, if after an emergency a PEP allows = most/all users in a cell to still send somewhat degraded images that is = better than the network choking itself with a few high quality images, = assuming images from the emergency are somewhat useful).

What is this, are you inventing a = (too me, frankly, strange) scenario where PEPs do some evil for = customers yet help operators,

[SM] This is = no invention, but how capitalism works, sorry. The party paying for the = PEP decides on using it based on the advantages it offers for them. E.g. = a mobile carrier that (in the past) forcible managed to downgrade the = quality of streaming video over mobile links without giving the paying = end-user an option to use either choppy high resolution or smooth low = resolution video. By the way, that does not make the operator evil, it = is just that operator and paying customers goals and desires are not all = that well aligned (e.g. the operator wants to maximize revenue, the = customer to minimize cost).



or is = there an anecdote here?

[SM] I think = the video downscaling thing actually happened in the German market, but = I am not sure on the exact details, so I might misinterpret things a bit = here. However the observation about alignment of goals I believe to be = universally true.

I=E2=80=99d be interested in hearing more. Was = there an outcry of customers who wanted their choppy high resolution = video back?   :-)    :-)

I have witnessed the = arguments in the "L4S wars" about how little processing one can ask the = more central network nodes perform, e.g. flow queueing which would solve = a lot of the issues (e.g. a hyper aggressive slow-start flow would = mostly hurt itself if it overshoots its capacity) seems to be a complete = no-go.

That=E2=80=99s to do = with scalability, which depends on how close to the network=E2=80=99s = edge one is.

[SM] I = have heard the alternative that it has to do with what operators of = core-links request from their vendors and what features they are willing = to pay for... but this is very anecdotal as I have little insight into = big-iron vendors or core-link operators. 

I personally think what we should do is have the network = supply more information to the end points to control their behavior = better. E.g. if we would mandate a max_queue-fill-percentage field in a = protocol header and have each node write max(current_value_of_the_field, = queue-filling_percentage_of_the_current_node) in every packet, end = points could estimate how close to congestion the path is (e.g. by = looking at the rate of %queueing changes) and tailor their = growth/shrinkage rates accordingly, both during slow-start and during = congestion avoidance.

That = could well be one way to go. Nice if we provoked you to think!

[SM] You mostly made me realize = what the recent increases in IW actually aim to accomplish ;)

That=E2=80=99s fine! Increasing = IW is surely a part of the solution space - though I advocate doing = something else (as in the example above) than just to increase the = constant in a worldwide standard.

[SM] Happy to = agree, I am not saying I think increasing IW is something I = unconditionally support, just that I see what it offers.


and that current slow = start seems actually better than its reputation; it solves a hard = problem surprisingly well.

Actually, given that the large majority of flows end = somewhere in slow start, what makes you say that it solves it = =E2=80=9Cwell=E2=80=9D?

[SM] As I = said, I accepted that there is no silver bullet, and hence some gradual = probing with increasing CWND/rate is unavoidable which immediately = implies that some flows will end before reaching = capacity.

You say = =E2=80=9Csome=E2=80=9D but data says =E2=80=9Cthe large = majority=E2=80=9D.


So the fact that flows end in = slow-start is not a problem but part of the solution. I see no way of = ever having all flows immediately start at their "stable" long-term = capacity share (something that does not exist in the first place in = environments with un-correlated and unpredictable cross traffic). But = short of that almost all flows will need more round trips to finish that = theoretically minimally possible. I tried to make that point before, and = I am not saying current slow-start is 100% perfect, but I do not expect = the possible fine-tuning to get us close enough to the theoretical = performance of an "oracle" solution to count as "revolutionary" = improvement.

It = doesn=E2=80=99t need to be revolutionary; I think that ways to learn / = cache the IW are already quite useful.



The max(pat_queue%) idea = has been kicking around in my head ever since reading a paper about = storing queue occupancy into packets to help CC along (sorry, do not = recall the authors or the title right now) so that is not even my own = original idea, but simply something I borrowed from smarter engineers = simply because I found the data convincing and the theory sane. (Also = because I grudgingly accept that latency increases measured over the = internet are a tad too noisy to be easily useful* and too noisy for a = meaningful controller based on the latency rate of change**)

But alas we seem to go the path of a relative = dumb 1 bit signal giving us an under-defined queue filling state instead = and to estimate relative queue filling dynamics from that we need many = samples (so literally too little too late, or L3T2), but I digress.

Yeah you do :-)

[SM] Less than you let on ;). If = L4S gets ratified

[snip]

I=E2=80=99m really not interested in an L4S = debate.

= [SM] I = understand, however I see clear reasons why L4S is detrimental to your = stated goals as it will getting more information from the network less = likely. I also tried to explain, why I believe that to be a = theoretically viable way forwards to improve slow-start dynamics. Maybe = show why my proposal is bunk while completely ignoring L4S? Or is that = the kind of "particular solution" you do not want to discuss at the = current stage?




Anyway, thanks for your time. I = fear I have made my points in the last mail already and are mostly = repeating myself, so I would not feel offended in any way if you let = this sub-discussion sleep and wait for more topical discussion = entries. 


Regards
Sebastian

Cheers,
Michael

= --Apple-Mail=_D21DA8C1-E7DD-458C-A5A0-137AA367B1D1--