From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp64.iad3a.emailsrvr.com (smtp64.iad3a.emailsrvr.com [173.203.187.64]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 053A13B29E for ; Sun, 31 Jul 2022 16:22:26 -0400 (EDT) Received: from app64.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by smtp17.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 45F9D20FB8; Sun, 31 Jul 2022 16:22:26 -0400 (EDT) Received: from deepplum.com (localhost.localdomain [127.0.0.1]) by app64.wa-webapps.iad3a (Postfix) with ESMTP id 31B8B61BBD; Sun, 31 Jul 2022 16:22:26 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) with HTTP; Sun, 31 Jul 2022 16:22:26 -0400 (EDT) X-Auth-ID: dpreed@deepplum.com Date: Sun, 31 Jul 2022 16:22:26 -0400 (EDT) From: "David P. Reed" To: "Sebastian Moeller" Cc: starlink@lists.bufferbloat.net MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20220731162226000000_34997" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: <05EB1373-AD05-4CB6-BD92-C444038D3A67@gmx.de> References: <1659123485.059828918@apps.rackspace.com> <05EB1373-AD05-4CB6-BD92-C444038D3A67@gmx.de> X-Client-IP: 209.6.168.128 Message-ID: <1659298946.200926837@apps.rackspace.com> X-Mailer: webmail/19.0.17-RC X-Classification-ID: 39e703de-c9d9-4a49-9255-89aba6e2b4a5-1-1 Subject: Re: [Starlink] Finite-Buffer M/G/1 Queues with Time and Space Priorities X-BeenThere: starlink@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: "Starlink has bufferbloat. Bad." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Jul 2022 20:22:27 -0000 ------=_20220731162226000000_34997 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0ASebastian - of course we agree far more than we disagree, and this seems= like healthy debate, focused on actual user benefit at scale, which is whe= re I hope the Internet focuses. The "bufferbloat" community has been really= kicking it for users, in my opinion.=0A =0A[Off topic: I don't have time f= or IETF's process, now that it is such a parody of captured bureaucratic pr= ocess, rather than "rough consensus and working code", now missing for at l= east 2 decades of corporate control of IAB. None of us "old farts" ever tho= ugh emulating ITU style bureaucracy would solve any important problem the I= nternet was trying to solve.]=0A =0AOn Sunday, July 31, 2022 7:58am, "Sebas= tian Moeller" said:=0A=0A=0A=0A> Hi David,=0A> =0A> inter= esting food for thought...=0A> =0A> =0A> > On Jul 29, 2022, at 21:38, David= P. Reed via Starlink=0A> wrote:=0A> >=0A>= > From: "Bless, Roland (TM)" =0A> > models from=0A> = > queueing theory is that they only work for load < 1, whereas=0A> > we are= using the network with load values ~1 (i.e., around one) due to=0A> > cong= estion control feedback loops that drive the bottleneck link=0A> > to satur= ation (unless you consider application limited traffic sources).=0A> >=0A> = > Let me remind people here that there is some kind of really weird thinkin= g=0A> going on here about what should be typical behavior in the Intenet wh= en it is=0A> working well.=0A> >=0A> > No, the goal of the Internet is not = to saturate all bottlenecks at maximum=0A> capacity. That is the opposite o= f the goal, and it is the opposite of a sane=0A> operating point.=0A> >=0A>= > Every user seeks low response time, typically a response time on the ord= er of=0A> the unloaded delay in the network, for ALL traffic. (whether it's= the response to=0A> a file transfer or a voice frame or a WWW request). *= =0A> >=0A> > Queueing is always suboptimal, if you can achieve goodput with= out introducing=0A> any queueing delay. Because a queue built up at any lin= k delays *all* traffic=0A> sharing that link, so the overall cost to all us= ers goes up radically when=0A> multiple streams share a link, because the q= ueueing *delay* gets multiplied by the=0A> number of flows affected!=0A> >= =0A> > So the most desirable operating point (which Kleinrock and his stude= nts=0A> recently demonstrated with his "power metric") is to have each queu= e in every link=0A> average < 1 packet in length. (big or small packets, do= esn't matter,=0A> actually).=0A> >=0A> > Now the bigger issue is that this = is unachievable when the flows in the=0A> network are bursty. Poisson being= the least bursty, and easiest to analyze of the=0A> random processes gener= ating flows. Typical Internet usage is incredibly bursty at=0A> all time sc= ales, though - the burstiness is fractal when observed for real (at=0A> lea= st if you look at time scales from 1 ms. to 1 day as your unit of analysis)= . =0A> Fractal random processes of this sort are not Poisson at all.=0A> = =0A> [SM] In this context I like the framing from the CoDel ACM paper, with= the queue=0A> acting as shock absorber for burst, as you indicate bursts a= re unavoidable in a=0A> network with unsynchronized senders. So it seems pr= udent to engineer with bursts=0A> as use-case (how ever undesirable) in min= d (compared to simply declaring bursts=0A> undesirable and require endpoint= s not to be bursty, as L4S seems to do*).=0A> =0A> =0A> > So what is the be= st one ought to try to do?=0A> >=0A> > Well, "keeping utilization at 100%" = is never what real network operators=0A> seek. Never, ever. Instead, conges= tion control is focused on latency control, not=0A> optimizing utilization.= =0A> =0A> [SM] I thought that these are not orthogonal goals and one needs = to pick an=0A> operating point in the throughput<->latency gradient somehow= ? This becomes=0A> relevant for smaller links like internet access links mo= re than for back bone=0A> links. It is relatively easy to drive my 100/40 l= ink into saturation by normal=0A> usage, so I have a clear goal of keeping = latency acceptable under saturating=0A> loads.=0A> =0A> =0A> > The only fol= ks who seem to focus on utilization is the bean counting=0A> fraternity, be= cause they seem to think the only cost is the wires, so you want the=0A> wi= res to be full.=0A> =0A> [SM] Pithy, yet I am sure the bean counters also a= ccount for the cost of=0A> ports/interfaces ;)=0A> =0A> > That, in my opini= on, and even in most accounting systems that consider the=0A> whole enterpr= ise rather than the wires/fibers/airtime alone, is IGNORANT and=0A> STUPID.= =0A> >=0A> > However, academics and vendors of switches care nothing about = latency at=0A> network scale. They focus on wirespeed as the only metric.= =0A> >=0A> > Well, in the old Bell Telephone days, the metric of the Bell S= ystem that=0A> really mattered was not utilization on every day. Instead it= was avoiding outages=0A> due to peak load. That often was "Mother's Day" -= a few hours out of one day once=0A> a year. Because an outage on Mother's = day (busy signals) meant major frustration!=0A> =0A> [SM] If one designs fo= r a (rare) worst-case scenario, one is in the clear most of=0A> the time. I= wish that was possible with my internet access link though... I get a=0A> = sync of 116.7/37.0 Mbps which I shape own to a gross 105.0/36.0 it turns ou= t it=0A> is not that hard to saturate that link occasionally with just norm= al usage by a=0A> family of five, so I clearly am far away from 90% reserve= capacity, and I have=0A> little change of expanding the capacity by a fact= r of 10 within my budget...=0A> =0A> =0A> > Why am I talking about this?=0A= > >=0A> > Because I have been trying for decades (and I am not alone) to ap= ply a=0A> "Clue-by-Four" to the thick skulls of folks who don't think about= the Internet at=0A> scale, or even won't think about an Enterprise Interne= t at scale (or Starlink at=0A> scale). And it doesn't sink in.=0A> >=0A> > = Andrew Odlyzko, a brilliant mathematician at Bell Labs for most of his care= er=0A> also tried to point out that the utilization of the "bottleneck link= s" in any=0A> enterprise, up to the size of ATT in the old days, was typica= lly tuned to < 10%=0A> of saturation at almost any time. Why? Because the C= EO freaked out at the quality=0A> of service of this critical infrastructur= e (which means completing tasks quickly,=0A> when load is unusual) and fire= d people.=0A> >=0A> > And in fact, the wires are the cheapest resource - th= e computers and people=0A> connected by those resources that can't do work = while waiting for queueing delay=0A> are vastly more expensive to leave idl= e. Networks don't do "work" that matters.=0A> Queueing isn't "efficient". I= t's evil.=0A> >=0A> > Which is why dropping packets rather then queueing th= em is *good*, if the=0A> sender will slow down and can resend them. Intenti= ally dropped packets should be=0A> nonzero under load, if an outsider is ob= serving for measruing quality.=0A> >=0A> > I call this brain-miswiring abou= t optimizing throughput to fill a bottleneck=0A> link the Hotrodder Fallacy= . That's the idea that one should optimize like a drag=0A> racer optimizes = his car - to burn up the tires and the engine to meet an=0A> irrelevant met= ric for automobiles. A nice hobby that has never improved any actual=0A> ve= hicle. (Even F1 racing is far more realistic, given you want your cars to l= ast=0A> for the lifetime of the race).=0A> >=0A> > A problem with much of t= he "network research" community is that it never has=0A> actually looked at= what networks are used for and tried to solve those problems.=0A> Instead,= they define irrelevant problems and encourage all students and professors= =0A> to pursue irrelevancy.=0A> >=0A> > Now let's look at RRUL. While it ni= cely looks at latency for small packets=0A> under load, it actually disrega= rds the performance of the load streams, which are=0A> only used to "fill t= he pipe".=0A> =0A> [SM] I respectfully disagree. They are used to simulate = those "fill the pipe"=0A> flows that do happen in edge networks... think mu= ltiple machines downloading=0A> multi-gigabyte update packages (OS, games, = software, ...) when ever they feel=0A> like it. The sparse latency measurem= ent flows simulate low rate/sparse=0A> interactive traffic...=0A> But note = that depending on the situation a nominally sparse flaw can use up quite=0A= > some capacity, I talked to a games who observed in riot games valorant in= a=0A> mylti-player online game with 10-20 players traffic at 20 Mbps with = cyclic burst=0A> 128 times a second. On a slow link that becomes a noticeab= le capacity hog.=0A> =0A> > Fortunately, they are TCP, so they rate limit t= hemselves by window=0A> adjustment. But they are speed unlimited TCP stream= s that are meaningless.=0A> =0A> [SM] Flent will however present informatio= n about those flows if instructed to do=0A> so (IIRC by the --socket-stats = argument):=0A> =0A> =0A> avg median 99th % =0A> # data pts=0A> Ping (ms) IC= MP 1.1.1.1 (extra) : 13.26 11.70 29.30 ms =0A> 1393=0A> Ping (ms) avg : 32.= 17 N/A N/A ms =0A> 1607=0A> Ping (ms)::ICMP : 32.76 30.60 48.02 ms =0A> 139= 5=0A> Ping (ms)::UDP 0 (0) : 32.64 30.52 46.55 ms =0A> 1607=0A> Ping (ms)::= UDP 1 (0) : 31.39 29.90 45.98 ms =0A> 1607=0A> Ping (ms)::UDP 2 (0) : 32.85= 30.82 47.04 ms =0A> 1607=0A> Ping (ms)::UDP 3 (0) : 31.72 30.25 46.49 ms = =0A> 1607=0A> Ping (ms)::UDP 4 (0) : 31.37 29.78 45.61 ms =0A> 1607=0A> Pin= g (ms)::UDP 5 (0) : 31.36 29.74 45.13 ms =0A> 1607=0A> Ping (ms)::UDP 6 (0)= : 32.85 30.71 47.34 ms =0A> 1607=0A> Ping (ms)::UDP 7 (0) : 33.16 31.08 47= .93 ms =0A> 1607=0A> TCP download avg : 7.82 N/A N/A=0A> Mbits/s 1607=0A> T= CP download sum : 62.55 N/A N/A=0A> Mbits/s 1607=0A> TCP download::0 (0) : = 7.86 7.28 13.81=0A> Mbits/s 1607=0A> TCP download::1 (0) : 8.18 7.88 13.98= =0A> Mbits/s 1607=0A> TCP download::2 (0) : 7.62 7.05 13.81=0A> Mbits/s 160= 7=0A> TCP download::3 (0) : 7.73 7.37 13.23=0A> Mbits/s 1607=0A> TCP downlo= ad::4 (0) : 7.58 7.07 13.51=0A> Mbits/s 1607=0A> TCP download::5 (0) : 7.92= 7.37 14.03=0A> Mbits/s 1607=0A> TCP download::6 (0) : 8.07 7.58 14.33=0A> = Mbits/s 1607=0A> TCP download::7 (0) : 7.59 6.96 13.94=0A> Mbits/s 1607=0A>= TCP totals : 93.20 N/A N/A=0A> Mbits/s 1607=0A> TCP upload avg : 3.83 N/A = N/A=0A> Mbits/s 1607=0A> TCP upload sum : 30.65 N/A N/A=0A> Mbits/s 1607=0A= > TCP upload::0 (0) : 3.82 3.86 9.57=0A> Mbits/s 1607=0A> TCP upload::0 (0)= ::tcp_cwnd : 14.31 14.00 23.00 =0A> 856=0A> TCP upload::0 (0)::tcp_delivery= _rate : 3.67 3.81 4.95 =0A> 855=0A> TCP upload::0 (0)::tcp_pacing_rate : 4.= 72 4.85 6.93 =0A> 855=0A> TCP upload::0 (0)::tcp_rtt : 42.48 41.36 65.32 = =0A> 851=0A> TCP upload::0 (0)::tcp_rtt_var : 2.83 2.38 9.90 =0A> 851=0A> T= CP upload::1 (0) : 3.90 3.94 16.49=0A> Mbits/s 1607=0A> TCP upload::1 (0)::= tcp_cwnd : 14.46 14.00 23.00 =0A> 857=0A> TCP upload::1 (0)::tcp_delivery_r= ate : 3.75 3.83 5.74 =0A> 856=0A> TCP upload::1 (0)::tcp_pacing_rate : 4.81= 4.89 8.15 =0A> 856=0A> TCP upload::1 (0)::tcp_rtt : 42.12 41.07 63.10 =0A>= 852=0A> TCP upload::1 (0)::tcp_rtt_var : 2.74 2.36 8.36 =0A> 852=0A> TCP u= pload::2 (0) : 3.85 3.96 5.11=0A> Mbits/s 1607=0A> TCP upload::2 (0)::tcp_c= wnd : 14.15 14.00 22.00 =0A> 852=0A> TCP upload::2 (0)::tcp_delivery_rate := 3.69 3.81 4.93 =0A> 851=0A> TCP upload::2 (0)::tcp_pacing_rate : 4.73 4.91= 6.55 =0A> 851=0A> TCP upload::2 (0)::tcp_rtt : 41.73 41.09 56.97 =0A> 851= =0A> TCP upload::2 (0)::tcp_rtt_var : 2.59 2.29 7.71 =0A> 851=0A> TCP uploa= d::3 (0) : 3.81 3.95 5.32=0A> Mbits/s 1607=0A> TCP upload::3 (0)::tcp_cwnd = : 13.90 14.00 21.00 =0A> 851=0A> TCP upload::3 (0)::tcp_delivery_rate : 3.6= 6 3.82 4.89 =0A> 851=0A> TCP upload::3 (0)::tcp_pacing_rate : 4.67 4.87 6.3= 6 =0A> 851=0A> TCP upload::3 (0)::tcp_rtt : 41.44 41.09 56.46 =0A> 847=0A> = TCP upload::3 (0)::tcp_rtt_var : 2.74 2.46 8.27 =0A> 847=0A> TCP upload::4 = (0) : 3.77 3.88 5.35=0A> Mbits/s 1607=0A> TCP upload::4 (0)::tcp_cwnd : 13.= 86 14.00 21.00 =0A> 852=0A> TCP upload::4 (0)::tcp_delivery_rate : 3.61 3.7= 5 4.87 =0A> 852=0A> TCP upload::4 (0)::tcp_pacing_rate : 4.63 4.83 6.46 =0A= > 852=0A> TCP upload::4 (0)::tcp_rtt : 41.74 41.18 57.27 =0A> 850=0A> TCP u= pload::4 (0)::tcp_rtt_var : 2.73 2.45 8.38 =0A> 850=0A> TCP upload::5 (0) := 3.83 3.93 5.60=0A> Mbits/s 1607=0A> TCP upload::5 (0)::tcp_cwnd : 13.98 14= .00 22.00 =0A> 851=0A> TCP upload::5 (0)::tcp_delivery_rate : 3.68 3.80 5.0= 5 =0A> 851=0A> TCP upload::5 (0)::tcp_pacing_rate : 4.69 4.82 6.65 =0A> 851= =0A> TCP upload::5 (0)::tcp_rtt : 41.50 40.91 56.42 =0A> 847=0A> TCP upload= ::5 (0)::tcp_rtt_var : 2.68 2.34 8.24 =0A> 847=0A> TCP upload::6 (0) : 3.86= 3.97 5.60=0A> Mbits/s 1607=0A> TCP upload::6 (0)::tcp_cwnd : 14.27 14.00 2= 2.00 =0A> 850=0A> TCP upload::6 (0)::tcp_delivery_rate : 3.71 3.83 5.07 =0A= > 850=0A> TCP upload::6 (0)::tcp_pacing_rate : 4.74 4.90 6.77 =0A> 850=0A> = TCP upload::6 (0)::tcp_rtt : 42.03 41.66 55.81 =0A> 850=0A> TCP upload::6 (= 0)::tcp_rtt_var : 2.71 2.49 7.85 =0A> 850=0A> TCP upload::7 (0) : 3.81 3.92= 5.18=0A> Mbits/s 1607=0A> TCP upload::7 (0)::tcp_cwnd : 14.01 14.00 22.00 = =0A> 850=0A> TCP upload::7 (0)::tcp_delivery_rate : 3.67 3.82 4.94 =0A> 849= =0A> TCP upload::7 (0)::tcp_pacing_rate : 4.57 4.69 6.52 =0A> 850=0A> TCP u= pload::7 (0)::tcp_rtt : 42.62 42.16 56.20 =0A> 847=0A> TCP upload::7 (0)::t= cp_rtt_var : 2.50 2.19 8.02 =0A> 847=0A> cpu_stats_root@192.168.42.1::load = : 0.31 0.30 0.75 =0A> 1286=0A> =0A> =0A> While the tcp_rtt is smoothed, it = still tells something about the latency of the=0A> load bearing flows.=0A>= =0A =0AI agree. I have used flent enough to have poked around at its option= s, and yes, the data is there.=0ABut RRUL's assumption that there is always= "more" to send on the load generating TCP flows explores only one kind of = case. Suppose you have three streaming video watchers at HD rates, and they= don't quite fill up the downlink, yet they actually are "buffering" suffic= iently most of the time. How well do they share the downlink so that none o= f them pause unnecessarily? Maybe FQ_codel works, maybe it doesn't work so = well. If you want a variant, imagine a small business office with 20 staff = doing Zoom conferences with customers. Zoom actually on the uplink side is = bursty to some extent. It can tolerate sub-100 msec. outages. =0AMy point i= s that there are real kinds of burstiness besides "click driven", and they = have different time-scales of variability. Studying these seems vastly more= useful than one more paper based on RRUL as the only point of study. (and = even more vastly better than just measuring the pipe's max throughput and u= tilization).=0A =0A> =0A> >=0A> > Actual situations (like what happens when= someone starts using BitTorrent=0A> while another in the same household is= playing a twitch Multi-user FPS) don't=0A> actually look like RRUL. Becaus= e in fact the big load is ALSO fractal. Bittorrent=0A> demand isn't constan= t over time - far from it. It's bursty.=0A> =0A> [SM] And this is where hav= ing an FQ scheduler for ingress and egress really=0A> helps,=0AI think FQ_c= odel is great! An insight that I get from it is that "flow fairness" plus d= ropping works pretty well for today's Internet traffic to provide very good= responsiveness that the user sees.=0AHowever, I think QUIC, which lacks "f= lows" that are visible at the queue manager, will become problematic. Not n= ecessarily at the "Home Router" end - but at the cloud endpoints that serve= many, many users.=0A =0A =0A> it can isolate most of the fall-out from bur= sty traffic onto the bursty=0A> traffic itself. However, occasionally a use= r actually evaluates the bursty=0A> traffic as more important than the rest= (my example from above with bursty=0A> real-time traffic of a game) in whi= ch case FQ tends to result in unhappiness if=0A> the capacity share of the = affected flow is such that the bursts are partly=0A> dropped (and even if t= hey are just spread out in time too much).=0A> =0A> > Everything is bursty = at different timescales in the Internet. There are no=0A> CBR flows.=0A> = =0A> [SM] Probably true, but I think on the scale of a few seconds/minutes = things can=0A> be "constant" enough, no?=0AEven your case of your family's = single edge connection, your peak-average over intervals of one hour or mor= e, and probably minute-to-minute as well, show burstiness. There may be a f= ew times each day where the "Mother's Day" event happens, but I bet your av= erage usage in every hour, and probably every minute is < 10%. What happen= s when (if your family works like many) you sit down to dinner? And then ge= t back to work?=0A =0AI bet you buy the fastest "up-to" speed you can affor= d, but not because your average is very high at all.=0ARight?=0A=0A> =0A> >= So if we want to address the real congestion problems, we need realistic= =0A> thinking about what the real problem is.=0A> >=0A> > Unfortunately thi= s is not achieved by the kind of thinking that created=0A> diffserv, sadly.= Because everything is bursty, just with different timescales in=0A> some c= ases. Even "flash override" priority traffic is incredibly bursty.=0A> =0A>= [SM] I thought the rationale for "flash override" is not that its traffic = pattern=0A> is any different (smoother) from other traffic classes, but sim= ply that delivery=0A> of such marked packets has highest priority and the n= etwork should do what it can=0A> to expedite such packets if at the cost of= other packets, so be it... (some link=0A> technologies even allow to pre-e= mpt packets already in transfer to expedite=0A> higher priority packets). P= ersonally, I like strict precedence, it is both=0A> unforgiving and easy to= predict, and pretty much useless for a shared medium=0A> like the internet= , at least as an end 2 end policy.=0A>=0AActually, if one uses EDF scheduli= ng, it is provably optimal in a particular sense. That is, if it is possibl= e to meet all the deadlines with a particular schedule, then EDF schedule w= ill achieve all the deadlines.=0AThat is from the early 1970's work on "rea= l-time scheduling" disciplines.=0A"Strict precedence" is the case of EDF wh= en the deadline for ALL packets is the send time + a network-wide "latency = bound". If you used EDF with a latency bound of, say, 100 msec for all pack= ets, and each packet was sequenced in deadline-order, the network would be = VERY predictable and responsive.=0A =0AThe imagined need for "flash overrid= e" priority would go away, unless the emergency somehow required sub-100mse= c. latency, if all flows backed off using TCP AIMD backoff, and queues in b= ottleneck links were dropped.=0A =0ANo one's actually tried this at scale, = but theory all suggests it would be brilliantly stable and predictable.=0A(= How it would work if the network is constantly being DDoSed everywhere isn'= t a fair question - no scheduling algorithm can work in constant DDoS, you= need meta-network policing to find culprits and shut them off ASAP).=0A = =0A> =0A> > Coming back to Starlink - Starlink apparently is being designed= by folks who=0A> really do not understand these fundamental ideas. Instead= , they probably all=0A> worked in researchy environments where the practica= l realities of being part of a=0A> worldwide public Internet were ignored.= =0A> =0A> [SM] Also in a world where use-facing tests and evaluations will = emphasize=0A> maximal throughput rates a lot, as these are easy to measure = and follow the=0A> simple "larger is better" principle consumers are traine= d to understand.=0A> =0A> =0A> > (The FCC folks are almost as bad. I have f= ound no-one at FCC engineering who=0A> understands fractal burstiness - eve= n w.'t. the old Bell System).=0A> =0A> =0A> =0A> =0A> *) It might appear th= at I have a bone to pick with L4S (which I have), but it=0A> really is just= a great example of engineering malpractice, especially not=0A> designing f= or the existing internet, but assuming one can simply "require" a more=0A> = L4S compatible internet though the power of IETF drafts. Case in point, L4S= wants=0A> to bound the maximum bursts duration for compliant senders, whic= h even if it=0A> worked, still leaves the problem, that unsynchronized send= ers can and will still=0A> occasionally add up to extended periods at line = rate.=0A =0AI totally agree with the L4S bias you have. It seems wrong-head= ed to require every participant in the Internet to behave when you don't ev= en tell them why they have to behave or what behaving means. My concern abo= ut IETF bureaucracy emulation applies here, as well.=0A =0A[If BT wants to = try L4S across all of BT's customers and take the flack when it fails miser= ably, it becomes "working code" when it finally works. Then they can get a = rough consensus, rather than they and others "dictating" L4S must be.=0A = =0AThey have not simulated actual usage (I admit that simulating "actual us= age" is hard to do, when you don't even know what your users are actually d= oing today, as I've mentioned above.) That suggests a "pilot" experimental= process. Even ECN, RED, diffserv and MBONE were pilots. Which is where it = was learned that they don't work at scale. Which is why no one seriously de= ploys them, to this day, because if they actually worked, there would be a = race among users to demand them]=0A> =0A> =0A> =0A> > _____________________= __________________________=0A> > Starlink mailing list=0A> > Starlink@lists= .bufferbloat.net=0A> > https://lists.bufferbloat.net/listinfo/starlink=0A> = =0A> ------=_20220731162226000000_34997 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Sebastian - of course = we agree far more than we disagree, and this seems like healthy debate, foc= used on actual user benefit at scale, which is where I hope the Internet fo= cuses. The "bufferbloat" community has been really kicking it for users, in= my opinion.

=0A

 

=0A

[= Off topic: I don't have time for IETF's process, now that it is such a paro= dy of captured bureaucratic process, rather than "rough consensus and worki= ng code", now missing for at least 2 decades of corporate control of IAB. N= one of us "old farts" ever though emulating ITU style bureaucracy would sol= ve any important problem the Internet was trying to solve.]

=0A

 

=0A

On Sunday, July 31, 2022 7:5= 8am, "Sebastian Moeller" <moeller0@gmx.de> said:

=0A=0A

> Hi David,
= >
> interesting food for thought...
>
>
= > > On Jul 29, 2022, at 21:38, David P. Reed via Starlink
> &= lt;starlink@lists.bufferbloat.net> wrote:
> >
> > = From: "Bless, Roland (TM)" <roland.bless@kit.edu>
> > mode= ls from
> > queueing theory is that they only work for load <= 1, whereas
> > we are using the network with load values ~1 (i.= e., around one) due to
> > congestion control feedback loops tha= t drive the bottleneck link
> > to saturation (unless you consid= er application limited traffic sources).
> >
> > Let = me remind people here that there is some kind of really weird thinking
> going on here about what should be typical behavior in the Intenet wh= en it is
> working well.
> >
> > No, the goal= of the Internet is not to saturate all bottlenecks at maximum
> ca= pacity. That is the opposite of the goal, and it is the opposite of a sane<= br />> operating point.
> >
> > Every user seeks l= ow response time, typically a response time on the order of
> the u= nloaded delay in the network, for ALL traffic. (whether it's the response t= o
> a file transfer or a voice frame or a WWW request). *
>= >
> > Queueing is always suboptimal, if you can achieve good= put without introducing
> any queueing delay. Because a queue built= up at any link delays *all* traffic
> sharing that link, so the ov= erall cost to all users goes up radically when
> multiple streams s= hare a link, because the queueing *delay* gets multiplied by the
> = number of flows affected!
> >
> > So the most desirab= le operating point (which Kleinrock and his students
> recently dem= onstrated with his "power metric") is to have each queue in every link
> average < 1 packet in length. (big or small packets, doesn't matte= r,
> actually).
> >
> > Now the bigger issue = is that this is unachievable when the flows in the
> network are bu= rsty. Poisson being the least bursty, and easiest to analyze of the
&g= t; random processes generating flows. Typical Internet usage is incredibly = bursty at
> all time scales, though - the burstiness is fractal whe= n observed for real (at
> least if you look at time scales from 1 m= s. to 1 day as your unit of analysis).
> Fractal random processes = of this sort are not Poisson at all.
>
> [SM] In this cont= ext I like the framing from the CoDel ACM paper, with the queue
> a= cting as shock absorber for burst, as you indicate bursts are unavoidable i= n a
> network with unsynchronized senders. So it seems prudent to e= ngineer with bursts
> as use-case (how ever undesirable) in mind (c= ompared to simply declaring bursts
> undesirable and require endpoi= nts not to be bursty, as L4S seems to do*).
>
>
>= > So what is the best one ought to try to do?
> >
> = > Well, "keeping utilization at 100%" is never what real network operato= rs
> seek. Never, ever. Instead, congestion control is focused on l= atency control, not
> optimizing utilization.
>
> = [SM] I thought that these are not orthogonal goals and one needs to pick an=
> operating point in the throughput<->latency gradient someh= ow? This becomes
> relevant for smaller links like internet access = links more than for back bone
> links. It is relatively easy to dri= ve my 100/40 link into saturation by normal
> usage, so I have a cl= ear goal of keeping latency acceptable under saturating
> loads.>
>
> > The only folks who seem to focus on util= ization is the bean counting
> fraternity, because they seem to thi= nk the only cost is the wires, so you want the
> wires to be full.<= br />>
> [SM] Pithy, yet I am sure the bean counters also accou= nt for the cost of
> ports/interfaces ;)
>
> > = That, in my opinion, and even in most accounting systems that consider the<= br />> whole enterprise rather than the wires/fibers/airtime alone, is I= GNORANT and
> STUPID.
> >
> > However, academ= ics and vendors of switches care nothing about latency at
> network= scale. They focus on wirespeed as the only metric.
> >
>= ; > Well, in the old Bell Telephone days, the metric of the Bell System = that
> really mattered was not utilization on every day. Instead it= was avoiding outages
> due to peak load. That often was "Mother's = Day" - a few hours out of one day once
> a year. Because an outage = on Mother's day (busy signals) meant major frustration!
>
>= ; [SM] If one designs for a (rare) worst-case scenario, one is in the clear= most of
> the time. I wish that was possible with my internet acce= ss link though... I get a
> sync of 116.7/37.0 Mbps which I shape o= wn to a gross 105.0/36.0 it turns out it
> is not that hard to satu= rate that link occasionally with just normal usage by a
> family of= five, so I clearly am far away from 90% reserve capacity, and I have
= > little change of expanding the capacity by a factr of 10 within my bud= get...
>
>
> > Why am I talking about this?> >
> > Because I have been trying for decades (and I a= m not alone) to apply a
> "Clue-by-Four" to the thick skulls of fol= ks who don't think about the Internet at
> scale, or even won't thi= nk about an Enterprise Internet at scale (or Starlink at
> scale). = And it doesn't sink in.
> >
> > Andrew Odlyzko, a bri= lliant mathematician at Bell Labs for most of his career
> also tri= ed to point out that the utilization of the "bottleneck links" in any
= > enterprise, up to the size of ATT in the old days, was typically tuned= to < 10%
> of saturation at almost any time. Why? Because the C= EO freaked out at the quality
> of service of this critical infrast= ructure (which means completing tasks quickly,
> when load is unusu= al) and fired people.
> >
> > And in fact, the wires = are the cheapest resource - the computers and people
> connected by= those resources that can't do work while waiting for queueing delay
&= gt; are vastly more expensive to leave idle. Networks don't do "work" that = matters.
> Queueing isn't "efficient". It's evil.
> >> > Which is why dropping packets rather then queueing them is *go= od*, if the
> sender will slow down and can resend them. Intentiall= y dropped packets should be
> nonzero under load, if an outsider is= observing for measruing quality.
> >
> > I call this= brain-miswiring about optimizing throughput to fill a bottleneck
>= link the Hotrodder Fallacy. That's the idea that one should optimize like = a drag
> racer optimizes his car - to burn up the tires and the eng= ine to meet an
> irrelevant metric for automobiles. A nice hobby th= at has never improved any actual
> vehicle. (Even F1 racing is far = more realistic, given you want your cars to last
> for the lifetime= of the race).
> >
> > A problem with much of the "ne= twork research" community is that it never has
> actually looked at= what networks are used for and tried to solve those problems.
> In= stead, they define irrelevant problems and encourage all students and profe= ssors
> to pursue irrelevancy.
> >
> > Now le= t's look at RRUL. While it nicely looks at latency for small packets
&= gt; under load, it actually disregards the performance of the load streams,= which are
> only used to "fill the pipe".
>
> [SM= ] I respectfully disagree. They are used to simulate those "fill the pipe"<= br />> flows that do happen in edge networks... think multiple machines = downloading
> multi-gigabyte update packages (OS, games, software, = ...) when ever they feel
> like it. The sparse latency measurement = flows simulate low rate/sparse
> interactive traffic...
> B= ut note that depending on the situation a nominally sparse flaw can use up = quite
> some capacity, I talked to a games who observed in riot gam= es valorant in a
> mylti-player online game with 10-20 players traf= fic at 20 Mbps with cyclic burst
> 128 times a second. On a slow li= nk that becomes a noticeable capacity hog.
>
> > Fortun= ately, they are TCP, so they rate limit themselves by window
> adju= stment. But they are speed unlimited TCP streams that are meaningless.
>
> [SM] Flent will however present information about those fl= ows if instructed to do
> so (IIRC by the --socket-stats argument):=
>
>
> avg median 99th %
> # data pts> Ping (ms) ICMP 1.1.1.1 (extra) : 13.26 11.70 29.30 ms
> 13= 93
> Ping (ms) avg : 32.17 N/A N/A ms
> 1607
> Pin= g (ms)::ICMP : 32.76 30.60 48.02 ms
> 1395
> Ping (ms)::UD= P 0 (0) : 32.64 30.52 46.55 ms
> 1607
> Ping (ms)::UDP 1 (= 0) : 31.39 29.90 45.98 ms
> 1607
> Ping (ms)::UDP 2 (0) : = 32.85 30.82 47.04 ms
> 1607
> Ping (ms)::UDP 3 (0) : 31.72= 30.25 46.49 ms
> 1607
> Ping (ms)::UDP 4 (0) : 31.37 29.7= 8 45.61 ms
> 1607
> Ping (ms)::UDP 5 (0) : 31.36 29.74 45.= 13 ms
> 1607
> Ping (ms)::UDP 6 (0) : 32.85 30.71 47.34 ms=
> 1607
> Ping (ms)::UDP 7 (0) : 33.16 31.08 47.93 ms
> 1607
> TCP download avg : 7.82 N/A N/A
> Mbits/s 160= 7
> TCP download sum : 62.55 N/A N/A
> Mbits/s 1607
&g= t; TCP download::0 (0) : 7.86 7.28 13.81
> Mbits/s 1607
> T= CP download::1 (0) : 8.18 7.88 13.98
> Mbits/s 1607
> TCP d= ownload::2 (0) : 7.62 7.05 13.81
> Mbits/s 1607
> TCP downl= oad::3 (0) : 7.73 7.37 13.23
> Mbits/s 1607
> TCP download:= :4 (0) : 7.58 7.07 13.51
> Mbits/s 1607
> TCP download::5 (= 0) : 7.92 7.37 14.03
> Mbits/s 1607
> TCP download::6 (0) := 8.07 7.58 14.33
> Mbits/s 1607
> TCP download::7 (0) : 7.5= 9 6.96 13.94
> Mbits/s 1607
> TCP totals : 93.20 N/A N/A> Mbits/s 1607
> TCP upload avg : 3.83 N/A N/A
> Mbit= s/s 1607
> TCP upload sum : 30.65 N/A N/A
> Mbits/s 1607> TCP upload::0 (0) : 3.82 3.86 9.57
> Mbits/s 1607
>= TCP upload::0 (0)::tcp_cwnd : 14.31 14.00 23.00
> 856
> T= CP upload::0 (0)::tcp_delivery_rate : 3.67 3.81 4.95
> 855
&g= t; TCP upload::0 (0)::tcp_pacing_rate : 4.72 4.85 6.93
> 855
= > TCP upload::0 (0)::tcp_rtt : 42.48 41.36 65.32
> 851
>= ; TCP upload::0 (0)::tcp_rtt_var : 2.83 2.38 9.90
> 851
> = TCP upload::1 (0) : 3.90 3.94 16.49
> Mbits/s 1607
> TCP up= load::1 (0)::tcp_cwnd : 14.46 14.00 23.00
> 857
> TCP uplo= ad::1 (0)::tcp_delivery_rate : 3.75 3.83 5.74
> 856
> TCP = upload::1 (0)::tcp_pacing_rate : 4.81 4.89 8.15
> 856
> TC= P upload::1 (0)::tcp_rtt : 42.12 41.07 63.10
> 852
> TCP u= pload::1 (0)::tcp_rtt_var : 2.74 2.36 8.36
> 852
> TCP upl= oad::2 (0) : 3.85 3.96 5.11
> Mbits/s 1607
> TCP upload::2 = (0)::tcp_cwnd : 14.15 14.00 22.00
> 852
> TCP upload::2 (0= )::tcp_delivery_rate : 3.69 3.81 4.93
> 851
> TCP upload::= 2 (0)::tcp_pacing_rate : 4.73 4.91 6.55
> 851
> TCP upload= ::2 (0)::tcp_rtt : 41.73 41.09 56.97
> 851
> TCP upload::2= (0)::tcp_rtt_var : 2.59 2.29 7.71
> 851
> TCP upload::3 (= 0) : 3.81 3.95 5.32
> Mbits/s 1607
> TCP upload::3 (0)::tcp= _cwnd : 13.90 14.00 21.00
> 851
> TCP upload::3 (0)::tcp_d= elivery_rate : 3.66 3.82 4.89
> 851
> TCP upload::3 (0)::t= cp_pacing_rate : 4.67 4.87 6.36
> 851
> TCP upload::3 (0):= :tcp_rtt : 41.44 41.09 56.46
> 847
> TCP upload::3 (0)::tc= p_rtt_var : 2.74 2.46 8.27
> 847
> TCP upload::4 (0) : 3.7= 7 3.88 5.35
> Mbits/s 1607
> TCP upload::4 (0)::tcp_cwnd : = 13.86 14.00 21.00
> 852
> TCP upload::4 (0)::tcp_delivery_= rate : 3.61 3.75 4.87
> 852
> TCP upload::4 (0)::tcp_pacin= g_rate : 4.63 4.83 6.46
> 852
> TCP upload::4 (0)::tcp_rtt= : 41.74 41.18 57.27
> 850
> TCP upload::4 (0)::tcp_rtt_va= r : 2.73 2.45 8.38
> 850
> TCP upload::5 (0) : 3.83 3.93 5= .60
> Mbits/s 1607
> TCP upload::5 (0)::tcp_cwnd : 13.98 14= .00 22.00
> 851
> TCP upload::5 (0)::tcp_delivery_rate : 3= .68 3.80 5.05
> 851
> TCP upload::5 (0)::tcp_pacing_rate := 4.69 4.82 6.65
> 851
> TCP upload::5 (0)::tcp_rtt : 41.50= 40.91 56.42
> 847
> TCP upload::5 (0)::tcp_rtt_var : 2.68= 2.34 8.24
> 847
> TCP upload::6 (0) : 3.86 3.97 5.60
> Mbits/s 1607
> TCP upload::6 (0)::tcp_cwnd : 14.27 14.00 22.0= 0
> 850
> TCP upload::6 (0)::tcp_delivery_rate : 3.71 3.83= 5.07
> 850
> TCP upload::6 (0)::tcp_pacing_rate : 4.74 4.= 90 6.77
> 850
> TCP upload::6 (0)::tcp_rtt : 42.03 41.66 5= 5.81
> 850
> TCP upload::6 (0)::tcp_rtt_var : 2.71 2.49 7.= 85
> 850
> TCP upload::7 (0) : 3.81 3.92 5.18
> Mb= its/s 1607
> TCP upload::7 (0)::tcp_cwnd : 14.01 14.00 22.00
= > 850
> TCP upload::7 (0)::tcp_delivery_rate : 3.67 3.82 4.94 > 849
> TCP upload::7 (0)::tcp_pacing_rate : 4.57 4.69 6.52 =
> 850
> TCP upload::7 (0)::tcp_rtt : 42.62 42.16 56.20 > 847
> TCP upload::7 (0)::tcp_rtt_var : 2.50 2.19 8.02
> 847
> cpu_stats_root@192.168.42.1::load : 0.31 0.30 0.75
> 1286
>
>
> While the tcp_rtt is smoothed, = it still tells something about the latency of the
> load bearing fl= ows.
>

=0A

 

=0A

I agree. I have used flent enough to have poked around at its options, and= yes, the data is there.

=0A

But RRUL's assumption t= hat there is always "more" to send on the load generating TCP flows explore= s only one kind of case. Suppose you have three streaming video watchers at= HD rates, and they don't quite fill up the downlink, yet they actually are= "buffering" sufficiently most of the time. How well do they share the down= link so that none of them pause unnecessarily? Maybe FQ_codel works, maybe = it doesn't work so well. If you want a variant, imagine a small business of= fice with 20 staff doing Zoom conferences with customers. Zoom actually on = the uplink side is bursty to some extent. It can tolerate sub-100 msec. out= ages. 

=0A

My point is that there are real kind= s of burstiness besides "click driven", and they have different time-scales= of variability. Studying these seems vastly more useful than one more pape= r based on RRUL as the only point of study. (and even more vastly better th= an just measuring the pipe's max throughput and utilization).

=0A

 

=0A

>
> >
&= gt; > Actual situations (like what happens when someone starts using Bit= Torrent
> while another in the same household is playing a twitch M= ulti-user FPS) don't
> actually look like RRUL. Because in fact the= big load is ALSO fractal. Bittorrent
> demand isn't constant over = time - far from it. It's bursty.
>
> [SM] And this is wher= e having an FQ scheduler for ingress and egress really
> helps,

= =0A

I think FQ_codel is great! An insight that I get fr= om it is that "flow fairness" plus dropping works pretty well for today's I= nternet traffic to provide very good responsiveness that the user sees.

= =0A

However, I think QUIC, which lacks "flows" that are= visible at the queue manager, will become problematic. Not necessarily at = the "Home Router" end - but at the cloud endpoints that serve many, many us= ers.

=0A

 

=0A

 =0A

> it can isolate most of the fall-out from burs= ty traffic onto the bursty
> traffic itself. However, occasionally = a user actually evaluates the bursty
> traffic as more important th= an the rest (my example from above with bursty
> real-time traffic = of a game) in which case FQ tends to result in unhappiness if
> the= capacity share of the affected flow is such that the bursts are partly
> dropped (and even if they are just spread out in time too much).
>
> > Everything is bursty at different timescales in the = Internet. There are no
> CBR flows.
>
> [SM] Proba= bly true, but I think on the scale of a few seconds/minutes things can
> be "constant" enough, no?

=0A

Even your case o= f your family's single edge connection, your peak-average over intervals of= one hour or more, and probably minute-to-minute as well, show burstiness. = There may be a few times each day where the "Mother's Day" event happens, b= ut I bet your average usage in every hour, and probably every minute is <= ; 10%.  What happens when (if your family works like many) you sit dow= n to dinner? And then get back to work?

=0A

 =0A

I bet you buy the fastest "up-to" speed you can a= fford, but not because your average is very high at all.

=0A

Right?

=0A


>
> > So if= we want to address the real congestion problems, we need realistic
&g= t; thinking about what the real problem is.
> >
> > U= nfortunately this is not achieved by the kind of thinking that created
> diffserv, sadly. Because everything is bursty, just with different ti= mescales in
> some cases. Even "flash override" priority traffic is= incredibly bursty.
>
> [SM] I thought the rationale for "= flash override" is not that its traffic pattern
> is any different = (smoother) from other traffic classes, but simply that delivery
> o= f such marked packets has highest priority and the network should do what i= t can
> to expedite such packets if at the cost of other packets, s= o be it... (some link
> technologies even allow to pre-empt packets= already in transfer to expedite
> higher priority packets). Person= ally, I like strict precedence, it is both
> unforgiving and easy t= o predict, and pretty much useless for a shared medium
> like the i= nternet, at least as an end 2 end policy.
>

=0A

Actually, if one uses EDF scheduling, it is provably optimal in a parti= cular sense. That is, if it is possible to meet all the deadlines with a pa= rticular schedule, then EDF schedule will achieve all the deadlines.

=0A=

That is from the early 1970's work on "real-time sched= uling" disciplines.

=0A

"Strict precedence" is the c= ase of EDF when the deadline for ALL packets is the send time + a network-w= ide "latency bound". If you used EDF with a latency bound of, say, 100 msec= for all packets, and each packet was sequenced in deadline-order, the netw= ork would be VERY predictable and responsive.

=0A

&n= bsp;

=0A

The imagined need for "flash override" prio= rity would go away, unless the emergency somehow required sub-100msec. late= ncy, if all flows backed off using TCP AIMD backoff, and queues in bottlene= ck links were dropped.

=0A

 

=0A

No one's actually tried this at scale, but theory all suggests it = would be brilliantly stable and predictable.

=0A

(Ho= w it would work if the network is constantly being DDoSed everywhere isn't&= nbsp; a fair question - no scheduling algorithm can work in constant DDoS, = you need meta-network policing to find culprits and shut them off ASAP).=0A

 

=0A

>
> = > Coming back to Starlink - Starlink apparently is being designed by fol= ks who
> really do not understand these fundamental ideas. Instead,= they probably all
> worked in researchy environments where the pra= ctical realities of being part of a
> worldwide public Internet wer= e ignored.
>
> [SM] Also in a world where use-facing tests= and evaluations will emphasize
> maximal throughput rates a lot, a= s these are easy to measure and follow the
> simple "larger is bett= er" principle consumers are trained to understand.
>
> > > (The FCC folks are almost as bad. I have found no-one at FCC e= ngineering who
> understands fractal burstiness - even w.'t. the ol= d Bell System).
>
>
>
>
> *) It= might appear that I have a bone to pick with L4S (which I have), but it> really is just a great example of engineering malpractice, especial= ly not
> designing for the existing internet, but assuming one can = simply "require" a more
> L4S compatible internet though the power = of IETF drafts. Case in point, L4S wants
> to bound the maximum bur= sts duration for compliant senders, which even if it
> worked, stil= l leaves the problem, that unsynchronized senders can and will still
&= gt; occasionally add up to extended periods at line rate.

=0A

 

=0A

I totally agree with the L4S bi= as you have. It seems wrong-headed to require every participant in the Inte= rnet to behave when you don't even tell them why they have to behave or wha= t behaving means. My concern about IETF bureaucracy emulation applies here,= as well.

=0A

 

=0A

[If = BT wants to try L4S across all of BT's customers and take the flack when it= fails miserably, it becomes "working code" when it finally works. Then the= y can get a rough consensus, rather than they and others "dictating" L4S mu= st be.

=0A

 

=0A

They ha= ve not simulated actual usage (I admit that simulating "actual usage" is ha= rd to do, when you don't even know what your users are actually doing today= , as I've mentioned above.)  That suggests a "pilot" experimental proc= ess. Even ECN, RED, diffserv and MBONE were pilots. Which is where it was l= earned that they don't work at scale. Which is why no one seriously deploys= them, to this day, because if they actually worked, there would be a race = among users to demand them]
>
>
>
> >= _______________________________________________
> > Starlink ma= iling list
> > Starlink@lists.bufferbloat.net
> > htt= ps://lists.bufferbloat.net/listinfo/starlink
>
>

=0A
------=_20220731162226000000_34997--