From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 8000F3CB35 for ; Thu, 27 Jun 2019 03:50:00 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1561621795; bh=vw769vQMAUHJhFD82jTpcwORZr9jZAG5KRV6FHLd5Us=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=QussTp7gg27xOJJz2+ctJLJbOn8qMRgnvqvQ2+vrZX976RK4E57UTBeavzLp0ZGxi dNT3A0rHt7hgj04gBHoRSq76ZzUdsII9gIIB07Vv6iSvJnet1bz1CDyo5V0R8m6XBE qaAT+Np5UsLlgGWdMvdWBsQLJbSt4tkVzt2z38Yk= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [172.17.3.45] ([134.76.241.253]) by mail.gmx.com (mrgmx002 [212.227.17.190]) with ESMTPSA (Nemesis) id 0LztD9-1icRf61QdX-01559G; Thu, 27 Jun 2019 09:49:55 +0200 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) From: Sebastian Moeller X-Priority: 3 (Normal) In-Reply-To: <1561566706.778820831@apps.rackspace.com> Date: Thu, 27 Jun 2019 09:49:53 +0200 Cc: Jonathan Morton , "ecn-sane@lists.bufferbloat.net" , Brian E Carpenter , tsvwg IETF list Content-Transfer-Encoding: quoted-printable Message-Id: <9A6E126A-43A3-4BD8-A3AC-507FF9095470@gmx.de> References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net> <46D1ABD8-715D-44D2-B7A0-12FE2A9263FE@gmx.de> <835b1fb3-e8d5-c58c-e2f8-03d2b886af38@gmail.com> <1561233009.95886420@apps.rackspace.com> <71EF351D-AFBF-4C92-B6B9-7FD695B68815@gmail.com> <1561241377.4026977@apps.rackspace.com> <4E863FC5-D30E-4F76-BDF7-6A787958C628@gmx.de> <1561566706.778820831@apps.rackspace.com> To: "David P. Reed" X-Mailer: Apple Mail (2.3445.104.11) X-Provags-ID: V03:K1:coa5vkGQuB6A7eBbC9IeTqwgcJQ2sQcc52KPAQ3ehBXPeWCDJ01 e4F6l6av2rGicN4A7pDhxTf9YhiqrFv8C1iYAukbcAMQtO7Trlre/NdXYpLb/nQwZFonSha /YyCTfAG0iQg9U0fRNksdCNag843X1jiXxg3nHRBGe4el1PnA+EfO2Pzu0jHk6RCV0eWHru HuzoUAZyl6DHYkmXAKZWA== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:bx2Y4n6mwLk=:VZ9Y+ogQ3pQQxyJX+rhqdT KGm/G1RxtgilpGHbc+y1IZCcrsdxyDHQFLJ5J20NfLKVzBWUbtQOr+su1g+jLKR3oq/+uuohN jlUEUMXP99SoOPotuGo4k+Xkk7y75oVUIRShVXO5hsSVT+TzlPGN3pY0oEFZbFkhUpBCBTdts 5bNTBDXj0mM3QZXkUmmVI6MTV4Qm59UCQo35M7CRuJLGfSE9u5e4WUIGQhi2Yjzgy8wC5Sog7 Yt2Z74szieyTbS6v7n7Ef6GoAS0bzi5mUzqoAxoFk+923ZLcXuxQt/fM3fdI427KTCJwicSc2 x16oyh/8edXujA1N4LXu3xdsjWd51l6Bt2jYco2/S0jk9tMMbDEp6KbsoJ4WTEcdC4BFsOhvU 1JiCeGrQEY9KxQXuzJaYTOZha7YILc1nNn4OEUJYJ3lwBKFQquAoZ6Vsb7YOB9fPOidBsBiBD mO6NHhCil0qfAzQqygcO4BTKNSTvv06HCLPFa4a/XhGEs61d2MpMJxnlyfyPKiJU62n3kaB57 gdAcEbslsUAodm/5demmI36xUKto3EI0ZUdZ3LYjCTNuGHrIlcjUU73Zkt/XBActPkj4vM7WD KKE6VZTTyuQMaEbSdyTeoOh3BhUup1i6MmIN6OKSMnk3v3jkfpb1KNjZQbAfiAuYkDuQQToZ5 hOt441nq5rHjaTxgMN5ANnFz7aJUv0ns1b/xkURVk4Bs4buSp0PF+yXNZoVM9ZkVBRtP/P7sj B/Hwm1Mk9IydM3DpnFjZ50w4FxeL4wPkBYs44FRuJnZ0tvUNbJ/I9dnMpJd4Z7BS7KfB2Wk8X GmSV+mMcmcKJGzVRFvj/N3BnNJ4B+HeVMFPbKLImiQr5xD4sIBGGcpDGFjwzHJhtcHPOTEWqX lEMMwYlO7gcszHRvFNck9okhoGgrhrSItux03ks9plA95mp06+T33gtLe5n26OpaoLFgCVske VxC67FyRaEIBeohlI2pVOq3F8MQiA1rLpYA7vbd8XQ0971xY5hZyH Subject: Re: [Ecn-sane] [tsvwg] per-flow scheduling X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jun 2019 07:50:00 -0000 Hi David, thanks for your response. > On Jun 26, 2019, at 18:31, David P. Reed wrote: >=20 > It's the limiting case, but also the optimal state given "perfect = knowledge". > =20 > Yes, it requires that the source-destination pairs sharing the link in = question coordinate their packet admission times so they don't "collide" = at the link. Ideally the next packet would arrive during the previous = packet's transmission, so it is ready-to-go when that packet's = transmission ends. > =20 > Such exquisite coordination is feasible when future behavior by source = and destination at the interface is known, which requires an Oracle. > That's the same kind of condition most information theoretic and = queueing theoretic optimality requires. Ah, great, I had feared I had missed something. > =20 > But this is worth keeping in mind as the overall joint goal of all = users. > =20 > In particular, "link utilization" isn't a user goal at all. The link = is there and is being paid for whether it is used or not (looking from = the network structure as a whole). Its capacity exists to move packets = out of the way. An ideal link satisfies the requirement that it never = creates a queue because of anything other than imperfect coordination of = the end-to-end flows mapped onto it. That's why the router should not be = measured by "link utilization" anymore than a tunnel in a city during = commuting hours should be measured by cars moved per hour. Clearly a = tunnel can be VERY congested and moving many cars if they are attached = to each other bumper to bumper - the latency through the tunnel would = then be huge. If the cars were tipped on their ends and stacked, even = more throughput would be achieved through the tunnel, and the delay of = rotating them and packing them would add even more delay. +1; this is the core of the movement under the "bufferbloat" = moniker put latency back into the spot light where it belongs (at least = for common inter-active network usage, bulk transfer is a different = kettle of fish). Given the relative low rates of common internet access = links, running at capacity, while not a primary goal, still becomes = common enough to require special treatment to keep latency under load = increase under control. Both FQ solutions and L4S offer remedies for = that case. (Being a non-expert home-user myself this case also is = prominent on my radar, my ISPs backbone and peerings/transits being well = managed the access link is the one point where queueing happens, just as = you describe). > =20 > The idea that "link utilization" of 100% must be achieved is why we = got bufferbloat designed into routers. While I do not describe to this view (and actually are trading = in "top-speed" to keep latency sane) a considerable fraction of = home-users seem obsessed in maxing out their access links and compare = achievable rates; whether such behaviour shoud be encouraged is a = different question. > It's a worm's eye perspective. To this day, Arista Networks brags = about how its bufferbloated feature design optimizes switch utilization = (https://packetpushers.net/aristas-big-buffer-b-s/). And it selects = benchmarks to "prove" it. Andy Bechtolsheim apparently is such a big = name that he can sell defective gear at a premium price, letting the = datacenters who buy it discover that those switches get "clogged up" by = TCP traffic when they are the "bottleneck link". Fortunately, they are = fast, so they are less frequently the bottleneck in datacenter daily = use. > =20 > In trying to understand what is going on with congestion signalling, = any buffering at the entry to the link should be due only to imperfect = information being fed back to the endpoints generating traffic. Because = a misbehaving endpoint generates Denial of Service for all other users. This is a good point, and one of the reasons, why I conceptually = like flow queueing, as that gives the tools to allow to isolate bad = actors, "trust, but verify" comes to mind as a principle. I also add = that the _only_ currently known L4S rolll-out target (low latency = docsis) actually mandates a mechanism they call "queue protection" which = to me looks pretty much like it is a FQ system that carefully tries to = not call itself FQ (it monitors the length of flows and if they exceed = something pushes them into the RFC3168 queue, which to this layman means = it need to separately track the packets for each flow in the common = queue to be able to re-direct them). > =20 > Priority mechanisms focused on protecting high-paying users from = low-paying ones don't help much - they only help at overloaded states of = the network. In principle I agree, in practice things get complicated; mixing = latency-indifferent capacity-devouring applications like bit-torrent = with say VoIP packets (fixed rates, but latency sensitive) over too = narrow a link will make it clear that giving the VoIP packet = precedence/priority over the bulk-transfer packet is a sane policy (that = becomes an issue due to the difficulty of running a narrow link below = capacity). I am sure you are aware of all of this, I just need to spell = it out for my thinking process. > Which isn't to say that priority does nothing - it's just that stable = assignment of a sharing level to priority levels isn't easy. (See Paris = Metro Pricing, where there are only two classes, and the problem of = deciding how to manage the access to the "first class" section - the = idea that 15 classes with different metrics can be handled simply and = interoperably between differently managed autonomous systems seems to be = an incredibly impractical goal). +1; any prioritization scheme should be extremely simple so that = an end-user can make predictions about its behavior easily. Also IMHO 3 = classes of latency behaviour will go a long way, "normal", "don-t care", = "important" should be enough (L4S IMHO only offers "important" and = normal, so does not offer to easily down-grade say bulk background = transfers like bit-torrent (which is going to be an issue with = bit-torrent triggering on ~100 induced latency increase with L4S's = RFC3168 queue using a PIE offspring to keep induced latency << 100ms, = but I digress).) > Even in the priority case, buffering is NOT a desirable end user = thing. +1; IMHO again a reason for fq, misbehaving flows will not spoil = the fun for everybody else. > =20 > My personal view is that the manager of a network needs to configure = the network so that no link ever gets overloaded, if possible. The = response to overload should be to tell the relevant flows to all slow = down (not just one, because if there are 100 flows that start up roughly = at the same time, causing MD on one does very little. > This is an example of something where per-flow stuff in the router = actually makes the router helpful in the large scheme of things. Maybe = all flows should be equally informed, as flows. Which means the router = needs to know how to signal multiple flows, while not just hammering all = the packets of a single flow. This case is very real, but not as = frequently on the client side as on the "server side" in "load = balancers" and such like. > =20 > My point here is simple: > =20 > 1) the endpoints tell the routers what flows are going through a link = already. That's just the address information. So that information can be = used for fairness pretty well, especially if short term memory (a bloom = filter, perhaps) can track a sufficiently large number of flows. > =20 > 2) The per-flow decisions related to congestion control within a flow = are necessarily end-to-end in nature - the router can only tell the ends = what is going on, but the ends (together - their admissions rates and = consumption rates are coupled to the use being made) must be informed = and decide. The congestion management must combine information about the = source and the destination future behavior (even if it is just taking = recent history and projecting it as an estimate of future behavior at = source and destination). Which is why it is quite natural to have = routers signal the destination, which then signals the source, which = changes its behavior. In an ideal world the router would also signal the sender as = that will at least half the time it takes for the congestion information = to reach the most relevant party; but as I understand this is a) not = generally possible and b) prone to abuses. > =20 > 3) there are definitely other ways to improve latency for IP and = protocols built on top of it - routing some flows over different paths = under congestion is one. call the per-flow routing. Another is = scattering a flow over several paths (but that seems problematic for = today's TcP which assumes all packets take the same path). This is about re-ordering, no?=20 > =20 > 4) A different, but very coupled view of IP is that any = application-relevant buffering shoujld be driven into the endpoints - at = the source, buffering is useful to deal with variability in the rate of = production of data to be sent. At the destination, buffering is useful = to minimize jitter, matching to the consumption behavior of the = application. But these buffers should not be pushed into the network = where they cause congestion for other flows sharing resources. > So buffering in the network should ONLY deal with the uncertainty in = resource competition. This, at least in my understanding, is one of the underlaying = ideas of the L4S approach, so how is your take on how well L4S archives = that goal? > =20 > This tripartite breakdown of buffering is protocol independent. It = applies to TCP, NTP, RTP, QUIC/UDP, ... It's what we (that is me) had = in mind when we split UDP out of TCP, allowing UDP based protocols to = manage source and destination buffering in the application for all the = things we thought UDP would be used for - packet speech, = computer-computer remote procedure calls (what would be QUIC today), = SATNET/interplanetary Internet connections , ...). Like many great insights that look obvious in retro-spect, I = would guess that might have been controversial at its time? > =20 > Sadly, in the many years since the late 1970's the tendency to think = file transfers between infinite speed storage devices over TCP are the = only relevant use of the Internet has penetrated the router design = community. I can't seem to get anyone to recognize how far we are from = that. No one runs benchmarks for such behavior, no one even measures = anything other than the "hot rod" maximum throughput cases. I would guess, that this obsession might be market-driven, as = long as customers only look at the top-speed numbers, increasing this = number will be the priority. Again thanks for your insights Sebastian > =20 > And many egos seem to think that working on the hot rod cases is going = to make their career or sell product. (e.g. the sad case of Arista). > =20 > =20 > On Wednesday, June 26, 2019 8:48am, "Sebastian Moeller" = said: >=20 > >=20 > >=20 > > > On Jun 23, 2019, at 00:09, David P. Reed = wrote: > > > > > > [...] > > > > > > per-flow scheduling is appropriate on a shared link. However, the = end-to-end > > argument would suggest that the network not try to divine which = flows get > > preferred. > > > And beyond the end-to-end argument, there's a practical problem - = since the > > ideal state of a shared link means that it ought to have no local = backlog in the > > queue, the information needed to schedule "fairly" isn't in the = queue backlog > > itself. If there is only one packet, what's to schedule? > > > > > [...] > >=20 > > Excuse my stupidity, but the "only one single packet" case is the = theoretical > > limiting case, no? > > Because even on a link not running at capacity this effectively = requires a > > mechanism to "synchronize" all senders (whose packets traverse the = hop we are > > looking at), as no other packet is allowed to reach the hop unless = the "current" > > one has been passed to the PHY otherwise we transiently queue 2 = packets (I note > > that this rationale should hold for any small N). The more packets = per second a > > hop handles the less likely it will be to avoid for any newcomer to = run into an > > already existing packet(s), that is to transiently grow the queue. > > Not having a CS background, I fail to see how this required = synchronized state can > > exist outside of a few steady state configurations where things = change slowly > > enough that the seemingly required synchronization can actually = happen (given > > that the feedback loop e.g. through ACKs, seems somewhat jittery). = Since packets > > never know which path they take and which hop is going to be = critical there seems > > to be no a priori way to synchronize all senders, heck I fail to see = whether it > > would be possible at all to guarantee synchronized behavior on more = than one hop > > (unless all hops are extremely uniform). > > I happen to believe that L4S suffers from the same conceptual issue = (plus overly > > generic promises, from the RITE website: > > "We are so used to the unpredictability of queuing delay, we don=E2=80= =99t know how > > good the Internet would feel without it. The RITE project has = developed simple > > technology to make queuing delay a thing of the past=E2=80=94not = just for a select > > few apps, but for all." this seems missing a conditions apply = statement) > >=20 > > Best Regards > > Sebastian