From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id AEC953CB35 for ; Thu, 27 Jun 2019 03:54:27 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1561622063; bh=xYBMiYY3DYNZbOnFIxj1pa8pd4b8ftbUgTeSp+TjS3E=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=SMIvLxljTWkjlSBGOg0THSBKMF/1GUgSBV/QWLHfsYBwnXkkBVVqQbQPYEr8vQD/r 3/XEJwAmdO+KKHFEGtBWnYHQF5zP3LPpMAXYP5MbTLJBpAVYWoSs6qjBgdXVZ7payU im/+S9XnGw+duzeXG1CXwn6+fwLFJTV68OP9id3E= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [172.17.3.45] ([134.76.241.253]) by mail.gmx.com (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MkHQX-1iQzW21sWP-00ke9a; Thu, 27 Jun 2019 09:54:23 +0200 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) From: Sebastian Moeller X-Priority: 3 (Normal) In-Reply-To: <1561567982.16883207@apps.rackspace.com> Date: Thu, 27 Jun 2019 09:54:21 +0200 Cc: "ecn-sane@lists.bufferbloat.net" , Brian E Carpenter , tsvwg IETF list Content-Transfer-Encoding: quoted-printable Message-Id: <593758EE-C831-4B34-B5C8-1F3858F94236@gmx.de> References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net> <46D1ABD8-715D-44D2-B7A0-12FE2A9263FE@gmx.de> <835b1fb3-e8d5-c58c-e2f8-03d2b886af38@gmail.com> <1561233009.95886420@apps.rackspace.com> <71EF351D-AFBF-4C92-B6B9-7FD695B68815@gmail.com> <1561241377.4026977@apps.rackspace.com> <4E863FC5-D30E-4F76-BDF7-6A787958C628@gmx.de> <1561566706.778820831@apps.rackspace.com> <1561567982.16883207@apps.rackspace.com> To: "David P. Reed" X-Mailer: Apple Mail (2.3445.104.11) X-Provags-ID: V03:K1:LZg08R1ceFJ46DANecr6E+3rXQ4sfKQNr4h0eJWwOrxWWGB2GsM uHCP48KqmGJn3y1yF4Guh9ZLOGwiH/AmWEqp39LywG+Qw/GVCYGVUxtcabWZ6rZiGLn8p0L j0F1J3Dg6v8UyV4n9Ac/H9GvcNKc2M6wu3+pLZ3zyXst1gTJ5mQKLH8deLoL/IWQdqZZfYP ugTe48QO5WhGDG0945stQ== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:9MwcDir6vzs=:mtA6WxXYFVOR6H+YDV4YLV CE+U/H2Y6+kge+CzOYd3ZKrwU/gyo9LLihwzZ6VwEOzhQ6xpuicE1gF0z9CTU/DpW4WnzlJjW OO3oAiVAbBNk0TFhhVExFM9oRbxP8/StbzItqooVaknQ0SYD53R181pfKKcnUC5oYRgX6iIC6 dhx7xokryXgGfTS9QxBeduCcmulfMmlVVaFppmH6GTUpCCloNjFRZ920Czc4qm5/5uK1/AyYM X2KLgUeW045nmgOBu4Otrgpdyu0wKWDvs/zk63xrWWi3SXiI35BIyPi6rPi5rFrTvu6ykkaAX xx2AoKSfsNDGdLeMszXNSs15sF6+9Ubt3NJFJ0vqkDyiHV638Am4YqEn2qi3O1LnOj+ji+dK3 XflQwu1uEooYCdD2FzSjju+HBIMUt7F06M9h16OfK04NS/cMX8C/VIGvNwMlvNZgS/9kDm53j R1K9N5Ea0bqNgG0fi2jkYwKhW2AjiXCp40V+F1t0+aigf6/uAnIHVvX9tXqceZzQkBQLgwrJv 4fpE7iOR3+NFVeeDHe6Gp3ynKBG1RK2ETwOpoMiSmbPOp0+tN5OcW5tRvGD2vS3a4qnJ1tat2 e2Anb4pf9QtRi6m0gM7msvvVX6ryxkIbTDM6dAJDigUZlDSvH5tV2q5H94rfD8hThqAKvZkjT HO72QnRc9ippy/urFmeuAecuACkYyMnO0f8/+hMNFjMynuM6z78ELkoqwvKj+kvT2PRo9tgKM 9SPIqOBO2fpqgU2XYT3s3gZQOzySfB8BLtyR99bkYXtLdCJA/aY/lUiBX/sDTeVRNPuMcJPsU o0o0t47Nmq+nKCITvVTHJvHG0Qy/5oLdL2LBi+fEgI+gvCNe6M++h2btu9MENtiHH7t5VIO6M kRDk4lTI1sIb4UhwNe0qNVRx/Hegwdt4BpqDx3QYno2IWcHgWNt67zIXEhA+cewng7RDUjNq+ T/YkvexDmL7eD1h6223DC++deLosSLHpICOY0j/wDw/py8bAVSjK9 Subject: Re: [Ecn-sane] [tsvwg] per-flow scheduling X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 27 Jun 2019 07:54:28 -0000 Hi David, > On Jun 26, 2019, at 18:53, David P. Reed wrote: >=20 > A further minor thought, maybe one that needs not be said: > =20 > Flows aren't "connections". Routers are not involved in connection = state management, which is purely part of the end to end protocol. = Anything about "connections" that a router might need to know to handle = a packet should be packaged into the IP header of each packet in a = standard form. I read this, that your are not opposed to using IP packet data = to convey information to intermediate routers then? In a way (and please = correct me if this is wrong /too simplistic), L4S intends to use the = ECT(1) codepoint for enspoints to signal to router's their behavior = towards CE congestion signals (reduce window/rate by 50% versus a = smaller step down). > Routers can "store" this information associated with the source, = destination pair if they want, for a short time, subject to well = understood semantics when they run out of storage. This fits into an = end-to-end argument as an optiimization of a kind, as long as the = function of such information is very narrowly and generally defined to = benefit all users of IP-based protocols. Okay, that I read as fq-syatems are not in violation of e2e = then. Best Regards Sebastian > =20 > For example, remembering the last time a packet of a particular flow = was received after forwarding it, for a short time, to calculate = fairness, that seems like a very useful idea, as long as forgetting the = last time of receipt is not unfair. > =20 > This use of the flow's IP headers to carry info into router queueing = and routing decisions is analogous to the "Fate Sharing" principle of = protocol design that DDC describes. Instead of having an independent = control plane protocol, which has all kinds of problems with = synchronization and combinatorial problems of packet loss, "Fate = Sharing" of protocol information is very elegant. > On Wednesday, June 26, 2019 12:31pm, "David P. Reed" = said: >=20 > It's the limiting case, but also the optimal state given "perfect = knowledge". > =20 > Yes, it requires that the source-destination pairs sharing the link in = question coordinate their packet admission times so they don't "collide" = at the link. Ideally the next packet would arrive during the previous = packet's transmission, so it is ready-to-go when that packet's = transmission ends. > =20 > Such exquisite coordination is feasible when future behavior by source = and destination at the interface is known, which requires an Oracle. > That's the same kind of condition most information theoretic and = queueing theoretic optimality requires. > =20 > But this is worth keeping in mind as the overall joint goal of all = users. > =20 > In particular, "link utilization" isn't a user goal at all. The link = is there and is being paid for whether it is used or not (looking from = the network structure as a whole). Its capacity exists to move packets = out of the way. An ideal link satisfies the requirement that it never = creates a queue because of anything other than imperfect coordination of = the end-to-end flows mapped onto it. That's why the router should not be = measured by "link utilization" anymore than a tunnel in a city during = commuting hours should be measured by cars moved per hour. Clearly a = tunnel can be VERY congested and moving many cars if they are attached = to each other bumper to bumper - the latency through the tunnel would = then be huge. If the cars were tipped on their ends and stacked, even = more throughput would be achieved through the tunnel, and the delay of = rotating them and packing them would add even more delay. > =20 > The idea that "link utilization" of 100% must be achieved is why we = got bufferbloat designed into routers. It's a worm's eye perspective. To = this day, Arista Networks brags about how its bufferbloated feature = design optimizes switch utilization = (https://packetpushers.net/aristas-big-buffer-b-s/). And it selects = benchmarks to "prove" it. Andy Bechtolsheim apparently is such a big = name that he can sell defective gear at a premium price, letting the = datacenters who buy it discover that those switches get "clogged up" by = TCP traffic when they are the "bottleneck link". Fortunately, they are = fast, so they are less frequently the bottleneck in datacenter daily = use. > =20 > In trying to understand what is going on with congestion signalling, = any buffering at the entry to the link should be due only to imperfect = information being fed back to the endpoints generating traffic. Because = a misbehaving endpoint generates Denial of Service for all other users. > =20 > Priority mechanisms focused on protecting high-paying users from = low-paying ones don't help much - they only help at overloaded states of = the network. Which isn't to say that priority does nothing - it's just = that stable assignment of a sharing level to priority levels isn't easy. = (See Paris Metro Pricing, where there are only two classes, and the = problem of deciding how to manage the access to the "first class" = section - the idea that 15 classes with different metrics can be handled = simply and interoperably between differently managed autonomous systems = seems to be an incredibly impractical goal). > Even in the priority case, buffering is NOT a desirable end user = thing. > =20 > My personal view is that the manager of a network needs to configure = the network so that no link ever gets overloaded, if possible. The = response to overload should be to tell the relevant flows to all slow = down (not just one, because if there are 100 flows that start up roughly = at the same time, causing MD on one does very little. This is an example = of something where per-flow stuff in the router actually makes the = router helpful in the large scheme of things. Maybe all flows should be = equally informed, as flows. Which means the router needs to know how to = signal multiple flows, while not just hammering all the packets of a = single flow. This case is very real, but not as frequently on the = client side as on the "server side" in "load balancers" and such like. > =20 > My point here is simple: > =20 > 1) the endpoints tell the routers what flows are going through a link = already. That's just the address information. So that information can be = used for fairness pretty well, especially if short term memory (a bloom = filter, perhaps) can track a sufficiently large number of flows. > =20 > 2) The per-flow decisions related to congestion control within a flow = are necessarily end-to-end in nature - the router can only tell the ends = what is going on, but the ends (together - their admissions rates and = consumption rates are coupled to the use being made) must be informed = and decide. The congestion management must combine information about the = source and the destination future behavior (even if it is just taking = recent history and projecting it as an estimate of future behavior at = source and destination). Which is why it is quite natural to have = routers signal the destination, which then signals the source, which = changes its behavior. > =20 > 3) there are definitely other ways to improve latency for IP and = protocols built on top of it - routing some flows over different paths = under congestion is one. call the per-flow routing. Another is = scattering a flow over several paths (but that seems problematic for = today's TcP which assumes all packets take the same path). > =20 > 4) A different, but very coupled view of IP is that any = application-relevant buffering shoujld be driven into the endpoints - at = the source, buffering is useful to deal with variability in the rate of = production of data to be sent. At the destination, buffering is useful = to minimize jitter, matching to the consumption behavior of the = application. But these buffers should not be pushed into the network = where they cause congestion for other flows sharing resources. > So buffering in the network should ONLY deal with the uncertainty in = resource competition. > =20 > This tripartite breakdown of buffering is protocol independent. It = applies to TCP, NTP, RTP, QUIC/UDP, ... It's what we (that is me) had = in mind when we split UDP out of TCP, allowing UDP based protocols to = manage source and destination buffering in the application for all the = things we thought UDP would be used for - packet speech, = computer-computer remote procedure calls (what would be QUIC today), = SATNET/interplanetary Internet connections , ...). > =20 > Sadly, in the many years since the late 1970's the tendency to think = file transfers between infinite speed storage devices over TCP are the = only relevant use of the Internet has penetrated the router design = community. I can't seem to get anyone to recognize how far we are from = that. No one runs benchmarks for such behavior, no one even measures = anything other than the "hot rod" maximum throughput cases. > =20 > And many egos seem to think that working on the hot rod cases is going = to make their career or sell product. (e.g. the sad case of Arista). > =20 > =20 > On Wednesday, June 26, 2019 8:48am, "Sebastian Moeller" = said: >=20 > >=20 > >=20 > > > On Jun 23, 2019, at 00:09, David P. Reed = wrote: > > > > > > [...] > > > > > > per-flow scheduling is appropriate on a shared link. However, the = end-to-end > > argument would suggest that the network not try to divine which = flows get > > preferred. > > > And beyond the end-to-end argument, there's a practical problem - = since the > > ideal state of a shared link means that it ought to have no local = backlog in the > > queue, the information needed to schedule "fairly" isn't in the = queue backlog > > itself. If there is only one packet, what's to schedule? > > > > > [...] > >=20 > > Excuse my stupidity, but the "only one single packet" case is the = theoretical > > limiting case, no? > > Because even on a link not running at capacity this effectively = requires a > > mechanism to "synchronize" all senders (whose packets traverse the = hop we are > > looking at), as no other packet is allowed to reach the hop unless = the "current" > > one has been passed to the PHY otherwise we transiently queue 2 = packets (I note > > that this rationale should hold for any small N). The more packets = per second a > > hop handles the less likely it will be to avoid for any newcomer to = run into an > > already existing packet(s), that is to transiently grow the queue. > > Not having a CS background, I fail to see how this required = synchronized state can > > exist outside of a few steady state configurations where things = change slowly > > enough that the seemingly required synchronization can actually = happen (given > > that the feedback loop e.g. through ACKs, seems somewhat jittery). = Since packets > > never know which path they take and which hop is going to be = critical there seems > > to be no a priori way to synchronize all senders, heck I fail to see = whether it > > would be possible at all to guarantee synchronized behavior on more = than one hop > > (unless all hops are extremely uniform). > > I happen to believe that L4S suffers from the same conceptual issue = (plus overly > > generic promises, from the RITE website: > > "We are so used to the unpredictability of queuing delay, we don=E2=80= =99t know how > > good the Internet would feel without it. The RITE project has = developed simple > > technology to make queuing delay a thing of the past=E2=80=94not = just for a select > > few apps, but for all." this seems missing a conditions apply = statement) > >=20 > > Best Regards > > Sebastian