From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id B589B3B2A4 for ; Thu, 18 Jul 2019 12:06:37 -0400 (EDT) Received: by mail-io1-xd35.google.com with SMTP id j6so19986524ioa.5 for ; Thu, 18 Jul 2019 09:06:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=L6hzNy/h8tTjqYO4wvrePrmml3lf7neKC+ZaYdWl/7g=; b=litjec5hvsy40rvxaWtXz3FJDQTlt7IWxvzMFnB+e95/LmQtsEPqpkugROfswh80fV knO+pk9s68lBS8COdC/fWu02gJ428uvvSO1L6UYLWNQCK2HZgLFVQ91q6hLkUaPJeV+6 sgYpmiz1a2QS08R+9D8Az+HdQBVik3CyaQh+khzag6ef/wUCrNil635IVhIlQ6Ru3WAi 30IDIb2aMGffYVGdPdEEUEIlvsmDaxZhhzUVxeiFEkcE9W0E2y2x89Bf2WDDuFLn1gkl Z0Vb9fz0IQ8kDNiBKvcVTpmGt8t2bJc+jxSxMpWODAh+1F47IEukItMqu3ix0ykcWeQr ppVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=L6hzNy/h8tTjqYO4wvrePrmml3lf7neKC+ZaYdWl/7g=; b=hTTLOTS+N3PRr+EYcKTWuzEk6FvwzzXRURJSXjQQ7W9yAS+y2CZ1SBSWGBICHsiXIn 8qT/ZSg2avzfYQSQKRB48RktiVd0Bb7pyptapMhRxGHRYsxN/k9VB2rvCWqnMJFpPP6J mPfSL5/pKq87zm9vSrY1MGhSjzVwvtmNBp1n6ZDzn35uYNIxLKWUie5WSPMzc5rNvYc+ 7fsIaL3QxCa1vnaB+1wss98OjoJsQMBbJs9k2bvyie92fYnblQBYvsOitGoptw6Km1db s16Iazv5ENR/KOZtWWrE45vJxZUOolN4xY7FF6w/Hv3+E5+xNlJRyzL4q549HlqGnD4J zcZQ== X-Gm-Message-State: APjAAAUDgHo+9erbFmHSafQaK7CtE5RSkKm/PZ6u5czYzWat/j4HbnHj Qt59sa1MOXlSPyrZyFtLDylCGVrhnJQhhBaCUvEl7Q== X-Google-Smtp-Source: APXvYqzh+huAXQCX8EofJAgrmabT4hlkrNp4C+rYsOt/UgvWg8O9qmre92kTBLeNd1JyJHPiA5NhiPnsVFuaYfuC5To= X-Received: by 2002:a02:c7c9:: with SMTP id s9mr48276909jao.82.1563465996915; Thu, 18 Jul 2019 09:06:36 -0700 (PDT) MIME-Version: 1.0 References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net> <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de> <1563401917.00951412@apps.rackspace.com> <1563402855.88484511@apps.rackspace.com> <1563462132.13975616@apps.rackspace.com> In-Reply-To: <1563462132.13975616@apps.rackspace.com> From: Dave Taht Date: Thu, 18 Jul 2019 09:06:22 -0700 Message-ID: To: "David P. Reed" Cc: "ecn-sane@lists.bufferbloat.net" , Bob Briscoe , tsvwg IETF list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Ecn-sane] per-flow scheduling X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jul 2019 16:06:37 -0000 On Thu, Jul 18, 2019 at 8:02 AM David P. Reed wrote: > > Dave - > The context of my remarks was about the end-to-end arguments for placing = function in the Internet. > > To that end, that "you do not mind putting storage for low priority packe= ts in the routers" doesn't matter, for two important reasons: > > 1) the idea that one should "throw in a feature" because people "don't mi= nd" is exactly what leads to feature creep of the worst kind - features tha= t serve absolutely no real purpose. That's what we rigorously objected to i= n the late 1970's. No, we would NOT throw in features as they were "request= ed" because we didn't mind. I dig it. :) If only the 5G folk had had your approach..... > > 2) you have made no argument that the function cannot be done properly at= the ends, and no argument that putting it in the network is necessary for = the ends to achieve storage. You are correct. > On Wednesday, July 17, 2019 7:23pm, "Dave Taht" sai= d: > > > On Wed, Jul 17, 2019 at 3:34 PM David P. Reed wro= te: > >> > >> A follow up point that I think needs to be made is one more end-to-end= argument: > >> > >> It is NOT the job of the IP transport layer to provide free storage fo= r low > >> priority packets. The end-to-end argument here says: the ends can and = must hold > >> packets until they are either delivered or not relevant (in RTP, they = become > >> irrelevant when they get older than their desired delivery time, if yo= u want an > >> example of the latter), SO, the network should not provide the functio= n of > >> storage beyond the minimum needed to deal with transients. > >> > >> That means, unfortunately, that the dream of some kind of "background"= path that > >> stores "low priority" packets in the network fails the end-to-end argu= ment test. > > > > I do not mind reserving a tiny portion of the network for "background" > > traffic. This > > is different (I think?) than storing low priority packets in the > > network. A background > > traffic "queue" of 1 packet would be fine.... > > > >> If you think about this, it even applies to some imaginary interplanet= ary IP > >> layer network. Queueing delay is not a feature of any end-to-end requi= rement. > >> > >> What may be desired at the router/link level in an interplanetary IP l= ayer is > >> holding packets because a link is actually down, or using link-level e= rror > >> correction coding or retransmission to bring the error rate down to an= acceptable > >> level before declaring it down. But that's quite different - it's the = link level > >> protocol, which aims to deliver minimum queueing delay under tough con= ditions, > >> without buffering more than needed for that (the number of bits that f= it in the > >> light-speed transmission at the transmission rate. > > > > As I outlined in my mit wifi talk - 1 layer of retry of at the wifi > > mac layer made it > > work, in 1998, and that seemed a very acceptable compromise at the > > time. Present day > > retries at the layer, not congestion controlled, is totally out of hand= . > > > > In thinking about starlink's mac, and mobility, I gradulally came to > > the conclusion that > > 1 retry from satellites 550km up (3.6ms rtt) was needed, as much as I > > disliked the idea. > > > > I still dislike retries at layer 2, even for nearby sats. really > > complicates things. so for all I know I'll be advocating ripping 'em > > out in starlink, if they are indeed, in there, next week. > > > >> So, the main reason I'm saying this is because again, there are those = who want to > >> implement the TCP function of reliable delivery of each packet in the = links. > >> That's a very bad idea. > > > > It was tried in the arpanet, and didn't work well there. There's a > > good story about many > > of the flaws of the Arpanet's design, including that problem, in the > > latter half of Kleinrock's second book on queue theory, at least the > > first edition... > > > > Wifi (and 345g) re-introduced the same problem with retransmits and > > block acks at layer 2. > > > > and after dissecting my ecn battlemesh data and observing what the > > retries at the mac layer STILL do on wifi with the current default > > wifi codel target (20ms AFTER two txops are in the hardware) currently > > achieve (50ms, which is 10x worse than what we could do and still > > better performance under load than any other shipping physical layer > > we have with fifos)... and after thinking hard about nagle's thought > > that "every application has a right to one packet in the network", and > > this very long thread reworking the end to end argument in a similar, > > but not quite identical direction, I'm coming to a couple conclusions > > I'd possibly not quite expressed well before. > > > > 1) transports should treat an RFC3168 CE coupled with loss (drop and > > mark) as an even stronger signal of congestion than either, and that > > this bit of the codel algorithm, > > when ecn is in use, is wrong, and has always been wrong: > > > > https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L178 > > > > (we added this arbitrarily to codel in the 5th day of development in > > 2012. Using FQ masked it's effects on light traffic) > > > > What it should do instead is peek the queue and drop until it hits a > > markable packet, at the very least. > > > > Pie has an arbitrary drop at 10% figure, which does lighten the load > > some... cake used to have drop and mark also until a year or two > > back... > > > > 2) At low rates and high contention, we really need pacing and fraction= al cwnd. > > > > (while I would very much like to see a dynamic reduction of MSS tried, > > that too has a bottom limit) > > > > even then, drop as per bullet 1. > > > > 3) In the end, I could see a world with SCE marks, and CE being > > obsoleted in favor of drop, or CE only being exerted on really light > > loads similar to (or less than!) what the arbitrary 10% figure for pie > > uses > > > > 4) in all cases, I vastly prefer somehow ultimately shifting greedy > > transports to RTT rather than drop or CE as their primary congestion > > control indicator. FQ makes that feasible today. With enough FQ > > deployed for enough congestive scenarios and hardware, and RTT > > becoming the core indicator for more transports, single queued designs > > become possible in the distant future. > > > > > >> > >> On Wednesday, July 17, 2019 6:18pm, "David P. Reed" said: > >> > >> > I do want to toss in my personal observations about the "end-to-end = argument" > >> > related to per-flow-scheduling. (Such arguments are, of course, a cl= ass of > >> > arguments to which my name is attached. Not that I am a judge/jury o= f such > >> > questions...) > >> > > >> > A core principle of the Internet design is to move function out of t= he > >> network, > >> > including routers and middleboxes, if those functions > >> > > >> > a) can be properly accomplished by the endpoints, and > >> > b) are not relevant to all uses of the Internet transport fabric bei= ng used by > >> the > >> > ends. > >> > > >> > The rationale here has always seemed obvious to me. Like Bob Briscoe= suggests, > >> we > >> > were very wary of throwing features into the network that would prec= lude > >> > unanticipated future interoperability needs, new applications, and n= ew > >> technology > >> > in the infrastructure of the Internet as a whole. > >> > > >> > So what are we talking about here (ignoring the fine points of SCE, = some of > >> which > >> > I think are debatable - especially the focus on TCP alone, since muc= h traffic > >> will > >> > likely move away from TCP in the near future. > >> > > >> > A second technical requirement (necessary invariant) of the Internet= 's > >> transport > >> > is that the entire Internet depends on rigorously stopping queueing = delay from > >> > building up anywhere except at the endpoints, where the ends can man= age it.This > >> is > >> > absolutely critical, though it is peculiar in that many engineers, e= specially > >> > those who work at the IP layer and below, have a mental model of rou= ting as > >> > essentially being about building up queueing delay (in order to mana= ge priority > >> in > >> > some trivial way by building up the queue on purpose, apparently). > >> > > >> > This second technical requirement cannot be resolved merely by the e= ndpoints. > >> > The reason is that the endpoints cannot know accurately what host-ho= st paths > >> share > >> > common queues. > >> > > >> > This lack of a way to "cooperate" among independent users of a queue= cannot be > >> > solved by a purely end-to-end solution. (well, I suppose some genius= might > >> invent > >> > a way, but I have not seen one in my 36 years closely watching the I= nternet in > >> > operation since it went live in 1983.) > >> > > >> > So, what the end-to-end argument would tend to do here, in my opinio= n, is to > >> > provide the most minimal mechanism in the devices that are capable o= f building > >> up > >> > a queue in order to allow all the ends sharing that queue to do thei= r job - > >> which > >> > is to stop filling up the queue! > >> > > >> > Only the endpoints can prevent filling up queues. And depending on t= he > >> protocol, > >> > they may need to make very different, yet compatible choices. > >> > > >> > This is a question of design at the architectural level. And the fut= ure > >> matters. > >> > > >> > So there is an end-to-end argument to be made here, but it is a subt= le one. > >> > > >> > The basic mechanism for controlling queue depth has been, and remain= s, quite > >> > simple: dropping packets. This has two impacts: 1) immediately reduc= ing > >> queueing > >> > delay, and 2) signalling to endpoints that are paying attention that= they have > >> > contributed to an overfull queue. > >> > > >> > The optimum queueing delay in a steady state would always be one pac= ket or > >> less. > >> > Kleinrock has shown this in the last few years. Of course there aren= 't steady > >> > states. But we don't want a mechanism that can't converge to that st= eady state > >> > *quickly*, for all queues in the network. > >> > > >> > Another issue is that endpoints are not aware of the fact that packe= ts can > >> take > >> > multiple paths to any destination. In the future, alternate path cho= ices can > >> be > >> > made by routers (when we get smarter routing algorithms based on tra= ffic > >> > engineering). > >> > > >> > So again, some minimal kind of information must be exposed to endpoi= nts that > >> will > >> > continue to communicate. Again, the routers must be able to help a w= ide variety > >> of > >> > endpoints with different use cases to decide how to move queue build= up out of > >> the > >> > network itself. > >> > > >> > Now the decision made by the endpoints must be made in the context o= f > >> information > >> > about fairness. Maybe this is what is not obvious. > >> > > >> > The most obvious notion of fairness is equal shares among source hos= t, dest > >> host > >> > pairs. There are drawbacks to that, but the benefit of it is that it= affects > >> the > >> > IP layer alone, and deals with lots of boundary cases like the case = where a > >> single > >> > host opens a zillion TCP connections or uses lots of UDP source port= s or > >> > destinations to somehow "cheat" by appearing to have "lots of flows"= . > >> > > >> > Another way to deal with dividing up flows is to ignore higher level= protocol > >> > information entirely, and put the flow idenfitication in the IP laye= r. A 32-bit > >> or > >> > 64-bit random number could be added as an "option" to IP to somehow = extend the > >> > flow space. > >> > > >> > But that is not the most important thing today. > >> > > >> > I write this to say: > >> > 1) some kind of per-flow queueing, during the transient state where = a queue is > >> > overloaded before packets are dropped would provide much needed info= rmation to > >> the > >> > ends of every flow sharing a common queue. > >> > 2) per-flow queueing, minimized to a very low level, using IP envelo= pe address > >> > information (plus maybe UDP and TCP addresses for those protocols in= an > >> extended > >> > address-based flow definition) is totally compatible with end-to-end > >> arguments, > >> > but ONLY if the decisions made are certain to drive queueing delay o= ut of the > >> > router to the endpoints. > >> > > >> > > >> > > >> > > >> > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" > >> said: > >> > > >> >> Dear Bob, dear IETF team, > >> >> > >> >> > >> >>> On Jun 19, 2019, at 16:12, Bob Briscoe wrote= : > >> >>> > >> >>> Jake, all, > >> >>> > >> >>> You may not be aware of my long history of concern about how per-f= low > >> scheduling > >> >>> within endpoints and networks will limit the Internet in future. I= find > >> per-flow > >> >>> scheduling a violation of the e2e principle in such a profound way= - the > >> dynamic > >> >>> choice of the spacing between packets - that most people don't eve= n associate > >> it > >> >>> with the e2e principle. > >> >> > >> >> This does not rhyme well with the L4S stated advantage of allo= wing > >> packet > >> >> reordering (due to mandating RACK for all L4S tcp endpoints). Becau= se surely > >> >> changing the order of packets messes up the "the dynamic choice of = the > >> spacing > >> >> between packets" in a significant way. IMHO it is either L4S is gre= at because > >> it > >> >> will give intermediate hops more leeway to re-order packets, or "a = sender's > >> >> packet spacing" is sacred, please make up your mind which it is. > >> >> > >> >>> > >> >>> I detected that you were talking about FQ in a way that might have= assumed > >> my > >> >>> concern with it was just about implementation complexity. If you (= or anyone > >> >>> watching) is not aware of the architectural concerns with per-flow > >> scheduling, I > >> >>> can enumerate them. > >> >> > >> >> Please do not hesitate to do so after your deserved holiday, a= nd please > >> state a > >> >> superior alternative. > >> >> > >> >> Best Regards > >> >> Sebastian > >> >> > >> >> > >> >>> > >> >>> I originally started working on what became L4S to prove that it w= as possible > >> to > >> >>> separate out reducing queuing delay from throughput scheduling. Wh= en Koen and > >> I > >> >>> started working together on this, we discovered we had identical c= oncerns on > >> >>> this. > >> >>> > >> >>> > >> >>> > >> >>> Bob > >> >>> > >> >>> > >> >>> -- > >> >>> ________________________________________________________________ > >> >>> Bob Briscoe http://bobbriscoe.net/ > >> >>> > >> >>> _______________________________________________ > >> >>> Ecn-sane mailing list > >> >>> Ecn-sane@lists.bufferbloat.net > >> >>> https://lists.bufferbloat.net/listinfo/ecn-sane > >> >> > >> >> _______________________________________________ > >> >> Ecn-sane mailing list > >> >> Ecn-sane@lists.bufferbloat.net > >> >> https://lists.bufferbloat.net/listinfo/ecn-sane > >> >> > >> > > >> > > >> > _______________________________________________ > >> > Ecn-sane mailing list > >> > Ecn-sane@lists.bufferbloat.net > >> > https://lists.bufferbloat.net/listinfo/ecn-sane > >> > > >> > >> > >> _______________________________________________ > >> Ecn-sane mailing list > >> Ecn-sane@lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/ecn-sane > > > > > > > > -- > > > > Dave T=C3=A4ht > > CTO, TekLibre, LLC > > http://www.teklibre.com > > Tel: 1-831-205-9740 > > > > --=20 Dave T=C3=A4ht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740