From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com [IPv6:2607:f8b0:4864:20::d41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id CCC173CB3A for ; Wed, 17 Jul 2019 20:21:12 -0400 (EDT) Received: by mail-io1-xd41.google.com with SMTP id q22so48686375iog.4 for ; Wed, 17 Jul 2019 17:21:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=/PGaKPUCrOdlDRkrlhZPqGtGhOaOVzGHF56dfTxra7g=; b=K8/TsLMTViCicXr+AVAT7l2yHsIhvzzUFx/keiGwIL7dXE7pDJQZXBlSh4+RlcJGiS okfDyAGVjNXR9iz4XNxWV87e3999dIMrtfGw68aFmJIuc6DQDd2nn0PtDVTwYBGDLCV6 nd/hKUD7QEl3v58uLGyVoIZi1WAmmLMoX6w0Fr0q7Ow8jcJ3QIpbe8VMAw7F5ib5HjZx I7Z2BQszKz3clf3O3oWkQZzDtm6Vpr9Grqi0uKEeBsfQ9oN+zLPAaKC5U8HIUjpcy2jp 1jB9tKfDZl/MT1ATCZ86s080nCYeu6VXq0kg6ySXpzKM5YzSVKBGauzyJ71cnYkdhOMF iFtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=/PGaKPUCrOdlDRkrlhZPqGtGhOaOVzGHF56dfTxra7g=; b=Ng9ObPrVwY9oUdX8dSpnMTK80mvNX0RBoucKijUGcZ6e1orLSOlE03D80gjmw+TLPv O+wwl3FZ3ipc13td7tNCinR8og1z23/o9a3CjuaE4EWeFAVgdijClm9SfYEErK8UDdHY zJrTzE+VGuW84odLVUqpxvZcRz7mdtv3mAkN8XgI07uevLzuCM5h2zyIgQ1HfXmj558O S5mo6kFPRqh06W0BAmhbJlkf0qnTaPun5kxIGDjOJefPlLk8e0uK2pJrCf+5UmPd25el Up/mmpg0GMLBVV0T7JhPXNTqc/b7JNSJPR6fsICD3+et9geyU6arQEsIQsY9PnPGlan5 RGGw== X-Gm-Message-State: APjAAAVqONWW7y79FXKwnMaQKmkwskn8NHNP3hglYNDDSz8Mzwbw1u2C cYdU/sefK4xWH59pSD4XMnKfPSyg/oCW1twQ0M+2+Q== X-Google-Smtp-Source: APXvYqzlOhL5LW8k56/Y4t2TqUF/+A6ypuUNkmv1xxZxIjrVxTHX95e89lYVwWtozSa0P+EwU5fTJ9DrmMITt/Emalw= X-Received: by 2002:a02:9f07:: with SMTP id z7mr44811307jal.29.1563409272098; Wed, 17 Jul 2019 17:21:12 -0700 (PDT) MIME-Version: 1.0 References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net> <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de> <1563401917.00951412@apps.rackspace.com> <1563402855.88484511@apps.rackspace.com> In-Reply-To: From: Dave Taht Date: Wed, 17 Jul 2019 17:20:59 -0700 Message-ID: To: "David P. Reed" Cc: "ecn-sane@lists.bufferbloat.net" , Bob Briscoe , tsvwg IETF list Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Ecn-sane] per-flow scheduling X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Jul 2019 00:21:12 -0000 On Wed, Jul 17, 2019 at 4:23 PM Dave Taht wrote: > > On Wed, Jul 17, 2019 at 3:34 PM David P. Reed wrote= : > > > > A follow up point that I think needs to be made is one more end-to-end = argument: > > > > It is NOT the job of the IP transport layer to provide free storage for= low priority packets. The end-to-end argument here says: the ends can and = must hold packets until they are either delivered or not relevant (in RTP, = they become irrelevant when they get older than their desired delivery time= , if you want an example of the latter), SO, the network should not provide= the function of storage beyond the minimum needed to deal with transients. > > > > That means, unfortunately, that the dream of some kind of "background" = path that stores "low priority" packets in the network fails the end-to-end= argument test. > > I do not mind reserving a tiny portion of the network for "background" > traffic. This > is different (I think?) than storing low priority packets in the > network. A background > traffic "queue" of 1 packet would be fine.... > > > If you think about this, it even applies to some imaginary interplaneta= ry IP layer network. Queueing delay is not a feature of any end-to-end requ= irement. > > > > What may be desired at the router/link level in an interplanetary IP la= yer is holding packets because a link is actually down, or using link-level= error correction coding or retransmission to bring the error rate down to = an acceptable level before declaring it down. But that's quite different - = it's the link level protocol, which aims to deliver minimum queueing delay = under tough conditions, without buffering more than needed for that (the nu= mber of bits that fit in the light-speed transmission at the transmission r= ate. > > As I outlined in my mit wifi talk - 1 layer of retry of at the wifi > mac layer made it > work, in 1998, and that seemed a very acceptable compromise at the > time. Present day > retries at the layer, not congestion controlled, is totally out of hand. > > In thinking about starlink's mac, and mobility, I gradulally came to > the conclusion that > 1 retry from satellites 550km up (3.6ms rtt) was needed, as much as I > disliked the idea. > > I still dislike retries at layer 2, even for nearby sats. really > complicates things. so for all I know I'll be advocating ripping 'em > out in starlink, if they are indeed, in there, next week. > > > So, the main reason I'm saying this is because again, there are those w= ho want to implement the TCP function of reliable delivery of each packet i= n the links. That's a very bad idea. > > It was tried in the arpanet, and didn't work well there. There's a > good story about many > of the flaws of the Arpanet's design, including that problem, in the > latter half of Kleinrock's second book on queue theory, at least the > first edition... > > Wifi (and 345g) re-introduced the same problem with retransmits and > block acks at layer 2. > > and after dissecting my ecn battlemesh data and observing what the > retries at the mac layer STILL do on wifi with the current default > wifi codel target (20ms AFTER two txops are in the hardware) currently > achieve (50ms, which is 10x worse than what we could do and still > better performance under load than any other shipping physical layer > we have with fifos)... and after thinking hard about nagle's thought > that "every application has a right to one packet in the network", and > this very long thread reworking the end to end argument in a similar, > but not quite identical direction, I'm coming to a couple conclusions > I'd possibly not quite expressed well before. > > 1) transports should treat an RFC3168 CE coupled with loss (drop and > mark) as an even stronger signal of congestion than either, and that > this bit of the codel algorithm, > when ecn is in use, is wrong, and has always been wrong: > > https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L178 > > (we added this arbitrarily to codel in the 5th day of development in > 2012. Using FQ masked it's effects on light traffic) > > What it should do instead is peek the queue and drop until it hits a > markable packet, at the very least. I didn't say this well. It should drop otherwise markable packets until it exits the loop, and then mark the one it delivers from that flow, if it del= ivers one from that flow. That gets rid of all the extra mass ecn creates... but I should go code it up again and see what happens on wifi. Worst case I prove yet again, that reasoning about the behavior of queues if futile. > > Pie has an arbitrary drop at 10% figure, which does lighten the load > some... cake used to have drop and mark also until a year or two > back... > > 2) At low rates and high contention, we really need pacing and fractional= cwnd. > > (while I would very much like to see a dynamic reduction of MSS tried, > that too has a bottom limit) > > even then, drop as per bullet 1. > > 3) In the end, I could see a world with SCE marks, and CE being > obsoleted in favor of drop, or CE only being exerted on really light > loads similar to (or less than!) what the arbitrary 10% figure for pie > uses > > 4) in all cases, I vastly prefer somehow ultimately shifting greedy > transports to RTT rather than drop or CE as their primary congestion > control indicator. FQ makes that feasible today. With enough FQ > deployed for enough congestive scenarios and hardware, and RTT > becoming the core indicator for more transports, single queued designs > become possible in the distant future. > > > > > > On Wednesday, July 17, 2019 6:18pm, "David P. Reed" said: > > > > > I do want to toss in my personal observations about the "end-to-end a= rgument" > > > related to per-flow-scheduling. (Such arguments are, of course, a cla= ss of > > > arguments to which my name is attached. Not that I am a judge/jury of= such > > > questions...) > > > > > > A core principle of the Internet design is to move function out of th= e network, > > > including routers and middleboxes, if those functions > > > > > > a) can be properly accomplished by the endpoints, and > > > b) are not relevant to all uses of the Internet transport fabric bein= g used by the > > > ends. > > > > > > The rationale here has always seemed obvious to me. Like Bob Briscoe = suggests, we > > > were very wary of throwing features into the network that would precl= ude > > > unanticipated future interoperability needs, new applications, and ne= w technology > > > in the infrastructure of the Internet as a whole. > > > > > > So what are we talking about here (ignoring the fine points of SCE, s= ome of which > > > I think are debatable - especially the focus on TCP alone, since much= traffic will > > > likely move away from TCP in the near future. > > > > > > A second technical requirement (necessary invariant) of the Internet'= s transport > > > is that the entire Internet depends on rigorously stopping queueing d= elay from > > > building up anywhere except at the endpoints, where the ends can mana= ge it.This is > > > absolutely critical, though it is peculiar in that many engineers, es= pecially > > > those who work at the IP layer and below, have a mental model of rout= ing as > > > essentially being about building up queueing delay (in order to manag= e priority in > > > some trivial way by building up the queue on purpose, apparently). > > > > > > This second technical requirement cannot be resolved merely by the en= dpoints. > > > The reason is that the endpoints cannot know accurately what host-hos= t paths share > > > common queues. > > > > > > This lack of a way to "cooperate" among independent users of a queue = cannot be > > > solved by a purely end-to-end solution. (well, I suppose some genius = might invent > > > a way, but I have not seen one in my 36 years closely watching the In= ternet in > > > operation since it went live in 1983.) > > > > > > So, what the end-to-end argument would tend to do here, in my opinion= , is to > > > provide the most minimal mechanism in the devices that are capable of= building up > > > a queue in order to allow all the ends sharing that queue to do their= job - which > > > is to stop filling up the queue! > > > > > > Only the endpoints can prevent filling up queues. And depending on th= e protocol, > > > they may need to make very different, yet compatible choices. > > > > > > This is a question of design at the architectural level. And the futu= re matters. > > > > > > So there is an end-to-end argument to be made here, but it is a subtl= e one. > > > > > > The basic mechanism for controlling queue depth has been, and remains= , quite > > > simple: dropping packets. This has two impacts: 1) immediately reduci= ng queueing > > > delay, and 2) signalling to endpoints that are paying attention that = they have > > > contributed to an overfull queue. > > > > > > The optimum queueing delay in a steady state would always be one pack= et or less. > > > Kleinrock has shown this in the last few years. Of course there aren'= t steady > > > states. But we don't want a mechanism that can't converge to that ste= ady state > > > *quickly*, for all queues in the network. > > > > > > Another issue is that endpoints are not aware of the fact that packet= s can take > > > multiple paths to any destination. In the future, alternate path choi= ces can be > > > made by routers (when we get smarter routing algorithms based on traf= fic > > > engineering). > > > > > > So again, some minimal kind of information must be exposed to endpoin= ts that will > > > continue to communicate. Again, the routers must be able to help a wi= de variety of > > > endpoints with different use cases to decide how to move queue buildu= p out of the > > > network itself. > > > > > > Now the decision made by the endpoints must be made in the context of= information > > > about fairness. Maybe this is what is not obvious. > > > > > > The most obvious notion of fairness is equal shares among source host= , dest host > > > pairs. There are drawbacks to that, but the benefit of it is that it = affects the > > > IP layer alone, and deals with lots of boundary cases like the case w= here a single > > > host opens a zillion TCP connections or uses lots of UDP source ports= or > > > destinations to somehow "cheat" by appearing to have "lots of flows". > > > > > > Another way to deal with dividing up flows is to ignore higher level = protocol > > > information entirely, and put the flow idenfitication in the IP layer= . A 32-bit or > > > 64-bit random number could be added as an "option" to IP to somehow e= xtend the > > > flow space. > > > > > > But that is not the most important thing today. > > > > > > I write this to say: > > > 1) some kind of per-flow queueing, during the transient state where a= queue is > > > overloaded before packets are dropped would provide much needed infor= mation to the > > > ends of every flow sharing a common queue. > > > 2) per-flow queueing, minimized to a very low level, using IP envelop= e address > > > information (plus maybe UDP and TCP addresses for those protocols in = an extended > > > address-based flow definition) is totally compatible with end-to-end = arguments, > > > but ONLY if the decisions made are certain to drive queueing delay ou= t of the > > > router to the endpoints. > > > > > > > > > > > > > > > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" said: > > > > > >> Dear Bob, dear IETF team, > > >> > > >> > > >>> On Jun 19, 2019, at 16:12, Bob Briscoe wrote: > > >>> > > >>> Jake, all, > > >>> > > >>> You may not be aware of my long history of concern about how per-fl= ow scheduling > > >>> within endpoints and networks will limit the Internet in future. I = find per-flow > > >>> scheduling a violation of the e2e principle in such a profound way = - the dynamic > > >>> choice of the spacing between packets - that most people don't even= associate it > > >>> with the e2e principle. > > >> > > >> This does not rhyme well with the L4S stated advantage of allow= ing packet > > >> reordering (due to mandating RACK for all L4S tcp endpoints). Becaus= e surely > > >> changing the order of packets messes up the "the dynamic choice of t= he spacing > > >> between packets" in a significant way. IMHO it is either L4S is grea= t because it > > >> will give intermediate hops more leeway to re-order packets, or "a s= ender's > > >> packet spacing" is sacred, please make up your mind which it is. > > >> > > >>> > > >>> I detected that you were talking about FQ in a way that might have = assumed my > > >>> concern with it was just about implementation complexity. If you (o= r anyone > > >>> watching) is not aware of the architectural concerns with per-flow = scheduling, I > > >>> can enumerate them. > > >> > > >> Please do not hesitate to do so after your deserved holiday, an= d please state a > > >> superior alternative. > > >> > > >> Best Regards > > >> Sebastian > > >> > > >> > > >>> > > >>> I originally started working on what became L4S to prove that it wa= s possible to > > >>> separate out reducing queuing delay from throughput scheduling. Whe= n Koen and I > > >>> started working together on this, we discovered we had identical co= ncerns on > > >>> this. > > >>> > > >>> > > >>> > > >>> Bob > > >>> > > >>> > > >>> -- > > >>> ________________________________________________________________ > > >>> Bob Briscoe http://bobbriscoe.net/ > > >>> > > >>> _______________________________________________ > > >>> Ecn-sane mailing list > > >>> Ecn-sane@lists.bufferbloat.net > > >>> https://lists.bufferbloat.net/listinfo/ecn-sane > > >> > > >> _______________________________________________ > > >> Ecn-sane mailing list > > >> Ecn-sane@lists.bufferbloat.net > > >> https://lists.bufferbloat.net/listinfo/ecn-sane > > >> > > > > > > > > > _______________________________________________ > > > Ecn-sane mailing list > > > Ecn-sane@lists.bufferbloat.net > > > https://lists.bufferbloat.net/listinfo/ecn-sane > > > > > > > > > _______________________________________________ > > Ecn-sane mailing list > > Ecn-sane@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/ecn-sane > > > > -- > > Dave T=C3=A4ht > CTO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-831-205-9740 --=20 Dave T=C3=A4ht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740