From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-io1-xd41.google.com (mail-io1-xd41.google.com
 [IPv6:2607:f8b0:4864:20::d41])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id CCC173CB3A
 for <ecn-sane@lists.bufferbloat.net>; Wed, 17 Jul 2019 20:21:12 -0400 (EDT)
Received: by mail-io1-xd41.google.com with SMTP id q22so48686375iog.4
 for <ecn-sane@lists.bufferbloat.net>; Wed, 17 Jul 2019 17:21:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=/PGaKPUCrOdlDRkrlhZPqGtGhOaOVzGHF56dfTxra7g=;
 b=K8/TsLMTViCicXr+AVAT7l2yHsIhvzzUFx/keiGwIL7dXE7pDJQZXBlSh4+RlcJGiS
 okfDyAGVjNXR9iz4XNxWV87e3999dIMrtfGw68aFmJIuc6DQDd2nn0PtDVTwYBGDLCV6
 nd/hKUD7QEl3v58uLGyVoIZi1WAmmLMoX6w0Fr0q7Ow8jcJ3QIpbe8VMAw7F5ib5HjZx
 I7Z2BQszKz3clf3O3oWkQZzDtm6Vpr9Grqi0uKEeBsfQ9oN+zLPAaKC5U8HIUjpcy2jp
 1jB9tKfDZl/MT1ATCZ86s080nCYeu6VXq0kg6ySXpzKM5YzSVKBGauzyJ71cnYkdhOMF
 iFtw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=/PGaKPUCrOdlDRkrlhZPqGtGhOaOVzGHF56dfTxra7g=;
 b=Ng9ObPrVwY9oUdX8dSpnMTK80mvNX0RBoucKijUGcZ6e1orLSOlE03D80gjmw+TLPv
 O+wwl3FZ3ipc13td7tNCinR8og1z23/o9a3CjuaE4EWeFAVgdijClm9SfYEErK8UDdHY
 zJrTzE+VGuW84odLVUqpxvZcRz7mdtv3mAkN8XgI07uevLzuCM5h2zyIgQ1HfXmj558O
 S5mo6kFPRqh06W0BAmhbJlkf0qnTaPun5kxIGDjOJefPlLk8e0uK2pJrCf+5UmPd25el
 Up/mmpg0GMLBVV0T7JhPXNTqc/b7JNSJPR6fsICD3+et9geyU6arQEsIQsY9PnPGlan5
 RGGw==
X-Gm-Message-State: APjAAAVqONWW7y79FXKwnMaQKmkwskn8NHNP3hglYNDDSz8Mzwbw1u2C
 cYdU/sefK4xWH59pSD4XMnKfPSyg/oCW1twQ0M+2+Q==
X-Google-Smtp-Source: APXvYqzlOhL5LW8k56/Y4t2TqUF/+A6ypuUNkmv1xxZxIjrVxTHX95e89lYVwWtozSa0P+EwU5fTJ9DrmMITt/Emalw=
X-Received: by 2002:a02:9f07:: with SMTP id z7mr44811307jal.29.1563409272098; 
 Wed, 17 Jul 2019 17:21:12 -0700 (PDT)
MIME-Version: 1.0
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net>
 <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de>
 <1563401917.00951412@apps.rackspace.com>
 <1563402855.88484511@apps.rackspace.com>
 <CAA93jw564FZSJOtP9BBpCiJEKhPxKACxSMBg5gpfuHfr==TzVQ@mail.gmail.com>
In-Reply-To: <CAA93jw564FZSJOtP9BBpCiJEKhPxKACxSMBg5gpfuHfr==TzVQ@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
Date: Wed, 17 Jul 2019 17:20:59 -0700
Message-ID: <CAA93jw7sfreeHrRy==cG7zWn2xo=xVAYn0mSzaNhOZ5BwXQhug@mail.gmail.com>
To: "David P. Reed" <dpreed@deepplum.com>
Cc: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 Bob Briscoe <ietf@bobbriscoe.net>, tsvwg IETF list <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Ecn-sane] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2019 00:21:12 -0000

On Wed, Jul 17, 2019 at 4:23 PM Dave Taht <dave.taht@gmail.com> wrote:
>
> On Wed, Jul 17, 2019 at 3:34 PM David P. Reed <dpreed@deepplum.com> wrote=
:
> >
> > A follow up point that I think needs to be made is one more end-to-end =
argument:
> >
> > It is NOT the job of the IP transport layer to provide free storage for=
 low priority packets. The end-to-end argument here says: the ends can and =
must hold packets until they are either delivered or not relevant (in RTP, =
they become irrelevant when they get older than their desired delivery time=
, if you want an example of the latter), SO, the network should not provide=
 the function of storage beyond the minimum needed to deal with transients.
> >
> > That means, unfortunately, that the dream of some kind of "background" =
path that stores "low priority" packets in the network fails the end-to-end=
 argument test.
>
> I do not mind reserving a tiny portion of the network for "background"
> traffic. This
> is different (I think?) than storing low priority packets in the
> network. A background
> traffic "queue" of 1 packet would be fine....
>
> > If you think about this, it even applies to some imaginary interplaneta=
ry IP layer network. Queueing delay is not a feature of any end-to-end requ=
irement.
> >
> > What may be desired at the router/link level in an interplanetary IP la=
yer is holding packets because a link is actually down, or using link-level=
 error correction coding or retransmission to bring the error rate down to =
an acceptable level before declaring it down. But that's quite different - =
it's the link level protocol, which aims to deliver minimum queueing delay =
under tough conditions, without buffering more than needed for that (the nu=
mber of bits that fit in the light-speed transmission at the transmission r=
ate.
>
> As I outlined in my mit wifi talk - 1 layer of retry of at the wifi
> mac layer made it
> work, in 1998, and that seemed a very acceptable compromise at the
> time. Present day
> retries at the layer, not congestion controlled, is totally out of hand.
>
> In thinking about starlink's mac, and mobility, I gradulally came to
> the conclusion that
> 1 retry from satellites 550km up (3.6ms rtt) was needed, as much as I
> disliked the idea.
>
> I still dislike retries at layer 2, even for nearby sats. really
> complicates things. so for all I know I'll be advocating ripping 'em
> out in starlink, if they are indeed, in there, next week.
>
> > So, the main reason I'm saying this is because again, there are those w=
ho want to implement the TCP function of reliable delivery of each packet i=
n the links. That's a very bad idea.
>
> It was tried in the arpanet, and didn't work well there. There's a
> good story about many
> of the flaws of the Arpanet's design, including that problem, in the
> latter half of Kleinrock's second book on queue theory, at least the
> first edition...
>
> Wifi (and 345g) re-introduced the same problem with retransmits and
> block acks at layer 2.
>
> and after dissecting my ecn battlemesh data and observing what the
> retries at the mac layer STILL do on wifi with the current default
> wifi codel target (20ms AFTER two txops are in the hardware) currently
> achieve (50ms, which is 10x worse than what we could do and still
> better performance under load than any other shipping physical layer
> we have with fifos)... and after thinking hard about nagle's thought
> that "every application has a right to one packet in the network", and
> this very long thread reworking the end to end argument in a similar,
> but not quite identical direction, I'm coming to a couple conclusions
> I'd possibly not quite expressed well before.
>
> 1) transports should treat an RFC3168 CE coupled with loss (drop and
> mark) as an even stronger signal of congestion than either, and that
> this bit of the codel algorithm,
> when ecn is in use, is wrong, and has always been wrong:
>
> https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L178
>
> (we added this arbitrarily to codel in the 5th day of development in
> 2012. Using FQ masked it's effects on light traffic)
>
> What it should do instead is peek the queue and drop until it hits a
> markable packet, at the very least.

I didn't say this well. It should drop otherwise markable packets until it
exits the loop, and then mark the one it delivers from that flow, if it del=
ivers
one from that flow. That gets rid of all the extra mass ecn creates...

but I should go code it up again and see what happens on wifi. Worst case I
prove yet again, that reasoning about the behavior of queues if futile.
>
> Pie has an arbitrary drop at 10% figure, which does lighten the load
> some... cake used to have drop and mark also until a year or two
> back...
>
> 2) At low rates and high contention, we really need pacing and fractional=
 cwnd.
>
> (while I would very much like to see a dynamic reduction of MSS tried,
> that too has a bottom limit)
>
> even then, drop as per bullet 1.
>
> 3) In the end, I could see a world with SCE marks, and CE being
> obsoleted in favor of drop, or CE only being exerted on really light
> loads similar to (or less than!) what the arbitrary 10% figure for pie
> uses
>
> 4) in all cases, I vastly prefer somehow ultimately shifting greedy
> transports to RTT rather than drop or CE as their primary congestion
> control indicator. FQ makes that feasible today. With enough FQ
> deployed for enough congestive scenarios and hardware, and RTT
> becoming the core indicator for more transports, single queued designs
> become possible in the distant future.
>
>
> >
> > On Wednesday, July 17, 2019 6:18pm, "David P. Reed" <dpreed@deepplum.co=
m> said:
> >
> > > I do want to toss in my personal observations about the "end-to-end a=
rgument"
> > > related to per-flow-scheduling. (Such arguments are, of course, a cla=
ss of
> > > arguments to which my name is attached. Not that I am a judge/jury of=
 such
> > > questions...)
> > >
> > > A core principle of the Internet design is to move function out of th=
e network,
> > > including routers and middleboxes, if those functions
> > >
> > > a) can be properly accomplished by the endpoints, and
> > > b) are not relevant to all uses of the Internet transport fabric bein=
g used by the
> > > ends.
> > >
> > > The rationale here has always seemed obvious to me. Like Bob Briscoe =
suggests, we
> > > were very wary of throwing features into the network that would precl=
ude
> > > unanticipated future interoperability needs, new applications, and ne=
w technology
> > > in the infrastructure of the Internet as a whole.
> > >
> > > So what are we talking about here (ignoring the fine points of SCE, s=
ome of which
> > > I think are debatable - especially the focus on TCP alone, since much=
 traffic will
> > > likely move away from TCP in the near future.
> > >
> > > A second technical requirement (necessary invariant) of the Internet'=
s transport
> > > is that the entire Internet depends on rigorously stopping queueing d=
elay from
> > > building up anywhere except at the endpoints, where the ends can mana=
ge it.This is
> > > absolutely critical, though it is peculiar in that many engineers, es=
pecially
> > > those who work at the IP layer and below, have a mental model of rout=
ing as
> > > essentially being about building up queueing delay (in order to manag=
e priority in
> > > some trivial way by building up the queue on purpose, apparently).
> > >
> > > This second technical requirement cannot be resolved merely by the en=
dpoints.
> > > The reason is that the endpoints cannot know accurately what host-hos=
t paths share
> > > common queues.
> > >
> > > This lack of a way to "cooperate" among independent users of a queue =
cannot be
> > > solved by a purely end-to-end solution. (well, I suppose some genius =
might invent
> > > a way, but I have not seen one in my 36 years closely watching the In=
ternet in
> > > operation since it went live in 1983.)
> > >
> > > So, what the end-to-end argument would tend to do here, in my opinion=
, is to
> > > provide the most minimal mechanism in the devices that are capable of=
 building up
> > > a queue in order to allow all the ends sharing that queue to do their=
 job - which
> > > is to stop filling up the queue!
> > >
> > > Only the endpoints can prevent filling up queues. And depending on th=
e protocol,
> > > they may need to make very different, yet compatible choices.
> > >
> > > This is a question of design at the architectural level. And the futu=
re matters.
> > >
> > > So there is an end-to-end argument to be made here, but it is a subtl=
e one.
> > >
> > > The basic mechanism for controlling queue depth has been, and remains=
, quite
> > > simple: dropping packets. This has two impacts: 1) immediately reduci=
ng queueing
> > > delay, and 2) signalling to endpoints that are paying attention that =
they have
> > > contributed to an overfull queue.
> > >
> > > The optimum queueing delay in a steady state would always be one pack=
et or less.
> > > Kleinrock has shown this in the last few years. Of course there aren'=
t steady
> > > states. But we don't want a mechanism that can't converge to that ste=
ady state
> > > *quickly*, for all queues in the network.
> > >
> > > Another issue is that endpoints are not aware of the fact that packet=
s can take
> > > multiple paths to any destination. In the future, alternate path choi=
ces can be
> > > made by routers (when we get smarter routing algorithms based on traf=
fic
> > > engineering).
> > >
> > > So again, some minimal kind of information must be exposed to endpoin=
ts that will
> > > continue to communicate. Again, the routers must be able to help a wi=
de variety of
> > > endpoints with different use cases to decide how to move queue buildu=
p out of the
> > > network itself.
> > >
> > > Now the decision made by the endpoints must be made in the context of=
 information
> > > about fairness. Maybe this is what is not obvious.
> > >
> > > The most obvious notion of fairness is equal shares among source host=
, dest host
> > > pairs. There are drawbacks to that, but the benefit of it is that it =
affects the
> > > IP layer alone, and deals with lots of boundary cases like the case w=
here a single
> > > host opens a zillion TCP connections or uses lots of UDP source ports=
 or
> > > destinations to somehow "cheat" by appearing to have "lots of flows".
> > >
> > > Another way to deal with dividing up flows is to ignore higher level =
protocol
> > > information entirely, and put the flow idenfitication in the IP layer=
. A 32-bit or
> > > 64-bit random number could be added as an "option" to IP to somehow e=
xtend the
> > > flow space.
> > >
> > > But that is not the most important thing today.
> > >
> > > I write this to say:
> > > 1) some kind of per-flow queueing, during the transient state where a=
 queue is
> > > overloaded before packets are dropped would provide much needed infor=
mation to the
> > > ends of every flow sharing a common queue.
> > > 2) per-flow queueing, minimized to a very low level, using IP envelop=
e address
> > > information (plus maybe UDP and TCP addresses for those protocols in =
an extended
> > > address-based flow definition) is totally compatible with end-to-end =
arguments,
> > > but ONLY if the decisions made are certain to drive queueing delay ou=
t of the
> > > router to the endpoints.
> > >
> > >
> > >
> > >
> > > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <moeller0@gmx=
.de> said:
> > >
> > >> Dear Bob, dear IETF team,
> > >>
> > >>
> > >>> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> > >>>
> > >>> Jake, all,
> > >>>
> > >>> You may not be aware of my long history of concern about how per-fl=
ow scheduling
> > >>> within endpoints and networks will limit the Internet in future. I =
find per-flow
> > >>> scheduling a violation of the e2e principle in such a profound way =
- the dynamic
> > >>> choice of the spacing between packets - that most people don't even=
 associate it
> > >>> with the e2e principle.
> > >>
> > >>      This does not rhyme well with the L4S stated advantage of allow=
ing packet
> > >> reordering (due to mandating RACK for all L4S tcp endpoints). Becaus=
e surely
> > >> changing the order of packets messes up the "the dynamic choice of t=
he spacing
> > >> between packets" in a significant way. IMHO it is either L4S is grea=
t because it
> > >> will give intermediate hops more leeway to re-order packets, or "a s=
ender's
> > >> packet spacing" is sacred, please make up your mind which it is.
> > >>
> > >>>
> > >>> I detected that you were talking about FQ in a way that might have =
assumed my
> > >>> concern with it was just about implementation complexity. If you (o=
r anyone
> > >>> watching) is not aware of the architectural concerns with per-flow =
scheduling, I
> > >>> can enumerate them.
> > >>
> > >>      Please do not hesitate to do so after your deserved holiday, an=
d please state a
> > >> superior alternative.
> > >>
> > >> Best Regards
> > >>      Sebastian
> > >>
> > >>
> > >>>
> > >>> I originally started working on what became L4S to prove that it wa=
s possible to
> > >>> separate out reducing queuing delay from throughput scheduling. Whe=
n Koen and I
> > >>> started working together on this, we discovered we had identical co=
ncerns on
> > >>> this.
> > >>>
> > >>>
> > >>>
> > >>> Bob
> > >>>
> > >>>
> > >>> --
> > >>> ________________________________________________________________
> > >>> Bob Briscoe                               http://bobbriscoe.net/
> > >>>
> > >>> _______________________________________________
> > >>> Ecn-sane mailing list
> > >>> Ecn-sane@lists.bufferbloat.net
> > >>> https://lists.bufferbloat.net/listinfo/ecn-sane
> > >>
> > >> _______________________________________________
> > >> Ecn-sane mailing list
> > >> Ecn-sane@lists.bufferbloat.net
> > >> https://lists.bufferbloat.net/listinfo/ecn-sane
> > >>
> > >
> > >
> > > _______________________________________________
> > > Ecn-sane mailing list
> > > Ecn-sane@lists.bufferbloat.net
> > > https://lists.bufferbloat.net/listinfo/ecn-sane
> > >
> >
> >
> > _______________________________________________
> > Ecn-sane mailing list
> > Ecn-sane@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/ecn-sane
>
>
>
> --
>
> Dave T=C3=A4ht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740


--=20

Dave T=C3=A4ht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740