From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com
 [IPv6:2607:f8b0:4864:20::d36])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id D53B63CB3A
 for <ecn-sane@lists.bufferbloat.net>; Wed, 17 Jul 2019 19:23:40 -0400 (EDT)
Received: by mail-io1-xd36.google.com with SMTP id e20so18179951iob.9
 for <ecn-sane@lists.bufferbloat.net>; Wed, 17 Jul 2019 16:23:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=lNLVxN7guFPV7jdV/n5ko+MVGfSvXphl3wDUkonXzEA=;
 b=gKX3+gqQ04gtXs4715L1MIhpaASAjRNjB3aDVNCxm3ru205USsKlToom/YxIIY5KqV
 HhrX2wqv70xRbNyy18ZSu4kiKYGd2Ax1mmpFoNSHh5J2aBMLedVLxjfM2mDuQIijLTsZ
 Sb7Vxy5+puXtW7dkGLr+unhCQuOHoP2+oOU8HMjkTG4f4Ew+qYs/5tcGPDzy7dBeRqtQ
 gPqWgkpfT5PJYgbD7pef7FsHjevr5lekUlse/BR1Jd9J0rDlu0jARyqSZAYj4P51szrC
 XctGqcLaBxE11SRodxXH0L7wZhr7l3jXOF3qssjiOYdWYdFU/axYQ+JF1hqbauRG4ZHM
 Gy1g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=lNLVxN7guFPV7jdV/n5ko+MVGfSvXphl3wDUkonXzEA=;
 b=aeGPx5ch39q31lk9dEyFfzvORQ/TJ9dhVJGJZhCqtiY+T2GomlGnZHAIZbtUljVdea
 taBi3s2MpQ+ZaTosppGezRs8MxWjxTleO0CVsr2cIplF9jHBgfzCt0smTDEOlo8j30PS
 cuCkl+DeP5XSYJr9pvsPQXCfG8jfrOxSTfKfmFJ2kxjQQVpDAzWhESmj2OKuzoSkaEwz
 BAesGU0GeTdldJOefJCNQQBpLwqY8HqlodJSTbzEP76W+Ga0LcALEzj6nl40v8t+2YOT
 gHuV4VCgMVjlBxjH0eNS3tOKXlqEJsLNGFGYpBd8B3drdgbn92MMq/+f5BG96pDnEwNE
 kIvQ==
X-Gm-Message-State: APjAAAU8jgRqHEEWkHp2ByHAqBRPPAoZLdU0y2J+bfTJoi8ytvZ52CTT
 w0tGB9Aqsx4f2tVgTPa3GI5pMMlDDrTGYq9ktsbt4Ai4
X-Google-Smtp-Source: APXvYqwuMe7/4aD9acjrX5Vt0dCNptCy/Tr2cbvoE9cDwLCXbZmUyF93mFcQn9sbVuOOqPsnkp79ZaCT0Y1bZMdlBtQ=
X-Received: by 2002:a5d:9d42:: with SMTP id k2mr36640420iok.45.1563405820112; 
 Wed, 17 Jul 2019 16:23:40 -0700 (PDT)
MIME-Version: 1.0
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net>
 <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de>
 <1563401917.00951412@apps.rackspace.com>
 <1563402855.88484511@apps.rackspace.com>
In-Reply-To: <1563402855.88484511@apps.rackspace.com>
From: Dave Taht <dave.taht@gmail.com>
Date: Wed, 17 Jul 2019 16:23:26 -0700
Message-ID: <CAA93jw564FZSJOtP9BBpCiJEKhPxKACxSMBg5gpfuHfr==TzVQ@mail.gmail.com>
To: "David P. Reed" <dpreed@deepplum.com>
Cc: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 Bob Briscoe <ietf@bobbriscoe.net>, tsvwg IETF list <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Ecn-sane] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Wed, 17 Jul 2019 23:23:41 -0000

On Wed, Jul 17, 2019 at 3:34 PM David P. Reed <dpreed@deepplum.com> wrote:
>
> A follow up point that I think needs to be made is one more end-to-end ar=
gument:
>
> It is NOT the job of the IP transport layer to provide free storage for l=
ow priority packets. The end-to-end argument here says: the ends can and mu=
st hold packets until they are either delivered or not relevant (in RTP, th=
ey become irrelevant when they get older than their desired delivery time, =
if you want an example of the latter), SO, the network should not provide t=
he function of storage beyond the minimum needed to deal with transients.
>
> That means, unfortunately, that the dream of some kind of "background" pa=
th that stores "low priority" packets in the network fails the end-to-end a=
rgument test.

I do not mind reserving a tiny portion of the network for "background"
traffic. This
is different (I think?) than storing low priority packets in the
network. A background
traffic "queue" of 1 packet would be fine....

> If you think about this, it even applies to some imaginary interplanetary=
 IP layer network. Queueing delay is not a feature of any end-to-end requir=
ement.
>
> What may be desired at the router/link level in an interplanetary IP laye=
r is holding packets because a link is actually down, or using link-level e=
rror correction coding or retransmission to bring the error rate down to an=
 acceptable level before declaring it down. But that's quite different - it=
's the link level protocol, which aims to deliver minimum queueing delay un=
der tough conditions, without buffering more than needed for that (the numb=
er of bits that fit in the light-speed transmission at the transmission rat=
e.

As I outlined in my mit wifi talk - 1 layer of retry of at the wifi
mac layer made it
work, in 1998, and that seemed a very acceptable compromise at the
time. Present day
retries at the layer, not congestion controlled, is totally out of hand.

In thinking about starlink's mac, and mobility, I gradulally came to
the conclusion that
1 retry from satellites 550km up (3.6ms rtt) was needed, as much as I
disliked the idea.

I still dislike retries at layer 2, even for nearby sats. really
complicates things. so for all I know I'll be advocating ripping 'em
out in starlink, if they are indeed, in there, next week.

> So, the main reason I'm saying this is because again, there are those who=
 want to implement the TCP function of reliable delivery of each packet in =
the links. That's a very bad idea.

It was tried in the arpanet, and didn't work well there. There's a
good story about many
of the flaws of the Arpanet's design, including that problem, in the
latter half of Kleinrock's second book on queue theory, at least the
first edition...

Wifi (and 345g) re-introduced the same problem with retransmits and
block acks at layer 2.

and after dissecting my ecn battlemesh data and observing what the
retries at the mac layer STILL do on wifi with the current default
wifi codel target (20ms AFTER two txops are in the hardware) currently
achieve (50ms, which is 10x worse than what we could do and still
better performance under load than any other shipping physical layer
we have with fifos)... and after thinking hard about nagle's thought
that "every application has a right to one packet in the network", and
this very long thread reworking the end to end argument in a similar,
but not quite identical direction, I'm coming to a couple conclusions
I'd possibly not quite expressed well before.

1) transports should treat an RFC3168 CE coupled with loss (drop and
mark) as an even stronger signal of congestion than either, and that
this bit of the codel algorithm,
when ecn is in use, is wrong, and has always been wrong:

https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L178

(we added this arbitrarily to codel in the 5th day of development in
2012. Using FQ masked it's effects on light traffic)

What it should do instead is peek the queue and drop until it hits a
markable packet, at the very least.

Pie has an arbitrary drop at 10% figure, which does lighten the load
some... cake used to have drop and mark also until a year or two
back...

2) At low rates and high contention, we really need pacing and fractional c=
wnd.

(while I would very much like to see a dynamic reduction of MSS tried,
that too has a bottom limit)

even then, drop as per bullet 1.

3) In the end, I could see a world with SCE marks, and CE being
obsoleted in favor of drop, or CE only being exerted on really light
loads similar to (or less than!) what the arbitrary 10% figure for pie
uses

4) in all cases, I vastly prefer somehow ultimately shifting greedy
transports to RTT rather than drop or CE as their primary congestion
control indicator. FQ makes that feasible today. With enough FQ
deployed for enough congestive scenarios and hardware, and RTT
becoming the core indicator for more transports, single queued designs
become possible in the distant future.


>
> On Wednesday, July 17, 2019 6:18pm, "David P. Reed" <dpreed@deepplum.com>=
 said:
>
> > I do want to toss in my personal observations about the "end-to-end arg=
ument"
> > related to per-flow-scheduling. (Such arguments are, of course, a class=
 of
> > arguments to which my name is attached. Not that I am a judge/jury of s=
uch
> > questions...)
> >
> > A core principle of the Internet design is to move function out of the =
network,
> > including routers and middleboxes, if those functions
> >
> > a) can be properly accomplished by the endpoints, and
> > b) are not relevant to all uses of the Internet transport fabric being =
used by the
> > ends.
> >
> > The rationale here has always seemed obvious to me. Like Bob Briscoe su=
ggests, we
> > were very wary of throwing features into the network that would preclud=
e
> > unanticipated future interoperability needs, new applications, and new =
technology
> > in the infrastructure of the Internet as a whole.
> >
> > So what are we talking about here (ignoring the fine points of SCE, som=
e of which
> > I think are debatable - especially the focus on TCP alone, since much t=
raffic will
> > likely move away from TCP in the near future.
> >
> > A second technical requirement (necessary invariant) of the Internet's =
transport
> > is that the entire Internet depends on rigorously stopping queueing del=
ay from
> > building up anywhere except at the endpoints, where the ends can manage=
 it.This is
> > absolutely critical, though it is peculiar in that many engineers, espe=
cially
> > those who work at the IP layer and below, have a mental model of routin=
g as
> > essentially being about building up queueing delay (in order to manage =
priority in
> > some trivial way by building up the queue on purpose, apparently).
> >
> > This second technical requirement cannot be resolved merely by the endp=
oints.
> > The reason is that the endpoints cannot know accurately what host-host =
paths share
> > common queues.
> >
> > This lack of a way to "cooperate" among independent users of a queue ca=
nnot be
> > solved by a purely end-to-end solution. (well, I suppose some genius mi=
ght invent
> > a way, but I have not seen one in my 36 years closely watching the Inte=
rnet in
> > operation since it went live in 1983.)
> >
> > So, what the end-to-end argument would tend to do here, in my opinion, =
is to
> > provide the most minimal mechanism in the devices that are capable of b=
uilding up
> > a queue in order to allow all the ends sharing that queue to do their j=
ob - which
> > is to stop filling up the queue!
> >
> > Only the endpoints can prevent filling up queues. And depending on the =
protocol,
> > they may need to make very different, yet compatible choices.
> >
> > This is a question of design at the architectural level. And the future=
 matters.
> >
> > So there is an end-to-end argument to be made here, but it is a subtle =
one.
> >
> > The basic mechanism for controlling queue depth has been, and remains, =
quite
> > simple: dropping packets. This has two impacts: 1) immediately reducing=
 queueing
> > delay, and 2) signalling to endpoints that are paying attention that th=
ey have
> > contributed to an overfull queue.
> >
> > The optimum queueing delay in a steady state would always be one packet=
 or less.
> > Kleinrock has shown this in the last few years. Of course there aren't =
steady
> > states. But we don't want a mechanism that can't converge to that stead=
y state
> > *quickly*, for all queues in the network.
> >
> > Another issue is that endpoints are not aware of the fact that packets =
can take
> > multiple paths to any destination. In the future, alternate path choice=
s can be
> > made by routers (when we get smarter routing algorithms based on traffi=
c
> > engineering).
> >
> > So again, some minimal kind of information must be exposed to endpoints=
 that will
> > continue to communicate. Again, the routers must be able to help a wide=
 variety of
> > endpoints with different use cases to decide how to move queue buildup =
out of the
> > network itself.
> >
> > Now the decision made by the endpoints must be made in the context of i=
nformation
> > about fairness. Maybe this is what is not obvious.
> >
> > The most obvious notion of fairness is equal shares among source host, =
dest host
> > pairs. There are drawbacks to that, but the benefit of it is that it af=
fects the
> > IP layer alone, and deals with lots of boundary cases like the case whe=
re a single
> > host opens a zillion TCP connections or uses lots of UDP source ports o=
r
> > destinations to somehow "cheat" by appearing to have "lots of flows".
> >
> > Another way to deal with dividing up flows is to ignore higher level pr=
otocol
> > information entirely, and put the flow idenfitication in the IP layer. =
A 32-bit or
> > 64-bit random number could be added as an "option" to IP to somehow ext=
end the
> > flow space.
> >
> > But that is not the most important thing today.
> >
> > I write this to say:
> > 1) some kind of per-flow queueing, during the transient state where a q=
ueue is
> > overloaded before packets are dropped would provide much needed informa=
tion to the
> > ends of every flow sharing a common queue.
> > 2) per-flow queueing, minimized to a very low level, using IP envelope =
address
> > information (plus maybe UDP and TCP addresses for those protocols in an=
 extended
> > address-based flow definition) is totally compatible with end-to-end ar=
guments,
> > but ONLY if the decisions made are certain to drive queueing delay out =
of the
> > router to the endpoints.
> >
> >
> >
> >
> > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <moeller0@gmx.d=
e> said:
> >
> >> Dear Bob, dear IETF team,
> >>
> >>
> >>> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> >>>
> >>> Jake, all,
> >>>
> >>> You may not be aware of my long history of concern about how per-flow=
 scheduling
> >>> within endpoints and networks will limit the Internet in future. I fi=
nd per-flow
> >>> scheduling a violation of the e2e principle in such a profound way - =
the dynamic
> >>> choice of the spacing between packets - that most people don't even a=
ssociate it
> >>> with the e2e principle.
> >>
> >>      This does not rhyme well with the L4S stated advantage of allowin=
g packet
> >> reordering (due to mandating RACK for all L4S tcp endpoints). Because =
surely
> >> changing the order of packets messes up the "the dynamic choice of the=
 spacing
> >> between packets" in a significant way. IMHO it is either L4S is great =
because it
> >> will give intermediate hops more leeway to re-order packets, or "a sen=
der's
> >> packet spacing" is sacred, please make up your mind which it is.
> >>
> >>>
> >>> I detected that you were talking about FQ in a way that might have as=
sumed my
> >>> concern with it was just about implementation complexity. If you (or =
anyone
> >>> watching) is not aware of the architectural concerns with per-flow sc=
heduling, I
> >>> can enumerate them.
> >>
> >>      Please do not hesitate to do so after your deserved holiday, and =
please state a
> >> superior alternative.
> >>
> >> Best Regards
> >>      Sebastian
> >>
> >>
> >>>
> >>> I originally started working on what became L4S to prove that it was =
possible to
> >>> separate out reducing queuing delay from throughput scheduling. When =
Koen and I
> >>> started working together on this, we discovered we had identical conc=
erns on
> >>> this.
> >>>
> >>>
> >>>
> >>> Bob
> >>>
> >>>
> >>> --
> >>> ________________________________________________________________
> >>> Bob Briscoe                               http://bobbriscoe.net/
> >>>
> >>> _______________________________________________
> >>> Ecn-sane mailing list
> >>> Ecn-sane@lists.bufferbloat.net
> >>> https://lists.bufferbloat.net/listinfo/ecn-sane
> >>
> >> _______________________________________________
> >> Ecn-sane mailing list
> >> Ecn-sane@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/ecn-sane
> >>
> >
> >
> > _______________________________________________
> > Ecn-sane mailing list
> > Ecn-sane@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/ecn-sane
> >
>
>
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane


--=20

Dave T=C3=A4ht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740