From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com
 [IPv6:2607:f8b0:4864:20::d35])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id B589B3B2A4
 for <ecn-sane@lists.bufferbloat.net>; Thu, 18 Jul 2019 12:06:37 -0400 (EDT)
Received: by mail-io1-xd35.google.com with SMTP id j6so19986524ioa.5
 for <ecn-sane@lists.bufferbloat.net>; Thu, 18 Jul 2019 09:06:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc:content-transfer-encoding;
 bh=L6hzNy/h8tTjqYO4wvrePrmml3lf7neKC+ZaYdWl/7g=;
 b=litjec5hvsy40rvxaWtXz3FJDQTlt7IWxvzMFnB+e95/LmQtsEPqpkugROfswh80fV
 knO+pk9s68lBS8COdC/fWu02gJ428uvvSO1L6UYLWNQCK2HZgLFVQ91q6hLkUaPJeV+6
 sgYpmiz1a2QS08R+9D8Az+HdQBVik3CyaQh+khzag6ef/wUCrNil635IVhIlQ6Ru3WAi
 30IDIb2aMGffYVGdPdEEUEIlvsmDaxZhhzUVxeiFEkcE9W0E2y2x89Bf2WDDuFLn1gkl
 Z0Vb9fz0IQ8kDNiBKvcVTpmGt8t2bJc+jxSxMpWODAh+1F47IEukItMqu3ix0ykcWeQr
 ppVA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=L6hzNy/h8tTjqYO4wvrePrmml3lf7neKC+ZaYdWl/7g=;
 b=hTTLOTS+N3PRr+EYcKTWuzEk6FvwzzXRURJSXjQQ7W9yAS+y2CZ1SBSWGBICHsiXIn
 8qT/ZSg2avzfYQSQKRB48RktiVd0Bb7pyptapMhRxGHRYsxN/k9VB2rvCWqnMJFpPP6J
 mPfSL5/pKq87zm9vSrY1MGhSjzVwvtmNBp1n6ZDzn35uYNIxLKWUie5WSPMzc5rNvYc+
 7fsIaL3QxCa1vnaB+1wss98OjoJsQMBbJs9k2bvyie92fYnblQBYvsOitGoptw6Km1db
 s16Iazv5ENR/KOZtWWrE45vJxZUOolN4xY7FF6w/Hv3+E5+xNlJRyzL4q549HlqGnD4J
 zcZQ==
X-Gm-Message-State: APjAAAUDgHo+9erbFmHSafQaK7CtE5RSkKm/PZ6u5czYzWat/j4HbnHj
 Qt59sa1MOXlSPyrZyFtLDylCGVrhnJQhhBaCUvEl7Q==
X-Google-Smtp-Source: APXvYqzh+huAXQCX8EofJAgrmabT4hlkrNp4C+rYsOt/UgvWg8O9qmre92kTBLeNd1JyJHPiA5NhiPnsVFuaYfuC5To=
X-Received: by 2002:a02:c7c9:: with SMTP id s9mr48276909jao.82.1563465996915; 
 Thu, 18 Jul 2019 09:06:36 -0700 (PDT)
MIME-Version: 1.0
References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net>
 <40605F1F-A6F5-4402-9944-238F92926EA6@gmx.de>
 <1563401917.00951412@apps.rackspace.com>
 <1563402855.88484511@apps.rackspace.com>
 <CAA93jw564FZSJOtP9BBpCiJEKhPxKACxSMBg5gpfuHfr==TzVQ@mail.gmail.com>
 <1563462132.13975616@apps.rackspace.com>
In-Reply-To: <1563462132.13975616@apps.rackspace.com>
From: Dave Taht <dave.taht@gmail.com>
Date: Thu, 18 Jul 2019 09:06:22 -0700
Message-ID: <CAA93jw7VJBUkYtReX3B=bB7SUuNJPBEjGP3dAbi_WTL2PEvs0g@mail.gmail.com>
To: "David P. Reed" <dpreed@deepplum.com>
Cc: "ecn-sane@lists.bufferbloat.net" <ecn-sane@lists.bufferbloat.net>,
 Bob Briscoe <ietf@bobbriscoe.net>, tsvwg IETF list <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Ecn-sane] per-flow scheduling
X-BeenThere: ecn-sane@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Discussion of explicit congestion notification's impact on the
 Internet <ecn-sane.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/ecn-sane>
List-Post: <mailto:ecn-sane@lists.bufferbloat.net>
List-Help: <mailto:ecn-sane-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/ecn-sane>,
 <mailto:ecn-sane-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 18 Jul 2019 16:06:37 -0000

On Thu, Jul 18, 2019 at 8:02 AM David P. Reed <dpreed@deepplum.com> wrote:
>
> Dave -
> The context of my remarks was about the end-to-end arguments for placing =
function in the Internet.
>
> To that end, that "you do not mind putting storage for low priority packe=
ts in the routers" doesn't matter, for two important reasons:
>
> 1) the idea that one should "throw in a feature" because people "don't mi=
nd" is exactly what leads to feature creep of the worst kind - features tha=
t serve absolutely no real purpose. That's what we rigorously objected to i=
n the late 1970's. No, we would NOT throw in features as they were "request=
ed" because we didn't mind.

I dig it. :) If only the 5G folk had had your approach.....

>
> 2) you have made no argument that the function cannot be done properly at=
 the ends, and no argument that putting it in the network is necessary for =
the ends to achieve storage.

You are correct.

> On Wednesday, July 17, 2019 7:23pm, "Dave Taht" <dave.taht@gmail.com> sai=
d:
>
> > On Wed, Jul 17, 2019 at 3:34 PM David P. Reed <dpreed@deepplum.com> wro=
te:
> >>
> >> A follow up point that I think needs to be made is one more end-to-end=
 argument:
> >>
> >> It is NOT the job of the IP transport layer to provide free storage fo=
r low
> >> priority packets. The end-to-end argument here says: the ends can and =
must hold
> >> packets until they are either delivered or not relevant (in RTP, they =
become
> >> irrelevant when they get older than their desired delivery time, if yo=
u want an
> >> example of the latter), SO, the network should not provide the functio=
n of
> >> storage beyond the minimum needed to deal with transients.
> >>
> >> That means, unfortunately, that the dream of some kind of "background"=
 path that
> >> stores "low priority" packets in the network fails the end-to-end argu=
ment test.
> >
> > I do not mind reserving a tiny portion of the network for "background"
> > traffic. This
> > is different (I think?) than storing low priority packets in the
> > network. A background
> > traffic "queue" of 1 packet would be fine....
> >
> >> If you think about this, it even applies to some imaginary interplanet=
ary IP
> >> layer network. Queueing delay is not a feature of any end-to-end requi=
rement.
> >>
> >> What may be desired at the router/link level in an interplanetary IP l=
ayer is
> >> holding packets because a link is actually down, or using link-level e=
rror
> >> correction coding or retransmission to bring the error rate down to an=
 acceptable
> >> level before declaring it down. But that's quite different - it's the =
link level
> >> protocol, which aims to deliver minimum queueing delay under tough con=
ditions,
> >> without buffering more than needed for that (the number of bits that f=
it in the
> >> light-speed transmission at the transmission rate.
> >
> > As I outlined in my mit wifi talk - 1 layer of retry of at the wifi
> > mac layer made it
> > work, in 1998, and that seemed a very acceptable compromise at the
> > time. Present day
> > retries at the layer, not congestion controlled, is totally out of hand=
.
> >
> > In thinking about starlink's mac, and mobility, I gradulally came to
> > the conclusion that
> > 1 retry from satellites 550km up (3.6ms rtt) was needed, as much as I
> > disliked the idea.
> >
> > I still dislike retries at layer 2, even for nearby sats. really
> > complicates things. so for all I know I'll be advocating ripping 'em
> > out in starlink, if they are indeed, in there, next week.
> >
> >> So, the main reason I'm saying this is because again, there are those =
who want to
> >> implement the TCP function of reliable delivery of each packet in the =
links.
> >> That's a very bad idea.
> >
> > It was tried in the arpanet, and didn't work well there. There's a
> > good story about many
> > of the flaws of the Arpanet's design, including that problem, in the
> > latter half of Kleinrock's second book on queue theory, at least the
> > first edition...
> >
> > Wifi (and 345g) re-introduced the same problem with retransmits and
> > block acks at layer 2.
> >
> > and after dissecting my ecn battlemesh data and observing what the
> > retries at the mac layer STILL do on wifi with the current default
> > wifi codel target (20ms AFTER two txops are in the hardware) currently
> > achieve (50ms, which is 10x worse than what we could do and still
> > better performance under load than any other shipping physical layer
> > we have with fifos)... and after thinking hard about nagle's thought
> > that "every application has a right to one packet in the network", and
> > this very long thread reworking the end to end argument in a similar,
> > but not quite identical direction, I'm coming to a couple conclusions
> > I'd possibly not quite expressed well before.
> >
> > 1) transports should treat an RFC3168 CE coupled with loss (drop and
> > mark) as an even stronger signal of congestion than either, and that
> > this bit of the codel algorithm,
> > when ecn is in use, is wrong, and has always been wrong:
> >
> > https://github.com/dtaht/fq_codel_fast/blob/master/codel_impl.h#L178
> >
> > (we added this arbitrarily to codel in the 5th day of development in
> > 2012. Using FQ masked it's effects on light traffic)
> >
> > What it should do instead is peek the queue and drop until it hits a
> > markable packet, at the very least.
> >
> > Pie has an arbitrary drop at 10% figure, which does lighten the load
> > some... cake used to have drop and mark also until a year or two
> > back...
> >
> > 2) At low rates and high contention, we really need pacing and fraction=
al cwnd.
> >
> > (while I would very much like to see a dynamic reduction of MSS tried,
> > that too has a bottom limit)
> >
> > even then, drop as per bullet 1.
> >
> > 3) In the end, I could see a world with SCE marks, and CE being
> > obsoleted in favor of drop, or CE only being exerted on really light
> > loads similar to (or less than!) what the arbitrary 10% figure for pie
> > uses
> >
> > 4) in all cases, I vastly prefer somehow ultimately shifting greedy
> > transports to RTT rather than drop or CE as their primary congestion
> > control indicator. FQ makes that feasible today. With enough FQ
> > deployed for enough congestive scenarios and hardware, and RTT
> > becoming the core indicator for more transports, single queued designs
> > become possible in the distant future.
> >
> >
> >>
> >> On Wednesday, July 17, 2019 6:18pm, "David P. Reed" <dpreed@deepplum.c=
om> said:
> >>
> >> > I do want to toss in my personal observations about the "end-to-end =
argument"
> >> > related to per-flow-scheduling. (Such arguments are, of course, a cl=
ass of
> >> > arguments to which my name is attached. Not that I am a judge/jury o=
f such
> >> > questions...)
> >> >
> >> > A core principle of the Internet design is to move function out of t=
he
> >> network,
> >> > including routers and middleboxes, if those functions
> >> >
> >> > a) can be properly accomplished by the endpoints, and
> >> > b) are not relevant to all uses of the Internet transport fabric bei=
ng used by
> >> the
> >> > ends.
> >> >
> >> > The rationale here has always seemed obvious to me. Like Bob Briscoe=
 suggests,
> >> we
> >> > were very wary of throwing features into the network that would prec=
lude
> >> > unanticipated future interoperability needs, new applications, and n=
ew
> >> technology
> >> > in the infrastructure of the Internet as a whole.
> >> >
> >> > So what are we talking about here (ignoring the fine points of SCE, =
some of
> >> which
> >> > I think are debatable - especially the focus on TCP alone, since muc=
h traffic
> >> will
> >> > likely move away from TCP in the near future.
> >> >
> >> > A second technical requirement (necessary invariant) of the Internet=
's
> >> transport
> >> > is that the entire Internet depends on rigorously stopping queueing =
delay from
> >> > building up anywhere except at the endpoints, where the ends can man=
age it.This
> >> is
> >> > absolutely critical, though it is peculiar in that many engineers, e=
specially
> >> > those who work at the IP layer and below, have a mental model of rou=
ting as
> >> > essentially being about building up queueing delay (in order to mana=
ge priority
> >> in
> >> > some trivial way by building up the queue on purpose, apparently).
> >> >
> >> > This second technical requirement cannot be resolved merely by the e=
ndpoints.
> >> > The reason is that the endpoints cannot know accurately what host-ho=
st paths
> >> share
> >> > common queues.
> >> >
> >> > This lack of a way to "cooperate" among independent users of a queue=
 cannot be
> >> > solved by a purely end-to-end solution. (well, I suppose some genius=
 might
> >> invent
> >> > a way, but I have not seen one in my 36 years closely watching the I=
nternet in
> >> > operation since it went live in 1983.)
> >> >
> >> > So, what the end-to-end argument would tend to do here, in my opinio=
n, is to
> >> > provide the most minimal mechanism in the devices that are capable o=
f building
> >> up
> >> > a queue in order to allow all the ends sharing that queue to do thei=
r job -
> >> which
> >> > is to stop filling up the queue!
> >> >
> >> > Only the endpoints can prevent filling up queues. And depending on t=
he
> >> protocol,
> >> > they may need to make very different, yet compatible choices.
> >> >
> >> > This is a question of design at the architectural level. And the fut=
ure
> >> matters.
> >> >
> >> > So there is an end-to-end argument to be made here, but it is a subt=
le one.
> >> >
> >> > The basic mechanism for controlling queue depth has been, and remain=
s, quite
> >> > simple: dropping packets. This has two impacts: 1) immediately reduc=
ing
> >> queueing
> >> > delay, and 2) signalling to endpoints that are paying attention that=
 they have
> >> > contributed to an overfull queue.
> >> >
> >> > The optimum queueing delay in a steady state would always be one pac=
ket or
> >> less.
> >> > Kleinrock has shown this in the last few years. Of course there aren=
't steady
> >> > states. But we don't want a mechanism that can't converge to that st=
eady state
> >> > *quickly*, for all queues in the network.
> >> >
> >> > Another issue is that endpoints are not aware of the fact that packe=
ts can
> >> take
> >> > multiple paths to any destination. In the future, alternate path cho=
ices can
> >> be
> >> > made by routers (when we get smarter routing algorithms based on tra=
ffic
> >> > engineering).
> >> >
> >> > So again, some minimal kind of information must be exposed to endpoi=
nts that
> >> will
> >> > continue to communicate. Again, the routers must be able to help a w=
ide variety
> >> of
> >> > endpoints with different use cases to decide how to move queue build=
up out of
> >> the
> >> > network itself.
> >> >
> >> > Now the decision made by the endpoints must be made in the context o=
f
> >> information
> >> > about fairness. Maybe this is what is not obvious.
> >> >
> >> > The most obvious notion of fairness is equal shares among source hos=
t, dest
> >> host
> >> > pairs. There are drawbacks to that, but the benefit of it is that it=
 affects
> >> the
> >> > IP layer alone, and deals with lots of boundary cases like the case =
where a
> >> single
> >> > host opens a zillion TCP connections or uses lots of UDP source port=
s or
> >> > destinations to somehow "cheat" by appearing to have "lots of flows"=
.
> >> >
> >> > Another way to deal with dividing up flows is to ignore higher level=
 protocol
> >> > information entirely, and put the flow idenfitication in the IP laye=
r. A 32-bit
> >> or
> >> > 64-bit random number could be added as an "option" to IP to somehow =
extend the
> >> > flow space.
> >> >
> >> > But that is not the most important thing today.
> >> >
> >> > I write this to say:
> >> > 1) some kind of per-flow queueing, during the transient state where =
a queue is
> >> > overloaded before packets are dropped would provide much needed info=
rmation to
> >> the
> >> > ends of every flow sharing a common queue.
> >> > 2) per-flow queueing, minimized to a very low level, using IP envelo=
pe address
> >> > information (plus maybe UDP and TCP addresses for those protocols in=
 an
> >> extended
> >> > address-based flow definition) is totally compatible with end-to-end
> >> arguments,
> >> > but ONLY if the decisions made are certain to drive queueing delay o=
ut of the
> >> > router to the endpoints.
> >> >
> >> >
> >> >
> >> >
> >> > On Wednesday, July 17, 2019 5:33pm, "Sebastian Moeller" <moeller0@gm=
x.de>
> >> said:
> >> >
> >> >> Dear Bob, dear IETF team,
> >> >>
> >> >>
> >> >>> On Jun 19, 2019, at 16:12, Bob Briscoe <ietf@bobbriscoe.net> wrote=
:
> >> >>>
> >> >>> Jake, all,
> >> >>>
> >> >>> You may not be aware of my long history of concern about how per-f=
low
> >> scheduling
> >> >>> within endpoints and networks will limit the Internet in future. I=
 find
> >> per-flow
> >> >>> scheduling a violation of the e2e principle in such a profound way=
 - the
> >> dynamic
> >> >>> choice of the spacing between packets - that most people don't eve=
n associate
> >> it
> >> >>> with the e2e principle.
> >> >>
> >> >>      This does not rhyme well with the L4S stated advantage of allo=
wing
> >> packet
> >> >> reordering (due to mandating RACK for all L4S tcp endpoints). Becau=
se surely
> >> >> changing the order of packets messes up the "the dynamic choice of =
the
> >> spacing
> >> >> between packets" in a significant way. IMHO it is either L4S is gre=
at because
> >> it
> >> >> will give intermediate hops more leeway to re-order packets, or "a =
sender's
> >> >> packet spacing" is sacred, please make up your mind which it is.
> >> >>
> >> >>>
> >> >>> I detected that you were talking about FQ in a way that might have=
 assumed
> >> my
> >> >>> concern with it was just about implementation complexity. If you (=
or anyone
> >> >>> watching) is not aware of the architectural concerns with per-flow
> >> scheduling, I
> >> >>> can enumerate them.
> >> >>
> >> >>      Please do not hesitate to do so after your deserved holiday, a=
nd please
> >> state a
> >> >> superior alternative.
> >> >>
> >> >> Best Regards
> >> >>      Sebastian
> >> >>
> >> >>
> >> >>>
> >> >>> I originally started working on what became L4S to prove that it w=
as possible
> >> to
> >> >>> separate out reducing queuing delay from throughput scheduling. Wh=
en Koen and
> >> I
> >> >>> started working together on this, we discovered we had identical c=
oncerns on
> >> >>> this.
> >> >>>
> >> >>>
> >> >>>
> >> >>> Bob
> >> >>>
> >> >>>
> >> >>> --
> >> >>> ________________________________________________________________
> >> >>> Bob Briscoe                               http://bobbriscoe.net/
> >> >>>
> >> >>> _______________________________________________
> >> >>> Ecn-sane mailing list
> >> >>> Ecn-sane@lists.bufferbloat.net
> >> >>> https://lists.bufferbloat.net/listinfo/ecn-sane
> >> >>
> >> >> _______________________________________________
> >> >> Ecn-sane mailing list
> >> >> Ecn-sane@lists.bufferbloat.net
> >> >> https://lists.bufferbloat.net/listinfo/ecn-sane
> >> >>
> >> >
> >> >
> >> > _______________________________________________
> >> > Ecn-sane mailing list
> >> > Ecn-sane@lists.bufferbloat.net
> >> > https://lists.bufferbloat.net/listinfo/ecn-sane
> >> >
> >>
> >>
> >> _______________________________________________
> >> Ecn-sane mailing list
> >> Ecn-sane@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/ecn-sane
> >
> >
> >
> > --
> >
> > Dave T=C3=A4ht
> > CTO, TekLibre, LLC
> > http://www.teklibre.com
> > Tel: 1-831-205-9740
> >
>
>


--=20

Dave T=C3=A4ht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740