From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-out02.uio.no (mail-out02.uio.no [IPv6:2001:700:100:8210::71]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 6AC4C3BA8E for ; Tue, 27 Nov 2018 16:17:22 -0500 (EST) Received: from mail-mx11.uio.no ([129.240.10.83]) by mail-out02.uio.no with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.91) (envelope-from ) id 1gRkjQ-000Gza-A8; Tue, 27 Nov 2018 22:17:20 +0100 Received: from 58.116.34.95.customer.cdi.no ([95.34.116.58] helo=[10.0.0.3]) by mail-mx11.uio.no with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) user michawe (Exim 4.91) (envelope-from ) id 1gRkjL-0003mN-8G; Tue, 27 Nov 2018 22:17:20 +0100 From: Michael Welzl Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_3042465D-1167-4F77-8A20-E63FF016E731" Mime-Version: 1.0 (Mac OS X Mail 12.1 \(3445.101.1\)) Date: Tue, 27 Nov 2018 22:17:14 +0100 In-Reply-To: Cc: bloat To: Dave Taht References: <65EAC6C1-4688-46B6-A575-A6C7F2C066C5@heistp.net> <38535869-BF61-4FC4-A0FB-96E91CC4F076@ifi.uio.no> X-Mailer: Apple Mail (2.3445.101.1) X-UiO-SPF-Received: Received-SPF: neutral (mail-mx11.uio.no: 95.34.116.58 is neither permitted nor denied by domain of ifi.uio.no) client-ip=95.34.116.58; envelope-from=michawe@ifi.uio.no; helo=[10.0.0.3]; X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, HTML_MESSAGE=0.001, TVD_RCVD_IP=0.001, UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO) X-UiO-Scanned: 905A54D03B1C12A00448D7975CAF3A898470F0A8 X-Mailman-Approved-At: Sun, 02 Dec 2018 22:38:56 -0500 Subject: Re: [Bloat] when does the CoDel part of fq_codel help in the real world? X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Nov 2018 21:17:23 -0000 --Apple-Mail=_3042465D-1167-4F77-8A20-E63FF016E731 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi, A few answers below: > On Nov 27, 2018, at 9:10 PM, Dave Taht wrote: >=20 > On Mon, Nov 26, 2018 at 1:56 PM Michael Welzl > wrote: >>=20 >> Hi folks, >>=20 >> That =E2=80=9CMichael=E2=80=9D dude was me :) >>=20 >> About the stuff below, a few comments. First, an impressive effort to = dig all of this up - I also thought that this was an interesting = conversation to have! >>=20 >> However, I would like to point out that thesis defense conversations = are meant to be provocative, by design - when I said that CoDel = doesn=E2=80=99t usually help and long queues would be the right thing = for all applications, I certainly didn=E2=80=99t REALLY REALLY mean = that. The idea was just to be thought provoking - and indeed I found = this interesting: e.g., if you think about a short HTTP/1 connection, a = large buffer just gives it a greater chance to get all packets across, = and the perceived latency from the reduced round-trips after not = dropping anything may in fact be less than with a smaller (or = CoDel=E2=80=99ed) buffer. >=20 > I really did want Toke to have a hard time. Thanks for putting his > back against the wall! >=20 > And I'd rather this be a discussion of toke's views... I do tend to > think he thinks FQ solves more than it does.... and I wish we had a > sound analysis as to why 1024 queues > works so much better for us than 64 or less on the workloads we have. > I tend to think in part it's because that acts as a 1000x1 > rate-shifter - but should it scale up? Or down? Is what we did with > cake (1024 setassociative) useful? or excessive? I'm regularly seeing > 64,000 queues on 10Gig and up hardware due to 64 hardware queues and > fq_codel on each, on that sort of gear. I think that's too much and > renders the aqm ineffective, but lack data... >=20 > but, to rant a bit... >=20 > While I tend to believe FQ solves 97% of the problem, AQM 2.9% and ECN = .09%. I think the sparse flow optimization bit plays a major role in FQ_CoDel. > BUT: Amdahls law says once you reduce one part of the problem to 0, > everything else takes 100%. :) >=20 > it often seems like me, being the sole and very lonely FQ advocate > here in 2011, have reversed the situation (in this group!), and I'm > oft the AQM advocate *here* now. Well I=E2=80=99m with you, I do agree that an AQM is useful! It=E2=80=99s= just that there are not SO many cases where a single flow builds a = standing queue only for itself and this really matters for that = particular application. But these cases absolutely do exist! (and several examples were = mentioned - also the VPN case etc.) > It's sort of like all the people quoting the e2e argument still, back > at me when dave reed (at least, and perhaps the other co-authors now) > have bought into this level of network interference between the > endpoints, and had no religion - or the red in a different light paper > being rejected because it attempted to overturn other religion - and > I'll be damned if I'll let fq_codel, sch_fq, pie, l4s, scream, nada, >=20 > I admit to getting kind of crusty and set in my ways, but so long as > people put code in front of me along with the paper, I still think, > when the facts change, so do my opinions. >=20 > Pacing is *really impressive* and I'd like to see that enter > everything, not just in packet processing - I've been thinking hard > about the impact of cpu bursts (like resizing a hash table), and other > forms of work that we currently do on computers that have a > "dragster-like" peak performance, and a great average, but horrible > pathologies - and I think the world would be better off if we built > more +1 > Anyway... >=20 > Once you have FQ and a sound outer limit on buffer size (100ms), > depredations like comcast's 680ms buffers no longer matter. There's > still plenty of room to innovate. BBR works brilliantly vs fq_codel > (and you can even turn ECN on which it doesn't respect and still get a > great result). LoLa would probably work well also 'cept that the git > tree was busted when I last tried it and it hasn't been tested much in > the 1Mbit-1Gbit range. >=20 >>=20 >> But corner cases aside, in fact I very much agree with the answers to = my question Pete gives below, and also with the points others have made = in answering this thread. Jonathan Morton even mentioned ECN - after = Dave=E2=80=99s recent over-reaction to ECN I made a point of not = bringing up ECN *yet* again >=20 > Not going to go into it (much) today! We ended up starting another > project on ECN that that operates under my core ground rule - "show me > the code" - and life over there and on that mailing list has been > pleasantly quiet. https://www.bufferbloat.net/projects/ecn-sane/wiki/ = > . >=20 > I did get back on the tsvwg mailing list recently because of some > ludicrously inaccurate misstatements about fq_codel. I also made a > strong appeal to the l4s people, to, in general, "stop thanking me" in > their documents. To me that reads as an endorsement, where all I did > was participate in the process until I gave up and hit my "show me the > code" moment - which was about 5 years ago and hasn't moved on the > needle since except in mutating standards documents. >=20 > The other document I didn't like was an arbitary attempt to just set > the ecn backoff figure to .8 when the sanest thing, given the > deployment, and pacing... was to aim for a right number - anyway..... > in that case I just wanted off the "thank you" list. So let=E2=80=99s draw a line between L4S and =E2=80=9Cthe other document = you didn=E2=80=99t like=E2=80=9D, which was our ABE. L4S is a more drastic attempt at getting things right. I haven=E2=80=99t = been contributing to this much; I like it for what it=E2=80=99s trying = to achieve, but I don=E2=80=99t have a strong opinion on it. Myself, I thought that much smaller changes might have a better chance = at getting the incentives right, to support ECN deployment - which was = the change to 0.8. Looking at our own document again, I am surprised to see that you are = indeed in our acknowledgement list: https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn-12 = We added everyone who we thought made useful suggestions - it wasn=E2=80=99= t meant as a sign of endorsement. But, before RFC publication, there is = still an opportunity to remove your name. =3D> I apologize and will remove you. > I like to think the more or less rfc3168 compliant deployment of ecn > is thus far going brilliantly, but lack data. Certainly would like a > hostile reviewers evaluation of cake's ecn method and for that matter, > pie's, honestly - from real traffic! There's an RFC- compliant version > of Pie being pushed into the kernel after it gets through some of > stephens nits. >=20 > And I'd really prefer all future discussions of "ecn benefits" to come > with code and data and be discussed over on the ecn-sane mailing list, > or *not discussed here* if no code is available. You keep complaining about lack of code. At least for ABE: - I think the code is in FreeBSD now - There is a slightly older Linux patch. I agree it would be nice to = continue with this code=E2=80=A6 I don=E2=80=99t have someone doing this = right now. Anyway, all code, along with measurement results, is available from: http://heim.ifi.uio.no/michawe/research/projects/abe/index.html = >> , but=E2=80=A6 yes indeed, being able to use ECN to tell an = application to back off instead of requiring to drop a packet is also = one of the benefits. >=20 > One thus far mis-understood and under-analyzed aspect of our work is > the switch to head dropping. >=20 > To me the switch to head dropping essentially killed the tail loss RTO > problem, eliminated most of the need for ecn. I doubt that: TCP will need to retransmit that packet at the head, and = that takes an RTT - all the packets after it will need to wait in the = receiver buffer before the application gets them. But I don=E2=80=99t have measurements to prove my point, so I=E2=80=99m = just hand-waving... > Forward progress and > prompt signalling always happens. That otherwise wonderful piece > stuart cheshire did at apple elided the actual dropping mode version > of fq_codel, which as best as I recall was about 12? 15ms? long and > totally invisible to the application. >=20 >> (I think people easily miss the latency benefit of not dropping a = packet, and thereby eliminating head-of-line blocking - packet drops = require an extra RTT for retransmission, which can be quite a long time. = This is about measuring latency at the right layer...) >=20 > see above. And yea, perversely, I agree with your last statement. Perversely? Come on :) > A > slashdot web page download takes 78 separate flows and 2.2 seconds to > complete. Worst case completion > time - if you had *tail* loss would be about 80ms longer than that, on > a tiny fraction of loads. The rest of it is absorbed into those 2.2 > seconds. Yes - and these separate flows get their own buckets in FQ_CoDel. Which = is great - just not much effect from CoDel there. But I=E2=80=99m NOT arguing that per-flow AQM is a bad thing, absolutely = not! > EVEN with http 2.0/ I would be extremely surprised to learn that many > websites fit it all into one tcp transaction. >=20 > There are very few other examples of TCP traffic requiring a low > latency response. I happen to be very happy with the ecn support in > mosh btw, not that anybody's ever looked at it since we did it. >=20 > And I'd really prefer all future discussions of "ecn benefits" to come > with code and data and be discussed over on the ecn-sane mailing list, > or not discussed here if no code is available. >=20 >> BTW, Anna Brunstrom was also very quick to also give me the HTTP/2.0 = example in the break after the defense. Also, TCP will generally not = work very well when queues get very long=E2=80=A6 the RTT estimate gets = way off. >=20 > I like to think that the syn/ack and ssl negotation handshake under > fq_codel gives a much more accurate estimate of actual RTT than we > ever had before. Another good point - this is indeed useful! >> All in all, I think this is a fun thought to consider for a bit, but = not really something worth spending people=E2=80=99s time on, IMO: big = buffers are bad, period. All else are corner cases. >=20 > I've said it elsewhere, and perhaps we should resume, but an RFC > merely stating the obvious about maximal buffer limits and getting > ISPs do to do that would be a boon. >=20 >> I=E2=80=99ll use the opportunity to tell folks that I was also pretty = impressed with Toke=E2=80=99s thesis as well as his performance at the = defense. Among the many cool things he=E2=80=99s developed (or = contributed to), my personal favorite is the airtime fairness scheduler. = But, there were many more. Really good stuff. >=20 > I so wish the world has about 1000 more toke's in training. How can we > make that happen? I don=E2=80=99t know=E2=80=A6 in academia, the mix of really = contributing to the kernel on the one side, and getting academic results = on the other, is a rare thing. Not that we advisors (at least the people I consider friends) would be = against that! But it's not easy to find someone who can pull this off. Cheers, Michael --Apple-Mail=_3042465D-1167-4F77-8A20-E63FF016E731 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi,

A few = answers below:


On Nov = 27, 2018, at 9:10 PM, Dave Taht <dave.taht@gmail.com> wrote:

On Mon, Nov 26, 2018 at 1:56 PM = Michael Welzl <michawe@ifi.uio.no> wrote:

Hi folks,

That =E2=80=9CMichael=E2=80=9D dude was me =  :)

About the stuff below, a few = comments. First, an impressive effort to dig all of this up - I also = thought that this was an interesting conversation to have!

However, I would like to point out that thesis = defense conversations are meant to be provocative, by design - when I = said that CoDel doesn=E2=80=99t usually help and long queues would be = the right thing for all applications, I certainly didn=E2=80=99t REALLY = REALLY mean that.  The idea was just to be thought provoking - and = indeed I found this interesting: e.g., if you think about a short HTTP/1 = connection, a large buffer just gives it a greater chance to get all = packets across, and the perceived latency from the reduced round-trips = after not dropping anything may in fact be less than with a smaller (or = CoDel=E2=80=99ed) buffer.

I really did want Toke to have a = hard time. Thanks for putting his
back against the wall!

And I'd rather this be a discussion of toke's views... I do = tend to
think he = thinks FQ solves more than it does.... and I wish we had a
sound analysis as to why 1024 = queues
works so much = better for us than 64 or less on the workloads we have.
I tend to think in part it's = because that acts as a 1000x1
rate-shifter - but should it scale up? Or down? Is what we = did with
cake (1024 = setassociative) useful? or excessive? I'm regularly seeing
64,000 queues on 10Gig and up = hardware due to 64 hardware queues and
fq_codel on each, on that sort of gear. I think that's too = much and
renders the = aqm ineffective, but lack data...

but, to rant a bit...

While I tend to believe FQ solves 97% of the problem, AQM = 2.9% and ECN .09%.

I think the sparse flow optimization bit plays a = major role in FQ_CoDel.


BUT: Amdahls law says once you = reduce one part of the problem to 0,
everything else takes 100%. :)

it often seems like me, being the sole and very lonely FQ = advocate
here in 2011, = have reversed the situation (in this group!), and I'm
oft the AQM advocate *here* = now.

Well = I=E2=80=99m with you, I do agree that an AQM is useful!  It=E2=80=99s= just that there are not SO many cases where a single flow builds a = standing queue only for itself and this really matters for that = particular application.
But these cases absolutely do exist! =  (and several examples were mentioned - also the VPN case = etc.)


It's sort of like all the people quoting the e2e argument = still, back
at me when = dave reed (at least, and perhaps the other co-authors now)
have bought into this level of = network interference between the
endpoints, and had no religion - or the red in a different = light paper
being = rejected because it attempted to overturn other religion - and
I'll be damned if I'll let = fq_codel, sch_fq, pie, l4s, scream, nada,

I admit to getting kind of crusty and set in my ways, but so = long as
people put = code in front of me along with the paper, I still think,
when the facts change, so do my = opinions.

Pacing is = *really impressive* and I'd like to see that enter
everything, not just in packet = processing - I've been thinking hard
about the impact of cpu bursts (like resizing a hash table), = and other
forms of work = that we currently do on computers that have a
"dragster-like" peak = performance, and a great average, but horrible
pathologies - and I think the = world would be better off if we built
more

+1


Anyway...

Once you have FQ and a sound = outer limit on buffer size (100ms),
depredations like comcast's 680ms buffers no longer matter. = There's
still plenty = of room to innovate. BBR works brilliantly vs fq_codel
(and you can even turn ECN on = which it doesn't respect and still get a
great result). LoLa would probably work well also 'cept that = the git
tree was = busted when I last tried it and it hasn't been tested much in
the 1Mbit-1Gbit range.


But corner cases aside, in fact I very much agree with the = answers to my question Pete gives below, and also with the points others = have made in answering this thread. Jonathan Morton even mentioned ECN - = after Dave=E2=80=99s recent over-reaction to ECN I made a point of not = bringing up ECN *yet* again

Not going to go into it (much) = today! We ended up starting another
project on ECN that that operates under my core ground rule - = "show me
the code" - = and life over there and on that mailing list has been
pleasantly quiet. https://www.bufferbloat.net/projects/ecn-sane/wiki/
.

I did get back on the tsvwg mailing list recently because of = some
ludicrously = inaccurate misstatements about fq_codel. I also made a
strong appeal to the l4s people, = to, in general, "stop thanking me" in
their documents. To me that reads as an endorsement, where = all I did
was = participate in the process until I gave up and hit my "show me = the
code" moment = - which was about 5 years ago and hasn't moved on the
needle since except in mutating = standards documents.

The other document I didn't like was an arbitary attempt to = just set
the ecn = backoff figure to .8 when the sanest thing, given the
deployment, and pacing... was to = aim for a right number - anyway.....
in that case I just wanted off the "thank you" = list.

So = let=E2=80=99s draw a line between L4S and =E2=80=9Cthe other document = you didn=E2=80=99t like=E2=80=9D, which was our ABE.
L4S is a = more drastic attempt at getting things right. I haven=E2=80=99t been = contributing to this much; I like it for what it=E2=80=99s trying to = achieve, but I don=E2=80=99t have a strong opinion on = it.
Myself, I thought that much smaller changes might have a = better chance at getting the incentives right, to support ECN deployment = - which was the change to 0.8.

Looking= at our own document again, I am surprised to see that you are indeed in = our acknowledgement list:
We added everyone who we thought made useful = suggestions - it wasn=E2=80=99t meant as a sign of endorsement. But, = before RFC publication, there is still an opportunity to remove your = name.
=3D> I apologize and will remove you.


I like to = think the more or less rfc3168 compliant deployment of ecn
is thus far going brilliantly, = but lack data. Certainly would like a
hostile reviewers evaluation of cake's ecn method and for = that matter,
pie's, = honestly - from real traffic! There's an RFC- compliant = version
of Pie being = pushed into the kernel after it gets through some of
stephens nits.

And I'd really prefer all future = discussions of "ecn benefits" to come
with code and data and be discussed over on the ecn-sane = mailing list,
or *not = discussed here* if no code is available.

You keep complaining about lack of code. At least = for ABE:
- I think the code is in FreeBSD now
- = There is a slightly older Linux patch. I agree it would be nice to = continue with this code=E2=80=A6 I don=E2=80=99t have someone doing this = right now.
Anyway, all code, along with measurement results, = is available from:


, but=E2=80=A6 yes indeed, being able = to use ECN to tell an application to back off instead of requiring to = drop a packet is also one of the benefits.

One thus far mis-understood and = under-analyzed aspect of our work is
the switch to head dropping.

To me the switch to head dropping essentially killed the tail = loss RTO
problem, = eliminated most of the need for ecn.

I doubt that: TCP will need to retransmit that = packet at the head, and that takes an RTT - all the packets after it = will need to wait in the receiver buffer before the application gets = them.
But I don=E2=80=99t have measurements to prove my point, = so I=E2=80=99m just hand-waving...


Forward progress and
prompt signalling always = happens. That otherwise wonderful piece
stuart cheshire did at apple elided the actual dropping mode = version
of fq_codel, = which as best as I recall was about 12? 15ms? long and
totally invisible to the = application.

(I = think people easily miss the latency benefit of not dropping a packet, = and thereby eliminating head-of-line blocking - packet drops require an = extra RTT for retransmission, which can be quite a long time. This is = about measuring latency at the right layer...)

see above. And yea, perversely, I agree with your last = statement.

Perversely? Come on  :)


A
slashdot web page download takes = 78 separate flows and 2.2 seconds to
complete. Worst case completion
time - if you had *tail* loss would be about 80ms longer than = that, on
a tiny = fraction of loads. The rest of it is absorbed into those 2.2
seconds.

Yes - and = these separate flows get their own buckets in FQ_CoDel. Which is great - = just not much effect from CoDel there.
But I=E2=80=99m NOT = arguing that per-flow AQM is a bad thing, absolutely not!


EVEN with = http 2.0/ I would be extremely surprised to learn that many
websites fit it all into one tcp = transaction.

There are = very few other examples of TCP traffic requiring a low
latency response. I happen to be = very happy with the ecn support in
mosh btw, not that anybody's ever looked at it since we did = it.

And I'd = really prefer all future discussions of "ecn benefits" to come
with code and data and be = discussed over on the ecn-sane mailing list,
or not discussed here if no code = is available.

BTW, = Anna Brunstrom was also very quick to also give me the HTTP/2.0 example = in the break after the defense. Also, TCP will generally not work very = well when queues get very long=E2=80=A6 the RTT estimate gets way = off.

I like to think that the syn/ack and ssl negotation handshake = under
fq_codel = gives a much more accurate estimate of actual RTT than we
ever had before.

Another good = point - this is indeed useful!


All = in all, I think this is a fun thought to consider for a bit, but not = really something worth spending people=E2=80=99s time on, IMO: big = buffers are bad, period. All else are corner cases.

I've said it elsewhere, and perhaps we should resume, but an = RFC
merely = stating the obvious about maximal buffer limits and getting
ISPs do to do that would be a = boon.

I=E2=80= =99ll use the opportunity to tell folks that I was also pretty impressed = with Toke=E2=80=99s thesis as well as his performance at the defense. = Among the many cool things he=E2=80=99s developed (or contributed to), = my personal favorite is the airtime fairness scheduler. But, there were = many more. Really good stuff.

I so wish the world has about = 1000 more toke's in training. How can we
make that happen?

I don=E2=80=99t know=E2=80=A6 in academia, the = mix of really contributing to the kernel on the one side, and getting = academic results on the other, is a rare thing.
Not = that we advisors (at least the people I consider friends) would be = against that!  But it's not easy to find someone who can pull this = off.

Cheers,
Michael

= --Apple-Mail=_3042465D-1167-4F77-8A20-E63FF016E731--