From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ve0-x234.google.com (mail-ve0-x234.google.com [IPv6:2607:f8b0:400c:c01::234]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 71E4821F205 for ; Tue, 26 Nov 2013 05:51:12 -0800 (PST) Received: by mail-ve0-f180.google.com with SMTP id jz11so3885970veb.25 for ; Tue, 26 Nov 2013 05:51:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=w+OceW4KjkNTvwTXPSttwajC1yKbhiPvT5jIpoySj+M=; b=y4sdSSgVCmNb9TuLv5ZEgGYcLuXqHrMXlMZKlygewVkEsKiJj3IPNKYvdHtRATEUwX IfS1XJn3s16gSeQc2Oglg4d0+B1R1TG2GJfHLmzC9ugS+aAlgBCbiR1Dpt+z3kSD/uA8 ibGsWdu8bb5Rex5/svg+xee9c3nf+h2wPw1ln7oDi+s4CPz6A8WGpYzefHeevZhoeAj5 kJME76UCUdc1spMVYICf89hEFj4qhkQsR+mlmG0u876LduV0sDyqciYpUuBE/YclWml2 iitxn/tcSvmU49TsQjk1GRmOpEFXM3aGXImnA4dP0NClcIKzgWUWKHeuf7vLg7SkKGtB zveg== MIME-Version: 1.0 X-Received: by 10.52.97.35 with SMTP id dx3mr26478159vdb.18.1385473870941; Tue, 26 Nov 2013 05:51:10 -0800 (PST) Received: by 10.220.109.5 with HTTP; Tue, 26 Nov 2013 05:51:10 -0800 (PST) In-Reply-To: References: Date: Tue, 26 Nov 2013 14:51:10 +0100 Message-ID: From: Naeem Khademi To: Dave Taht Content-Type: multipart/alternative; boundary=20cf307f31a8ec562c04ec14c5bf Cc: bloat , "aqm@ietf.org" , "iccrg@irtf.org" , "tsvwg@ietf.org" Subject: Re: [Bloat] [iccrg] Clearing up misunderstandings about Linux versioning X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Nov 2013 13:51:12 -0000 --20cf307f31a8ec562c04ec14c5bf Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Dave Thanks for sharing this and great to know that you're working on replicating the "AQM Kids..." work presented at ICCRG. jfyi we used Linux 3.10.4 on the AQM box for our real-life tests (also mentioned in Table 4, page 8 of the TR) and it should contain most of the recent codes on the latest AQMs judging from what you've written below. Cheers, Naeem On Tue, Nov 26, 2013 at 2:49 AM, Dave Taht wrote: > I am in the process of trying to reproduce the recent rite papers on > immediate congestion notification and the iccrg ARED work and need > to explain something to researchers so that I don't have to work so hard. > > Linux kernel releases are numbered X.Y.Z-Q. All kernel versions > contain bugs. Linux 3.2.0 was, IMHO, the nadir of the network stack in > Linux. > > The "X" is the major version number. It's only changed 3 times. "Y", > is the minor version number. These come out roughly quarterly and > consist of new development of "features". "Z" is critical patches > backported from newer releases. -Q is usually the vendor's kernel > build number which often contains more patches. > > These numbers, clearly identified, in every academic paper on > networking, and every presentation, ever published, would make me a > happier guy. A pointer to the git tree actually used would make me > even happier, with all the patches (like DCTCP in this case) applied, > would cause me to dance for joy, and sing hallelujah! > > Anyway, on the "Z" part of X.Y.Z: > > Periodically a "long term stable" release is picked and receives updates > for as long as someone is funded to do it. > > *the only things that enter into a long term stable release* are fixes > for security bugs, crash bugs, and truly egregious bugs that can be > somewhat easily fixed. > > But the rest of the development goes into X.Y+1. > > Sane people never run a X.Y.0 release on hardware/data they care about. > > So... anyway... when I was told that the recent paper on DCTCP had > been done against Linux 3.2.18, "which came out in july, 2013!" ... > > I was partially happy - pretty stable release - but > my heart sank as I knew that very, very, very few of the relevant fixes > for bufferbloat and the tcp stack had landed in anything prior to > Linux 3.6. Those fixes had mostly qualified as "features". Several in > fact have been in such continuous development that I'd not want to > generalize from fq_codel in 3.5 vs what's in 3.8 now, as one example. > > So it's my hope that folk will try to follow more closely the X.Y series > of kernels rather than the 3.2.Z series of kernels in the future. I'm > very happy with what happened in the 3.12 series in particular and > look forward to work against it in the near future. The 3.13 work is > just beginning, too. > > In the hope that the showing the mechanics of researching what fixes > did land in an old stable release would help on future papers, > here's how to look: > > git clone git:// > git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git > git clone linux-stable linux-3.2.18 > cd linux-3.2.18 > git checkout v3.2.18 > git checkout -b ritepaper > git log net include/net # just the networking bits, not the driver > bits for this example > > BQL, enhanced SFQ, SFQRED, codel, fq_codel, tcp small queues, > retirement of some odd tcp logic, and hundreds of other changes were > made to the the stack since 3.2.0, and were NOT backported to 3.2.Z. > > Some very relevant bugfixes did indeed land in 3.2.18 that were not in 3.= 2! > > If you have an experiment that used TCP that is against an even older > release, perhaps some of these major bugs might explain your results. > Here's a sampling: > > commit 4b9b05fd95c502521eaef111ba0f83c58b391587 > Author: Eric Dumazet > Date: Wed May 2 02:28:41 2012 +0000 > > tcp: change tcp_adv_win_scale and tcp_rmem[2] > > > > This also means tcp advertises a too optimistic window for a given > allocated rcvspace : When receiving frames, sk_rmem_alloc can hit > sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too ofte= n, > especially when application is slow to drain its receive queue or in > case of losses (netperf is fast, scp is slow). This is a major latenc= y > source. > > icommit b713f6c7d317c136f03c132203d0900f4a0de084 > Author: Yuchung Cheng > Date: Mon Apr 30 06:00:18 2012 +0000 > > tcp: fix infinite cwnd in tcp_complete_cwr() > > [ Upstream commit 1cebce36d660c83bd1353e41f3e66abd4686f215 ] > > When the cwnd reduction is done, ssthresh may be infinite > if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e., > undo_marker is set, tcp_complete_cwr() falsely set cwnd to the > infinite ssthresh value. The correct operation is to keep cwnd > intact because it has been updated in ECN or F-RTO. > > commit 65355aea86b2a70cbc7cbe14466702bc5a4e2217 > Author: Neal Cardwell > Date: Tue Apr 10 07:59:20 2012 +0000 > > tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample > > [ Upstream commit 18a223e0b9ec8979320ba364b47c9772391d6d05 ] > > Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and > unscaled RTT samples. > > The intent in the code was to only use the 'm' measurement if it was = a > new minimum. However, since 'm' had not yet been shifted left 3 bits > but 'new_sample' had, this comparison would nearly always succeed, > leading us to erroneously set our receive-side RTT estimate to the 'm= ' > sample when that sample could be nearly 8x too high to use. > > The overall effect is to often cause the receive-side RTT estimate to > be significantly too large (up to 40% too large for brief periods in > my tests). > > > commit 1ee5fa1e9970a16036e37c7b9d5ce81c778252fc > > [PATCH] sch_red: fix red_change() > > Now RED is classful, we must check q->qdisc->q.qlen, and if queue is > empt > we start an idle period, not end it. > > > > -- > Dave T=E4ht > > Fixing bufferbloat with cerowrt: > http://www.teklibre.com/cerowrt/subscribe.html > _______________________________________________ > iccrg mailing list > iccrg@irtf.org > https://www.irtf.org/mailman/listinfo/iccrg > --20cf307f31a8ec562c04ec14c5bf Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Dave

Thanks for sharing this and gre= at to know that you're working on replicating the "AQM Kids...&quo= t; work presented at ICCRG. jfyi we used Linux 3.10.4 on the AQM box for ou= r real-life tests (also mentioned in Table 4, page 8 of the TR) and it shou= ld contain most of the recent codes on the latest AQMs judging from what yo= u've written below. =A0 =A0 =A0

Cheers,
Naeem

On Tue= , Nov 26, 2013 at 2:49 AM, Dave Taht <dave.taht@gmail.com>= wrote:
I am in the process of trying to reproduce t= he recent rite papers on
immediate congestion notification and the iccrg ARED work and need
to explain something to researchers so that I don't have to work so har= d.

Linux kernel releases are numbered X.Y.Z-Q. All kernel versions
contain bugs. Linux 3.2.0 was, IMHO, the nadir of the network stack in
Linux.

The "X" is the major version number. It's only changed 3 time= s. "Y",
is the minor version number. These come out roughly quarterly and
consist of new development of "features". "Z" is critic= al patches
backported from newer releases. -Q is usually the vendor's kernel
build number which often contains more patches.

These numbers, clearly identified, in every academic paper on
networking, and every presentation, ever published, =A0would make me a
happier guy. A pointer to the git tree actually used would make me
even happier, with all the patches (like DCTCP in this case) applied,
would cause me to dance for joy, and sing hallelujah!

Anyway, on the "Z" part of X.Y.Z:

Periodically a "long term stable" release is picked and receives = updates
for as long as someone is funded to do it.

*the only things that enter into a long term stable release* are fixes
for security bugs, crash bugs, and truly egregious bugs that can be
somewhat easily fixed.

But the rest of the development goes into X.Y+1.

Sane people never run a X.Y.0 release on hardware/data they care about.

So... anyway... when I was told that the recent paper on DCTCP had
been done against Linux 3.2.18, "which came out in july, 2013!" .= ..

I was partially happy - pretty stable release - but
my heart sank as I knew that very, very, very few of the relevant fixes
for bufferbloat and the tcp stack had landed in anything prior to
Linux 3.6. Those fixes had mostly qualified as "features". Severa= l in
fact have been in such continuous development that I'd not want to
generalize from fq_codel in 3.5 vs what's in 3.8 now, as one example.
So it's my hope that folk will try to follow more closely the X.Y serie= s
of kernels rather than the 3.2.Z series of kernels in the future. I'm very happy with what happened in the 3.12 series in particular and
look forward to work against it in the near future. The 3.13 work is
just beginning, too.

In the hope that the showing the mechanics of researching what fixes
did land in an old stable release would help on future papers,
here's how to look:

git clone git://git.kernel.org/pub/scm/linux/kern= el/git/stable/linux-stable.git
git clone linux-stable linux-3.2.18
cd linux-3.2.18
git checkout v3.2.18
git checkout -b ritepaper
git log net include/net # just the networking bits, not the driver
bits for this example

BQL, enhanced SFQ, SFQRED, codel, fq_codel, tcp small queues,
retirement of some odd tcp logic, and hundreds of other changes were
made to the the stack since 3.2.0, and were NOT backported to 3.2.Z.

Some very relevant bugfixes did indeed land in 3.2.18 that were not in 3.2!=

If you have an experiment that used TCP that is against an even older
release, perhaps some of these major bugs might explain your results.
Here's a sampling:

commit 4b9b05fd95c502521eaef111ba0f83c58b391587
Author: Eric Dumazet <edumazet@go= ogle.com>
Date: =A0 Wed May 2 02:28:41 2012 +0000

=A0 =A0 tcp: change tcp_adv_win_scale and tcp_rmem[2]

<snip snip>

=A0 =A0 This also means tcp advertises a too optimistic window for a given<= br> =A0 =A0 allocated rcvspace : When receiving frames, sk_rmem_alloc can hit =A0 =A0 sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too of= ten,
=A0 =A0 especially when application is slow to drain its receive queue or i= n
=A0 =A0 case of losses (netperf is fast, scp is slow). This is a major late= ncy
=A0 =A0 source.

icommit b713f6c7d317c136f03c132203d0900f4a0de084
Author: Yuchung Cheng <ycheng@googl= e.com>
Date: =A0 Mon Apr 30 06:00:18 2012 +0000

=A0 =A0 tcp: fix infinite cwnd in tcp_complete_cwr()

=A0 =A0 [ Upstream commit 1cebce36d660c83bd1353e41f3e66abd4686f215 ]

=A0 =A0 When the cwnd reduction is done, ssthresh may be infinite
=A0 =A0 if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e., =A0 =A0 undo_marker is set, tcp_complete_cwr() falsely set cwnd to the
=A0 =A0 infinite ssthresh value. The correct operation is to keep cwnd
=A0 =A0 intact because it has been updated in ECN or F-RTO.

commit 65355aea86b2a70cbc7cbe14466702bc5a4e2217
Author: Neal Cardwell <ncardwell= @google.com>
Date: =A0 Tue Apr 10 07:59:20 2012 +0000

=A0 =A0 tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample

=A0 =A0 [ Upstream commit 18a223e0b9ec8979320ba364b47c9772391d6d05 ]

=A0 =A0 Fix a code path in tcp_rcv_rtt_update() that was comparing scaled a= nd
=A0 =A0 unscaled RTT samples.

=A0 =A0 The intent in the code was to only use the 'm' measurement = if it was a
=A0 =A0 new minimum. =A0However, since 'm' had not yet been shifted= left 3 bits
=A0 =A0 but 'new_sample' had, this comparison would nearly always s= ucceed,
=A0 =A0 leading us to erroneously set our receive-side RTT estimate to the = 'm'
=A0 =A0 sample when that sample could be nearly 8x too high to use.

=A0 =A0 The overall effect is to often cause the receive-side RTT estimate = to
=A0 =A0 be significantly too large (up to 40% too large for brief periods i= n
=A0 =A0 my tests).


commit 1ee5fa1e9970a16036e37c7b9d5ce81c778252fc

=A0 =A0 [PATCH] sch_red: fix red_change()

=A0 =A0 Now RED is classful, we must check q->qdisc->q.qlen, and if q= ueue is empt
=A0 =A0 we start an idle period, not end it.



--
Dave T=E4ht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscrib= e.html
_______________________________________________
iccrg mailing list
iccrg@irtf.org
h= ttps://www.irtf.org/mailman/listinfo/iccrg

--20cf307f31a8ec562c04ec14c5bf--