[Bloat] [iccrg] Clearing up misunderstandings about Linux versioning

Naeem Khademi naeem.khademi at gmail.com
Tue Nov 26 08:51:10 EST 2013


Hi Dave

Thanks for sharing this and great to know that you're working on
replicating the "AQM Kids..." work presented at ICCRG. jfyi we used Linux
3.10.4 on the AQM box for our real-life tests (also mentioned in Table 4,
page 8 of the TR) and it should contain most of the recent codes on the
latest AQMs judging from what you've written below.

Cheers,
Naeem

On Tue, Nov 26, 2013 at 2:49 AM, Dave Taht <dave.taht at gmail.com> wrote:

> I am in the process of trying to reproduce the recent rite papers on
> immediate congestion notification and the iccrg ARED work and need
> to explain something to researchers so that I don't have to work so hard.
>
> Linux kernel releases are numbered X.Y.Z-Q. All kernel versions
> contain bugs. Linux 3.2.0 was, IMHO, the nadir of the network stack in
> Linux.
>
> The "X" is the major version number. It's only changed 3 times. "Y",
> is the minor version number. These come out roughly quarterly and
> consist of new development of "features". "Z" is critical patches
> backported from newer releases. -Q is usually the vendor's kernel
> build number which often contains more patches.
>
> These numbers, clearly identified, in every academic paper on
> networking, and every presentation, ever published,  would make me a
> happier guy. A pointer to the git tree actually used would make me
> even happier, with all the patches (like DCTCP in this case) applied,
> would cause me to dance for joy, and sing hallelujah!
>
> Anyway, on the "Z" part of X.Y.Z:
>
> Periodically a "long term stable" release is picked and receives updates
> for as long as someone is funded to do it.
>
> *the only things that enter into a long term stable release* are fixes
> for security bugs, crash bugs, and truly egregious bugs that can be
> somewhat easily fixed.
>
> But the rest of the development goes into X.Y+1.
>
> Sane people never run a X.Y.0 release on hardware/data they care about.
>
> So... anyway... when I was told that the recent paper on DCTCP had
> been done against Linux 3.2.18, "which came out in july, 2013!" ...
>
> I was partially happy - pretty stable release - but
> my heart sank as I knew that very, very, very few of the relevant fixes
> for bufferbloat and the tcp stack had landed in anything prior to
> Linux 3.6. Those fixes had mostly qualified as "features". Several in
> fact have been in such continuous development that I'd not want to
> generalize from fq_codel in 3.5 vs what's in 3.8 now, as one example.
>
> So it's my hope that folk will try to follow more closely the X.Y series
> of kernels rather than the 3.2.Z series of kernels in the future. I'm
> very happy with what happened in the 3.12 series in particular and
> look forward to work against it in the near future. The 3.13 work is
> just beginning, too.
>
> In the hope that the showing the mechanics of researching what fixes
> did land in an old stable release would help on future papers,
> here's how to look:
>
> git clone git://
> git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> git clone linux-stable linux-3.2.18
> cd linux-3.2.18
> git checkout v3.2.18
> git checkout -b ritepaper
> git log net include/net # just the networking bits, not the driver
> bits for this example
>
> BQL, enhanced SFQ, SFQRED, codel, fq_codel, tcp small queues,
> retirement of some odd tcp logic, and hundreds of other changes were
> made to the the stack since 3.2.0, and were NOT backported to 3.2.Z.
>
> Some very relevant bugfixes did indeed land in 3.2.18 that were not in 3.2!
>
> If you have an experiment that used TCP that is against an even older
> release, perhaps some of these major bugs might explain your results.
> Here's a sampling:
>
> commit 4b9b05fd95c502521eaef111ba0f83c58b391587
> Author: Eric Dumazet <edumazet at google.com>
> Date:   Wed May 2 02:28:41 2012 +0000
>
>     tcp: change tcp_adv_win_scale and tcp_rmem[2]
>
> <snip snip>
>
>     This also means tcp advertises a too optimistic window for a given
>     allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
>     sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
>     especially when application is slow to drain its receive queue or in
>     case of losses (netperf is fast, scp is slow). This is a major latency
>     source.
>
> icommit b713f6c7d317c136f03c132203d0900f4a0de084
> Author: Yuchung Cheng <ycheng at google.com>
> Date:   Mon Apr 30 06:00:18 2012 +0000
>
>     tcp: fix infinite cwnd in tcp_complete_cwr()
>
>     [ Upstream commit 1cebce36d660c83bd1353e41f3e66abd4686f215 ]
>
>     When the cwnd reduction is done, ssthresh may be infinite
>     if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e.,
>     undo_marker is set, tcp_complete_cwr() falsely set cwnd to the
>     infinite ssthresh value. The correct operation is to keep cwnd
>     intact because it has been updated in ECN or F-RTO.
>
> commit 65355aea86b2a70cbc7cbe14466702bc5a4e2217
> Author: Neal Cardwell <ncardwell at google.com>
> Date:   Tue Apr 10 07:59:20 2012 +0000
>
>     tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample
>
>     [ Upstream commit 18a223e0b9ec8979320ba364b47c9772391d6d05 ]
>
>     Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and
>     unscaled RTT samples.
>
>     The intent in the code was to only use the 'm' measurement if it was a
>     new minimum.  However, since 'm' had not yet been shifted left 3 bits
>     but 'new_sample' had, this comparison would nearly always succeed,
>     leading us to erroneously set our receive-side RTT estimate to the 'm'
>     sample when that sample could be nearly 8x too high to use.
>
>     The overall effect is to often cause the receive-side RTT estimate to
>     be significantly too large (up to 40% too large for brief periods in
>     my tests).
>
>
> commit 1ee5fa1e9970a16036e37c7b9d5ce81c778252fc
>
>     [PATCH] sch_red: fix red_change()
>
>     Now RED is classful, we must check q->qdisc->q.qlen, and if queue is
> empt
>     we start an idle period, not end it.
>
>
>
> --
> Dave Täht
>
> Fixing bufferbloat with cerowrt:
> http://www.teklibre.com/cerowrt/subscribe.html
> _______________________________________________
> iccrg mailing list
> iccrg at irtf.org
> https://www.irtf.org/mailman/listinfo/iccrg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20131126/2b575e3a/attachment-0003.html>


More information about the Bloat mailing list