From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-x22d.google.com (mail-we0-x22d.google.com [IPv6:2a00:1450:400c:c03::22d]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id B806B21F1FD for ; Mon, 25 Nov 2013 17:49:31 -0800 (PST) Received: by mail-we0-f173.google.com with SMTP id t61so4620832wes.18 for ; Mon, 25 Nov 2013 17:49:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=IFnV8dyjCOi8AaqYyFFRBPjJlVL+FXugSIq73UzhOLo=; b=bjcbDo1jbAgFiHLkB9QI420L1NoigUw6veoejj/jP3uDqAzRHtP6h0MmOja5wotNFR GgsZJxRK/ELvpixUkJw8FFkjPiDgZqvSvVhA9dtx2MMExsT3tvb+ir0Inc+AUj25pFWq sfGMwEK5qz6FIL9XBrJ+YtY670K6STCTRXmmsdpI4NtsFQAlx0qCi8xUcG0L7PO//hPo H5k/ycO2BFJfTTuGnOywYjAuHeE8IiZ1Wuc5ClHClVOxexra9R9QhNxEkTiQUg6XKtlz 1En+/L1I9ioPU0TWJfBHDJbruCtBI/LkV8oxVafA9DcHfpu9HUFR9Aa2WZHfz5iFR5N7 wJrw== MIME-Version: 1.0 X-Received: by 10.194.186.167 with SMTP id fl7mr23786wjc.85.1385430569410; Mon, 25 Nov 2013 17:49:29 -0800 (PST) Received: by 10.217.51.5 with HTTP; Mon, 25 Nov 2013 17:49:29 -0800 (PST) Date: Mon, 25 Nov 2013 17:49:29 -0800 Message-ID: From: Dave Taht To: "aqm@ietf.org" , "tsvwg@ietf.org" , bloat , "iccrg@irtf.org" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: [Bloat] Clearing up misunderstandings about Linux versioning X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Nov 2013 01:49:32 -0000 I am in the process of trying to reproduce the recent rite papers on immediate congestion notification and the iccrg ARED work and need to explain something to researchers so that I don't have to work so hard. Linux kernel releases are numbered X.Y.Z-Q. All kernel versions contain bugs. Linux 3.2.0 was, IMHO, the nadir of the network stack in Linux. The "X" is the major version number. It's only changed 3 times. "Y", is the minor version number. These come out roughly quarterly and consist of new development of "features". "Z" is critical patches backported from newer releases. -Q is usually the vendor's kernel build number which often contains more patches. These numbers, clearly identified, in every academic paper on networking, and every presentation, ever published, would make me a happier guy. A pointer to the git tree actually used would make me even happier, with all the patches (like DCTCP in this case) applied, would cause me to dance for joy, and sing hallelujah! Anyway, on the "Z" part of X.Y.Z: Periodically a "long term stable" release is picked and receives updates for as long as someone is funded to do it. *the only things that enter into a long term stable release* are fixes for security bugs, crash bugs, and truly egregious bugs that can be somewhat easily fixed. But the rest of the development goes into X.Y+1. Sane people never run a X.Y.0 release on hardware/data they care about. So... anyway... when I was told that the recent paper on DCTCP had been done against Linux 3.2.18, "which came out in july, 2013!" ... I was partially happy - pretty stable release - but my heart sank as I knew that very, very, very few of the relevant fixes for bufferbloat and the tcp stack had landed in anything prior to Linux 3.6. Those fixes had mostly qualified as "features". Several in fact have been in such continuous development that I'd not want to generalize from fq_codel in 3.5 vs what's in 3.8 now, as one example. So it's my hope that folk will try to follow more closely the X.Y series of kernels rather than the 3.2.Z series of kernels in the future. I'm very happy with what happened in the 3.12 series in particular and look forward to work against it in the near future. The 3.13 work is just beginning, too. In the hope that the showing the mechanics of researching what fixes did land in an old stable release would help on future papers, here's how to look: git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable= .git git clone linux-stable linux-3.2.18 cd linux-3.2.18 git checkout v3.2.18 git checkout -b ritepaper git log net include/net # just the networking bits, not the driver bits for this example BQL, enhanced SFQ, SFQRED, codel, fq_codel, tcp small queues, retirement of some odd tcp logic, and hundreds of other changes were made to the the stack since 3.2.0, and were NOT backported to 3.2.Z. Some very relevant bugfixes did indeed land in 3.2.18 that were not in 3.2! If you have an experiment that used TCP that is against an even older release, perhaps some of these major bugs might explain your results. Here's a sampling: commit 4b9b05fd95c502521eaef111ba0f83c58b391587 Author: Eric Dumazet Date: Wed May 2 02:28:41 2012 +0000 tcp: change tcp_adv_win_scale and tcp_rmem[2] This also means tcp advertises a too optimistic window for a given allocated rcvspace : When receiving frames, sk_rmem_alloc can hit sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often, especially when application is slow to drain its receive queue or in case of losses (netperf is fast, scp is slow). This is a major latency source. icommit b713f6c7d317c136f03c132203d0900f4a0de084 Author: Yuchung Cheng Date: Mon Apr 30 06:00:18 2012 +0000 tcp: fix infinite cwnd in tcp_complete_cwr() [ Upstream commit 1cebce36d660c83bd1353e41f3e66abd4686f215 ] When the cwnd reduction is done, ssthresh may be infinite if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e., undo_marker is set, tcp_complete_cwr() falsely set cwnd to the infinite ssthresh value. The correct operation is to keep cwnd intact because it has been updated in ECN or F-RTO. commit 65355aea86b2a70cbc7cbe14466702bc5a4e2217 Author: Neal Cardwell Date: Tue Apr 10 07:59:20 2012 +0000 tcp: fix tcp_rcv_rtt_update() use of an unscaled RTT sample [ Upstream commit 18a223e0b9ec8979320ba364b47c9772391d6d05 ] Fix a code path in tcp_rcv_rtt_update() that was comparing scaled and unscaled RTT samples. The intent in the code was to only use the 'm' measurement if it was a new minimum. However, since 'm' had not yet been shifted left 3 bits but 'new_sample' had, this comparison would nearly always succeed, leading us to erroneously set our receive-side RTT estimate to the 'm' sample when that sample could be nearly 8x too high to use. The overall effect is to often cause the receive-side RTT estimate to be significantly too large (up to 40% too large for brief periods in my tests). commit 1ee5fa1e9970a16036e37c7b9d5ce81c778252fc [PATCH] sch_red: fix red_change() Now RED is classful, we must check q->qdisc->q.qlen, and if queue is em= pt we start an idle period, not end it. --=20 Dave T=E4ht Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.= html