From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-x230.google.com (mail-la0-x230.google.com [IPv6:2a00:1450:4010:c03::230]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 9147D21F28D; Tue, 17 Mar 2015 13:08:45 -0700 (PDT) Received: by labjg1 with SMTP id jg1so18613838lab.2; Tue, 17 Mar 2015 13:08:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:content-type:content-transfer-encoding:subject:message-id:date :to:mime-version; bh=ibE/LcoeKISnyRSTlr0mgNdgMX4yn8xsFzw8RmG1Dug=; b=IgjViGaDMWlPatK8vqNwbQDSklKGk00tfjz3sVZdcS5KwVSu09Qi2YsHgkQlPlx2V5 36bflwM62+YkI3/g61kunBGC0ywxeSJBwqmA+iL+Mk2Oe7GwLe8trZ5t49QLY/PwC1Ei E/dgacq1vjTFXsPBnRlXXeU7ipYv8LqPTqX0sxy6jm2vz+gBS9inhO4Hb9FBuMSY6MT7 52Y/mymmpdC/HFmHoZVpNRfvdiapgTpHSJdAFgwFSQEtTHMNZ9Il1Vb8RGbePN20SD82 nFIT8BXZWmHlJ6XIiTXgsGKq82BVvR/XEUCFGhViovhRiJchCxLmnwHY4fidgEaygmS2 HODg== X-Received: by 10.112.98.201 with SMTP id ek9mr61629992lbb.68.1426622922656; Tue, 17 Mar 2015 13:08:42 -0700 (PDT) Received: from [192.168.43.25] (87-93-89-136.bb.dnainternet.fi. [87.93.89.136]) by mx.google.com with ESMTPSA id lf1sm2971276lab.42.2015.03.17.13.08.40 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 17 Mar 2015 13:08:41 -0700 (PDT) From: Jonathan Morton Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Message-Id: <7081A75C-899A-4DB7-8D77-935A37B362D8@gmail.com> Date: Tue, 17 Mar 2015 22:08:39 +0200 To: codel@lists.bufferbloat.net, cerowrt-devel@lists.bufferbloat.net Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) X-Mailer: Apple Mail (2.2070.6) Subject: [Codel] The next slice of cake X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Mar 2015 20:09:14 -0000 After far too long, it looks like I=E2=80=99ll have the opportunity to = work on sch_cake a bit more. So here=E2=80=99s a little bit of a = =E2=80=9Cstate of the union=E2=80=9D speech about what we=E2=80=99ve got = and what I=E2=80=99m planing to add to it. So far we=E2=80=99ve got a deficit-mode, non-bursting shaper that works = pretty well, and an integrated implementation of fq_codel that tunes = itself (that is, the target delay) to the bandwidth set on the shaper. = The configuration is =E2=80=9Cas easy as cake=E2=80=9D; the intention is = that you can just specify one parameter (the bandwidth to shape at) and = leave everything else at the defaults; there simply aren=E2=80=99t very = many visible knobs, because they aren=E2=80=99t needed. We=E2=80=99ve also got Diffserv classification, and that part hasn=E2=80=99= t been so successful. Each class grabs all traffic with some subset of = the codepoints, and stuffs them into a separate shaper+fq_codel = instance, and the higher-priority shapers steal bandwidth from the lower = ones to enforce priority. High-priority classes can only use a limited = amount of bandwidth, exactly as specified in generic Diffserv PHBs. It works, perfectly as designed, but the resulting behaviour isn=E2=80=99t= particularly desirable from an end-user perspective. In particular, = people run tests using best-effort traffic to see how much bandwidth = they=E2=80=99re getting, resulting in complaints that cake had to be = given a bigger number to get the correct throughput - which of course = also stops it from functioning correctly when background traffic is = added to the mix. So that needed a rethink. Incidentally, the existing Diffserv implementation can be disabled by = specifying the =E2=80=9Cbesteffort=E2=80=9D keyword. This lumps all = traffic into a single class, handled by a single shaper at the = configured rate. Cake already works pretty well in that mode; sometimes = I turn the shaper down to analogue-modem speeds and note, with some = satisfaction, that everything *still* works. Except YouTube, but = that=E2=80=99s only because streaming video really does need more than = analogue-modem bandwidth. As for performance, I=E2=80=99m able to make my ancient Pentium-MMX = shape at over 50 Mbps, summing traffic in both directions between two = bridged Fast Ethernet cards. This limitation is probably a combination = of timer latency and context-switch overhead. I don=E2=80=99t expect it = to improve much, unless we find a way to seriously reduce those = overheads (which are already quite low for a modern desktop OS). A = faster machine with better timers gets better performance, of course. So there are two big things I want to change in the next version: The easy part (at least in terms of how many unknowns there are) is = adjusting the flow-queueing part so that it uses set-associative hashing = instead of straight hashing when selecting a queue. This should reduce = the incidence of hash collisions considerably for a given number of flow = queues, or conversely provide equivalent collision performance with a = smaller number of queues. The more interesting part is to rework the Diffserv prioritiser so that = it behaves more usefully. I think I=E2=80=99ve hit upon the right idea = which should make this work in practice - instead of individually = hard-shaping each class, instead use the shaper logic as a threshold = function between high and low priority, and instead implement a single = shaper to handle all traffic. The priority function can then be handled = by a weighted DRR system - which is already in place, but doesn=E2=80=99t = do much - with just that small modification for changing the weights = based on the shaper state. So high-priority traffic gets high priority - but only if it limits = itself to a reasonable bandwidth. Above that bandwidth, it gets low = priority, but is still able to use the full shaped bandwidth if nobody = else contends for it. And (unlike say HFSC) we need precisely two = parameters per class to do this, both specified as ratios rather than = hard bandwidth numbers: a bandwidth share (which determines both the = shaper setting and the low-priority-mode DRR weighting) and a priority = factor (which determines the high-priority-mode DRR weighting). So if = those knobs end up being exposed to userspace, they=E2=80=99ll be easier = to understand and thus use correctly. All of this feeds my main goal with Diffserv, which is to start giving = applications natural incentives to mark their traffic appropriately. = Each class has both an advantage, and a tradeoff which must be accepted = to realise that advantage. If you need absolutely minimal latency, you = can choose a high-priority class, but you=E2=80=99ll have to be frugal = about bandwidth. If you need maximum throughput, you=E2=80=99ll have to = put up with reduced priority compared to latency-sensitive traffic. And = if you want to be altruistic, you can choose to mark your stuff as bulk, = background traffic, and it=E2=80=99ll be treated accordingly. All of = this is in accordance with existing RFCs. A small caveat: cake is not designed for wifi. It=E2=80=99s designed = for links that can at least be treated as full-duplex to a close = approximation. Shared-medium links *can* behave like that, if they=E2=80=99= re shaped to a miserly enough degree, but we really need something = different for wifi - although several of cake=E2=80=99s components and = ideas could be used in such a qdisc. Roll on cake3. - Jonathan Morton