From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <chromatix99@gmail.com>
Received: from mail-la0-x230.google.com (mail-la0-x230.google.com
	[IPv6:2a00:1450:4010:c03::230])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 9147D21F28D;
	Tue, 17 Mar 2015 13:08:45 -0700 (PDT)
Received: by labjg1 with SMTP id jg1so18613838lab.2;
	Tue, 17 Mar 2015 13:08:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=from:content-type:content-transfer-encoding:subject:message-id:date
	:to:mime-version;
	bh=ibE/LcoeKISnyRSTlr0mgNdgMX4yn8xsFzw8RmG1Dug=;
	b=IgjViGaDMWlPatK8vqNwbQDSklKGk00tfjz3sVZdcS5KwVSu09Qi2YsHgkQlPlx2V5
	36bflwM62+YkI3/g61kunBGC0ywxeSJBwqmA+iL+Mk2Oe7GwLe8trZ5t49QLY/PwC1Ei
	E/dgacq1vjTFXsPBnRlXXeU7ipYv8LqPTqX0sxy6jm2vz+gBS9inhO4Hb9FBuMSY6MT7
	52Y/mymmpdC/HFmHoZVpNRfvdiapgTpHSJdAFgwFSQEtTHMNZ9Il1Vb8RGbePN20SD82
	nFIT8BXZWmHlJ6XIiTXgsGKq82BVvR/XEUCFGhViovhRiJchCxLmnwHY4fidgEaygmS2
	HODg==
X-Received: by 10.112.98.201 with SMTP id ek9mr61629992lbb.68.1426622922656;
	Tue, 17 Mar 2015 13:08:42 -0700 (PDT)
Received: from [192.168.43.25] (87-93-89-136.bb.dnainternet.fi. [87.93.89.136])
	by mx.google.com with ESMTPSA id lf1sm2971276lab.42.2015.03.17.13.08.40
	(version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
	Tue, 17 Mar 2015 13:08:41 -0700 (PDT)
From: Jonathan Morton <chromatix99@gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Message-Id: <7081A75C-899A-4DB7-8D77-935A37B362D8@gmail.com>
Date: Tue, 17 Mar 2015 22:08:39 +0200
To: codel@lists.bufferbloat.net,
 cerowrt-devel@lists.bufferbloat.net
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\))
X-Mailer: Apple Mail (2.2070.6)
Subject: [Codel] The next slice of cake
X-BeenThere: codel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: CoDel AQM discussions <codel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/codel>
List-Post: <mailto:codel@lists.bufferbloat.net>
List-Help: <mailto:codel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 17 Mar 2015 20:09:14 -0000

After far too long, it looks like I=E2=80=99ll have the opportunity to =
work on sch_cake a bit more.  So here=E2=80=99s a little bit of a =
=E2=80=9Cstate of the union=E2=80=9D speech about what we=E2=80=99ve got =
and what I=E2=80=99m planing to add to it.

So far we=E2=80=99ve got a deficit-mode, non-bursting shaper that works =
pretty well, and an integrated implementation of fq_codel that tunes =
itself (that is, the target delay) to the bandwidth set on the shaper.  =
The configuration is =E2=80=9Cas easy as cake=E2=80=9D; the intention is =
that you can just specify one parameter (the bandwidth to shape at) and =
leave everything else at the defaults; there simply aren=E2=80=99t very =
many visible knobs, because they aren=E2=80=99t needed.

We=E2=80=99ve also got Diffserv classification, and that part hasn=E2=80=99=
t been so successful.  Each class grabs all traffic with some subset of =
the codepoints, and stuffs them into a separate shaper+fq_codel =
instance, and the higher-priority shapers steal bandwidth from the lower =
ones to enforce priority.  High-priority classes can only use a limited =
amount of bandwidth, exactly as specified in generic Diffserv PHBs.

It works, perfectly as designed, but the resulting behaviour isn=E2=80=99t=
 particularly desirable from an end-user perspective.  In particular, =
people run tests using best-effort traffic to see how much bandwidth =
they=E2=80=99re getting, resulting in complaints that cake had to be =
given a bigger number to get the correct throughput - which of course =
also stops it from functioning correctly when background traffic is =
added to the mix.  So that needed a rethink.

Incidentally, the existing Diffserv implementation can be disabled by =
specifying the =E2=80=9Cbesteffort=E2=80=9D keyword.  This lumps all =
traffic into a single class, handled by a single shaper at the =
configured rate.  Cake already works pretty well in that mode; sometimes =
I turn the shaper down to analogue-modem speeds and note, with some =
satisfaction, that everything *still* works.  Except YouTube, but =
that=E2=80=99s only because streaming video really does need more than =
analogue-modem bandwidth.

As for performance, I=E2=80=99m able to make my ancient Pentium-MMX =
shape at over 50 Mbps, summing traffic in both directions between two =
bridged Fast Ethernet cards.  This limitation is probably a combination =
of timer latency and context-switch overhead.  I don=E2=80=99t expect it =
to improve much, unless we find a way to seriously reduce those =
overheads (which are already quite low for a modern desktop OS).  A =
faster machine with better timers gets better performance, of course.

So there are two big things I want to change in the next version:

The easy part (at least in terms of how many unknowns there are) is =
adjusting the flow-queueing part so that it uses set-associative hashing =
instead of straight hashing when selecting a queue.  This should reduce =
the incidence of hash collisions considerably for a given number of flow =
queues, or conversely provide equivalent collision performance with a =
smaller number of queues.

The more interesting part is to rework the Diffserv prioritiser so that =
it behaves more usefully.  I think I=E2=80=99ve hit upon the right idea =
which should make this work in practice - instead of individually =
hard-shaping each class, instead use the shaper logic as a threshold =
function between high and low priority, and instead implement a single =
shaper to handle all traffic.  The priority function can then be handled =
by a weighted DRR system - which is already in place, but doesn=E2=80=99t =
do much - with just that small modification for changing the weights =
based on the shaper state.

So high-priority traffic gets high priority - but only if it limits =
itself to a reasonable bandwidth.  Above that bandwidth, it gets low =
priority, but is still able to use the full shaped bandwidth if nobody =
else contends for it.  And (unlike say HFSC) we need precisely two =
parameters per class to do this, both specified as ratios rather than =
hard bandwidth numbers: a bandwidth share (which determines both the =
shaper setting and the low-priority-mode DRR weighting) and a priority =
factor (which determines the high-priority-mode DRR weighting).  So if =
those knobs end up being exposed to userspace, they=E2=80=99ll be easier =
to understand and thus use correctly.

All of this feeds my main goal with Diffserv, which is to start giving =
applications natural incentives to mark their traffic appropriately.  =
Each class has both an advantage, and a tradeoff which must be accepted =
to realise that advantage.  If you need absolutely minimal latency, you =
can choose a high-priority class, but you=E2=80=99ll have to be frugal =
about bandwidth.  If you need maximum throughput, you=E2=80=99ll have to =
put up with reduced priority compared to latency-sensitive traffic.  And =
if you want to be altruistic, you can choose to mark your stuff as bulk, =
background traffic, and it=E2=80=99ll be treated accordingly.  All of =
this is in accordance with existing RFCs.

A small caveat: cake is not designed for wifi.  It=E2=80=99s designed =
for links that can at least be treated as full-duplex to a close =
approximation.  Shared-medium links *can* behave like that, if they=E2=80=99=
re shaped to a miserly enough degree, but we really need something =
different for wifi - although several of cake=E2=80=99s components and =
ideas could be used in such a qdisc.

Roll on cake3.

 - Jonathan Morton