From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.17.22])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mout.gmx.net",
	Issuer "TeleSec ServerPass DE-1" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id E1E0321F210
	for <codel@lists.bufferbloat.net>; Mon, 30 Mar 2015 11:23:41 -0700 (PDT)
Received: from hms-beagle.home.lan ([87.164.165.7]) by mail.gmx.com (mrgmx101)
	with ESMTPSA (Nemesis) id 0MXllr-1Z06mM0QGl-00Wmmq;
	Mon, 30 Mar 2015 20:23:37 +0200
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <D9E4FCCD-3281-4556-8402-90D9A828BE30@gmail.com>
Date: Mon, 30 Mar 2015 20:23:35 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <DB44BB9F-C05C-4186-AF15-AE20A0D3682E@gmx.de>
References: <7081A75C-899A-4DB7-8D77-935A37B362D8@gmail.com>
	<5509957B.30600@pollere.com>
	<491C7497-BE3E-452B-A797-C7FC1102E7ED@gmail.com>
	<CAA93jw7JVjvj9QYCLYT+UHOLDPSmdYB4uAp8wf+B=TL9pOrxrA@mail.gmail.com>
	<CAA93jw4PZ2ARfbeVhoGJTXodu1qy+q11W-Pug3P5OLd=feTzkg@mail.gmail.com>
	<B6540AAC-023D-433B-92B6-B435AA165874@gmx.de>
	<750FA673-E20D-4D48-9386-097D32CD31FB@gmail.com>
	<51665FD7-629A-4B8C-B258-2E9AF8E3B5D0@gmx.de>
	<D9E4FCCD-3281-4556-8402-90D9A828BE30@gmail.com>
To: Jonathan Morton <chromatix99@gmail.com>
X-Mailer: Apple Mail (2.1878.6)
X-Provags-ID: V03:K0:2UkCUNx9GlZMAIuuBmDbZh0qtyp1H2pob+QxEnTiyozIqtKkm0i
	CP7hcq3/mAW3+F9D9fsIHCYhZdMTxQu4ugz3uJlSrdf36e4fZOr0bZgSaugodxw7wIn6xtI
	0ECDR9n/4J67HVlx8eg/btRyEY4vFuywptjalwemZzH0RWCk7UzZV9FdwIoYt3NvZetl9nb
	edz52t3rU/bc/Koz5b7gg==
X-UI-Out-Filterresults: notjunk:1;
Cc: "codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>
Subject: Re: [Codel] The next slice of cake
X-BeenThere: codel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: CoDel AQM discussions <codel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/codel>
List-Post: <mailto:codel@lists.bufferbloat.net>
List-Help: <mailto:codel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 30 Mar 2015 18:24:10 -0000


On Mar 30, 2015, at 19:28 , Jonathan Morton <chromatix99@gmail.com> =
wrote:

> I made a lot of progress over the weekend, and got cake3 running on a =
sacrificial box today.  No crashes yet!
>=20
> The set-associative hashing is in, currently hard-coded to 8-way, =
which should be more than adequate for typical home workloads =
(especially if BitTorrent is marked as background).  I=92m now adding =
statistics outputs so that we can all see how well it works under real =
loads, distinguishing between direct hits (the fast path, where the hash =
points directly to a correctly allocated queue), indirect hits (where a =
search through the set locates an allocated queue), misses (where an =
empty queue was allocated) and collisions (where all the queues were =
already occupied by other flows).  Note that =93allocation=94 in this =
case simply means updating the tag on the queue to match the packet, not =
allocating memory.  These statistics should help with tuning the number =
of flow queues in each class.
>=20
> The new Diffserv logic is also in and apparently working well.  I=92m =
able to observe both the priority-balanced and bandwidth-balanced =
behaviours under appropriate circumstances.  It is, however, fairly =
sensitive to the baseline quantum value from which the outer DRR weights =
are derived.  Setting it to the MTU lets the high-priority classes burst =
too much, temporarily starving the lower classes.  It seems to work a =
lot better using 256 as a baseline, but I hope the overhead of going =
around the DRR loop several times per packet isn=92t too high.  It might =
be sane to vary the baseline quantum based on the bandwidth setting, but =
we can experiment with that later.
>=20
>> We keep track of how many bytes of system memory are consumed by a =
packet in 'skb->truesize'.
>=20
> The buffer-occupancy queue limit is in, using the field quoted above.  =
Previously, it would just count packets.  As with previous versions of =
cake (and fq_codel), it checks the limit immediately after enqueuing =
each packet, and drops one packet from the head of the longest queue (by =
byte count) if it=92s violated.

	Does this mean the largest size the queue actually occupies is =
limit plus the last packet=92s true size temporarily? What about the =
case a full MTU packet comes in while the eligible packet at the head of =
the longest queue is say 64 byte?

>=20
> Calculating the appropriate size to use here was an interesting =
problem.  I start by estimating the BDP (from the shaper bandwidth and =
the Codel interval), quadrupling that and enforcing a lower limit of =
64KB.  If it=92s set to =93unlimited bandwidth=94, I set the size to an =
arbitrary 1MB.  Then I look at the packet limit, multiply that by the =
MTU and clamp the resulting size down to that - providing a =
straightforward mechanism to set a limit on memory usage.
>=20
> This all comes with a *fifth* almost-copy of codel.h, which =
incorporates the more efficient dequeue callback from codel4.h, but puts =
back the separate treatment of interval and target.  Cake auto-tunes =
both of them to related values, but the relationship isn=92t as simple =
as scaling one from the other, and I=92m not comfortable with the mental =
contortions involved in the new reasoning.

	I thought the main argument was, that interval needs to be =
roughly in the right range and target is supposed to be between 5 to 10 =
percent of interval.

>=20
> I also added a lookup table for the first few inverse square roots, =
because I found (using a spreadsheet) that these are very inaccurate if =
you only do one Newton iteration on them - something like 1.0, 0.500, =
0.563, 0.488 instead of 1.0, 0.707, 0.577, 0.500.  Nebulous problems =
with =93insufficiently tight control=94 might possibly be traced to =
this.  With the first fifteen entries calculated up-front using four =
iterations and stored, accuracy is vastly improved and, in practice, =
almost all square-root calculations will now reduce to a simple table =
lookup.  The memory cost is trivial, and Newton is good enough for =
exceptionally large drop counts.
>=20
> Along with the hashing statistics, I=92m also adding three measures of =
latency to replace the one from the first version.  This no longer =
relies on data provided by Codel (so there=92s no space penalty per flow =
for collecting it, and the inner loop in the stats dump is also =
eliminated), but performs the necessary calculation just after dequeuing =
each packet, storing the aggregated values per class instead of per =
flow.  Each measure is an EWMA (using optimised weights which reduce to =
shifts and adds), but two of them are biased - one towards high =
latencies (so it=92ll mostly track the fattest flows), and one towards =
low latencies (so it=92ll mostly track sparse flows).
>=20
> Still lots of little things to do=85
>=20
> - Jonathan Morton

Wahoo, this sounds great. I cant=92t wait to do a =93test-drive=94 with =
this ;), (once you are looking for testers let me know)

Best Regards
	Sebastian