From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mout.gmx.net",
	Issuer "TeleSec ServerPass DE-1" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 1EDEA21F13F
	for <codel@lists.bufferbloat.net>; Tue, 31 Mar 2015 00:13:02 -0700 (PDT)
Received: from u-089-d060.biologie.uni-tuebingen.de ([134.2.89.60]) by
	mail.gmx.com (mrgmx102) with ESMTPSA (Nemesis) id
	0LaaVn-1ZJdTx2mvn-00mHyD; Tue, 31 Mar 2015 09:12:58 +0200
Content-Type: text/plain; charset=windows-1252
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <8753FC87-8C86-4EFE-A4B6-6EDB82AE5D26@gmail.com>
Date: Tue, 31 Mar 2015 09:12:58 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <4219E429-4E6F-4CC5-88CA-E33076A4A895@gmx.de>
References: <7081A75C-899A-4DB7-8D77-935A37B362D8@gmail.com>
	<5509957B.30600@pollere.com>
	<491C7497-BE3E-452B-A797-C7FC1102E7ED@gmail.com>
	<CAA93jw7JVjvj9QYCLYT+UHOLDPSmdYB4uAp8wf+B=TL9pOrxrA@mail.gmail.com>
	<CAA93jw4PZ2ARfbeVhoGJTXodu1qy+q11W-Pug3P5OLd=feTzkg@mail.gmail.com>
	<B6540AAC-023D-433B-92B6-B435AA165874@gmx.de>
	<750FA673-E20D-4D48-9386-097D32CD31FB@gmail.com>
	<51665FD7-629A-4B8C-B258-2E9AF8E3B5D0@gmx.de>
	<D9E4FCCD-3281-4556-8402-90D9A828BE30@gmail.com>
	<DB44BB9F-C05C-4186-AF15-AE20A0D3682E@gmx.de>
	<8753FC87-8C86-4EFE-A4B6-6EDB82AE5D26@gmail.com>
To: Jonathan Morton <chromatix99@gmail.com>
X-Mailer: Apple Mail (2.1878.6)
X-Provags-ID: V03:K0:ijjZ/uiG280C26H3HFB7AIJ5+5arXxh7pSpWyez/1b/lK8Z4IAl
	PwiLNdD8MIMS8NUhTiueNKoeTpgDlQzQndhNvuvnR29UWnoySkq+8qAiJKsZtILtp8eKEoV
	uJ1gs7XNTbvZj87xTDjpQv7XlofJqdI4v+PDF1ddKBJYLsiEz4H3SqvYJohcHhSeNcS6iRH
	yA2iFZq5uMNuQ6rT+6jhA==
X-UI-Out-Filterresults: notjunk:1;
Cc: "codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>
Subject: Re: [Codel] The next slice of cake
X-BeenThere: codel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: CoDel AQM discussions <codel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/codel>
List-Post: <mailto:codel@lists.bufferbloat.net>
List-Help: <mailto:codel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/codel>,
	<mailto:codel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 31 Mar 2015 07:13:31 -0000

Hi Jonathan,

On Mar 31, 2015, at 05:22 , Jonathan Morton <chromatix99@gmail.com> =
wrote:

>=20
>> On 30 Mar, 2015, at 21:23, Sebastian Moeller <moeller0@gmx.de> wrote:
>>=20
>>>> We keep track of how many bytes of system memory are consumed by a =
packet in 'skb->truesize'.
>>>=20
>>> The buffer-occupancy queue limit is in, using the field quoted =
above.  Previously, it would just count packets.  As with previous =
versions of cake (and fq_codel), it checks the limit immediately after =
enqueuing each packet, and drops one packet from the head of the longest =
queue (by byte count) if it=92s violated.
>>=20
>> Does this mean the largest size the queue actually occupies is limit =
plus the last packet=92s true size temporarily? What about the case a =
full MTU packet comes in while the eligible packet at the head of the =
longest queue is say 64 byte?
>=20
> Yes - I could make it capable of dropping multiple packets to cope =
better with the latter case.
>=20
> The former case is not a problem really, since the packet is already =
in memory when we=92re given it to enqueue; to fix it we=92d have to =
anticipate the largest possible arriving packet and pre-emptively leave =
space for it.

	Well, we at least need to drop as much data as we add at the =
end, otherwise a UDP flood can certainly make us exceed all memory =
(queue is filled with small packets, then a flood of full MTU packets =
arrives, boom). So on resource starved machines it would be nice if we =
could set a =93hard" limit on the memory consumption by cake (and yes =
this leaves the buffers leading into cake).

>=20
>> I thought the main argument was, that interval needs to be roughly in =
the right range and target is supposed to be between 5 to 10 percent of =
interval.
>=20
> Yes, but why?

	I think this is explained well in sections 4.3 and 4.4 of =
https://datatracker.ietf.org/doc/draft-ietf-aqm-codel/?include_text=3D1

>=20
> I have seen other arguments, both recently and otherwise, which modify =
target to accommodate the MAC grant latency of some medium, while =
interval is still described as a rough estimate of the RTT.

	According to Kathleen Nichols =
http://pollere.net/Pdfdocs/noteburstymacs.pdf the reasoning of the RFC =
draft should also apply to the MAC issue. I admit that in sqm-scripts I =
went the easy way and sized target to allow ~1.5 Packets and then just =
added the target duration to interval, but according to Kathleen Nichols =
this will basically sacrifice more latency than necessary to reclaim =
some bandwidth. Now I thought that adding the slow link transfer time to =
the interval appealed to me as effectively all RTTs are increased by =
that amount, but empirically we want to fiddle with target. I guess I =
will go and implement a smarter adjustment, that first ramps target from =
5 to 10% of interval (the RFC argues for target in the 5-10% range), and =
only then starts to increase interval, so that adjusted_interval =3D =
adjusted_target * 10. But increasing target above a full MTU transfer =
time is still necessary otherwise people;e on slow links sacrifice =
(too?) much of their already scarce bandwidth. (I believe Dave argued =
that this problem does not really exist n the ns version of the ACM =
paper, but since ns2 is not a full model of the actual linux kernel, the =
do not drop for sing;e packet queues heuristic in codel does not seem to =
do the right thing in real life (I could easily have gotten this wrong, =
though))
	I guess I should go measure and only then come back to this =
discussion.


>=20
> The existence of both explanations gives me epic cognitive-dissonance =
headaches, which I can=92t resolve *unless* I am able to vary target and =
interval independently.  Since the cost to do so appears to be small, I =
put that facility back in.

	I am all for manual overrides, I could also live with a mode =
where if only one is given, the other is calculated according to the =
above ides.

>=20
>>>> We keep track of how many bytes of system memory are consumed by a =
packet in 'skb->truesize'.
>>>=20
>>> The buffer-occupancy queue limit is in, using the field quoted =
above.  Previously, it would just count packets.  As with previous =
versions of cake (and fq_codel), it checks the limit immediately after =
enqueuing each packet, and drops one packet from the head of the longest =
queue (by byte count) if it=92s violated.
>>=20
>> did this include the overload protection fallback to dropping rather
>> than ecn-ing packets?
>=20
> Codel already has such a protection mechanism, but I haven=92t yet =
converted it to byte mode.

	Ah, good this is one more important step to make smallish =
routers survive a udp flood; not that I think a router should work great =
during a flood/attack, but it certainly should not crash...

>=20
>> We explored the sqrt cache idea in the very first implementations of
>> codel which you can revisit in the earliest, exciting days of the
>> existence of the codel mailing list.
>>=20
>> I am not allergic to trying it again. What we had found then was the
>> count variable was increasing without bound (with the original
>> published ns2 version), for which we substituted another modification
>> that works well at low bandwidths but less well at higher ones.
>>=20
>> My sole mathematical contribution to the invsqrt approximation in
>> linux codel was the discovery that newton's method had much larger
>> errors than 2% when run in reverse (eg, when decreasing count by more
>> than 1, as various forms of the algorithm have done while trying to
>> come up with a better estimate for the the resumption portion of the
>> algorithm)
>=20
> A quick check in that spreadsheet suggests that this inaccuracy only =
occurs for low values of count, in basically the same places as running =
it forward is also inaccurate.  So having the pre-calculated lookup =
table for those first values works, and Newton is still used directly =
for larger counts if they occur.
>=20
>> Note that I tried hard in codel4 (I think it was) to reduce the
>> size of the codel stats to a single 32byte cache-line on 32 bit =
arches
>> - down from 48 or so bytes - and succeeded with one word to spare. I
>> am not allergic to keeping lots of statistics but we are also aiming
>> for speed on lame processors here=85
>=20
> I=92ve managed to add these statistics without increasing the size of =
the per-flow data, only the per-class data.  The weakness of the CPUs =
involved is why I=92m aiming for shifts and adds (which an ARM in =
particular can do extremely efficiently, but a MIPS isn=92t hurt much =
either) rather than multiplies in the fast path.  There=92s a stray =
multiply in the hash function which I still want to excise.
>=20
> It might also be possible, later on, to add a kernel config switch to =
drop out the stats computations.  It probably won=92t make very much =
difference, but when every little helps...
>=20
> - Jonathan Morton
>=20


Best Regards
	Sebastian