From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 1EDEA21F13F for ; Tue, 31 Mar 2015 00:13:02 -0700 (PDT) Received: from u-089-d060.biologie.uni-tuebingen.de ([134.2.89.60]) by mail.gmx.com (mrgmx102) with ESMTPSA (Nemesis) id 0LaaVn-1ZJdTx2mvn-00mHyD; Tue, 31 Mar 2015 09:12:58 +0200 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Sebastian Moeller In-Reply-To: <8753FC87-8C86-4EFE-A4B6-6EDB82AE5D26@gmail.com> Date: Tue, 31 Mar 2015 09:12:58 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <4219E429-4E6F-4CC5-88CA-E33076A4A895@gmx.de> References: <7081A75C-899A-4DB7-8D77-935A37B362D8@gmail.com> <5509957B.30600@pollere.com> <491C7497-BE3E-452B-A797-C7FC1102E7ED@gmail.com> <750FA673-E20D-4D48-9386-097D32CD31FB@gmail.com> <51665FD7-629A-4B8C-B258-2E9AF8E3B5D0@gmx.de> <8753FC87-8C86-4EFE-A4B6-6EDB82AE5D26@gmail.com> To: Jonathan Morton X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:ijjZ/uiG280C26H3HFB7AIJ5+5arXxh7pSpWyez/1b/lK8Z4IAl PwiLNdD8MIMS8NUhTiueNKoeTpgDlQzQndhNvuvnR29UWnoySkq+8qAiJKsZtILtp8eKEoV uJ1gs7XNTbvZj87xTDjpQv7XlofJqdI4v+PDF1ddKBJYLsiEz4H3SqvYJohcHhSeNcS6iRH yA2iFZq5uMNuQ6rT+6jhA== X-UI-Out-Filterresults: notjunk:1; Cc: "codel@lists.bufferbloat.net" Subject: Re: [Codel] The next slice of cake X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Mar 2015 07:13:31 -0000 Hi Jonathan, On Mar 31, 2015, at 05:22 , Jonathan Morton = wrote: >=20 >> On 30 Mar, 2015, at 21:23, Sebastian Moeller wrote: >>=20 >>>> We keep track of how many bytes of system memory are consumed by a = packet in 'skb->truesize'. >>>=20 >>> The buffer-occupancy queue limit is in, using the field quoted = above. Previously, it would just count packets. As with previous = versions of cake (and fq_codel), it checks the limit immediately after = enqueuing each packet, and drops one packet from the head of the longest = queue (by byte count) if it=92s violated. >>=20 >> Does this mean the largest size the queue actually occupies is limit = plus the last packet=92s true size temporarily? What about the case a = full MTU packet comes in while the eligible packet at the head of the = longest queue is say 64 byte? >=20 > Yes - I could make it capable of dropping multiple packets to cope = better with the latter case. >=20 > The former case is not a problem really, since the packet is already = in memory when we=92re given it to enqueue; to fix it we=92d have to = anticipate the largest possible arriving packet and pre-emptively leave = space for it. Well, we at least need to drop as much data as we add at the = end, otherwise a UDP flood can certainly make us exceed all memory = (queue is filled with small packets, then a flood of full MTU packets = arrives, boom). So on resource starved machines it would be nice if we = could set a =93hard" limit on the memory consumption by cake (and yes = this leaves the buffers leading into cake). >=20 >> I thought the main argument was, that interval needs to be roughly in = the right range and target is supposed to be between 5 to 10 percent of = interval. >=20 > Yes, but why? I think this is explained well in sections 4.3 and 4.4 of = https://datatracker.ietf.org/doc/draft-ietf-aqm-codel/?include_text=3D1 >=20 > I have seen other arguments, both recently and otherwise, which modify = target to accommodate the MAC grant latency of some medium, while = interval is still described as a rough estimate of the RTT. According to Kathleen Nichols = http://pollere.net/Pdfdocs/noteburstymacs.pdf the reasoning of the RFC = draft should also apply to the MAC issue. I admit that in sqm-scripts I = went the easy way and sized target to allow ~1.5 Packets and then just = added the target duration to interval, but according to Kathleen Nichols = this will basically sacrifice more latency than necessary to reclaim = some bandwidth. Now I thought that adding the slow link transfer time to = the interval appealed to me as effectively all RTTs are increased by = that amount, but empirically we want to fiddle with target. I guess I = will go and implement a smarter adjustment, that first ramps target from = 5 to 10% of interval (the RFC argues for target in the 5-10% range), and = only then starts to increase interval, so that adjusted_interval =3D = adjusted_target * 10. But increasing target above a full MTU transfer = time is still necessary otherwise people;e on slow links sacrifice = (too?) much of their already scarce bandwidth. (I believe Dave argued = that this problem does not really exist n the ns version of the ACM = paper, but since ns2 is not a full model of the actual linux kernel, the = do not drop for sing;e packet queues heuristic in codel does not seem to = do the right thing in real life (I could easily have gotten this wrong, = though)) I guess I should go measure and only then come back to this = discussion. >=20 > The existence of both explanations gives me epic cognitive-dissonance = headaches, which I can=92t resolve *unless* I am able to vary target and = interval independently. Since the cost to do so appears to be small, I = put that facility back in. I am all for manual overrides, I could also live with a mode = where if only one is given, the other is calculated according to the = above ides. >=20 >>>> We keep track of how many bytes of system memory are consumed by a = packet in 'skb->truesize'. >>>=20 >>> The buffer-occupancy queue limit is in, using the field quoted = above. Previously, it would just count packets. As with previous = versions of cake (and fq_codel), it checks the limit immediately after = enqueuing each packet, and drops one packet from the head of the longest = queue (by byte count) if it=92s violated. >>=20 >> did this include the overload protection fallback to dropping rather >> than ecn-ing packets? >=20 > Codel already has such a protection mechanism, but I haven=92t yet = converted it to byte mode. Ah, good this is one more important step to make smallish = routers survive a udp flood; not that I think a router should work great = during a flood/attack, but it certainly should not crash... >=20 >> We explored the sqrt cache idea in the very first implementations of >> codel which you can revisit in the earliest, exciting days of the >> existence of the codel mailing list. >>=20 >> I am not allergic to trying it again. What we had found then was the >> count variable was increasing without bound (with the original >> published ns2 version), for which we substituted another modification >> that works well at low bandwidths but less well at higher ones. >>=20 >> My sole mathematical contribution to the invsqrt approximation in >> linux codel was the discovery that newton's method had much larger >> errors than 2% when run in reverse (eg, when decreasing count by more >> than 1, as various forms of the algorithm have done while trying to >> come up with a better estimate for the the resumption portion of the >> algorithm) >=20 > A quick check in that spreadsheet suggests that this inaccuracy only = occurs for low values of count, in basically the same places as running = it forward is also inaccurate. So having the pre-calculated lookup = table for those first values works, and Newton is still used directly = for larger counts if they occur. >=20 >> Note that I tried hard in codel4 (I think it was) to reduce the >> size of the codel stats to a single 32byte cache-line on 32 bit = arches >> - down from 48 or so bytes - and succeeded with one word to spare. I >> am not allergic to keeping lots of statistics but we are also aiming >> for speed on lame processors here=85 >=20 > I=92ve managed to add these statistics without increasing the size of = the per-flow data, only the per-class data. The weakness of the CPUs = involved is why I=92m aiming for shifts and adds (which an ARM in = particular can do extremely efficiently, but a MIPS isn=92t hurt much = either) rather than multiplies in the fast path. There=92s a stray = multiply in the hash function which I still want to excise. >=20 > It might also be possible, later on, to add a kernel config switch to = drop out the stats computations. It probably won=92t make very much = difference, but when every little helps... >=20 > - Jonathan Morton >=20 Best Regards Sebastian