From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id E1E0321F210 for ; Mon, 30 Mar 2015 11:23:41 -0700 (PDT) Received: from hms-beagle.home.lan ([87.164.165.7]) by mail.gmx.com (mrgmx101) with ESMTPSA (Nemesis) id 0MXllr-1Z06mM0QGl-00Wmmq; Mon, 30 Mar 2015 20:23:37 +0200 Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Sebastian Moeller In-Reply-To: Date: Mon, 30 Mar 2015 20:23:35 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <7081A75C-899A-4DB7-8D77-935A37B362D8@gmail.com> <5509957B.30600@pollere.com> <491C7497-BE3E-452B-A797-C7FC1102E7ED@gmail.com> <750FA673-E20D-4D48-9386-097D32CD31FB@gmail.com> <51665FD7-629A-4B8C-B258-2E9AF8E3B5D0@gmx.de> To: Jonathan Morton X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:2UkCUNx9GlZMAIuuBmDbZh0qtyp1H2pob+QxEnTiyozIqtKkm0i CP7hcq3/mAW3+F9D9fsIHCYhZdMTxQu4ugz3uJlSrdf36e4fZOr0bZgSaugodxw7wIn6xtI 0ECDR9n/4J67HVlx8eg/btRyEY4vFuywptjalwemZzH0RWCk7UzZV9FdwIoYt3NvZetl9nb edz52t3rU/bc/Koz5b7gg== X-UI-Out-Filterresults: notjunk:1; Cc: "codel@lists.bufferbloat.net" Subject: Re: [Codel] The next slice of cake X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Mar 2015 18:24:10 -0000 On Mar 30, 2015, at 19:28 , Jonathan Morton = wrote: > I made a lot of progress over the weekend, and got cake3 running on a = sacrificial box today. No crashes yet! >=20 > The set-associative hashing is in, currently hard-coded to 8-way, = which should be more than adequate for typical home workloads = (especially if BitTorrent is marked as background). I=92m now adding = statistics outputs so that we can all see how well it works under real = loads, distinguishing between direct hits (the fast path, where the hash = points directly to a correctly allocated queue), indirect hits (where a = search through the set locates an allocated queue), misses (where an = empty queue was allocated) and collisions (where all the queues were = already occupied by other flows). Note that =93allocation=94 in this = case simply means updating the tag on the queue to match the packet, not = allocating memory. These statistics should help with tuning the number = of flow queues in each class. >=20 > The new Diffserv logic is also in and apparently working well. I=92m = able to observe both the priority-balanced and bandwidth-balanced = behaviours under appropriate circumstances. It is, however, fairly = sensitive to the baseline quantum value from which the outer DRR weights = are derived. Setting it to the MTU lets the high-priority classes burst = too much, temporarily starving the lower classes. It seems to work a = lot better using 256 as a baseline, but I hope the overhead of going = around the DRR loop several times per packet isn=92t too high. It might = be sane to vary the baseline quantum based on the bandwidth setting, but = we can experiment with that later. >=20 >> We keep track of how many bytes of system memory are consumed by a = packet in 'skb->truesize'. >=20 > The buffer-occupancy queue limit is in, using the field quoted above. = Previously, it would just count packets. As with previous versions of = cake (and fq_codel), it checks the limit immediately after enqueuing = each packet, and drops one packet from the head of the longest queue (by = byte count) if it=92s violated. Does this mean the largest size the queue actually occupies is = limit plus the last packet=92s true size temporarily? What about the = case a full MTU packet comes in while the eligible packet at the head of = the longest queue is say 64 byte? >=20 > Calculating the appropriate size to use here was an interesting = problem. I start by estimating the BDP (from the shaper bandwidth and = the Codel interval), quadrupling that and enforcing a lower limit of = 64KB. If it=92s set to =93unlimited bandwidth=94, I set the size to an = arbitrary 1MB. Then I look at the packet limit, multiply that by the = MTU and clamp the resulting size down to that - providing a = straightforward mechanism to set a limit on memory usage. >=20 > This all comes with a *fifth* almost-copy of codel.h, which = incorporates the more efficient dequeue callback from codel4.h, but puts = back the separate treatment of interval and target. Cake auto-tunes = both of them to related values, but the relationship isn=92t as simple = as scaling one from the other, and I=92m not comfortable with the mental = contortions involved in the new reasoning. I thought the main argument was, that interval needs to be = roughly in the right range and target is supposed to be between 5 to 10 = percent of interval. >=20 > I also added a lookup table for the first few inverse square roots, = because I found (using a spreadsheet) that these are very inaccurate if = you only do one Newton iteration on them - something like 1.0, 0.500, = 0.563, 0.488 instead of 1.0, 0.707, 0.577, 0.500. Nebulous problems = with =93insufficiently tight control=94 might possibly be traced to = this. With the first fifteen entries calculated up-front using four = iterations and stored, accuracy is vastly improved and, in practice, = almost all square-root calculations will now reduce to a simple table = lookup. The memory cost is trivial, and Newton is good enough for = exceptionally large drop counts. >=20 > Along with the hashing statistics, I=92m also adding three measures of = latency to replace the one from the first version. This no longer = relies on data provided by Codel (so there=92s no space penalty per flow = for collecting it, and the inner loop in the stats dump is also = eliminated), but performs the necessary calculation just after dequeuing = each packet, storing the aggregated values per class instead of per = flow. Each measure is an EWMA (using optimised weights which reduce to = shifts and adds), but two of them are biased - one towards high = latencies (so it=92ll mostly track the fattest flows), and one towards = low latencies (so it=92ll mostly track sparse flows). >=20 > Still lots of little things to do=85 >=20 > - Jonathan Morton Wahoo, this sounds great. I cant=92t wait to do a =93test-drive=94 with = this ;), (once you are looking for testers let me know) Best Regards Sebastian