From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-la0-x236.google.com (mail-la0-x236.google.com [IPv6:2a00:1450:4010:c03::236]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 7466021F224 for ; Mon, 30 Mar 2015 10:28:18 -0700 (PDT) Received: by lagg8 with SMTP id g8so109514839lag.1 for ; Mon, 30 Mar 2015 10:28:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=8UAxDQ5IQXBypRYWh0YQ6dfPG7RbG3uggICa43hieks=; b=jDCfvN1Zk45TH0jzPkkLZE1ArmXzSp/5jjTTEkLNG5huDRn9JTZ1OiHa5w7mN7UJhS aawFnNq5BTTZgIDMqUndLhcDYovTVQpWrjn6u2Et6WqXpP5RcaICIW7h0xkdDxkaQ5kz LYNk8uYEgIYB+1QD/YvqswEsZU4t6qxrWgaNObP8rgpyd0O1Rcr48RmtgcdR69keswla MqGkFgWO3LrW9oiFHH0f00FBPD3KJgFnHClUOCa6iwoYEexzUZyTS/76Z83mJzAs3Ayw YiJeMui5qzsdbZ5u2DRL3RRQstuzMrgrXw9/yNafMNoRvY2Ph829scoo4oUEHrpPCu0d ugmg== X-Received: by 10.152.45.37 with SMTP id j5mr27883636lam.31.1427736495661; Mon, 30 Mar 2015 10:28:15 -0700 (PDT) Received: from bass.home.chromatix.fi (87-95-163-98.bb.dnainternet.fi. [87.95.163.98]) by mx.google.com with ESMTPSA id dz1sm2140312lbc.47.2015.03.30.10.28.12 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 30 Mar 2015 10:28:13 -0700 (PDT) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2070.6\)) From: Jonathan Morton In-Reply-To: <51665FD7-629A-4B8C-B258-2E9AF8E3B5D0@gmx.de> Date: Mon, 30 Mar 2015 20:28:11 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: References: <7081A75C-899A-4DB7-8D77-935A37B362D8@gmail.com> <5509957B.30600@pollere.com> <491C7497-BE3E-452B-A797-C7FC1102E7ED@gmail.com> <750FA673-E20D-4D48-9386-097D32CD31FB@gmail.com> <51665FD7-629A-4B8C-B258-2E9AF8E3B5D0@gmx.de> To: Sebastian Moeller X-Mailer: Apple Mail (2.2070.6) Cc: "codel@lists.bufferbloat.net" Subject: Re: [Codel] The next slice of cake X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Mar 2015 17:28:47 -0000 I made a lot of progress over the weekend, and got cake3 running on a = sacrificial box today. No crashes yet! The set-associative hashing is in, currently hard-coded to 8-way, which = should be more than adequate for typical home workloads (especially if = BitTorrent is marked as background). I=92m now adding statistics = outputs so that we can all see how well it works under real loads, = distinguishing between direct hits (the fast path, where the hash points = directly to a correctly allocated queue), indirect hits (where a search = through the set locates an allocated queue), misses (where an empty = queue was allocated) and collisions (where all the queues were already = occupied by other flows). Note that =93allocation=94 in this case = simply means updating the tag on the queue to match the packet, not = allocating memory. These statistics should help with tuning the number = of flow queues in each class. The new Diffserv logic is also in and apparently working well. I=92m = able to observe both the priority-balanced and bandwidth-balanced = behaviours under appropriate circumstances. It is, however, fairly = sensitive to the baseline quantum value from which the outer DRR weights = are derived. Setting it to the MTU lets the high-priority classes burst = too much, temporarily starving the lower classes. It seems to work a = lot better using 256 as a baseline, but I hope the overhead of going = around the DRR loop several times per packet isn=92t too high. It might = be sane to vary the baseline quantum based on the bandwidth setting, but = we can experiment with that later. > We keep track of how many bytes of system memory are consumed by a = packet in 'skb->truesize'. The buffer-occupancy queue limit is in, using the field quoted above. = Previously, it would just count packets. As with previous versions of = cake (and fq_codel), it checks the limit immediately after enqueuing = each packet, and drops one packet from the head of the longest queue (by = byte count) if it=92s violated. Calculating the appropriate size to use here was an interesting problem. = I start by estimating the BDP (from the shaper bandwidth and the Codel = interval), quadrupling that and enforcing a lower limit of 64KB. If = it=92s set to =93unlimited bandwidth=94, I set the size to an arbitrary = 1MB. Then I look at the packet limit, multiply that by the MTU and = clamp the resulting size down to that - providing a straightforward = mechanism to set a limit on memory usage. This all comes with a *fifth* almost-copy of codel.h, which incorporates = the more efficient dequeue callback from codel4.h, but puts back the = separate treatment of interval and target. Cake auto-tunes both of them = to related values, but the relationship isn=92t as simple as scaling one = from the other, and I=92m not comfortable with the mental contortions = involved in the new reasoning. I also added a lookup table for the first few inverse square roots, = because I found (using a spreadsheet) that these are very inaccurate if = you only do one Newton iteration on them - something like 1.0, 0.500, = 0.563, 0.488 instead of 1.0, 0.707, 0.577, 0.500. Nebulous problems = with =93insufficiently tight control=94 might possibly be traced to = this. With the first fifteen entries calculated up-front using four = iterations and stored, accuracy is vastly improved and, in practice, = almost all square-root calculations will now reduce to a simple table = lookup. The memory cost is trivial, and Newton is good enough for = exceptionally large drop counts. Along with the hashing statistics, I=92m also adding three measures of = latency to replace the one from the first version. This no longer = relies on data provided by Codel (so there=92s no space penalty per flow = for collecting it, and the inner loop in the stats dump is also = eliminated), but performs the necessary calculation just after dequeuing = each packet, storing the aggregated values per class instead of per = flow. Each measure is an EWMA (using optimised weights which reduce to = shifts and adds), but two of them are biased - one towards high = latencies (so it=92ll mostly track the fattest flows), and one towards = low latencies (so it=92ll mostly track sparse flows). Still lots of little things to do=85 - Jonathan Morton=