From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [IPv6:2001:470:dc45:1000::1]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 82F593BA8E for ; Wed, 18 Apr 2018 08:57:31 -0400 (EDT) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1524056249; bh=zOGLM+3WkQiFPaQ5CRY0ej+O9bwsv7NO6GOHlZ/pZ64=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=xdDSlnZYk9Ixap9wjgGe/UldhOYSdc/jdSgtRpz6fAlPxT1ju5tK+YJSwhXZ0WF9A AQ0i8gtsPTX1zmIJMAecANwnKg9c8fozEGsnpYodyG4lhABgqljnhUl0tXOtcv9wxw a6GxDWy4oqnQJ7R5okUeAZglktJx186HA4ssGV4PIYOM1ygWsNZI4yI6POD6EJIFG4 QoTwoAg5/pfl4GCYCRCDtN+/fmmlxtzJoe+rsCqLuZEhTIIf68c1ohPlOGdwxvzXlh iNzO6OpZ3aedRtudHnA91MCBuTKU7uXLrP4TUCo/ZaOwvmXWb3Sc1MiZxFEebQdMvE QFWAqxUiBwGng== To: Kevin Darbyshire-Bryant Cc: Jonathan Morton , "cake\@lists.bufferbloat.net" In-Reply-To: <1B7176CA-41BC-4CF0-838D-871F0C858CF3@darbyshire-bryant.me.uk> References: <87vacq419h.fsf@toke.dk> <874lk9533l.fsf@toke.dk> <87604o3get.fsf@toke.dk> <1B7176CA-41BC-4CF0-838D-871F0C858CF3@darbyshire-bryant.me.uk> Date: Wed, 18 Apr 2018 14:57:29 +0200 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <8736zs3c5i.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] A few puzzling Cake results X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Apr 2018 12:57:31 -0000 Kevin Darbyshire-Bryant writes: >> On 18 Apr 2018, at 12:25, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >>=20 >> Toke H=C3=B8iland-J=C3=B8rgensen writes: >>=20 >>> Jonathan Morton writes: >>>=20 >>>>> On 17 Apr, 2018, at 12:42 pm, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >>>>>=20 >>>>> - The TCP RTT of the 32 flows is *way* higher for Cake. FQ-CoDel >>>>> controls TCP flow latency to around 65 ms, while for Cake it is all >>>>> the way up around the 180ms mark. Is the Codel version in Cake too >>>>> lenient, or what is going on here? >>>>=20 >>>> A recent change was to increase the target dynamically so that at >>>> least 4 MTUs per flow could fit in each queue without AQM activity. >>>> That should improve throughput in high-contention scenarios, but it >>>> does come at the expense of intra-flow latency when it's relevant. >>>=20 >>> Ah, right, that might explain it. In the 128 flow case each flow has >>> less than 100 Kbps available to it, so four MTUs are going to take a >>> while to dequeue... >>=20 >> OK, so I went and looked at the code and found this: >>=20 >> bool over_target =3D sojourn > p->target && >> sojourn > p->mtu_time * bulk_flows * 4; >>=20 >>=20 >> Which means that we scale the allowed sojourn time for each flow by the >> time of four packets *times the number of bulk flows*. >>=20 >> So if there is one active bulk flow, we allow each flow to queue four >> packets. But if there are ten active bulk flows, we allow *each* flow to >> queue *40* packets. >>=20 >> This completely breaks the isolation of different flows, and makes the >> scaling of Cake *worse* than plain CoDel. >>=20 >> So why on earth would we do that? > > The thread that lead to that change: > > https://lists.bufferbloat.net/pipermail/cake/2017-December/003159.html > > Commits: 0d8f30faa3d4bb2bc87a382f18d8e0f3e4e56eac & the change to > 4*bulk flows 49776da5b93f03c8548e26f2d7982d553d1d226c Ah, thanks for digging that up! I must not have been paying attention during that discussion ;) Well, from reading the thread, this is an optimisation for severe overload in ingress mode on very low bandwidths. And the change basically amounts to throwing up our hands and saying "screw it, we don't care about the intra-flow latency improvements of an AQM". Which is, I guess, technically a valid choice in weighing tradeoffs, but I maintain that it is the wrong one. Incidentally, removing the multiplication with the number of bulk flows restores TCP intra-flow latency to be on par with (or even a bit better than) FQ-CoDel, and no longer scaling with the number of active flows. -Toke