From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf0-x229.google.com (mail-lf0-x229.google.com [IPv6:2a00:1450:4010:c07::229]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 2D70D21F336 for ; Tue, 3 Nov 2015 08:43:18 -0800 (PST) Received: by lfbn126 with SMTP id n126so24774579lfb.2 for ; Tue, 03 Nov 2015 08:43:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=MHzSMHOIUXYew5EgBvTHirfWG/2N4vzLRw3NpgbFJw8=; b=x/X7DccSu3Gn0rQ6Mxk9IQJDDJOEnqFBqtsiAK/OXs5um2krxMcvf3Anz4x6KwRrrY 1w6Qw2P/kG2ngtheAyRHJC/95o+K9R0onZCXmGvRQsSY19hgj5wJFpcFLvfEt9LIks4f BbosRI7fxiABk5/nmMtIcnp8QgZWZRcMYm8aJWMtoEvY1J6ybvov8wAoBbdToBImjCI4 lD1dh7jSL55qDHSjTtV18AddTLJgL/lLur4e15DSZ9e5vhEhoGMzHlalEzC7CrgdxDVq 9CJcnK0dmhIeuQsZU6/91A2qDGdRA+BH/hTYhZvgERxysFpNqaK/rnK4YfQQp5bo2of0 ZWYw== X-Received: by 10.112.72.67 with SMTP id b3mr5613642lbv.34.1446568996313; Tue, 03 Nov 2015 08:43:16 -0800 (PST) Received: from bass.home.chromatix.fi (83-245-237-115-nat-p.elisa-mobile.fi. [83.245.237.115]) by smtp.gmail.com with ESMTPSA id m75sm4994164lfe.0.2015.11.03.08.43.15 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 03 Nov 2015 08:43:15 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.1 \(3096.5\)) From: Jonathan Morton In-Reply-To: <874mh3pai9.fsf@toke.dk> Date: Tue, 3 Nov 2015 18:43:12 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: <50C2A7B7-1B81-41E1-B534-CA449296FE77@gmail.com> References: <87pozspckj.fsf@toke.dk> <6A2609D9-7747-487B-9484-ECC69C50DE96@gmx.de> <874mh3pai9.fsf@toke.dk> To: =?utf-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= X-Mailer: Apple Mail (2.3096.5) Cc: cake@lists.bufferbloat.net Subject: Re: [Cake] Long-RTT broken again X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 Nov 2015 16:43:41 -0000 > On 3 Nov, 2015, at 13:50, Toke H=C3=B8iland-J=C3=B8rgensen = wrote: >=20 >> The question remains why a 15MB buffer (which comfortably exceeds the >> traditional FIFO rule of thumb for 1 second * 100Mbps) is apparently >> insufficient according to Toke=E2=80=99s tests, even with the target = increased >> as requested. >=20 > Because it's not a 15MB buffer; it's a 10240 packet buffer. So = anything > from ~.5 to ~15MB. In this case, it's a bidirectional test, so about > half the packets will be tiny ACKs. I guess doing byte accounting = would > actually be better here, regardless of what happens to the overall = limit... :) Cake does the queue accounting in bytes, and calculates 15MB (by = default) as the upper limit. It=E2=80=99s *not* meant to be a packet = buffer. However, the bytes counted are those allocated, not the on-wire packet = sizes, because this limit is meant to avoid consuming all of a small = router=E2=80=99s RAM for one queue. This isn=E2=80=99t just about hard = OOM, which the kernel probably has handling for already, but sharing RAM = between different queues on the same device; note the different = behaviour of the upload and download streams in the results given. The only way this could behave like a =E2=80=9Cpacket buffer=E2=80=9D = instead of a byte-accounted queue is if there is a fixed size allocation = per packet, regardless of the size of said packet. There are hints that = this might actually be the case, and that the allocation is a hugely = wasteful (for an ack) 2KB. (This also means that it=E2=80=99s not a = 10240 packet buffer, but about 7500.) But in a bidirectional TCP scenario with ECN, only about a third of the = packets should be acks (ignoring the relatively low number of ICMP and = UDP probes); ECN causes an ack to be sent immediately, but then normal = delayed-ack processing should resume. This makes 6KB allocated per ~3KB = transmitted. The effective buffer size is thus 7.5MB, which is still = compliant with the traditional rule of thumb (BDP / sqrt(flows)), given = that there are four bulk flows each way. This effect is therefore not enough to explain the huge deficit Toke = measured. The calculus also changes by only a small factor if we ignore = delayed acks, making 8KB allocated per 3KB transmitted. So, again - what=E2=80=99s going on? Are there any clues in packet = traces with sequence analysis? I=E2=80=99ll put in a configurable memory limit anyway, but I really do = want to understand why this is happening. - Jonathan Morton