From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com [IPv6:2a00:1450:4010:c04::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 466683B260; Fri, 20 May 2016 08:18:15 -0400 (EDT) Received: by mail-lb0-x235.google.com with SMTP id ww9so34743906lbc.2; Fri, 20 May 2016 05:18:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=6+TGNs2Hj768Y+MzQLe6KVNy/wiHsd8bZEbBGA0vp70=; b=M/B8LQ4Ek6SL+ubtnrr3nQftBrT+pHHLdtkisT8Zq5ZG5bkNYyCJGloEYTciNVGBm7 z9BsW2qyzT3XiuznmiJap8w/JWOirLdqY6XZPmbvnGHhIALbC4zvaocgW1rdZu1ibr2k IqmUkZ9AAw/hzVCGJunCnaW075Bj081aIH3vo82LVgvgoLARESVfhKStRbL+LjVsKjlj lMZS34IYu4l7zfd60eU8cbQNBuTuzly03qKva5cUjYO7DJvlXoBbMurf9YN49agU7Xy6 Ts9KBffoZI8uoW02Tj3oR0KSn4vOwWgV5wAxfstTfwpYYdhP5Bp66rQcWta8g8dkF79C UaOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=6+TGNs2Hj768Y+MzQLe6KVNy/wiHsd8bZEbBGA0vp70=; b=SG0u3CJErV3MmKnkLmfQtE2Qsoq2fiaY2tDUdP0jZoEPtJUaSRkjHgR/HF3Fs+jMgi Y4FPj3TlLHG7aRdapZlTdIYhG463OP5YrJhsyjxRgVxZDb9/O7tQnzSy+JBkGY5m3p4p KCyj8He1FpgJ6r4i5d0YK1pGlga7WKor2EkFfw58lop/XqgZHQAiIp8g4p3IwSWul/4J sl1AE5gtW9NYAzPhK46hpyfZEAEA82CeEDId94kflll6iG9fnzOQX4bhqBywufzbFD46 614ABK57bWnuVcL8FgqtnGBx4YxShc7XsG+q75VVxp2PlYvuZgSref/mbptHsMWd1zxE e1rw== X-Gm-Message-State: AOPr4FWuzx6xTIjtLz+267DjZrwCuwyUp25rrgl+3nylrp+wX87jeIrMBDmohgTg1Sd5AQ== X-Received: by 10.112.50.107 with SMTP id b11mr1033928lbo.15.1463746694039; Fri, 20 May 2016 05:18:14 -0700 (PDT) Received: from bass.home.chromatix.fi (188-67-138-144.bb.dnainternet.fi. [188.67.138.144]) by smtp.gmail.com with ESMTPSA id f129sm1797346lff.10.2016.05.20.05.18.12 (version=TLSv1/SSLv3 cipher=OTHER); Fri, 20 May 2016 05:18:13 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) From: Jonathan Morton In-Reply-To: <22371476-B45C-4E81-93C0-D39A67639EA0@gmx.de> Date: Fri, 20 May 2016 15:18:11 +0300 Cc: cake@lists.bufferbloat.net, codel@lists.bufferbloat.net Content-Transfer-Encoding: quoted-printable Message-Id: References: <22371476-B45C-4E81-93C0-D39A67639EA0@gmx.de> To: moeller0 X-Mailer: Apple Mail (2.3124) Subject: Re: [Cake] Proposing COBALT X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 May 2016 12:18:15 -0000 >> One of the major reasons why Codel fails on UDP floods is that its = drop schedule is time-based. This is the correct behaviour for TCP = flows, which respond adequately to one congestion signal per RTT, = regardless of the packet rate. However, it means it is easily = overwhelmed by high-packet-rate unresponsive (or anti-responsive, as = with TCP acks) floods, which an attacker or lab test can easily produce = on a high-bandwidth ingress, especially using small packets. >=20 > In essence I agree, but want to point out that the protocol itself = does not really matter but rather the observed behavior of a flow. = Civilized UDP applications (that expect their data to be carried over = the best-effort internet) will also react to drops similar to decent TCP = flows, and crappy TCP implementations might not. I would guess with the = maturity of TCP stacks misbehaving TCP flows will be rarer than = misbehaving UDP flows (which might be for example well-behaved = fixed-rate isochronous flows that simply should never have been sent = over the internet). Codel properly handles both actual TCP flows and other flows supporting = TCP-friendly congestion control. The intent of COBALT is for BLUE to = activate whenever Codel clearly cannot cope, rather than on a = protocol-specific basis. This happens to dovetail neatly with the way = BLUE works anyway. >> BLUE=E2=80=99s up-trigger should be on a packet drop due to overflow = (only) targeting the individual subqueue managed by that particular BLUE = instance. It is not correct to trigger BLUE globally when an overall = overflow occurs. Note also that BLUE has a timeout between triggers, = which should I think be scaled according to the estimated RTT. >=20 > That sounds nice in that no additional state is required. But with the = current fq_codel I believe, the packet causing the memory limit overrun, = is not necessarily from the flow that actually caused the problem to = beginn with, and I doesn=E2=80=99t fq_codel actuall search the fattest = flow and drops from there. But I guess that selection procedure could be = run with blue as as well. Yes, both fq_codel and Cake search for the longest extant queue and drop = packets from that on overflow. It is this longest queue which would = receive the BLUE up-trigger at that point, which is not necessarily the = queue for the arriving packet. >> BLUE=E2=80=99s down-trigger is on the subqueue being empty when a = packet is requested from it, again on a timeout. To ensure this occurs, = it may be necessary to retain subqueues in the DRR list while BLUE=E2=80=99= s drop probability is nonzero. >=20 > Question, doesn=E2=80=99t this mean the affected flow will be = throttled quite harshly? Will blue slowly decrease the drop probability = p if the flow behaves? If so, blue could just disengage if p drops below = a threshold? Given that within COBALT, BLUE will normally only trigger on = unresponsive flows, an aggressive up-trigger response from BLUE is in = fact desirable. Codel is far too meek to handle this situation; we = should not seek to emulate it when designing a scheme to work around its = limitations. BLUE=E2=80=99s down-trigger decreases the drop probability by a smaller = amount (say 1/4000) than the up-trigger increases it (say 1/400). These = figures are the best-performing configuration from the original paper, = which is very readable, and behaviour doesn=E2=80=99t seem to be = especially sensitive to the precise values (though only = highly-aggregated traffic was considered, and probably on a long = timescale). For an actual implementation, I would choose convenient = binary fractions, such as 1/256 up and 1/4096 down, and a relatively = short trigger timeout. If the relative load from the flow decreases, BLUE=E2=80=99s action will = begin to leave the subqueue empty when serviced, causing BLUE=E2=80=99s = drop probability to fall off gradually, potentially until it reaches = zero. At this point the subqueue is naturally reset and will react = normally to subsequent traffic using it. The BLUE paper: = http://www.eecs.umich.edu/techreports/cse/99/CSE-TR-387-99.pdf >> Note that this does nothing to improve the situation regarding = fragmented packets. I think the correct solution in that case is to = divert all fragments (including the first) into a particular queue = dependent only on the host pair, by assuming zero for src and dst ports = and a =E2=80=9Cspecial=E2=80=9D protocol number. =20 >=20 > I believe the RFC recommends using the SRC IP, DST IP, Protocol, = Identity tuple, as otherwise all fragmented flows between a host pair = will hash into the same bucket=E2=80=A6 I disagree with that recommendation, because the Identity field will be = different for each fragmented packet, even if many such packets belong = to the same flow. This would spread these packets across many subqueues = and give them an unfair advantage over normal flows, which is the = opposite of what we want. Normal traffic does not include large numbers of fragmented packets (I = would expect a mere handful from certain one-shot request-response = protocols which can produce large responses), so it is better to shunt = them to a single queue per host-pair. - Jonathan Morton