From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 4EA863B260; Fri, 20 May 2016 09:22:31 -0400 (EDT) Received: from [172.17.3.79] ([134.76.241.253]) by mail.gmx.com (mrgmx103) with ESMTPSA (Nemesis) id 0LtZfc-1bksee3hcS-010xd8; Fri, 20 May 2016 15:22:28 +0200 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) From: moeller0 In-Reply-To: Date: Fri, 20 May 2016 15:22:07 +0200 Cc: cake@lists.bufferbloat.net, codel@lists.bufferbloat.net Content-Transfer-Encoding: quoted-printable Message-Id: References: <22371476-B45C-4E81-93C0-D39A67639EA0@gmx.de> To: Jonathan Morton X-Mailer: Apple Mail (2.2104) X-Provags-ID: V03:K0:v7iER/XuuLyBHs7lTlSm+5RXNniGOdxhKUE+LujL+xstNn+LgZn YU4eSh4S4vKYLJZ0j+ZRHBUDaadM57+rFUvu3p5bclYcOgyp3S9Zh1ZGcvrqYAA9LqQ5JFC 8e8Do+/AiKiJGeru+gr5oIWiOZF0HduHoSqoUXPcKaGyHG9kbXh9cAb+QId/adm/nMspZ6Q GQw+7ncrRaj7+L169rOiw== X-UI-Out-Filterresults: notjunk:1;V01:K0:e+hOrYcn+uY=:uEkxV5GVZdLeFuET8D6wMG OVt1vRyzGRpBsMdKM3ipYlwS7kXwGcC9nJ2OwS7gsTwlMBlA+ydz+SLYN/Vcw7mrfIY84F8xC PDqm7NMOPZB+K63KH02xD9VoyR1r77LkgMuD+4aH7uskF0BGQGMQ8TE+JGJJzL7eb9zrBZS7i m26r0MxugREzisEi6UJL7PQNjmmQg/g32iyYsfMp+6nP+W0Io8J3DnzTFS5pCPOnnW6Vp+13j jv3t5pMuUXwQC9OdIB252JXrden/gylvIJmbeRakZ7y7yMUYhGC9Io8RNxlHA3QKTimsMO+8q 3r1pD580YVo94hVu6zAnxihvypVxmeFYfxaFfkkjnNKaQaTgHsugPNc4Q378nHMoVc4s+Tvip yAE8H9nSXGTV4at8+6Bnw+BH3XVXUsOT2l6W6RY31m44SH/rLiymP0nFCGTLQXbVsU5ptfTAI 7idTADLZAcmS9A5/aMV/MEcs+Dn+U7pzB2iqMdt9W7EPFxqbbXRwy+HUYSNXuC/gZcutz/mXQ DPMZU6jRdJ2Ur7E/VCLFkf8hMFpmd8sYyTz6CGqyni1GCdW8QNkyd1tD+e+fM9+GJTMedXTr4 i5+z1Asm2dELkQUVptopdSrHySTNfrGjQBcipzO8ee7HtzvJHXy9Nn2xz0w9oXYJKsNiH5RK3 WGACcviWwe2SrMwUNcUTdGsCecKsrlF5NXcHhUl61yLUWwaOsWrEZAD0zYW1maEO2SUWX0hxm FF7mTscjCAbSEFJGWVn1O+fGaT9V/2Bun8Ecbkr9ymKIYBDj5AZPRYKZTEOhNVlVLj51SIji5 6ztU/Nm6EujXOJsiVL9CZ8gF6IFaA== Subject: Re: [Codel] [Cake] Proposing COBALT X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 May 2016 13:22:31 -0000 Hi Jonathan, > On May 20, 2016, at 14:18 , Jonathan Morton = wrote: >=20 >>> One of the major reasons why Codel fails on UDP floods is that its = drop schedule is time-based. This is the correct behaviour for TCP = flows, which respond adequately to one congestion signal per RTT, = regardless of the packet rate. However, it means it is easily = overwhelmed by high-packet-rate unresponsive (or anti-responsive, as = with TCP acks) floods, which an attacker or lab test can easily produce = on a high-bandwidth ingress, especially using small packets. >>=20 >> In essence I agree, but want to point out that the protocol itself = does not really matter but rather the observed behavior of a flow. = Civilized UDP applications (that expect their data to be carried over = the best-effort internet) will also react to drops similar to decent TCP = flows, and crappy TCP implementations might not. I would guess with the = maturity of TCP stacks misbehaving TCP flows will be rarer than = misbehaving UDP flows (which might be for example well-behaved = fixed-rate isochronous flows that simply should never have been sent = over the internet). >=20 > Codel properly handles both actual TCP flows and other flows = supporting TCP-friendly congestion control. The intent of COBALT is for = BLUE to activate whenever Codel clearly cannot cope, rather than on a = protocol-specific basis. This happens to dovetail neatly with the way = BLUE works anyway. Well, as I said I agree, only wanted to smart alec around the = tcp versus udp flood destinction. And I fully agree the behaviur should = depend on observed flow behavior and not header values=E2=80=A6 >=20 >>> BLUE=E2=80=99s up-trigger should be on a packet drop due to overflow = (only) targeting the individual subqueue managed by that particular BLUE = instance. It is not correct to trigger BLUE globally when an overall = overflow occurs. Note also that BLUE has a timeout between triggers, = which should I think be scaled according to the estimated RTT. >>=20 >> That sounds nice in that no additional state is required. But with = the current fq_codel I believe, the packet causing the memory limit = overrun, is not necessarily from the flow that actually caused the = problem to beginn with, and I doesn=E2=80=99t fq_codel actuall search = the fattest flow and drops from there. But I guess that selection = procedure could be run with blue as as well. >=20 > Yes, both fq_codel and Cake search for the longest extant queue and = drop packets from that on overflow. It is this longest queue which = would receive the BLUE up-trigger at that point, which is not = necessarily the queue for the arriving packet. >=20 >>> BLUE=E2=80=99s down-trigger is on the subqueue being empty when a = packet is requested from it, again on a timeout. To ensure this occurs, = it may be necessary to retain subqueues in the DRR list while BLUE=E2=80=99= s drop probability is nonzero. >>=20 >> Question, doesn=E2=80=99t this mean the affected flow will be = throttled quite harshly? Will blue slowly decrease the drop probability = p if the flow behaves? If so, blue could just disengage if p drops below = a threshold? >=20 > Given that within COBALT, BLUE will normally only trigger on = unresponsive flows, an aggressive up-trigger response from BLUE is in = fact desirable. =20 Sure, by that point the flow had ample/some time to react, but = didn=E2=80=99t so a sliding tackle is warranted. > Codel is far too meek to handle this situation; we should not seek to = emulate it when designing a scheme to work around its limitations. And again since we triggerd blue by crossiing a threshold we = know that codel=E2=80=99s way of asking nicely whether the flow might = reduce its bandwidth lead o where=E2=80=A6 >=20 > BLUE=E2=80=99s down-trigger decreases the drop probability by a = smaller amount (say 1/4000) than the up-trigger increases it (say = 1/400). These figures are the best-performing configuration from the = original paper, which is very readable, and behaviour doesn=E2=80=99t = seem to be especially sensitive to the precise values (though only = highly-aggregated traffic was considered, and probably on a long = timescale). For an actual implementation, I would choose convenient = binary fractions, such as 1/256 up and 1/4096 down, and a relatively = short trigger timeout. >=20 > If the relative load from the flow decreases, BLUE=E2=80=99s action = will begin to leave the subqueue empty when serviced, causing BLUE=E2=80=99= s drop probability to fall off gradually, potentially until it reaches = zero. At this point the subqueue is naturally reset and will react = normally to subsequent traffic using it. But if we reach a queue length of codel=E2=80=99s target (for = some small amount of time), would that not be the best point in time to = hand back to codel? Otherwise we push the queue to zero only to have = codel come in and let it grow back to target (well approximately). >=20 > The BLUE paper: = http://www.eecs.umich.edu/techreports/cse/99/CSE-TR-387-99.pdf If I had time I would read that now ;) >=20 >>> Note that this does nothing to improve the situation regarding = fragmented packets. I think the correct solution in that case is to = divert all fragments (including the first) into a particular queue = dependent only on the host pair, by assuming zero for src and dst ports = and a =E2=80=9Cspecial=E2=80=9D protocol number. =20 >>=20 >> I believe the RFC recommends using the SRC IP, DST IP, Protocol, = Identity tuple, as otherwise all fragmented flows between a host pair = will hash into the same bucket=E2=80=A6 >=20 > I disagree with that recommendation, because the Identity field will = be different for each fragmented packet, Ah, I see from rfc 791 (https://tools.ietf.org/html/rfc791): The identification field is used to distinguish the fragments of one datagram from those of another. The originating protocol module of an internet datagram sets the identification field to a value that must be unique for that source-destination pair and protocol for the time the datagram will be active in the internet system. The originating protocol module of a complete datagram sets the more-fragments flag to zero and the fragment offset to zero. I agree the identity field decidely does the wrong thing, by spreading = even a single flow over all hash buckets. That leaves my proposal from = earlier, extract the ports from packets marked MF=3D1 Fragment offset = packets, store the identity and use the stored values to calculate the = hash values for all other packets in the same fragmented datagram=E2=80=A6= That sounds expensive enough to initially punt and use your idea, but = certainly it is not ideal. > even if many such packets belong to the same flow. This would spread = these packets across many subqueues and give them an unfair advantage = over normal flows, which is the opposite of what we want. >=20 > Normal traffic does not include large numbers of fragmented packets (I = would expect a mere handful from certain one-shot request-response = protocols which can produce large responses), so it is better to shunt = them to a single queue per host-pair. This kind of special-casing can easily be abused as an attack = vector=E2=80=A6 really if possible even fragmented flows should be = hashed properly. If you are unlucky and set the wrong MTU for a ppoe = link for example all full MTU packets will be fragmented and it would be = nice to even show grace under load ;) Best Regards Sebastian >=20 > - Jonathan Morton >=20