From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.17.20])
 (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 4EA863B260;
 Fri, 20 May 2016 09:22:31 -0400 (EDT)
Received: from [172.17.3.79] ([134.76.241.253]) by mail.gmx.com (mrgmx103)
 with ESMTPSA (Nemesis) id 0LtZfc-1bksee3hcS-010xd8; Fri, 20 May 2016 15:22:28
 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: moeller0 <moeller0@gmx.de>
In-Reply-To: <A714A489-BA53-4FCF-BECC-1C092619C556@gmail.com>
Date: Fri, 20 May 2016 15:22:07 +0200
Cc: cake@lists.bufferbloat.net,
 codel@lists.bufferbloat.net
Content-Transfer-Encoding: quoted-printable
Message-Id: <B64452AB-91E8-41C8-9024-A90B208FC32A@gmx.de>
References: <CEE3BB80-E6D4-4851-8406-54DBCC0B36AB@gmail.com>
 <22371476-B45C-4E81-93C0-D39A67639EA0@gmx.de>
 <A714A489-BA53-4FCF-BECC-1C092619C556@gmail.com>
To: Jonathan Morton <chromatix99@gmail.com>
X-Mailer: Apple Mail (2.2104)
X-Provags-ID: V03:K0:v7iER/XuuLyBHs7lTlSm+5RXNniGOdxhKUE+LujL+xstNn+LgZn
 YU4eSh4S4vKYLJZ0j+ZRHBUDaadM57+rFUvu3p5bclYcOgyp3S9Zh1ZGcvrqYAA9LqQ5JFC
 8e8Do+/AiKiJGeru+gr5oIWiOZF0HduHoSqoUXPcKaGyHG9kbXh9cAb+QId/adm/nMspZ6Q
 GQw+7ncrRaj7+L169rOiw==
X-UI-Out-Filterresults: notjunk:1;V01:K0:e+hOrYcn+uY=:uEkxV5GVZdLeFuET8D6wMG
 OVt1vRyzGRpBsMdKM3ipYlwS7kXwGcC9nJ2OwS7gsTwlMBlA+ydz+SLYN/Vcw7mrfIY84F8xC
 PDqm7NMOPZB+K63KH02xD9VoyR1r77LkgMuD+4aH7uskF0BGQGMQ8TE+JGJJzL7eb9zrBZS7i
 m26r0MxugREzisEi6UJL7PQNjmmQg/g32iyYsfMp+6nP+W0Io8J3DnzTFS5pCPOnnW6Vp+13j
 jv3t5pMuUXwQC9OdIB252JXrden/gylvIJmbeRakZ7y7yMUYhGC9Io8RNxlHA3QKTimsMO+8q
 3r1pD580YVo94hVu6zAnxihvypVxmeFYfxaFfkkjnNKaQaTgHsugPNc4Q378nHMoVc4s+Tvip
 yAE8H9nSXGTV4at8+6Bnw+BH3XVXUsOT2l6W6RY31m44SH/rLiymP0nFCGTLQXbVsU5ptfTAI
 7idTADLZAcmS9A5/aMV/MEcs+Dn+U7pzB2iqMdt9W7EPFxqbbXRwy+HUYSNXuC/gZcutz/mXQ
 DPMZU6jRdJ2Ur7E/VCLFkf8hMFpmd8sYyTz6CGqyni1GCdW8QNkyd1tD+e+fM9+GJTMedXTr4
 i5+z1Asm2dELkQUVptopdSrHySTNfrGjQBcipzO8ee7HtzvJHXy9Nn2xz0w9oXYJKsNiH5RK3
 WGACcviWwe2SrMwUNcUTdGsCecKsrlF5NXcHhUl61yLUWwaOsWrEZAD0zYW1maEO2SUWX0hxm
 FF7mTscjCAbSEFJGWVn1O+fGaT9V/2Bun8Ecbkr9ymKIYBDj5AZPRYKZTEOhNVlVLj51SIji5
 6ztU/Nm6EujXOJsiVL9CZ8gF6IFaA==
Subject: Re: [Codel] [Cake] Proposing COBALT
X-BeenThere: codel@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: CoDel AQM discussions <codel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/codel>,
 <mailto:codel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/codel>
List-Post: <mailto:codel@lists.bufferbloat.net>
List-Help: <mailto:codel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/codel>,
 <mailto:codel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Fri, 20 May 2016 13:22:31 -0000

Hi Jonathan,

> On May 20, 2016, at 14:18 , Jonathan Morton <chromatix99@gmail.com> =
wrote:
>=20
>>> One of the major reasons why Codel fails on UDP floods is that its =
drop schedule is time-based.  This is the correct behaviour for TCP =
flows, which respond adequately to one congestion signal per RTT, =
regardless of the packet rate.  However, it means it is easily =
overwhelmed by high-packet-rate unresponsive (or anti-responsive, as =
with TCP acks) floods, which an attacker or lab test can easily produce =
on a high-bandwidth ingress, especially using small packets.
>>=20
>> In essence I agree, but want to point out that the protocol itself =
does not really matter but rather the observed behavior of a flow. =
Civilized UDP applications (that expect their data to be carried over =
the best-effort internet) will also react to drops similar to decent TCP =
flows, and crappy TCP implementations might not. I would guess with the =
maturity of TCP stacks misbehaving TCP flows will be rarer than =
misbehaving UDP flows (which might be for example well-behaved =
fixed-rate isochronous flows that simply should never have been sent =
over the internet).
>=20
> Codel properly handles both actual TCP flows and other flows =
supporting TCP-friendly congestion control.  The intent of COBALT is for =
BLUE to activate whenever Codel clearly cannot cope, rather than on a =
protocol-specific basis.  This happens to dovetail neatly with the way =
BLUE works anyway.

	Well, as I said I agree, only wanted to smart alec around the =
tcp versus udp flood destinction. And I fully agree the behaviur should =
depend on observed flow behavior and not header values=E2=80=A6

>=20
>>> BLUE=E2=80=99s up-trigger should be on a packet drop due to overflow =
(only) targeting the individual subqueue managed by that particular BLUE =
instance.  It is not correct to trigger BLUE globally when an overall =
overflow occurs.  Note also that BLUE has a timeout between triggers, =
which should I think be scaled according to the estimated RTT.
>>=20
>> That sounds nice in that no additional state is required. But with =
the current fq_codel I believe, the packet causing the memory limit =
overrun, is not necessarily from the flow that actually caused the =
problem to beginn with, and I doesn=E2=80=99t fq_codel actuall search =
the fattest flow and drops from there. But I guess that selection =
procedure could be run with blue as as well.
>=20
> Yes, both fq_codel and Cake search for the longest extant queue and =
drop packets from that on overflow.  It is this longest queue which =
would receive the BLUE up-trigger at that point, which is not =
necessarily the queue for the arriving packet.
>=20
>>> BLUE=E2=80=99s down-trigger is on the subqueue being empty when a =
packet is requested from it, again on a timeout.  To ensure this occurs, =
it may be necessary to retain subqueues in the DRR list while BLUE=E2=80=99=
s drop probability is nonzero.
>>=20
>> Question, doesn=E2=80=99t this mean the affected flow will be =
throttled quite harshly? Will blue slowly decrease the drop probability =
p if the flow behaves? If so, blue could just disengage if p drops below =
a threshold?
>=20
> Given that within COBALT, BLUE will normally only trigger on =
unresponsive flows, an aggressive up-trigger response from BLUE is in =
fact desirable. =20

	Sure, by that point the flow had ample/some time to react, but =
didn=E2=80=99t so a sliding tackle is warranted.

> Codel is far too meek to handle this situation; we should not seek to =
emulate it when designing a scheme to work around its limitations.

	And again since we triggerd blue by crossiing a threshold we =
know that codel=E2=80=99s way of asking nicely whether the flow might =
reduce its bandwidth lead o where=E2=80=A6

>=20
> BLUE=E2=80=99s down-trigger decreases the drop probability by a =
smaller amount (say 1/4000) than the up-trigger increases it (say =
1/400).  These figures are the best-performing configuration from the =
original paper, which is very readable, and behaviour doesn=E2=80=99t =
seem to be especially sensitive to the precise values (though only =
highly-aggregated traffic was considered, and probably on a long =
timescale).  For an actual implementation, I would choose convenient =
binary fractions, such as 1/256 up and 1/4096 down, and a relatively =
short trigger timeout.
>=20
> If the relative load from the flow decreases, BLUE=E2=80=99s action =
will begin to leave the subqueue empty when serviced, causing BLUE=E2=80=99=
s drop probability to fall off gradually, potentially until it reaches =
zero.  At this point the subqueue is naturally reset and will react =
normally to subsequent traffic using it.

	But if we reach a queue length of codel=E2=80=99s target (for =
some small amount of time), would that not be the best point in time to =
hand back to codel? Otherwise we push the queue to zero only to have =
codel come in and let it grow back to target (well approximately).

>=20
> The BLUE paper: =
http://www.eecs.umich.edu/techreports/cse/99/CSE-TR-387-99.pdf

	If I had time I would read that now ;)

>=20
>>> Note that this does nothing to improve the situation regarding =
fragmented packets.  I think the correct solution in that case is to =
divert all fragments (including the first) into a particular queue =
dependent only on the host pair, by assuming zero for src and dst ports =
and a =E2=80=9Cspecial=E2=80=9D protocol number. =20
>>=20
>> I believe the RFC recommends using the SRC IP, DST IP, Protocol, =
Identity tuple, as otherwise all fragmented flows between a host pair =
will hash into the same bucket=E2=80=A6
>=20
> I disagree with that recommendation, because the Identity field will =
be different for each fragmented packet,

	Ah, I see from rfc 791 (https://tools.ietf.org/html/rfc791):
    The identification field is used to distinguish the fragments of one
    datagram from those of another.  The originating protocol module of
    an internet datagram sets the identification field to a value that
    must be unique for that source-destination pair and protocol for the
    time the datagram will be active in the internet system.  The
    originating protocol module of a complete datagram sets the
    more-fragments flag to zero and the fragment offset to zero.

I agree the identity field decidely does the wrong thing, by spreading =
even a single flow over all hash buckets. That leaves my proposal from =
earlier, extract the ports from packets marked MF=3D1 Fragment offset =
packets, store the identity and use the stored values to calculate the =
hash values for all other packets in the same fragmented datagram=E2=80=A6=
 That sounds expensive enough to initially punt and use your idea, but =
certainly it is not ideal.

> even if many such packets belong to the same flow.  This would spread =
these packets across many subqueues and give them an unfair advantage =
over normal flows, which is the opposite of what we want.
>=20
> Normal traffic does not include large numbers of fragmented packets (I =
would expect a mere handful from certain one-shot request-response =
protocols which can produce large responses), so it is better to shunt =
them to a single queue per host-pair.

	This kind of special-casing can easily be abused as an attack =
vector=E2=80=A6 really if possible even fragmented flows should be =
hashed properly. If you are unlucky and set the wrong MTU for a ppoe =
link for example all full MTU packets will be fragmented and it would be =
nice to even show grace under load ;)

Best Regards
	Sebastian

>=20
> - Jonathan Morton
>=20