From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x236.google.com (mail-lj1-x236.google.com [IPv6:2a00:1450:4864:20::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 00AAC3B2A4 for ; Sat, 22 Jun 2019 19:07:11 -0400 (EDT) Received: by mail-lj1-x236.google.com with SMTP id k18so9127178ljc.11 for ; Sat, 22 Jun 2019 16:07:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=zS5Tj6141d1G8e+sL/TQVYrZeSfWYY1nDMHUzW2C28A=; b=UxCHZSovUgNmBHfM3OmHX+V2ugJAG0TJQ2v1ZclT1D3TGB0dtQ9vrmJoyShSGQbass tcJSwpkfXNNS9NRJ953vNftrQYtzDW2LBhA9q35Qv+QbUlVivLrXPXGB9qj4KR/axM9M /5iQWzR9lrGZNfXDex6Oq02O8qRdG3MB3MDg2Ma030WfP+xNACCqIKLflwKJGJWQIJbs x/wK9ws7LwjR/lrYBp0LF2cJwOJ0Fuyn0qsptEfhgD4W3fPhklJ8/L4fX2BlhLKO9THr LPwONP7PwOZTJKerSrHLRrNNcUJn2b7ECozmuApdqh9MeISNd6jn0+FONKP+k7Rh3C5n lU9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=zS5Tj6141d1G8e+sL/TQVYrZeSfWYY1nDMHUzW2C28A=; b=UKHvaJsyDmRe3g040zcLGGBp9apFPQ+r6NQ+pO/shfC1TRNJYWTBld6efM5Mj1o0Am Qg7t+rLpTCoeMMwSorGe2QKcPnkSkJMKQr/dy21aLwpioWQYZ8ciDy0slQEv8xmbjLGr Mp7CJ3yV1gc2B8cFzDXTMg36EP8rd+DSbvzAkfaI512AJN0FjUmlKgFSb0UXfvEWTNHT rhWOGdP0ji3P98Dvyx3fLRrZfei2/7BcceMhHvw/X3KQV1iU1YSQOyIbMBPI/oLvG036 N+bUHqKsAwWK5pHwMI18KgD3bhp3f0CDZqn4DwbkXLinyJKzSwd9mfjEaAJknosuldTc iusg== X-Gm-Message-State: APjAAAXuxTmXRozxlWHEjnNO169WBmwane+/7X8wpDIXSo8UiuO5H7sw e15PlG2Pbw5AmADDVdDuiic= X-Google-Smtp-Source: APXvYqzbQq7ilWKBDBs0a5WEUXAM712o5GYBf3is8XWy0r5HcLBxakep+CEmcBu+43JaGHdp0XmR9A== X-Received: by 2002:a2e:81c4:: with SMTP id s4mr65252078ljg.182.1561244830796; Sat, 22 Jun 2019 16:07:10 -0700 (PDT) Received: from jonathartonsmbp.lan (83-245-253-251-nat-p.elisa-mobile.fi. [83.245.253.251]) by smtp.gmail.com with ESMTPSA id p6sm1042157lja.67.2019.06.22.16.07.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 22 Jun 2019 16:07:10 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) From: Jonathan Morton X-Priority: 3 (Normal) In-Reply-To: <1561241377.4026977@apps.rackspace.com> Date: Sun, 23 Jun 2019 02:07:08 +0300 Cc: Brian E Carpenter , "ecn-sane@lists.bufferbloat.net" , tsvwg IETF list Content-Transfer-Encoding: quoted-printable Message-Id: <081BAF4F-2E1C-441B-A31A-9AC70E3EDA32@gmail.com> References: <350f8dd5-65d4-d2f3-4d65-784c0379f58c@bobbriscoe.net> <46D1ABD8-715D-44D2-B7A0-12FE2A9263FE@gmx.de> <835b1fb3-e8d5-c58c-e2f8-03d2b886af38@gmail.com> <1561233009.95886420@apps.rackspace.com> <71EF351D-AFBF-4C92-B6B9-7FD695B68815@gmail.com> <1561241377.4026977@apps.rackspace.com> To: "David P. Reed" X-Mailer: Apple Mail (2.3445.9.1) Subject: Re: [Ecn-sane] [tsvwg] per-flow scheduling X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 22 Jun 2019 23:07:12 -0000 > On 23 Jun, 2019, at 1:09 am, David P. Reed = wrote: >=20 > per-flow scheduling is appropriate on a shared link. However, the = end-to-end argument would suggest that the network not try to divine = which flows get preferred. > And beyond the end-to-end argument, there's a practical problem - = since the ideal state of a shared link means that it ought to have no = local backlog in the queue, the information needed to schedule "fairly" = isn't in the queue backlog itself. If there is only one packet, what's = to schedule? This is a great straw-man. Allow me to deconstruct it. The concept that DRR++ has empirically proved is that flows can be = classified into two categories - sparse and saturating - very easily by = the heuristic that a saturating flow's arrival rate exceeds its = available delivery rate, and the opposite is true for a sparse flow. An excessive arrival rate results in a standing queue; with Reno, the = excess arrival rate after capacity is reached is precisely 1 segment per = RTT, very small next to modern link capacities. If there is no overall = standing queue, then by definition all of the flows passing through are = currently sparse. DRR++ (as implemented in fq_codel and Cake) ensures = that all sparse traffic is processed with minimum delay and no AQM = activity, while saturating traffic is metered out fairly and given = appropriate AQM signals. > In fact, what the ideal queueing discipline would do is send signals = to the endpoints that provide information as to what each flow's = appropriate share is, and/or how far its current share is from what's = fair. The definition of which flows are sparse and which are saturating shifts = dynamically in response to endpoint behaviour. > Well, presumably the flows have definable average rates. Today's TCP traffic exhibits the classic sawtooth behaviour - which has = a different shape and period with CUBIC than Reno, but is fundamentally = similar. The sender probes capacity by increasing send rate until a = congestion signal is fed back to it, at which point it drops back = sharply. With efficient AQM action, a TCP flow will therefore spend = most of its time "sparse" and using less than the available path = capacity, with occasional excursions into "saturating" territory which = are fairly promptly terminated by AQM signals. So TCP does *not* have a definable "average rate". It grows to fill = available capacity, just like the number of cars on a motorway network. The recent work on high-fidelity ECN (including SCE) aims to eliminate = the sawtooth, so that dropping out of "saturating" mode is done faster = and by only a small margin, wasting less capacity and reducing peak = delays - very close to ideal control as you describe. But it's still = necessary to avoid giving these signals unnecessarily to "sparse" flows, = which would cause them to back off and thus waste capacity, but only to = "saturating" flows that have just begun to build a queue. And it's also = necessary to protect these well-behaved "modern" flows from "legacy" = endpoint behaviour, and vice versa. DRR++ does that very naturally. > Merely re-ordering the packets on a link is just not very effective at = achieving fairness. I'm afraid this assertion is simply false. DRR++ does precisely that, = and achieves near-perfect fairness. It is important however to define "flow" correctly relative to the = measure of fairness you want to achieve. Traditionally the unique = 5-tuple is used to define "flow", but this means applications can game = the system by opening multiple flows. For an ISP a better definition = might be that each subscriber's traffic is one "flow". Or there is a = tweak to DRR++ which allows a two-layer fairness definition, implemented = successfully in Cake. > So the end-to-end approach would suggest moving most of the scheduling = back to the endpoints of each flow, with the role of the routers being = to extract information about the competing flows that are congesting the = network, and forwarding those signals (via drops or marking) to the = endpoints. That's because, in the end-to-end argument that applies here = - the router cannot do the entire function of managing congestion or = priority. It must be remembered that congestion signals require one RTT to = circulate from the bottleneck, via the receiver, back to the sender, and = their effects to then be felt at the bottleneck. That's typically a = much longer response time (say 100ms for a general Internet path) than = can be achieved by packet scheduling (sub-millisecond for a 20Mbps = link), and therefore effects only a looser control (by fundamental = control theory). Both mechanisms are useful and complement each other. My personal interpretation of the end-to-end principle is that endpoints = generally do not, cannot, and *should not* be aware of the topology of = the network between them, nor of any other traffic that might be sharing = that network. The network itself takes care of those details, and may = send standardised control-feedback signals to the endpoints to inform = them about actions they need to take. These currently take the form of = ICMP error packets and the ECN field, the latter substituted by packet = drops on Not-ECT flows. - Jonathan Morton=