From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 6F6423CB38 for ; Thu, 25 Jul 2019 18:01:04 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1564092015; bh=aiM2UNAg41Dvoy1FrgSqtEmeQF4jhq4eZLwOXDhInPI=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=HJz/mCShx2rxLHlCWzqbfDq1M6Tnqwzr7ZwuwNxoRYxjVPmxruD0D77W82a9l/8GV qN41zgoGWrFyG5anyiOSTZRG+P1V3D9AgDsG1d0JeovkfHd7r1H6nUk+oPJ84nGMDe 4l6S8fE5zwhRe0yNJ+P+2LjylqLts+HY4HPxpWmY= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from hms-beagle2.lan ([77.180.85.154]) by mail.gmx.com (mrgmx002 [212.227.17.190]) with ESMTPSA (Nemesis) id 0LxgHz-1iS5a8057r-017AUQ; Fri, 26 Jul 2019 00:00:15 +0200 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) From: Sebastian Moeller In-Reply-To: <619092c0-640f-56c2-19c9-1cc486180c8b@bobbriscoe.net> Date: Fri, 26 Jul 2019 00:00:11 +0200 Cc: "De Schepper, Koen (Nokia - BE/Antwerp)" , "Black, David" , "ecn-sane@lists.bufferbloat.net" , "tsvwg@ietf.org" , Dave Taht Content-Transfer-Encoding: quoted-printable Message-Id: <3A454B00-AEBC-48B6-9A8A-922C66E884A7@gmx.de> References: <364514D5-07F2-4388-A2CD-35ED1AE38405@akamai.com> <17B33B39-D25A-432C-9037-3A4835CCC0E1@gmail.com> <52F85CFC-B7CF-4C7A-88B8-AE0879B3CCFE@gmail.com> <87ef2myqzv.fsf@taht.net> <4B02593C-E67F-4587-8B7E-9127D029AED9@gmx.de> <34e3b1b0-3c4c-bb6a-82c1-89ac14d5fd2c@bobbriscoe.net> <77522c07-6f2e-2491-ba0e-cbef62aad194@bobbriscoe.net> <619092c0-640f-56c2-19c9-1cc486180c8b@bobbriscoe.net> To: Bob Briscoe X-Mailer: Apple Mail (2.3445.104.11) X-Provags-ID: V03:K1:/LMIpuysutGpwJ2lWFhA50I9y+MXcbM5RaCeEx5ovmNFi5ucWcn XSeZhc89D5xvXAaiWLjDOW14motiEmlbjPbNw//azxFhBTw6KQD23gWrHrAcxeW/mqf+qR0 cxT+40gDVpZEw8RpSRqT23pvjv10Jw2uG7Pb2/ATGwsBlwo56eIMYR2UvqoVB8B3dwnimz+ 6DoKcMygKKTEPLlyzhXtA== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:/VNmDwAKcoA=:fvoljcH0KADVIzGh16sH1Z yxfr6taU+x8/qgKu+Gl9vc0UKwqLrAs+yhhRkM5n/HRd32AlSVC5ZGtl3oE39rOH/PJUic3ta ivRfrNCz8lNFKcaKS0SnWHBDGgrKLVnbSUZ/CrI5ezYmOKxlV4Fy5TK8O/6CFdXtbOgwiZcL2 oVZMGsaPhNbUYbuP0qTVlh1fjmmMaO20kPAPRBkSV3MBu5hKEhNdbgqLP0dT11jRVGvXWWsXY GBDV0ocyDK98Zcpbak2j6iupXmavcdU3VaKKr/gy2ithnkqQMrSRR1iy8wA4wyWZSbqwZ4cqp zLykkg96ZfhXtiSbKXtGl96O6UZBcGoWfIBr1FKahW6kB8byV7Q2d9eY0s6S8vPDiYkA9fUoZ IjbnuFBTkw4S9c+DOXVx6vZtRruszY+/RJ8U9ufXenJpqpbpUpvi56hPLq6/HQLZhQ4fV+Fvv Z5ycUwBCf6znbdfxDeTxmElrxOfpJwK5/K++/HxX/Q8oxFtp6Nj5cQEsUslOIlNKVw+zaDNrO q/0VtT7hR+sFhotJy5+bgrZ/2Gm8PMGNInn+IGGp9m+1wzyu0e/jUW/kjcOGOGOHFh13TSqL3 lXBeQS9PkXk3Q/b73z0G02E7wHHRlMno7XBTz4YKVH7xyhJbL5dma3Ocbk8Dz/krZYyW37QTE a0a/A4l9kfq9sdG4hZdjqlKOW587z+5NEFXhqcQaMthP8cqoODTUqa4C7KttJWge+O+UOBJAB F0/g4CHJ/ca1zkZ8+mXwF8LxyxaiUTuY/WBoPWgc+OSFRsdJbILMBh4AXUdF+M3skXrFz25pO KAEQct9BKndwcPACrnS8KlRXuykKRSVkho2BFF6ErVwX/uQj8FfdczRh6YURO3p+rAp2VeK6S h/Eo8v88DGL3BdyRcaszZRM6tZ9qMCvOKl+NOP3q+NqX5qKLIe3wNcaqWIGynxlL7seY4FcwF hVGYmgm61aE87hNTcHsGdxhBGgPELyLvUsID/P5izx2VHpac5lCuchBKyF1+iya35Y7jCXEnX BKgAtNZKBhke1xa6m6uRGzfdjEm+FzfW+CR9EYDiRhWDqZHScJGjE8aCtfRtdElL24IQZBFt3 MROmpvgrEjxizs= Subject: Re: [Ecn-sane] [tsvwg] Comments on L4S drafts X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Jul 2019 22:01:04 -0000 Dear Bob, thanks for you time and insight. More comments below. I will try to = follow your style. > On Jul 25, 2019, at 23:17, Bob Briscoe wrote: >=20 > Sebastien, >=20 > Sry, I sent that last reply too early, and not bottom posted. Both = corrected below (tagged [BB]): >=20 >=20 > On 25/07/2019 16:51, Bob Briscoe wrote: >> Sebastien, >>=20 >>=20 >> On 21/07/2019 16:48, Sebastian Moeller wrote: >>> Dear Bob,=20 >>>=20 >>>=20 >>>> On Jul 21, 2019, at 21:14, Bob Briscoe >>>> wrote: >>>>=20 >>>> Sebastien, >>>>=20 >>>> On 21/07/2019 17:08, Sebastian Moeller wrote: >>>>=20 >>>>> Hi Bob, >>>>>=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>>> On Jul 21, 2019, at 14:30, Bob Briscoe >>>>>>=20 >>>>>> wrote: >>>>>>=20 >>>>>> David, >>>>>>=20 >>>>>> On 19/07/2019 21:06, Black, David wrote: >>>>>>=20 >>>>>>=20 >>>>>>> Two comments as an individual, not as a WG chair: >>>>>>>=20 >>>>>>>=20 >>>>>>>=20 >>>>>>>> Mostly, they're things that an end-host algorithm needs >>>>>>>> to do in order to behave nicely, that might be good things = anyways >>>>>>>> without regard to L4S in the network (coexist w/ Reno, avoid = RTT bias, >>>>>>>> work well w/ small RTT, be robust to reordering). I am curious = which >>>>>>>> ones you think are too rigid ... maybe they can be loosened? >>>>>>>>=20 >>>>>>>>=20 >>>>>>> [1] I have profoundly objected to L4S's RACK-like requirement = (use time to detect loss, and in particular do not use 3DupACK) in = public on multiple occasions, because in reliable transport space, that = forces use of TCP Prague, a protocol with which we have little to no = deployment or operational experience. Moreover, that requirement raises = the bar for other protocols in a fashion that impacts endpoint firmware, = and possibly hardware in some important (IMHO) environments where = investing in those changes delivers little to no benefit. The = environments that I have in mind include a lot of data centers. Process = wise, I'm ok with addressing this objection via some sort of "controlled = environment" escape clause text that makes this RACK-like requirement = inapplicable in a "controlled environment" that does not need that = behavior (e.g., where 3DupACK does not cause problems and is not = expected to cause problems). >>>>>>>=20 >>>>>>> For clarity, I understand the multi-lane link design rationale = behind the RACK-like requirement and would agree with that requirement = in a perfect world ... BUT ... this world is not perfect ... e.g., = 3DupACK will not vanish from "running code" anytime soon. >>>>>>>=20 >>>>>>>=20 >>>>>> As you know, we have been at pains to address every concern about = L4S that has come up over the years, and I thought we had addressed this = one to your satisfaction. >>>>>>=20 >>>>>> The reliable transports you are are concerned about require = ordered delivery by the underlying fabric, so they can only ever exist = in a controlled environment. In such a controlled environment, your = ECT1+DSCP idea (below) could be used to isolate the L4S experiment from = these transports and their firmware/hardware constraints. >>>>>>=20 >>>>>> On the public Internet, the DSCP commonly gets wiped at the first = hop. So requiring a DSCP as well as ECT1 to separate off L4S would serve = no useful purpose: it would still lead to ECT1 packets without the DSCP = sent from a scalable congestion controls (which is behind Jonathan's = concern in response to you). >>>>>>=20 >>>>>>=20 >>>>> And this is why IPv4's protocol fiel/ IPv6's next header field = are the classifier you actually need... You are changing a significant = portion of TCP's observable behavior, so it can be argued that = TCP-Prague is TCP by name only; this "classifier" still lives in the IP = header, so no deeper layer's need to be accessed, this is non-leaky in = that the classifier is unambiguously present independent of the value of = the ECN bits; and it is also compatible with an SCE style ECN signaling. = Since I believe the most/only likely roll-out of L4S is going to be at = the ISPs access nodes (BRAS/BNG/CMTS/whatever) middleboxes shpould not = be an unsurmountable problem, as ISPs controll their own middleboxes and = often even the CPEs, so protocoll ossification is not going to be a = showstopper for this part of the roll-out. >>>>>=20 >>>>> Best Regards >>>>> Sebastian >>>>>=20 >>>>>=20 >>>>>=20 >>>> I think you've understood this from reading abbreviated description = of the requirement on the list, rather than the spec. The spec. solely = says: >>>> A scalable congestion control MUST detect loss by counting in = time-based units >>>> That's all. No more, no less.=20 >>>>=20 >>>> People call this the "RACK requirement", purely because the idea = came from RACK. There is no requirement to do RACK, and the requirement = applies to all transports, not just TCP. >>>>=20 >>> Fair enough, but my argument was not really about RACK at all, = it more-so applies to the linear response to CE-marks that ECT(1) = promises in the L4S approach. You are making changes to TCP's congestion = controller that make it cease to be "TCP-friendly" (for arguably good = reasons). So why insist on pretending that this is still TCP? So give it = a new protocol ID already and all your classification needs are solved. = As a bonus you do not need to use the same signal (CE) to elicit two = different responses, but you could use the re-gained ECT(1) code point = similarly to SCE to put the new fine-grained congestion signal into... = while using CE in the RFC3168 compliant sense. >=20 > [BB] The protocol ID identifies the wire protocol, not the congestion = control behaviour. If we had used a different protocol ID for each = congestion control behaviour, we'd have run out of protocol IDs long ago = (semi serious ;) [SM] Yes, I know, but you are proposing a massively incompatible = "congestion control behaviour" for L4S that is not TCP-friendly, = otherwise you would not need to deal with isolating your new style flows = from the rest. For convenience (and since most of the other components = are TCP-like) you package the whole thing as a congestion control module = for TCP. My argument is, do not do that. As an aside, with this approach you are still at the mercy of OS = and router manufacturers (okay Linux should be easy, but what is the = plan of attack to get L4S behaviour into windows' TCP implementation; to = me it seems your best bet would be to create a library for UDP that will = do your L4S type response on top of UDP (you get resequencing tolerance = for free ;) ), as long as you supply that library for all inportant OSes = application writers can opt in without the need for OSes to change, but = that is an aside. >=20 > This is a re-run of a debate that has already been had (in Jul 2015 - = Nov 2016), which is recorded in the appendix of ecn-l4s-id here: > = https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#appendix-B.4 [SM] Read it there, I just believe that the final choice of = identifier was not the optimal one (I know this is all about trade-offs, = I just happen to have different priorities than the L4S project; IMHO = all the power to L4S as long as it does stay opt-in and has ZERO = side-effects on existing internet users). > Quoted and annotated below: >=20 >> B.4. Protocol ID >>=20 >> It has been suggested that a new ID in the IPv4 Protocol field or = the >> IPv6 Next Header field could identify L4S packets. However this >> approach is ruled out by numerous problems: >>=20 >> o A new protocol ID would need to be paired with the old one for >> each transport (TCP, SCTP, UDP, etc.); [SM] That is somewhat weak, as you are a) currently only pushing = a TCP version, and you might want a UDP version (see above), (how many = applications use anything but TCP or UDP?) >>=20 >> o In IPv6, there can be a sequence of Next Header fields, and it >> would not be obvious which one would be expected to identify a >> network service like L4S; >>=20 > In particular, the protocol ID / next header stays next to the upper = layer header as a PDU gets encapsulated, possibly many times. So the = protocol ID is not necessarily (rarely?) in the outer, particularly in = IPv6, and it might be encrypted in IPSec. [SM] So, at a peering/transit point, which encapsulations are = actually realistic? I would have thought that more or less raw IP = packets are required to make the necessary routing decisions at a = network's edge, same argument holds for the internet access links. At = which points besides the ingress and egress of a network do you expect = queueing to happen routinely? =46rom my limited experience it really is = at ingress/egress/transit, so which other hops will actually be = realistic targets for an L4S-AQM? I also am not yet convinced that ISPs will really want to signal = that their peering/transits are under-sized, so I am dubious that these = will ever get L4S/SCE style signaling (but I hope I am overly = pessimistic here). >=20 >> o A new protocol ID would rarely provide an end-to-end service, >> because It is well-known that new protocol IDs are often = blocked >> by numerous types of middlebox; [SM] Yes, that is the strongest of these four arguments, at last = to my layman's eyes. >>=20 >> o The approach is not a solution for AQMs below the IP layer; >>=20 >>=20 > That last point means that the protocol ID is not designed to always = propagate to the outer on encap and back from the outer on decap, = whereas the ECN field is (and it's the only field that is). [SM] Fair enough, as indicated above, I am not really seeing = hops that deal in non-IP packets to actually ever use L4S/SCE type = signalling, so is that really a big problem? >=20 > more.... >>>=20 >>>=20 >>>=20 >>>> It then means that a packet with ECT1 in the IP field can be = forwarded without resequencing (no requirement - it just it /can/ be). >>>>=20 >>> Packets always "can" be forwarded without resequencing, the = question is whether the end-points are going to like that...=20 >>> And IMHO even RACK with its at maximum one RTT reordering windows = gives intermediate hops not much to work with, without knowing the full = RTT a cautious hop might allow itself one retransmission slot (so its = own contribution to the RTT), but as far as I can tell they do that = already. And tracking the RTT will require to keep per flow statistics, = this also seems like it can get computationally expensive quickly... (I = probably misunderstand how RACK works, but I fail to see how it will = really allow more re-ordering, but that is also orthogonal to the L4S = issues I try to raise). >>>=20 > [BB] No-one's suggesting reordering degree will adapt to measured RTT = at run-time.=20 [SM] I know, as that would defeat the purpose, but that also = puts severe limits on how much re-ordering budget a given link actually = has. >=20 > See the original discussion on this point here: > Vicious or Virtuous circle? Adapting reordering window to reordering = degree >=20 > In summary, the uncertainty for the network is a feature not a bug. It = means it has to keep reordering degree lower than the lowest likely RTT = (or some fraction of it) that is expected for that link technology at = the design stage. This will keep reordering low, but not too = unnecessarily low (i.e. not 3 packets at the link rate). [SM] As I state above, a given link realistically will only be = allowed one of its own local RTTs worth of re-ordering (other links = might re-order as well, so no link can claim the full e2E RTT's worth of = re-ordering all for itself). So all I can see for each link one or (if = the link feels lucky) two re-transmit opportunities before the link = needs to stall to resequenced packets again. Now, that might already be = enough (and a sufficiently "batchy" link might transfer more than 3 = packets in one haul). I naively thought that a link would only ever stall those flows = with out-of-order packets and happily fill its upstream pipe with = packets from unaffected flows, but that seems not to be happening. >=20 >>>=20 >>>> This is a network layer 'unordered delivery' property, so it's = appropriate to flag at the IP layer.=20 >>>>=20 >>> But at that point you are multiplexing multiple things into the = poor ECT(1) codepoint, the promise of a certain "linear" back-off = behavior on encountered congestion AND a "allow relaxed ordering" ( = "detect loss by counting in time-based units" does not seem to be fully = equivalent with a generic tolerance to 'unordered delivery' as far as I = understand). That seems asking to much of a simple number... > [BB] In a purist sense, it is a valid architectural criticism that we = overload one codepoint with two architecturally distinct functions: > =E2=80=A2 low queuing delay > =E2=80=A2 low resequencing delay > But then, one has to consider the value vs cost of 2 independent = identifiers for two things that are unlikely to ever need to be = distinguished. If an app wants low delay, would it want only low queuing = delay and not low resequencing delay?=20 [SM] Sorry, I can well envision apps that do not care about "low = queuing delay" but would be happy to give laxer reordering requirements = to the network (like a bulk data transfer, that just wants to keep = pushing packets through). Is that unrealistic?=20 >=20 > You could contrive a case where the receiver is memory-challenged and = needs the network to do the resequencing. Well, packets are send in sequence, so the idea is not to burden = the network with undue work, but rather to faithfully transmit what the = endpoints send.=20 (On a tangent, somewhere else you argued against FQ as it will take the = dynamic packet spacing decisions away from the sending endpoint, but = surely changing the order of packets is a far more grave intervention = than just changing the interpacket intervals, no?) > But it's not a reasonable expectation for the network to do a function = that will cause HoL blocking for other applications in the process of = helping you with your memory problems. >=20 > Given we are header-bit-challenged, it would not be unreasonable for = the WG to decide to conflate these two architectural identifiers into = one. >=20 >=20 > Bob >=20 >>>=20 >>> Best Regards >>> Sebastian >>>=20 >>>=20 >>>>=20 >>>> Bob >>>>=20 >>>>=20 >>>>=20 >>>> --=20 >>>> ________________________________________________________________ >>>> Bob Briscoe =20 >>>>=20 >>>> http://bobbriscoe.net/ >>> _______________________________________________ >>> Ecn-sane mailing list >>>=20 >>> Ecn-sane@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/ecn-sane >>=20 >> --=20 >> ________________________________________________________________ >> Bob Briscoe =20 >> http://bobbriscoe.net/ >=20 > --=20 > ________________________________________________________________ > Bob Briscoe =20 > http://bobbriscoe.net/