[Ecn-sane] [tsvwg] Comments on L4S drafts

Sebastian Moeller moeller0 at gmx.de
Thu Jul 25 18:00:11 EDT 2019


Dear Bob,

thanks for you time and insight. More comments below. I will try to follow your style.

> On Jul 25, 2019, at 23:17, Bob Briscoe <ietf at bobbriscoe.net> wrote:
> 
> Sebastien,
> 
> Sry, I sent that last reply too early, and not bottom posted. Both corrected below (tagged [BB]):
> 
> 
> On 25/07/2019 16:51, Bob Briscoe wrote:
>> Sebastien,
>> 
>> 
>> On 21/07/2019 16:48, Sebastian Moeller wrote:
>>> Dear Bob, 
>>> 
>>> 
>>>> On Jul 21, 2019, at 21:14, Bob Briscoe <ietf at bobbriscoe.net>
>>>>  wrote:
>>>> 
>>>> Sebastien,
>>>> 
>>>> On 21/07/2019 17:08, Sebastian Moeller wrote:
>>>> 
>>>>> Hi Bob,
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 21, 2019, at 14:30, Bob Briscoe <ietf at bobbriscoe.net>
>>>>>> 
>>>>>>  wrote:
>>>>>> 
>>>>>> David,
>>>>>> 
>>>>>> On 19/07/2019 21:06, Black, David wrote:
>>>>>> 
>>>>>> 
>>>>>>> Two comments as an individual, not as a WG chair:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Mostly, they're things that an end-host algorithm needs
>>>>>>>> to do in order to behave nicely, that might be good things anyways
>>>>>>>> without regard to L4S in the network (coexist w/ Reno, avoid RTT bias,
>>>>>>>> work well w/ small RTT, be robust to reordering).  I am curious which
>>>>>>>> ones you think are too rigid ... maybe they can be loosened?
>>>>>>>> 
>>>>>>>> 
>>>>>>> [1] I have profoundly objected to L4S's RACK-like requirement (use time to detect loss, and in particular do not use 3DupACK) in public on multiple occasions, because in reliable transport space, that forces use of TCP Prague, a protocol with which we have little to no deployment or operational experience.  Moreover, that requirement raises the bar for other protocols in a fashion that impacts endpoint firmware, and possibly hardware in some important (IMHO) environments where investing in those changes delivers little to no benefit.  The environments that I have in mind include a lot of data centers.  Process wise, I'm ok with addressing this objection via some sort of "controlled environment" escape clause text that makes this RACK-like requirement inapplicable in a "controlled environment" that does not need that behavior (e.g., where 3DupACK does not cause problems and is not expected to cause problems).
>>>>>>> 
>>>>>>> For clarity, I understand the multi-lane link design rationale behind the RACK-like requirement and would agree with that requirement in a perfect world ... BUT ... this world is not perfect ... e.g., 3DupACK will not vanish from "running code" anytime soon.
>>>>>>> 
>>>>>>> 
>>>>>> As you know, we have been at pains to address every concern about L4S that has come up over the years, and I thought we had addressed this one to your satisfaction.
>>>>>> 
>>>>>> The reliable transports you are are concerned about require ordered delivery by the underlying fabric, so they can only ever exist in a controlled environment. In such a controlled environment, your ECT1+DSCP idea (below) could be used to isolate the L4S experiment from these transports and their firmware/hardware constraints.
>>>>>> 
>>>>>> On the public Internet, the DSCP commonly gets wiped at the first hop. So requiring a DSCP as well as ECT1 to separate off L4S would serve no useful purpose: it would still lead to ECT1 packets without the DSCP sent from a scalable congestion controls (which is behind Jonathan's concern in response to you).
>>>>>> 
>>>>>> 
>>>>> 	And this is why IPv4's protocol fiel/ IPv6's next header field are the classifier you actually need... You are changing a significant portion of TCP's observable behavior, so it can be argued that TCP-Prague is TCP by name only; this "classifier" still lives in the IP header, so no deeper layer's need to be accessed, this is non-leaky in that the classifier is unambiguously present independent of the value of the ECN bits; and it is also compatible with an SCE style ECN signaling. Since I believe the most/only likely roll-out of L4S is going to be at the ISPs access nodes (BRAS/BNG/CMTS/whatever)  middleboxes shpould not be an unsurmountable problem, as ISPs controll their own middleboxes and often even the CPEs, so protocoll ossification is not going to be a showstopper for this part of the roll-out.
>>>>> 
>>>>> Best Regards
>>>>> 	Sebastian
>>>>> 
>>>>> 
>>>>> 
>>>> I think you've understood this from reading abbreviated description of the requirement on the list, rather than the spec. The spec. solely says:
>>>> 	A scalable congestion control MUST detect loss by counting in time-based units
>>>> That's all. No more, no less. 
>>>> 
>>>> People call this the "RACK requirement", purely because the idea came from RACK. There is no requirement to do RACK, and the requirement applies to all transports, not just TCP.
>>>> 
>>> 	Fair enough, but my argument was not really about RACK at all, it more-so applies to the linear response to CE-marks that ECT(1) promises in the L4S approach. You are making changes to TCP's congestion controller that make it cease to be "TCP-friendly" (for arguably good reasons). So why insist on pretending that this is still TCP? So give it a new protocol ID already and all your classification needs are solved. As a bonus you do not need to use the same signal (CE) to elicit two different responses, but you could use the re-gained ECT(1) code point similarly to SCE to put the new fine-grained congestion signal into... while using CE in the RFC3168 compliant sense.
> 
> [BB] The protocol ID identifies the wire protocol, not the congestion control behaviour. If we had used a different protocol ID for each congestion control behaviour, we'd have run out of protocol IDs long ago (semi serious ;)


	[SM] Yes, I know, but you are proposing a massively incompatible "congestion control behaviour" for L4S that is not TCP-friendly, otherwise you would not need to deal with isolating your new style flows from the rest. For convenience (and since most of the other components are TCP-like) you package the whole thing as a congestion control module for TCP. My argument is, do not do that.
	As an aside, with this approach you are still at the mercy of OS and router manufacturers (okay Linux should be easy, but what is the plan of attack to get L4S behaviour into windows' TCP implementation; to me it seems your best bet would be to create a library for UDP that will do your L4S type response on top of UDP (you get resequencing tolerance for free ;) ), as long as you supply that library for all inportant OSes application writers can opt in without the need for OSes to change, but that is an aside.



> 
> This is a re-run of a debate that has already been had (in Jul 2015 - Nov 2016), which is recorded in the appendix of ecn-l4s-id here:
> https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-l4s-id-07#appendix-B.4

	[SM] Read it there, I just believe that the final choice of identifier was not the optimal one (I know this is all about trade-offs, I just happen to have different priorities than the L4S project; IMHO all the power to L4S as long as it does stay opt-in and has ZERO side-effects on existing internet users).


> Quoted and annotated below:
> 
>> B.4.  Protocol ID
>> 
>>    It has been suggested that a new ID in the IPv4 Protocol field or the
>>    IPv6 Next Header field could identify L4S packets.  However this
>>    approach is ruled out by numerous problems:
>> 
>>    o  A new protocol ID would need to be paired with the old one for
>>       each transport (TCP, SCTP, UDP, etc.);

	[SM] That is somewhat weak, as you are a) currently only pushing a TCP version, and you might want a UDP version (see above), (how many applications use anything but TCP or UDP?)

>> 
>>    o  In IPv6, there can be a sequence of Next Header fields, and it
>>       would not be obvious which one would be expected to identify a
>>       network service like L4S;
>> 
> In particular, the protocol ID / next header stays next to the upper layer header as a PDU gets encapsulated, possibly many times. So the protocol ID is not necessarily (rarely?) in the outer, particularly in IPv6, and it might be encrypted in IPSec.

	[SM] So, at a peering/transit point, which encapsulations are actually realistic? I would have thought that more or less raw IP packets are required to make the necessary routing decisions at a network's edge, same argument holds for the internet access links. At which points besides the ingress and egress of a network do you expect queueing to happen routinely? From my limited experience it really is at ingress/egress/transit, so which other hops will actually be realistic targets for an L4S-AQM?
	I also am not yet convinced that ISPs will really want to signal that their peering/transits are under-sized, so I am dubious that these will ever get L4S/SCE style signaling (but I hope I am overly pessimistic here).


> 
>>    o  A new protocol ID would rarely provide an end-to-end service,
>>       because It is well-known that new protocol IDs are often blocked
>>       by numerous types of middlebox;

	[SM] Yes, that is the strongest of these four arguments, at last to my layman's eyes.


>> 
>>    o  The approach is not a solution for AQMs below the IP layer;
>> 
>> 
> That last point means that the protocol ID is not designed to always propagate to the outer on encap and back from the outer on decap, whereas the ECN field is (and it's the only field that is).

	[SM] Fair enough, as indicated above, I am not really seeing hops that deal in non-IP packets to actually ever use L4S/SCE type signalling, so is that really a big problem?


> 
> more....
>>> 
>>> 
>>> 
>>>> It then means that a packet with ECT1 in the IP field can be forwarded without resequencing (no requirement - it just it /can/ be).
>>>> 
>>> 	Packets always "can" be forwarded without resequencing, the question is whether the end-points are going to like that... 
>>> And IMHO even RACK with its at maximum one RTT reordering windows gives intermediate hops not much to work with, without knowing the full RTT a cautious hop might allow itself one retransmission slot (so its own contribution to the RTT), but as far as I can tell they do that already. And tracking the RTT will require to keep per flow statistics, this also seems like it can get computationally expensive quickly... (I probably misunderstand how RACK works, but I fail to see how it will really allow more re-ordering, but that is also orthogonal to the L4S issues I try to raise).
>>> 
> [BB] No-one's suggesting reordering degree will adapt to measured RTT at run-time. 

	[SM] I know, as that would defeat the purpose, but that also puts severe limits on how much re-ordering budget a given link actually has.

> 
> See the original discussion on this point here:
> Vicious or Virtuous circle? Adapting reordering window to reordering degree
> 
> In summary, the uncertainty for the network is a feature not a bug. It means it has to keep reordering degree lower than the lowest likely RTT (or some fraction of it) that is expected for that link technology at the design stage. This will keep reordering low, but not too unnecessarily low (i.e. not 3 packets at the link rate).

	[SM] As I state above, a given link realistically will only be allowed one of its own local RTTs worth of re-ordering (other links might re-order as well, so no link can claim the full e2E RTT's worth of re-ordering all for itself). So all I can see for each link one or (if the link feels lucky) two re-transmit opportunities before the link needs to stall to resequenced packets again. Now, that might already be enough (and a sufficiently "batchy" link might transfer more than 3 packets in one haul).
	I naively thought that a link would only ever stall those flows with out-of-order packets and happily fill its upstream pipe with packets from unaffected flows, but that seems not to be happening.


> 
>>> 
>>>> This is a network layer 'unordered delivery' property, so it's appropriate to flag at the IP layer. 
>>>> 
>>> 	But at that point you are multiplexing multiple things into the poor ECT(1) codepoint, the promise of a certain "linear" back-off behavior on encountered congestion AND a "allow relaxed ordering" ( "detect loss by counting in time-based units" does not seem to be fully equivalent with a generic tolerance to 'unordered delivery' as far as I understand). That seems asking to much of a simple number...
> [BB] In a purist sense, it is a valid architectural criticism that we overload one codepoint with two architecturally distinct functions:
> 	• low queuing delay
> 	• low resequencing delay
> But then, one has to consider the value vs cost of 2 independent identifiers for two things that are unlikely to ever need to be distinguished. If an app wants low delay, would it want only low queuing delay and not low resequencing delay? 

	[SM] Sorry, I can well envision apps that do not care about "low queuing delay" but would be happy to give laxer reordering requirements to the network (like a bulk data transfer, that just wants to keep pushing packets through). Is that unrealistic? 

> 
> You could contrive a case where the receiver is memory-challenged and needs the network to do the resequencing.

	Well, packets are send in sequence, so the idea is not to burden the network with undue work, but rather to faithfully transmit what the endpoints send. 
(On a tangent, somewhere else you argued against FQ as it will take the dynamic packet spacing decisions away from the sending endpoint, but surely changing the order of packets is a far more grave intervention than just changing the interpacket intervals, no?)

> But it's not a reasonable expectation for the network to do a function that will cause HoL blocking for other applications in the process of helping you with your memory problems.
> 
> Given we are header-bit-challenged, it would not be unreasonable for the WG to decide to conflate these two architectural identifiers into one.
> 
> 
> Bob
> 
>>> 
>>> Best Regards
>>> 	Sebastian
>>> 
>>> 
>>>> 
>>>> Bob
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> ________________________________________________________________
>>>> Bob Briscoe                               
>>>> 
>>>> http://bobbriscoe.net/
>>> _______________________________________________
>>> Ecn-sane mailing list
>>> 
>>> Ecn-sane at lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/ecn-sane
>> 
>> -- 
>> ________________________________________________________________
>> Bob Briscoe                               
>> http://bobbriscoe.net/
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               
> http://bobbriscoe.net/



More information about the Ecn-sane mailing list