* [Ecn-sane] IETF 110 quick summary @ 2021-03-08 23:47 Pete Heist 2021-03-08 23:57 ` Dave Taht 0 siblings, 1 reply; 22+ messages in thread From: Pete Heist @ 2021-03-08 23:47 UTC (permalink / raw) To: ECN-Sane Just responding to Dave's ask for a quick IETF 110 summary on ecn-sane, after one day. We presented the data on ECN at MAPRG (https://datatracker.ietf.org/doc/draft-heist-tsvwg-ecn-deployment-observations/ ). It basically just showed that ECN is in use by endpoints (more as a proportion across paths than a proportion of flows), that RFC3168 AQMs do exist out there and are signaling, and that the ECN field can be misused. There weren't any questions, maybe because we were the last to present and were already short on time. We also applied that to L4S by first explaining that risk is the product of severity and prevalence, and tried to increase the awareness about the flow domination problem when L4S flows meet non-L4S flows (ECN or not) in a 3168 queue. Spreading this information seems to go slowly, as we're still hearing "oh really?", which leads me to believe 1) that people are tuning this debate out, and 2) it just takes a long time to comprehend, and to believe. It's still our stance that L4S can't be deployed due to its signalling design, or if it is, the end result is likely to be more bleaching and confusion with the DS field. There was a question I'd already heard before about why fq_codel is being deployed at an ISP, so I tried to cover that over in tsvwg. Basically, fq_codel is not ideal for this purpose, lacking host and subscriber fairness, but it's available and effective, so it's a good start. Wednesday's TSVWG session will be entirely devoted to L4S drafts. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-08 23:47 [Ecn-sane] IETF 110 quick summary Pete Heist @ 2021-03-08 23:57 ` Dave Taht 2021-03-09 2:13 ` Holland, Jake 2021-03-09 8:21 ` Pete Heist 0 siblings, 2 replies; 22+ messages in thread From: Dave Taht @ 2021-03-08 23:57 UTC (permalink / raw) To: Pete Heist; +Cc: ECN-Sane Thx very much for the update. I wanted to note that preseem does a lot of work with wisps and I wish they'd share more data on it, as well as our ever present mention of free.fr. Another data point is that apple's early rollout of ecn was kind of a failure, and there are now so many workarounds in the os for it as to make coherent testing impossible. I do wish there was more work on ecn enabling bbr, as presently it does negotiate ecn often and then completely ignores it. You can see this in traces from dropbox in particular. On Mon, Mar 8, 2021 at 3:47 PM Pete Heist <pete@heistp.net> wrote: > > Just responding to Dave's ask for a quick IETF 110 summary on ecn-sane, > after one day. We presented the data on ECN at MAPRG > (https://datatracker.ietf.org/doc/draft-heist-tsvwg-ecn-deployment-observations/ > ). It basically just showed that ECN is in use by endpoints (more as a > proportion across paths than a proportion of flows), that RFC3168 AQMs > do exist out there and are signaling, and that the ECN field can be > misused. There weren't any questions, maybe because we were the last to > present and were already short on time. > > We also applied that to L4S by first explaining that risk is the > product of severity and prevalence, and tried to increase the awareness > about the flow domination problem when L4S flows meet non-L4S flows > (ECN or not) in a 3168 queue. Spreading this information seems to go > slowly, as we're still hearing "oh really?", which leads me to believe > 1) that people are tuning this debate out, and 2) it just takes a long > time to comprehend, and to believe. It's still our stance that L4S > can't be deployed due to its signalling design, or if it is, the end > result is likely to be more bleaching and confusion with the DS field. > > There was a question I'd already heard before about why fq_codel is > being deployed at an ISP, so I tried to cover that over in tsvwg. > Basically, fq_codel is not ideal for this purpose, lacking host and > subscriber fairness, but it's available and effective, so it's a good > start. > > Wednesday's TSVWG session will be entirely devoted to L4S drafts. > > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane -- "For a successful technology, reality must take precedence over public relations, for Mother Nature cannot be fooled" - Richard Feynman dave@taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-08 23:57 ` Dave Taht @ 2021-03-09 2:13 ` Holland, Jake 2021-03-09 4:06 ` Steven Blake ` (2 more replies) 2021-03-09 8:21 ` Pete Heist 1 sibling, 3 replies; 22+ messages in thread From: Holland, Jake @ 2021-03-09 2:13 UTC (permalink / raw) To: Dave Taht, Pete Heist; +Cc: ECN-Sane The presentations were pretty great, but they were really short on time. In the chat a person or 2 was surprised about the way L4S will impact NECT competing traffic when competing in a queue. I agree some of the people who have tuned out the discussion are learning things from these presentations, and I thought Jonathan's slot was a good framing of the real question, and Pete's study was also very helpful. I seem to recall a thread in the wake of Apple's ECN enabling about one of the Linux distros considering turning ECN on by default for outbound connections, in which one of them found that it completely wrecked his throughput, and so it got tabled with unfortunately no pcap posted. Any recollection of where that was? I was guessing it might be one of the misbehaviors from the network that Apple encountered. I also thought Apple had a sysctl to disable the hold-downs and always use ECN in spite of the heuristics, did that not work? -Jake On 3/8/21, 3:57 PM, "Dave Taht" <dave.taht@gmail.com> wrote: Thx very much for the update. I wanted to note that preseem does a lot of work with wisps and I wish they'd share more data on it, as well as our ever present mention of free.fr. Another data point is that apple's early rollout of ecn was kind of a failure, and there are now so many workarounds in the os for it as to make coherent testing impossible. I do wish there was more work on ecn enabling bbr, as presently it does negotiate ecn often and then completely ignores it. You can see this in traces from dropbox in particular. On Mon, Mar 8, 2021 at 3:47 PM Pete Heist <pete@heistp.net> wrote: > > Just responding to Dave's ask for a quick IETF 110 summary on ecn-sane, > after one day. We presented the data on ECN at MAPRG > (https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-heist-tsvwg-ecn-deployment-observations/__;!!GjvTz_vk!AsneqOLeLWeNxzyWItOxlVbVQYefAMLslNpK4U9NEHw0dfUI0vDG7O07G3f1kzw$ > ). It basically just showed that ECN is in use by endpoints (more as a > proportion across paths than a proportion of flows), that RFC3168 AQMs > do exist out there and are signaling, and that the ECN field can be > misused. There weren't any questions, maybe because we were the last to > present and were already short on time. > > We also applied that to L4S by first explaining that risk is the > product of severity and prevalence, and tried to increase the awareness > about the flow domination problem when L4S flows meet non-L4S flows > (ECN or not) in a 3168 queue. Spreading this information seems to go > slowly, as we're still hearing "oh really?", which leads me to believe > 1) that people are tuning this debate out, and 2) it just takes a long > time to comprehend, and to believe. It's still our stance that L4S > can't be deployed due to its signalling design, or if it is, the end > result is likely to be more bleaching and confusion with the DS field. > > There was a question I'd already heard before about why fq_codel is > being deployed at an ISP, so I tried to cover that over in tsvwg. > Basically, fq_codel is not ideal for this purpose, lacking host and > subscriber fairness, but it's available and effective, so it's a good > start. > > Wednesday's TSVWG session will be entirely devoted to L4S drafts. > > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://urldefense.com/v3/__https://lists.bufferbloat.net/listinfo/ecn-sane__;!!GjvTz_vk!AsneqOLeLWeNxzyWItOxlVbVQYefAMLslNpK4U9NEHw0dfUI0vDG7O07L2Cfk-Y$ -- "For a successful technology, reality must take precedence over public relations, for Mother Nature cannot be fooled" - Richard Feynman dave@taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729 _______________________________________________ Ecn-sane mailing list Ecn-sane@lists.bufferbloat.net https://urldefense.com/v3/__https://lists.bufferbloat.net/listinfo/ecn-sane__;!!GjvTz_vk!AsneqOLeLWeNxzyWItOxlVbVQYefAMLslNpK4U9NEHw0dfUI0vDG7O07L2Cfk-Y$ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 2:13 ` Holland, Jake @ 2021-03-09 4:06 ` Steven Blake 2021-03-09 9:57 ` Pete Heist 2021-03-09 8:43 ` Pete Heist 2021-03-09 11:06 ` Jonathan Morton 2 siblings, 1 reply; 22+ messages in thread From: Steven Blake @ 2021-03-09 4:06 UTC (permalink / raw) To: Holland, Jake; +Cc: ECN-Sane If I'm a random network operator, not participating in any L4S experiments, and L4S traffic traversing my network hits a bottleneck, what happens? Consider all of the cases (no AQM tail-drop, AQM-drop, AQM-classic ECN). My understanding was that TCP-Prague's classic bottleneck detection code wasn't fully baked. On Tue, 2021-03-09 at 02:13 +0000, Holland, Jake wrote: > The presentations were pretty great, but they were really short > on time. In the chat a person or 2 was surprised about the way > L4S will impact NECT competing traffic when competing in a queue. > I agree some of the people who have tuned out the discussion are > learning things from these presentations, and I thought Jonathan's > slot was a good framing of the real question, and Pete's study was > also very helpful. > > I seem to recall a thread in the wake of Apple's ECN enabling about > one of the Linux distros considering turning ECN on by default for > outbound connections, in which one of them found that it completely > wrecked his throughput, and so it got tabled with unfortunately > no pcap posted. > > Any recollection of where that was? I was guessing it might be > one of the misbehaviors from the network that Apple encountered. > > I also thought Apple had a sysctl to disable the hold-downs and > always use ECN in spite of the heuristics, did that not work? > > -Jake > > On 3/8/21, 3:57 PM, "Dave Taht" <dave.taht@gmail.com> wrote: > > Thx very much for the update. I wanted to note that > preseem does a lot of work with wisps and I wish they'd share more > data on it, as well as our ever present mention of free.fr. > > Another data point is that apple's early rollout of ecn was kind of > a failure, and there are now so many workarounds in the os for it as > to make coherent testing impossible. > > I do wish there was more work on ecn enabling bbr, as presently > it does negotiate ecn often and then completely ignores it. You can > see this in traces from dropbox in particular. > > > > On Mon, Mar 8, 2021 at 3:47 PM Pete Heist <pete@heistp.net> wrote: > > Just responding to Dave's ask for a quick IETF 110 summary on ecn- > > sane, > > after one day. We presented the data on ECN at MAPRG > > ( > > https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-heist-tsvwg-ecn-deployment-observations/__;!!GjvTz_vk!AsneqOLeLWeNxzyWItOxlVbVQYefAMLslNpK4U9NEHw0dfUI0vDG7O07G3f1kzw$ > > > > ). It basically just showed that ECN is in use by endpoints (more > > as a > > proportion across paths than a proportion of flows), that RFC3168 > > AQMs > > do exist out there and are signaling, and that the ECN field can be > > misused. There weren't any questions, maybe because we were the > > last to > > present and were already short on time. > > > > We also applied that to L4S by first explaining that risk is the > > product of severity and prevalence, and tried to increase the > > awareness > > about the flow domination problem when L4S flows meet non-L4S flows > > (ECN or not) in a 3168 queue. Spreading this information seems to > > go > > slowly, as we're still hearing "oh really?", which leads me to > > believe > > 1) that people are tuning this debate out, and 2) it just takes a > > long > > time to comprehend, and to believe. It's still our stance that L4S > > can't be deployed due to its signalling design, or if it is, the > > end > > result is likely to be more bleaching and confusion with the DS > > field. > > > > There was a question I'd already heard before about why fq_codel is > > being deployed at an ISP, so I tried to cover that over in tsvwg. > > Basically, fq_codel is not ideal for this purpose, lacking host and > > subscriber fairness, but it's available and effective, so it's a > > good > > start. > > > > Wednesday's TSVWG session will be entirely devoted to L4S drafts. Regards, // Steve ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 4:06 ` Steven Blake @ 2021-03-09 9:57 ` Pete Heist 2021-03-09 13:53 ` Jonathan Morton 0 siblings, 1 reply; 22+ messages in thread From: Pete Heist @ 2021-03-09 9:57 UTC (permalink / raw) To: Steven Blake; +Cc: ECN-Sane On Mon, 2021-03-08 at 23:06 -0500, Steven Blake wrote: > If I'm a random network operator, not participating in any L4S > experiments, and L4S traffic traversing my network hits a bottleneck, > what happens? Consider all of the cases (no AQM tail-drop, AQM-drop, > AQM-classic ECN). > > My understanding was that TCP-Prague's classic bottleneck detection > code wasn't fully baked. Hi Steven, I'll take a crack at this as I see it anyway: *No AQM tail-drop & AQM-drop* Both _should_ be OK, as L4S transports, at least Prague, treat drop with a 50% MD (barring one bug which has been fixed). We have tested with straight tail-drop FIFOs and drop-based AQMs and afaik so far it was safe, even if performance wasn't ideal in all cases. *AQM-classic ECN, single queue* Severity: L4S flows drive competing flows, ECN capable or not, down to somewhere around minimum cwnd. FCT for shorter flows is also harmed, but some flows can do better, if they complete before getting out of SS. Prevalence: We're not sure how many single queue AQMs are enabled, so it's unclear how often this would be a problem. Maybe rarely, but it's hard to believe that there are zero single queue 3168 AQMs enabled out there. *AQM-classic ECN, FQ* Severity: Same as AQM-classic ECN single queue, _when there is a problem_. Prevalence: FQ protects competing flows, unless L4S and non-L4S traffic ends up in the same queue. This can happen with a hash collision, or maybe more commonly, with tunneled traffic in tunnels that support copying the ECN bits from the inner to the outer. If anyone thinks of any other reasons we haven't considered why competing flows would share the same 5-tuple and thus the same queue, do mention it. :) We've tried to get a handle on the percentage of random paths with fq_codel deployed. In one environment we measured around 10%, but that's still +/- an order of magnitude as for the general Internet, given that the study was relatively small (https://tools.ietf.org/html/draft-heist-tsvwg-ecn-deployment-observations-02#section-3.2 ). Lastly, not a safety problem but a performance problem, when L4S flows traverse ANY fq_codel bottleneck they impose delays on themselves, since they don't respond to CE in the way the AQM expects. That leads to intra-flow latency spikes, explained here: https://github.com/heistp/l4s-tests/#intra-flow-latency-spikes So, this will happen on whatever percentage of paths fq_codel, or any other RFC3168 AQM is deployed on. Delay spikes after rate reductions can be higher in Codel due to how the algorithm works. > On Tue, 2021-03-09 at 02:13 +0000, Holland, Jake wrote: > > The presentations were pretty great, but they were really short > > on time. In the chat a person or 2 was surprised about the way > > L4S will impact NECT competing traffic when competing in a queue. > > I agree some of the people who have tuned out the discussion are > > learning things from these presentations, and I thought Jonathan's > > slot was a good framing of the real question, and Pete's study was > > also very helpful. > > > > I seem to recall a thread in the wake of Apple's ECN enabling about > > one of the Linux distros considering turning ECN on by default for > > outbound connections, in which one of them found that it completely > > wrecked his throughput, and so it got tabled with unfortunately > > no pcap posted. > > > > Any recollection of where that was? I was guessing it might be > > one of the misbehaviors from the network that Apple encountered. > > > > I also thought Apple had a sysctl to disable the hold-downs and > > always use ECN in spite of the heuristics, did that not work? > > > > -Jake > > > > On 3/8/21, 3:57 PM, "Dave Taht" <dave.taht@gmail.com> wrote: > > > > Thx very much for the update. I wanted to note that > > preseem does a lot of work with wisps and I wish they'd share more > > data on it, as well as our ever present mention of free.fr. > > > > Another data point is that apple's early rollout of ecn was kind of > > a failure, and there are now so many workarounds in the os for it as > > to make coherent testing impossible. > > > > I do wish there was more work on ecn enabling bbr, as presently > > it does negotiate ecn often and then completely ignores it. You can > > see this in traces from dropbox in particular. > > > > > > > > On Mon, Mar 8, 2021 at 3:47 PM Pete Heist <pete@heistp.net> wrote: > > > Just responding to Dave's ask for a quick IETF 110 summary on ecn- > > > sane, > > > after one day. We presented the data on ECN at MAPRG > > > ( > > > https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-heist-tsvwg-ecn-deployment-observations/__;!!GjvTz_vk!AsneqOLeLWeNxzyWItOxlVbVQYefAMLslNpK4U9NEHw0dfUI0vDG7O07G3f1kzw$ > > > > > > ). It basically just showed that ECN is in use by endpoints (more > > > as a > > > proportion across paths than a proportion of flows), that RFC3168 > > > AQMs > > > do exist out there and are signaling, and that the ECN field can be > > > misused. There weren't any questions, maybe because we were the > > > last to > > > present and were already short on time. > > > > > > We also applied that to L4S by first explaining that risk is the > > > product of severity and prevalence, and tried to increase the > > > awareness > > > about the flow domination problem when L4S flows meet non-L4S flows > > > (ECN or not) in a 3168 queue. Spreading this information seems to > > > go > > > slowly, as we're still hearing "oh really?", which leads me to > > > believe > > > 1) that people are tuning this debate out, and 2) it just takes a > > > long > > > time to comprehend, and to believe. It's still our stance that L4S > > > can't be deployed due to its signalling design, or if it is, the > > > end > > > result is likely to be more bleaching and confusion with the DS > > > field. > > > > > > There was a question I'd already heard before about why fq_codel is > > > being deployed at an ISP, so I tried to cover that over in tsvwg. > > > Basically, fq_codel is not ideal for this purpose, lacking host and > > > subscriber fairness, but it's available and effective, so it's a > > > good > > > start. > > > > > > Wednesday's TSVWG session will be entirely devoted to L4S drafts. > > > Regards, > > // Steve > > > > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 9:57 ` Pete Heist @ 2021-03-09 13:53 ` Jonathan Morton 2021-03-09 14:27 ` Sebastian Moeller 2021-03-09 17:31 ` Steven Blake 0 siblings, 2 replies; 22+ messages in thread From: Jonathan Morton @ 2021-03-09 13:53 UTC (permalink / raw) To: Pete Heist; +Cc: Steven Blake, ECN-Sane > On 9 Mar, 2021, at 11:57 am, Pete Heist <pete@heistp.net> wrote: > > FQ protects competing flows, unless L4S and non-L4S traffic ends up in > the same queue. This can happen with a hash collision, or maybe more > commonly, with tunneled traffic in tunnels that support copying the ECN > bits from the inner to the outer. If anyone thinks of any other reasons > we haven't considered why competing flows would share the same 5-tuple > and thus the same queue, do mention it. Bob Briscoe's favourite defence to this, at the moment, seems to be that multiple flows sharing one tunnel are *also* disadvantaged when they share an FQ AQM bottleneck with multiple other flows that are not tunnelled, and which the FQ mechanism *can* distinguish. Obviously this is specious, but it's worth pinning down exactly *why* so we can explain it back to him (and more importantly, anyone else paying attention). Bob's scenario involves entirely conventional traffic, and a saturated bottleneck managed by an FQ-AQM (fq_codel), which is itself shared with at least one other flow. We assume that all AQMs in existing networks are ECN enabled (as distinct from the also-common policers which only drop). The FQ mechanism treats the tunnel as a single flow, and shares out bandwidth equally on that basis. So the throughput available to the tunnel as a whole is one share of the total, no matter how many flows occupy the tunnel. Additionally, the same AQM mark/drop rate is applied to everything in the tunnel, causing the flows using it to adopt an RTT-fair relationship to each other. The disadvantage experienced by the tunnel (relative to a plain AQM) is proportional to the number of flows using the tunnel, and only indirectly related to the number of other flows using the bottleneck. This I would classify as Minor severity, since it is a moderate, sustained effect. It increases in effect only linearly with the load on the tunnel, which is the same as at any ordinary bottleneck - and this is routinely tolerated. Note that if the tunnel is the only traffic using the bottleneck, the situation is equivalent to a plain, single-queue AQM. This is an important degenerate case, which we can come back to later. Also, in principle the effect can be avoided by either not using the tunnel, or by dividing the flows between multiple tunnels that the FQ mechanism *can* distinguish. This puts the risk into either an "involved participant" or "interested observer" category, unless the tunnel has been imposed on the user without knowledge or consent. What this means is that the tunnel user might reasonably consider the security or privacy benefit of the tunnel to outweigh the performance defect it incurs, and thereby choose to continue using it. Now, let us add one L4S flow to the tunnel, replacing one of the conventional flows in it, but keeping everything else the same. The conventional flows *outside* the tunnel are unaffected, because they are protected by the FQ-AQM. But the conventional flows *inside* the tunnel, which the FQ-AQM cannot protect because it cannot distinguish them, are immediately squashed to minimum cwnd or thereabouts, which may be considerably less than the fair-share BDP within that allocated by the tunnel. The L4S flow thereby grows to dominate the tunnel traffic as described elsewhere. This is clearly a Major severity effect, as the conventional traffic in the tunnel is seriously impaired. Note that if the tunnel shared a plain AQM bottleneck, without FQ, with other conventional flows outside the tunnel, these other flows would *also* be squashed by the L4S flow in the tunnel. This is because the AQM must increase its signalling rate considerably to control the L4S flow, and it applies the same signalling rate to all traffic. The FQ-AQM only increases signalling to the flow requiring it. Returning to the degenerate case where the tunnel is the only traffic using the bottleneck, the situation remains the same within the tunnel, and the behaviour is again equivalent to a plain AQM, with the L4S flow dominating and the conventional traffic severely impaired. The tunnel as a whole now occupies the full bottleneck rather than merely a fraction of it, but almost all of this extra capacity is used by the L4S flow, and can't be effectively used by the conventional flows within the tunnel. It is therefore clear that the effect is caused by the L4S flow meeting a conventional AQM, and not by the FQ mechanism. Furthermore, the effect of an L4S flow within a tunnel is *over and above* any effects imposed on the tunnel as a whole by an FQ-AQM. The main proposed solution to this is to upgrade the AQM at the bottleneck, so that it understands the ECT(1) signal distinguishing the L4S traffic from conventional traffic. But this imposes the burden of mitigating the problem on the existing network, an "innocent bystander". This is therefore clearly not an appropriate strategy; L4S should instead ensure that it reacts appropriately to congestion signals produced by existing networks, which by RFC-3168 compliance treat ECT(1) as equivalent to ECT(0). If L4S cannot do this reliably - and we doubt that it can - then it must either be redesigned to use an unambiguous signal, or explicitly confined to networks which have been prepared for it by removing/upgrading all conventional AQMs. We have proposed two possible methods of redesigning L4S, both of which have been rejected by the L4S team. - Jonathan Morton ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 13:53 ` Jonathan Morton @ 2021-03-09 14:27 ` Sebastian Moeller 2021-03-09 14:35 ` Dave Taht 2021-03-09 17:31 ` Steven Blake 1 sibling, 1 reply; 22+ messages in thread From: Sebastian Moeller @ 2021-03-09 14:27 UTC (permalink / raw) To: Jonathan Morton; +Cc: Pete Heist, ECN-Sane Hi Jonathan, > On Mar 9, 2021, at 14:53, Jonathan Morton <chromatix99@gmail.com> wrote: > >> On 9 Mar, 2021, at 11:57 am, Pete Heist <pete@heistp.net> wrote: >> >> FQ protects competing flows, unless L4S and non-L4S traffic ends up in >> the same queue. This can happen with a hash collision, or maybe more >> commonly, with tunneled traffic in tunnels that support copying the ECN >> bits from the inner to the outer. If anyone thinks of any other reasons >> we haven't considered why competing flows would share the same 5-tuple >> and thus the same queue, do mention it. > > Bob Briscoe's favourite defence to this, at the moment, seems to be that multiple flows sharing one tunnel are *also* disadvantaged when they share an FQ AQM bottleneck with multiple other flows that are not tunnelled, and which the FQ mechanism *can* distinguish. Obviously this is specious, but it's worth pinning down exactly *why* so we can explain it back to him (and more importantly, anyone else paying attention). [SM] I think the way forward in this would be to embrace the IPv6 flow label, and include it into the hash (sure will not help with IPv4 tunnels). That way even tunneled flows can reveal them selves to upper layers and get per-flow treatment (or they can decide to keep to their secret ways, their choice). I think that trying to abuse the flow label will result in massive reordering for the tunneled flow, so might still be a risk (but it seems hard for an abuser to gain more usable capacity). How do such tunnels behave in the prevalent FIFO's, do they actually get a share depending on their number of hidden constituent flows, or are they treated as a single flow? And in either case, isn't that not a policy question the operator of the bottleneck should be able to control? I snipped the rest of your excellent analysis, as I only want to bring up the flow-label to side-step that issue partially. This does not solve L4S' misdesign, but it will take Bob's argument the wind out of the sails to some degree... Best Regards Sebastian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 14:27 ` Sebastian Moeller @ 2021-03-09 14:35 ` Dave Taht 0 siblings, 0 replies; 22+ messages in thread From: Dave Taht @ 2021-03-09 14:35 UTC (permalink / raw) To: Sebastian Moeller; +Cc: Jonathan Morton, ECN-Sane I would certainly like to see more exploration of when and where the ipv6 flow label gets peed on. but as it is yet another untried idea... On Tue, Mar 9, 2021 at 6:27 AM Sebastian Moeller <moeller0@gmx.de> wrote: > > Hi Jonathan, > > > > On Mar 9, 2021, at 14:53, Jonathan Morton <chromatix99@gmail.com> wrote: > > > >> On 9 Mar, 2021, at 11:57 am, Pete Heist <pete@heistp.net> wrote: > >> > >> FQ protects competing flows, unless L4S and non-L4S traffic ends up in > >> the same queue. This can happen with a hash collision, or maybe more > >> commonly, with tunneled traffic in tunnels that support copying the ECN > >> bits from the inner to the outer. If anyone thinks of any other reasons > >> we haven't considered why competing flows would share the same 5-tuple > >> and thus the same queue, do mention it. > > > > Bob Briscoe's favourite defence to this, at the moment, seems to be that multiple flows sharing one tunnel are *also* disadvantaged when they share an FQ AQM bottleneck with multiple other flows that are not tunnelled, and which the FQ mechanism *can* distinguish. Obviously this is specious, but it's worth pinning down exactly *why* so we can explain it back to him (and more importantly, anyone else paying attention). > > [SM] I think the way forward in this would be to embrace the IPv6 flow label, and include it into the hash (sure will not help with IPv4 tunnels). That way even tunneled flows can reveal them selves to upper layers and get per-flow treatment (or they can decide to keep to their secret ways, their choice). I think that trying to abuse the flow label will result in massive reordering for the tunneled flow, so might still be a risk (but it seems hard for an abuser to gain more usable capacity). > How do such tunnels behave in the prevalent FIFO's, do they actually get a share depending on their number of hidden constituent flows, or are they treated as a single flow? And in either case, isn't that not a policy question the operator of the bottleneck should be able to control? > > I snipped the rest of your excellent analysis, as I only want to bring up the flow-label to side-step that issue partially. This does not solve L4S' misdesign, but it will take Bob's argument the wind out of the sails to some degree... > > Best Regards > Sebastian > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane -- "For a successful technology, reality must take precedence over public relations, for Mother Nature cannot be fooled" - Richard Feynman dave@taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 13:53 ` Jonathan Morton 2021-03-09 14:27 ` Sebastian Moeller @ 2021-03-09 17:31 ` Steven Blake 2021-03-09 17:50 ` Steven Blake 1 sibling, 1 reply; 22+ messages in thread From: Steven Blake @ 2021-03-09 17:31 UTC (permalink / raw) To: Jonathan Morton; +Cc: ECN-Sane TL;DR: L4S traffic sharing a queue with AQM-Classic ECN will crush non- L4S traffic. Thanks, this lines up with my prior understanding (wanted to make sure I wasn't missing any arguments from the zillions of back-and-forth emails on the tsvwg list). And I'm glad that at least they appear to behave correctly in the face of packet discards. The disaster scenario is that their experiment introduces performance issues in some unsuspecting operators, causing them to start bleaching ECN bits. Their whole safety plan depends on the claim that Classic RFC 3168 ECN is not deployed (except in fq_codel on the edge; who cares? they can patch their code). If that were the case, it would make more sense for them to try to move classic ECN to historic and redefine ECT(0) to signal L4S traffic (ala DCTCP). It's also been clear that this is not an effort to conduct an experiment. On Tue, 2021-03-09 at 15:53 +0200, Jonathan Morton wrote: > > On 9 Mar, 2021, at 11:57 am, Pete Heist <pete@heistp.net> wrote: > > > > FQ protects competing flows, unless L4S and non-L4S traffic ends up > > in > > the same queue. This can happen with a hash collision, or maybe > > more > > commonly, with tunneled traffic in tunnels that support copying the > > ECN > > bits from the inner to the outer. If anyone thinks of any other > > reasons > > we haven't considered why competing flows would share the same 5- > > tuple > > and thus the same queue, do mention it. > > Bob Briscoe's favourite defence to this, at the moment, seems to be > that multiple flows sharing one tunnel are *also* disadvantaged when > they share an FQ AQM bottleneck with multiple other flows that are > not tunnelled, and which the FQ mechanism *can* > distinguish. Obviously this is specious, but it's worth pinning down > exactly *why* so we can explain it back to him (and more importantly, > anyone else paying attention). > > Bob's scenario involves entirely conventional traffic, and a > saturated bottleneck managed by an FQ-AQM (fq_codel), which is itself > shared with at least one other flow. We assume that all AQMs in > existing networks are ECN enabled (as distinct from the also-common > policers which only drop). The FQ mechanism treats the tunnel as a > single flow, and shares out bandwidth equally on that basis. So the > throughput available to the tunnel as a whole is one share of the > total, no matter how many flows occupy the tunnel. Additionally, the > same AQM mark/drop rate is applied to everything in the tunnel, > causing the flows using it to adopt an RTT-fair relationship to each > other. > > The disadvantage experienced by the tunnel (relative to a plain AQM) > is proportional to the number of flows using the tunnel, and only > indirectly related to the number of other flows using the > bottleneck. This I would classify as Minor severity, since it is a > moderate, sustained effect. It increases in effect only linearly > with the load on the tunnel, which is the same as at any ordinary > bottleneck - and this is routinely tolerated. > > Note that if the tunnel is the only traffic using the bottleneck, the > situation is equivalent to a plain, single-queue AQM. This is an > important degenerate case, which we can come back to later. Also, in > principle the effect can be avoided by either not using the tunnel, > or by dividing the flows between multiple tunnels that the FQ > mechanism *can* distinguish. This puts the risk into either an > "involved participant" or "interested observer" category, unless the > tunnel has been imposed on the user without knowledge or > consent. What this means is that the tunnel user might reasonably > consider the security or privacy benefit of the tunnel to outweigh > the performance defect it incurs, and thereby choose to continue > using it. > > Now, let us add one L4S flow to the tunnel, replacing one of the > conventional flows in it, but keeping everything else the same. The > conventional flows *outside* the tunnel are unaffected, because they > are protected by the FQ-AQM. But the conventional flows *inside* the > tunnel, which the FQ-AQM cannot protect because it cannot distinguish > them, are immediately squashed to minimum cwnd or thereabouts, which > may be considerably less than the fair-share BDP within that > allocated by the tunnel. The L4S flow thereby grows to dominate the > tunnel traffic as described elsewhere. This is clearly a Major > severity effect, as the conventional traffic in the tunnel is > seriously impaired. > > Note that if the tunnel shared a plain AQM bottleneck, without FQ, > with other conventional flows outside the tunnel, these other flows > would *also* be squashed by the L4S flow in the tunnel. This is > because the AQM must increase its signalling rate considerably to > control the L4S flow, and it applies the same signalling rate to all > traffic. The FQ-AQM only increases signalling to the flow requiring > it. > > Returning to the degenerate case where the tunnel is the only traffic > using the bottleneck, the situation remains the same within the > tunnel, and the behaviour is again equivalent to a plain AQM, with > the L4S flow dominating and the conventional traffic severely > impaired. The tunnel as a whole now occupies the full bottleneck > rather than merely a fraction of it, but almost all of this extra > capacity is used by the L4S flow, and can't be effectively used by > the conventional flows within the tunnel. > > It is therefore clear that the effect is caused by the L4S flow > meeting a conventional AQM, and not by the FQ > mechanism. Furthermore, the effect of an L4S flow within a tunnel is > *over and above* any effects imposed on the tunnel as a whole by an > FQ-AQM. > > The main proposed solution to this is to upgrade the AQM at the > bottleneck, so that it understands the ECT(1) signal distinguishing > the L4S traffic from conventional traffic. But this imposes the > burden of mitigating the problem on the existing network, an > "innocent bystander". This is therefore clearly not an appropriate > strategy; L4S should instead ensure that it reacts appropriately to > congestion signals produced by existing networks, which by RFC-3168 > compliance treat ECT(1) as equivalent to ECT(0). > > If L4S cannot do this reliably - and we doubt that it can - then it > must either be redesigned to use an unambiguous signal, or explicitly > confined to networks which have been prepared for it by > removing/upgrading all conventional AQMs. We have proposed two > possible methods of redesigning L4S, both of which have been rejected > by the L4S team. > > - Jonathan Morton Regards, // Steve ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 17:31 ` Steven Blake @ 2021-03-09 17:50 ` Steven Blake 2021-03-09 18:07 ` Rodney W. Grimes ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Steven Blake @ 2021-03-09 17:50 UTC (permalink / raw) To: Jonathan Morton; +Cc: ECN-Sane On Tue, 2021-03-09 at 12:31 -0500, Steven Blake wrote: > Their whole safety plan depends on the claim that Classic RFC 3168 > ECN > is not deployed (except in fq_codel on the edge; who cares? they can > patch their code). If that were the case, it would make more sense > for > them to try to move classic ECN to historic and redefine ECT(0) to > signal L4S traffic (ala DCTCP). Actually, that is the ideal outcome. ECT(0) signals ECT-Capable, ECT(1) and CE signal two levels of congestion. In other words, SCE everywhere. Maybe that is an argument that you can throw at them: if it is safe to ignore classic ECN, might as well move straight to SCE with non-ECT traffic shunted off to a separate queue(s). Regards, // Steve ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 17:50 ` Steven Blake @ 2021-03-09 18:07 ` Rodney W. Grimes 2021-03-09 18:13 ` Pete Heist 2021-03-09 18:44 ` Holland, Jake 2 siblings, 0 replies; 22+ messages in thread From: Rodney W. Grimes @ 2021-03-09 18:07 UTC (permalink / raw) To: Steven Blake; +Cc: Jonathan Morton, ECN-Sane > On Tue, 2021-03-09 at 12:31 -0500, Steven Blake wrote: > > > Their whole safety plan depends on the claim that Classic RFC 3168 > > ECN > > is not deployed (except in fq_codel on the edge; who cares? they can > > patch their code). If that were the case, it would make more sense > > for > > them to try to move classic ECN to historic and redefine ECT(0) to > > signal L4S traffic (ala DCTCP). > > Actually, that is the ideal outcome. ECT(0) signals ECT-Capable, ECT(1) > and CE signal two levels of congestion. In other words, SCE everywhere. > > Maybe that is an argument that you can throw at them: if it is safe to > ignore classic ECN, might as well move straight to SCE with non-ECT > traffic shunted off to a separate queue(s). Would you be willing to float that infront of them? We have discussed this internal between Jonothan, Pete and myself, it is a viable solution. And iirc our discussion resulted in this ECT(0) being used to signal ECT or SCE treatment to be rather low risk. Right now any time we (SCE) try to float anything its shot down without any due consideration or discussion, sadly. > Regards, > // Steve -- Rod Grimes rgrimes@freebsd.org ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 17:50 ` Steven Blake 2021-03-09 18:07 ` Rodney W. Grimes @ 2021-03-09 18:13 ` Pete Heist 2021-03-09 19:51 ` Holland, Jake 2021-03-09 18:44 ` Holland, Jake 2 siblings, 1 reply; 22+ messages in thread From: Pete Heist @ 2021-03-09 18:13 UTC (permalink / raw) To: Steven Blake; +Cc: ECN-Sane On Tue, 2021-03-09 at 12:50 -0500, Steven Blake wrote: > On Tue, 2021-03-09 at 12:31 -0500, Steven Blake wrote: > > > Their whole safety plan depends on the claim that Classic RFC 3168 > > ECN > > is not deployed (except in fq_codel on the edge; who cares? they can > > patch their code). If that were the case, it would make more sense > > for > > them to try to move classic ECN to historic and redefine ECT(0) to > > signal L4S traffic (ala DCTCP). > > Actually, that is the ideal outcome. ECT(0) signals ECT-Capable, ECT(1) > and CE signal two levels of congestion. In other words, SCE everywhere. > > Maybe that is an argument that you can throw at them: if it is safe to > ignore classic ECN, might as well move straight to SCE with non-ECT > traffic shunted off to a separate queue(s). You've hit on what IMO is a serious inconsistency in section B.5 of the L4S-ID draft, which at one point explored that option: ----- B.5. ECN capability alone This approach uses ECN capability alone as the L4S identifier. It would only have been feasible if RFC 3168 ECN had not been widely deployed. This was the case when the choice of L4S identifier was being made and this appendix was first written. Since then, RFC 3168 ECN has been widely deployed and L4S did not take this approach anyway. So this approach is not discussed further, because it is no longer a feasible option. ---- On the one hand, the argument is that 3168 is *not* widely deployed when it comes to safety with existing AQMs, and on the other hand, it *is* widely deployed when it comes to selection of the identifier. I think this finally needs bringing up, maybe tomorrow. We had a conversation late last year around instead making a discontinuous upgrade to ECN/SCE by redefining ECT(0) to be the identifier, and I spent some time thinking about it. It's not without issues, but I wouldn't mind hearing other's thoughts on it before I pollute it with mine. Pete > Regards, > > // Steve > > > > > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/ecn-sane ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 18:13 ` Pete Heist @ 2021-03-09 19:51 ` Holland, Jake 2021-03-09 20:53 ` Pete Heist 0 siblings, 1 reply; 22+ messages in thread From: Holland, Jake @ 2021-03-09 19:51 UTC (permalink / raw) To: Pete Heist, Steven Blake; +Cc: ECN-Sane On 3/9/21, 10:13 AM, "Pete Heist" <pete@heistp.net> wrote: > On the one hand, the argument is that 3168 is *not* widely deployed > when it comes to safety with existing AQMs, and on the other hand, it > *is* widely deployed when it comes to selection of the identifier. I > think this finally needs bringing up, maybe tomorrow. I think they rephrased section B.2 to match up with this. Although B.5 probably does need some editorial work, I think the technical explanation is mostly the same as what's covered in B.2, so bringing this up probably has limited utility. I won't deny that there's some weird shifting of the purported reasoning behind stable conclusions, but I'll suggest that IMHO you're better off keeping the focus on the real crux of the issue, which I think is correctly articulated as harm to bystanders by deploying a new codepoint assignment for ECT(1) without first proving it can be used effectively without harm by most traffic under the prior meaning of that codepoint. (I'm less sure about the tunnels, which seem to be considered both so common that FQ can't address their latency and also ignorable wrt harm from sharing classic 3168 and TCP Prague traffic. Raising this point might at least bring them around on the idea that tunnels could be split by flows when it's useful, but probably also has limited utility overall.) > We had a conversation late last year around instead making a > discontinuous upgrade to ECN/SCE by redefining ECT(0) to be the > identifier, and I spent some time thinking about it. It's not without > issues, but I wouldn't mind hearing other's thoughts on it before I > pollute it with mine. They did at least update the draft to speak to this point in l4s-id B.3. I think the biggest objection on their side was that it's not a good classifier with chained aqms, and this problem gets worse as deployment increases. I still kinda like it as the least harmful, mostly only helpful option (assuming endpoints who negotiate support will also do better RACK-like support for reordering and switches will stop trying to do it). While it doesn't provide a great classifier, it at least provides a crappy one that doesn't hurt that much when you're wrong. -Jake ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 19:51 ` Holland, Jake @ 2021-03-09 20:53 ` Pete Heist 0 siblings, 0 replies; 22+ messages in thread From: Pete Heist @ 2021-03-09 20:53 UTC (permalink / raw) To: Holland, Jake; +Cc: ECN-Sane On Tue, 2021-03-09 at 19:51 +0000, Holland, Jake wrote: > On 3/9/21, 10:13 AM, "Pete Heist" <pete@heistp.net> wrote: > > On the one hand, the argument is that 3168 is *not* widely deployed > > when it comes to safety with existing AQMs, and on the other hand, > > it > > *is* widely deployed when it comes to selection of the identifier. > > I > > think this finally needs bringing up, maybe tomorrow. > > I think they rephrased section B.2 to match up with this. > > Although B.5 probably does need some editorial work, I think the > technical explanation is mostly the same as what's covered in B.2, > so bringing this up probably has limited utility. Ok, I'll trust that. I think they mainly meant there that they didn't want Apple devices polluting their green field L queue. > I won't deny that there's some weird shifting of the purported > reasoning behind stable conclusions, but I'll suggest that IMHO > you're better off keeping the focus on the real crux of the issue, > which I think is correctly articulated as harm to bystanders by > deploying a new codepoint assignment for ECT(1) without first proving > it can be used effectively without harm by most traffic under the > prior meaning of that codepoint. > > (I'm less sure about the tunnels, which seem to be considered both > so common that FQ can't address their latency and also ignorable wrt > harm from sharing classic 3168 and TCP Prague traffic. Raising > this point might at least bring them around on the idea that tunnels > could be split by flows when it's useful, but probably also has > limited utility overall.) These are good points. It's true that when we've tried to present arguments in teh past that waiver from these fundamental safety issues, they've almost never landed and end up being better left unsaid, or just written on the list. > > We had a conversation late last year around instead making a > > discontinuous upgrade to ECN/SCE by redefining ECT(0) to be the > > identifier, and I spent some time thinking about it. It's not > > without > > issues, but I wouldn't mind hearing other's thoughts on it before I > > pollute it with mine. > > They did at least update the draft to speak to this point in > l4s-id B.3. I think the biggest objection on their side was that > it's > not a good classifier with chained aqms, and this problem gets worse > as deployment increases. > > I still kinda like it as the least harmful, mostly only helpful > option (assuming endpoints who negotiate support will also do better > RACK-like support for reordering and switches will stop trying to do > it). While it doesn't provide a great classifier, it at least > provides a crappy one that doesn't hurt that much when you're wrong. By now I think of that idea as B.Jake. While I understood their loss of classification argument, it's a definite improvement on flow starvation. :) > -Jake > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 17:50 ` Steven Blake 2021-03-09 18:07 ` Rodney W. Grimes 2021-03-09 18:13 ` Pete Heist @ 2021-03-09 18:44 ` Holland, Jake 2021-03-09 19:09 ` Jonathan Morton 2 siblings, 1 reply; 22+ messages in thread From: Holland, Jake @ 2021-03-09 18:44 UTC (permalink / raw) To: Steven Blake, Jonathan Morton; +Cc: ECN-Sane On 3/9/21, 9:50 AM, "Steven Blake" <slblake@petri-meat.com> wrote: > Actually, that is the ideal outcome. ECT(0) signals ECT-Capable, ECT(1) > and CE signal two levels of congestion. In other words, SCE everywhere. > > Maybe that is an argument that you can throw at them: if it is safe to > ignore classic ECN, might as well move straight to SCE with non-ECT > traffic shunted off to a separate queue(s). The L4S drafts address this somewhat already, I think. The main argument is probably best-articulated in Appendix B.2 of l4s-id: https://datatracker.ietf.org/doc/html/draft-ietf-tsvwg-ecn-l4s-id-14#appendix-B.2 From more of a summary of (my recollection of) live discussion, the main reason it's rejected is that classic ECN traffic will not respond as quickly as L4S, so you could not get lower latency than classic ECN offers in a shared queue with competing classic ECN traffic, which would impede adoption by providing no latency benefit plus some throughput penalty for upgrading. (Also of note: L4S is trying to target bigger devices upstream of the home gateway, where flow-aware queuing is less practical, and also where most of the congestion and buffering delay occurs for those who are not throttling at their home gateway.) This was one point where a lot of the people in tsvwg explicitly expressed that you really do need a classifier to improve on classic aqm latency, hence preferred ECT(1)-as-input. (I think even some who do not agree it should be deployed due to safety concerns did agree with this point.) So I don't expect raising that point would be helpful. -Jake ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 18:44 ` Holland, Jake @ 2021-03-09 19:09 ` Jonathan Morton 2021-03-09 19:27 ` Holland, Jake 0 siblings, 1 reply; 22+ messages in thread From: Jonathan Morton @ 2021-03-09 19:09 UTC (permalink / raw) To: Holland, Jake; +Cc: Steven Blake, ECN-Sane > On 9 Mar, 2021, at 8:44 pm, Holland, Jake <jholland@akamai.com> wrote: > > …classic ECN traffic will not respond as quickly as L4S… I know it wasn't you making this claim, Jake, but I have to point out that it's completely false. Classic ECN transports actually respond *more* quickly to a CE mark than L4S transports. Let's walk through the processes. RFC-3168 TCP: A single CE mark is applied to a data segment. The receiver immediately sends an ACK with ECE set, and keeps ECE set on all further ACKs until a CWR cancels it. The sender gets the ECE, reduces the cwnd immediately, and sends the next data segment with CWR set to confirm it. Proportional Rate Reduction may be used to spread out the reduction in actual in-flight data. This takes at most one RTT. From the queue's perspective, one RTT (the minimum possible) elapses before the arrival rate from the sender halves (due to PRR). After two RTTs maximum, the in-flight data has reached a new, substantially lower value than the original. L4S TCP (Prague): A single CE mark is applied to a data segment. The receiver updates the CE mark counter in the next ACK. The sender sees the new counter value, and feeds it into a low-pass filter which operates on discrete time intervals. When the filter is next processed, on average a single CE mark results in half a segment being removed from the cwnd. Half the time, this results in no externally visible change to the data in flight. The other half, it is a very slight response. From the queue's perspective, one RTT plus (on average) half the filter window passes before any possible response reaches it, and half the time it's no response anyway, the other half a single segment reduction. Meanwhile, at least one segment has been added to the cwnd in the RTT-plus time since the mark was applied (due to Reno-style growth). I do not see how TCP Prague's response can be described as "faster" than that of standard TCP. - Jonathan Morton ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 19:09 ` Jonathan Morton @ 2021-03-09 19:27 ` Holland, Jake 2021-03-09 19:42 ` Jonathan Morton 0 siblings, 1 reply; 22+ messages in thread From: Holland, Jake @ 2021-03-09 19:27 UTC (permalink / raw) To: Jonathan Morton; +Cc: Steven Blake, ECN-Sane Sorry Jonathan, I think I didn't convey some context properly... On 3/9/21, 11:09 AM, "Jonathan Morton" <chromatix99@gmail.com> wrote: >> On 9 Mar, 2021, at 8:44 pm, Holland, Jake <jholland@akamai.com> wrote: >> >> …classic ECN traffic will not respond as quickly as L4S… > >I know it wasn't you making this claim, Jake, but I have to point out that it's completely false. Classic ECN transports actually respond *more* quickly to a CE mark than L4S transports. Here I meant to talk about an SCE-style low-congestion signal (in either 1->0 or 0->1 direction), which would be ignored by a classic endpoint but which a high-fidelity endpoint would respond to. So I'm not referring to a CE mark here, but rather an SCE mark, as I thought Steve was proposing with this bit: >> Maybe that is an argument that you can throw at them: if it is safe to >> ignore classic ECN, might as well move straight to SCE with non-ECT >> traffic shunted off to a separate queue(s). Sorry for any confusion there, I'm not in favor of talking past each other and I think we probably agree here if I've understood correctly. What I was trying to say is that an SCE response (specifically including an L4S-using-SCE response, though I think you had some intriguing alternate ideas to reduce the effect) would be faster than a classic response that ignores SCE and waits for a CE. I do agree with your explanation that a classic CC responds faster to a CE mark than TCP Prague, that's just not what I was trying to talk about. -Jake ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 19:27 ` Holland, Jake @ 2021-03-09 19:42 ` Jonathan Morton 0 siblings, 0 replies; 22+ messages in thread From: Jonathan Morton @ 2021-03-09 19:42 UTC (permalink / raw) To: Holland, Jake; +Cc: Steven Blake, ECN-Sane > On 9 Mar, 2021, at 9:27 pm, Holland, Jake <jholland@akamai.com> wrote: > >>> …classic ECN traffic will not respond as quickly as L4S… >> >> I know it wasn't you making this claim, Jake, but I have to point out that it's completely false. Classic ECN transports actually respond *more* quickly to a CE mark than L4S transports. > > Here I meant to talk about an SCE-style low-congestion signal (in > either 1->0 or 0->1 direction), which would be ignored by a classic > endpoint but which a high-fidelity endpoint would respond to. > > So I'm not referring to a CE mark here, but rather an SCE mark, as > I thought Steve was proposing with this bit: > >>> Maybe that is an argument that you can throw at them: if it is safe to >>> ignore classic ECN, might as well move straight to SCE with non-ECT >>> traffic shunted off to a separate queue(s). > > Sorry for any confusion there, I'm not in favor of talking past each > other and I think we probably agree here if I've understood correctly. > > What I was trying to say is that an SCE response (specifically > including an L4S-using-SCE response, though I think you had some > intriguing alternate ideas to reduce the effect) would be faster > than a classic response that ignores SCE and waits for a CE. Okay, that does make more sense. I probably wouldn't use "faster" or "not as quickly" to describe that, however. Such a description only makes sense if you pre-suppose a queue depth that rises monotonically over time. AIMD and HFCC responses do tend to need different operating points to work efficiently. HFCC can settle on a steady-state cwnd that is quite close to the true BDP. AIMD needs the peak queue depth to be significantly higher, to accommodate the deep sawtooth without losing too much goodput. So it entirely makes sense to set the thresholds for the two types of signalling accordingly. > I do agree with your explanation that a classic CC responds faster to > a CE mark than TCP Prague, that's just not what I was trying to talk > about. Sure. But the phrasing sounded so much like arguments that have indeed come from the L4S team - I'm sure you remember all the marketing BS that had to be cut out of their drafts, and there's still a lot of stuff there which I think is not supported (at best) by the data. - Jonathan Morton ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 2:13 ` Holland, Jake 2021-03-09 4:06 ` Steven Blake @ 2021-03-09 8:43 ` Pete Heist 2021-03-09 15:57 ` Holland, Jake 2021-03-09 11:06 ` Jonathan Morton 2 siblings, 1 reply; 22+ messages in thread From: Pete Heist @ 2021-03-09 8:43 UTC (permalink / raw) To: Holland, Jake; +Cc: ECN-Sane On Tue, 2021-03-09 at 02:13 +0000, Holland, Jake wrote: > The presentations were pretty great, but they were really short > on time. In the chat a person or 2 was surprised about the way > L4S will impact NECT competing traffic when competing in a queue. > I agree some of the people who have tuned out the discussion are > learning things from these presentations, and I thought Jonathan's > slot was a good framing of the real question, and Pete's study was > also very helpful. I'm glad to hear that. At least it adds something to your work before, from a different vantage point, albeit much smaller. I know that studies from entirely disinterested parties would be good too, but that might the hard part, they're disinterested! :) > I seem to recall a thread in the wake of Apple's ECN enabling about > one of the Linux distros considering turning ECN on by default for > outbound connections, in which one of them found that it completely > wrecked his throughput, and so it got tabled with unfortunately > no pcap posted. > > Any recollection of where that was? I was guessing it might be > one of the misbehaviors from the network that Apple encountered. That is odd and would be good to know about. I enabled ECN on my Linux laptop a long time ago and haven't noticed a problem that I'm aware of. I wish the distros would reconsider enabling it, unless there are active reasons it shouldn't be deployed, but they may now just be in a holding pattern on it. > I also thought Apple had a sysctl to disable the hold-downs and > always use ECN in spite of the heuristics, did that not work? > > -Jake > > On 3/8/21, 3:57 PM, "Dave Taht" <dave.taht@gmail.com> wrote: > > Thx very much for the update. I wanted to note that > preseem does a lot of work with wisps and I wish they'd share more > data on it, as well as our ever present mention of free.fr. > > Another data point is that apple's early rollout of ecn was kind of > a failure, and there are now so many workarounds in the os for it as > to make coherent testing impossible. > > I do wish there was more work on ecn enabling bbr, as presently > it does negotiate ecn often and then completely ignores it. You can > see this in traces from dropbox in particular. > > > > On Mon, Mar 8, 2021 at 3:47 PM Pete Heist <pete@heistp.net> wrote: > > > > Just responding to Dave's ask for a quick IETF 110 summary on ecn- > > sane, > > after one day. We presented the data on ECN at MAPRG > > ( > > https://urldefense.com/v3/__https://datatracker.ietf.org/doc/draft-heist-tsvwg-ecn-deployment-observations/__;!!GjvTz_vk!AsneqOLeLWeNxzyWItOxlVbVQYefAMLslNpK4U9NEHw0dfUI0vDG7O07G3f1kzw$ > > > > ). It basically just showed that ECN is in use by endpoints (more > > as a > > proportion across paths than a proportion of flows), that RFC3168 > > AQMs > > do exist out there and are signaling, and that the ECN field can be > > misused. There weren't any questions, maybe because we were the > > last to > > present and were already short on time. > > > > We also applied that to L4S by first explaining that risk is the > > product of severity and prevalence, and tried to increase the > > awareness > > about the flow domination problem when L4S flows meet non-L4S flows > > (ECN or not) in a 3168 queue. Spreading this information seems to > > go > > slowly, as we're still hearing "oh really?", which leads me to > > believe > > 1) that people are tuning this debate out, and 2) it just takes a > > long > > time to comprehend, and to believe. It's still our stance that L4S > > can't be deployed due to its signalling design, or if it is, the > > end > > result is likely to be more bleaching and confusion with the DS > > field. > > > > There was a question I'd already heard before about why fq_codel is > > being deployed at an ISP, so I tried to cover that over in tsvwg. > > Basically, fq_codel is not ideal for this purpose, lacking host and > > subscriber fairness, but it's available and effective, so it's a > > good > > start. > > > > Wednesday's TSVWG session will be entirely devoted to L4S drafts. > > > > > > _______________________________________________ > > Ecn-sane mailing list > > Ecn-sane@lists.bufferbloat.net > > > > https://urldefense.com/v3/__https://lists.bufferbloat.net/listinfo/ecn-sane__;!!GjvTz_vk!AsneqOLeLWeNxzyWItOxlVbVQYefAMLslNpK4U9NEHw0dfUI0vDG7O07L2Cfk-Y$ > > > > > > -- > "For a successful technology, reality must take precedence over > public > relations, for Mother Nature cannot be fooled" - Richard Feynman > > dave@taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729 > _______________________________________________ > Ecn-sane mailing list > Ecn-sane@lists.bufferbloat.net > > https://urldefense.com/v3/__https://lists.bufferbloat.net/listinfo/ecn-sane__;!!GjvTz_vk!AsneqOLeLWeNxzyWItOxlVbVQYefAMLslNpK4U9NEHw0dfUI0vDG7O07L2Cfk-Y$ > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 8:43 ` Pete Heist @ 2021-03-09 15:57 ` Holland, Jake 0 siblings, 0 replies; 22+ messages in thread From: Holland, Jake @ 2021-03-09 15:57 UTC (permalink / raw) To: Pete Heist; +Cc: ECN-Sane On 3/9/21, 12:43 AM, "Pete Heist" <pete@heistp.net> wrote: > I'm glad to hear that. At least it adds something to your work before, > from a different vantage point, albeit much smaller. I know that > studies from entirely disinterested parties would be good too, but that > might the hard part, they're disinterested! :) Yeah, sorry I couldn't manage to re-run my scripts successfully yet. I'm still curious to figure out if there's been any deployment motion, but the attempts that are easy to try haven't succeeded, and I haven't had time to refactor it to insist on an answer. (I re-tried a few times, hoping it was a cluster capacity issue that would sort itself out or that it would complete if I batched the jobs smaller, but no joy.) >> I seem to recall a thread in the wake of Apple's ECN enabling about >> one of the Linux distros considering turning ECN on by default for >> outbound connections, in which one of them found that it completely >> wrecked his throughput, and so it got tabled with unfortunately >> no pcap posted. >> >> Any recollection of where that was? I was guessing it might be >> one of the misbehaviors from the network that Apple encountered. > > That is odd and would be good to know about. I enabled ECN on my Linux > laptop a long time ago and haven't noticed a problem that I'm aware of. > I wish the distros would reconsider enabling it, unless there are > active reasons it shouldn't be deployed, but they may now just be in a > holding pattern on it. There's apparently a few misbehaving boxes out there, so using ECN from the wrong network location can leave you messed up, which I thought was why Apple had the heuristics running to detect pathologies and respond by turning off ECT for a while. But it's pretty hard to pin down what's happening (and from where) without a zillion clients running ECN from all over the world. Anyway, I thought I remembered Dave posting a link to that thread to one of the lists (maybe this one?) and commenting in the thread (my vague recollection was he was asking for a pcap but was told they had already moved on and couldn't get one easily, or some such). I also thought I remembered someone in that thread was maybe considering (or maybe just suggesting) adding something like the apple heuristics, so I was curious if anything ever happened. -Jake ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-09 2:13 ` Holland, Jake 2021-03-09 4:06 ` Steven Blake 2021-03-09 8:43 ` Pete Heist @ 2021-03-09 11:06 ` Jonathan Morton 2 siblings, 0 replies; 22+ messages in thread From: Jonathan Morton @ 2021-03-09 11:06 UTC (permalink / raw) To: Holland, Jake; +Cc: Dave Taht, Pete Heist, ECN-Sane > On 9 Mar, 2021, at 4:13 am, Holland, Jake <jholland@akamai.com> wrote: > > In the chat a person or 2 was surprised about the way > L4S will impact NECT competing traffic when competing in a queue. I think that was mostly Martin Duke. I caught up with him in the IETF Gather space immediately afterwards and discussed this with him, one to one, and he now seems to understand more clearly what we were presenting. I was pleased to hear that he's also familiar with the "risk matrix" formulation I presented. > We also applied that to L4S by first explaining that risk is the > product of severity and prevalence… And also, crucially, the concept of "externalised risk", ie. the distinction between involved participants, interested observers, and innocent bystanders. L4S has innocent bystanders (existing networks and their users, who have no idea that L4S even exists nor how to troubleshoot ECN related problems) incur most of the risk of Bad Things happening. This is an "externalised risk" which is very difficult to manage after the fact, and must be minimised to a much greater extent than other risks. SCE ensures that innocent bystanders incur virtually no risk, in that bad interactions only occur for poeple actually using SCE over an SCE-enabled path, which is where mitigations can actually be practical to employ - in the limit, by switching off SCE. This is much easier to accept in a risk analysis. We didn't get to that slide, however, due to shortage of time. - Jonathan Morton ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Ecn-sane] IETF 110 quick summary 2021-03-08 23:57 ` Dave Taht 2021-03-09 2:13 ` Holland, Jake @ 2021-03-09 8:21 ` Pete Heist 1 sibling, 0 replies; 22+ messages in thread From: Pete Heist @ 2021-03-09 8:21 UTC (permalink / raw) To: Dave Taht; +Cc: ECN-Sane On Mon, 2021-03-08 at 15:57 -0800, Dave Taht wrote: > Thx very much for the update. I wanted to note that > preseem does a lot of work with wisps and I wish they'd share more > data on it, as well as our ever present mention of free.fr. > > Another data point is that apple's early rollout of ecn was kind of > a failure, and there are now so many workarounds in the os for it as > to make coherent testing impossible. Is there any info on what way it was a failure, or what workarounds there are? > I do wish there was more work on ecn enabling bbr, as presently > it does negotiate ecn often and then completely ignores it. You can > see this in traces from dropbox in particular. I haven't tested BBR but briefly. I'd expect: - BBR harming itself through fq_codel as TCP RTT goes up, but if it also uses that as a signal to back off, I don't know the end result - Harm to competing traffic in a tunnel through fq_codel > On Mon, Mar 8, 2021 at 3:47 PM Pete Heist <pete@heistp.net> wrote: > > > > Just responding to Dave's ask for a quick IETF 110 summary on ecn- > > sane, > > after one day. We presented the data on ECN at MAPRG > > ( > > https://datatracker.ietf.org/doc/draft-heist-tsvwg-ecn-deployment-observations/ > > ). It basically just showed that ECN is in use by endpoints (more > > as a > > proportion across paths than a proportion of flows), that RFC3168 > > AQMs > > do exist out there and are signaling, and that the ECN field can be > > misused. There weren't any questions, maybe because we were the > > last to > > present and were already short on time. > > > > We also applied that to L4S by first explaining that risk is the > > product of severity and prevalence, and tried to increase the > > awareness > > about the flow domination problem when L4S flows meet non-L4S flows > > (ECN or not) in a 3168 queue. Spreading this information seems to > > go > > slowly, as we're still hearing "oh really?", which leads me to > > believe > > 1) that people are tuning this debate out, and 2) it just takes a > > long > > time to comprehend, and to believe. It's still our stance that L4S > > can't be deployed due to its signalling design, or if it is, the > > end > > result is likely to be more bleaching and confusion with the DS > > field. > > > > There was a question I'd already heard before about why fq_codel is > > being deployed at an ISP, so I tried to cover that over in tsvwg. > > Basically, fq_codel is not ideal for this purpose, lacking host and > > subscriber fairness, but it's available and effective, so it's a > > good > > start. > > > > Wednesday's TSVWG session will be entirely devoted to L4S drafts. > > > > > > _______________________________________________ > > Ecn-sane mailing list > > Ecn-sane@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/ecn-sane > > > ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2021-03-09 20:53 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-03-08 23:47 [Ecn-sane] IETF 110 quick summary Pete Heist 2021-03-08 23:57 ` Dave Taht 2021-03-09 2:13 ` Holland, Jake 2021-03-09 4:06 ` Steven Blake 2021-03-09 9:57 ` Pete Heist 2021-03-09 13:53 ` Jonathan Morton 2021-03-09 14:27 ` Sebastian Moeller 2021-03-09 14:35 ` Dave Taht 2021-03-09 17:31 ` Steven Blake 2021-03-09 17:50 ` Steven Blake 2021-03-09 18:07 ` Rodney W. Grimes 2021-03-09 18:13 ` Pete Heist 2021-03-09 19:51 ` Holland, Jake 2021-03-09 20:53 ` Pete Heist 2021-03-09 18:44 ` Holland, Jake 2021-03-09 19:09 ` Jonathan Morton 2021-03-09 19:27 ` Holland, Jake 2021-03-09 19:42 ` Jonathan Morton 2021-03-09 8:43 ` Pete Heist 2021-03-09 15:57 ` Holland, Jake 2021-03-09 11:06 ` Jonathan Morton 2021-03-09 8:21 ` Pete Heist
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox