[Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call

Discussion of explicit congestion notification's impact on the Internet
 help / color / mirror / Atom feed

* [Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call
       [not found] <CAA93jw5g8vMNEoV899tX=89HzHSG5s2E3sGU1EFVdqRkWryCqw@mail.gmail.com>
@ 2020-04-27 19:26 ` Dave Taht
  2020-04-29  9:31   ` Bob Briscoe
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Taht @ 2020-04-27 19:26 UTC (permalink / raw)
  To: ECN-Sane

just because I read this list more often than tsvwg.

---------- Forwarded message ---------
From: Dave Taht <dave.taht@gmail.com>
Date: Mon, Apr 27, 2020 at 12:24 PM
Subject: my backlogged comments on the ECT(1) interim call
To: tsvwg IETF list <tsvwg@ietf.org>
Cc: bloat <bloat@lists.bufferbloat.net>

It looks like the majority of what I say below is not related to the
fate of the "bit". The push to take the bit was
strong with this one, and me... can't we deploy more of what we
already got in places where it matters?

...

so: A) PLEA: From 10 years now, of me working on bufferbloat, working
on real end-user and wifi traffic and real networks....

I would like folk here to stop benchmarking two flows that run for a long time
and in one direction only... and thus exclusively in tcp congestion
avoidance mode.

Please. just. stop. Real traffic looks nothing like that. The internet
looks nothing like that.
The netops folk I know just roll their eyes up at benchmarks like this
that prove nothing and tell me to go to ripe meetings instead.
When y'all talk about "not looking foolish for not mandating ecn now",
you've already lost that audience with benchmarks like these.

Sure, setup a background flow(s)  like that, but then hit the result
with a mix of
far more normal traffic? Please? networks are never used unidirectionally
and both directions congesting is frequent. To illustrate that problem...

I have a really robust benchmark that we have used throughout the bufferbloat
project that I would like everyone to run in their environments, the flent
"rrul" test. Everybody on both sides has big enough testbeds setup that a few
hours spent on doing that - and please add in asymmetric networks especially -
and perusing the results ought to be enlightening to everyone as to the kind
of problems real people have, on real networks.

Can the L4S and SCE folk run the rrul test some day soon? Please?

I rather liked this benchmark that tested another traffic mix,

( https://www.cablelabs.com/wp-content/uploads/2014/06/DOCSIS-AQM_May2014.pdf )

although it had many flaws (like not doing dns lookups), I wish it
could be dusted off and used to compare this
new fangled ecn enabled stuff with the kind of results you can merely get
with packet loss and rtt awareness. It would be so great to be able
to directly compare all these new algorithms against this benchmark.

Adding in a non ecn'd udp based routing protocol on heavily
oversubscribed 100mbit link is also enlightening.

I'd rather like to see that benchmark improved for a more modernized
home traffic mix
where it is projected there may be 30 devices on the network on average,
in a few years.

If there is any one thing y'all can do to reduce my blood pressure and
keep me engaged here whilst you
debate the end of the internet as I understand it, it would be to run
the rrul test as part of all your benchmarks.

thank you.

B) Stuart Cheshire regaled us with several anecdotes - one concerning
his problems
with comcast's 1Gbit/35mbit service being unusable, under load, for
videoconferencing. This is true. The overbuffering at the CMTSes
still, has to be seen to be believed, at all rates. At lower rates
it's possible to shape this, with another device (which is what
the entire SQM deployment does in self defense and why cake has a
specific docsis ingress mode), but it is cpu intensive
and requires x86 hardware to do well at rates above 500Mbits, presently.

So I wish CMTS makers (Arris and Cisco) were in this room. are they?

(Stuart, if you'd like a box that can make your comcast link pleasurable
under all workloads, whenever you get back to los gatos, I've got a few
lying around. Was so happy to get a few ietfers this past week to apply
what's off the shelf for end users today. :)

C) I am glad bob said the L4S is finally looking at asymmetric
networks, and starting to tackle ack-filtering and accecn issues
there.

But... I would have *started there*. Asymmetric access is the predominate form
of all edge technologies.

I would love to see flent rrul test results for 1gig/35mbit, 100/10, 200/10
services, in particular. (from SCE also!). "lifeline" service (11/2)
would be good
to have results on. It would be especially good to have baseline
comparison data from the measured, current deployment
of the CMTSes at these rates, to start with, with no queue management in
play, then pie on the uplink, then fq_codel on the uplink, and then
this ecn stuff, and so on.

D) The two CPE makers in the room have dismissed both fq and sce as
being too difficult to implement. They did say that dualpi was
actually implemented in software, not hardware.

I would certainly like them to benchmark what they plan to offer in L4S
vs what is already available in the edgerouter X, as one low end
example among thousands.

I also have to note, at higher speeds, all the buffering moves into
the wifi and the results are currently ugly. I imagine
they are exploring how to fix their wifi stacks also? I wish more folk
were using RVR + latency benchmarks like this one:

http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf

Same goes for the LTE folk.

E) Andrew mcgregor mentioned how great it would be for a closeted musician to
be able to play in real time with someone across town. that has been my goal
for nearly 30 years now!! And although I rather enjoyed his participation in
my last talk on the subject (
https://blog.apnic.net/2020/01/22/bufferbloat-may-be-solved-but-its-not-over-yet/
) conflating
a need for ecn and l4s signalling for low latency audio applications
with what I actually said in that talk, kind of hurt. I achieved
"my 2ms fiber based guitarist to fiber based drummer dream" 4+ years
back with fq_codel and diffserv, no ecn required,
no changes to the specs, no mandating packets be undroppable" and
would like to rip the opus codec out of that mix one day.

F) I agree with jana that changing the definition of RFC3168 to suit
the RED algorithm (which is not pi or anything fancy) often present in
network switches,
today to suit dctcp, works. But you should say "configuring red to
have l4s marking style" and document that.

Sometimes I try to point out many switches have a form of DRR in them,
and it's helpful to use that in conjunction with whatever diffserv
markings you trust in your network.

To this day I wish someone would publish how much they use DCTCP style
signalling on a dc network relative to their other traffic.

To this day I keep hoping that someone will publish a suitable
set of RED parameters for a wide variety of switches and routers -
for the most common switches and ethernet chips, for correct DCTCP usage.

Mellonox's example:
( https://community.mellanox.com/s/article/howto-configure-ecn-on-mellanox-ethernet-switches--spectrum-x
) is not dctcp specific.

many switches have a form of DRR in them, and it's helpful to use that
in conjunction with whatever diffserv markings you trust in your
network,
and, as per the above example, segregate two red queues that way. From
what I see
above there is no way to differentiate ECT(0) from ECT(1) in that switch. (?)

I do keep trying to point out the size of the end user ecn enabled
deployment, starting with the data I have from free.fr. Are we
building a network for AIs or people?

G) Jana also made a point about 2 queues "being enough" (I might be
mis-remembering the exact point). Mellonoxes ethernet chips at 10Gig expose
64 hardware queues, some new intel hardware exposes 2000+. How do these
queues work relative to these algorithms?

We have generally found hw mq to be far less of a benefit than the
manufacturers think, especially as regard to
lower latency or reduced cpu usage (as cache crossing is a bear).
There is a lot of software work in this area left to be done, however
they are needed to match queues to cpus (and tenants)

Until sch_pie gained timestamping support recently, the rate estimator
did not work correctly in a hw mq environment. Haven't looked over
dualpi in this respect.

--
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729

-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call
  2020-04-27 19:26 ` [Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call Dave Taht
@ 2020-04-29  9:31   ` Bob Briscoe
  2020-04-29  9:46     ` [Ecn-sane] [tsvwg] " Sebastian Moeller
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Bob Briscoe @ 2020-04-29  9:31 UTC (permalink / raw)
  To: Dave Taht; +Cc: ECN-Sane, tsvwg IETF list

Dave,

Please don't tar everything with the same brush. Inline...

On 27/04/2020 20:26, Dave Taht wrote:
> just because I read this list more often than tsvwg.
>
> ---------- Forwarded message ---------
> From: Dave Taht <dave.taht@gmail.com>
> Date: Mon, Apr 27, 2020 at 12:24 PM
> Subject: my backlogged comments on the ECT(1) interim call
> To: tsvwg IETF list <tsvwg@ietf.org>
> Cc: bloat <bloat@lists.bufferbloat.net>
>
>
> It looks like the majority of what I say below is not related to the
> fate of the "bit". The push to take the bit was
> strong with this one, and me... can't we deploy more of what we
> already got in places where it matters?
>
> ...
>
> so: A) PLEA: From 10 years now, of me working on bufferbloat, working
> on real end-user and wifi traffic and real networks....
>
> I would like folk here to stop benchmarking two flows that run for a long time
> and in one direction only... and thus exclusively in tcp congestion
> avoidance mode.

[BB] All the results that the L4S team has ever published include short 
flow mixes either with or without long flows.
     2020: http://folk.uio.no/asadsa/ecn-fbk/results_v2.2/full_heatmap_rrr/
     2019: 
http://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf#subsection.4.2
     2019: https://www.files.netdevconf.info/f/febbe8c6a05b4ceab641/?dl=1
     2015: 
http://bobbriscoe.net/projects/latency/dctth_preprint.pdf#subsection.7.2

I think this implies you have never actually looked at our data, which 
would be highly concerning if true.

Regarding asymmetric links, as you will see in the 2015 and 2019 papers, 
our original tests were conducted over Al-Lu's broadband testbed with 
real ADSL lines, real home routers, etc. When we switched to a Linux 
testbed, we checked we were getting identical results to the testbed 
that used real broadband kit, but I admit we omitted to emulate the 
asymmetric upstream. As I said, we can add asymmetric tests back again, 
and we should.

Nonetheless, when testing Accurate ECN feedback specifically we have 
been watching for the reverse path, given AccECN is designed to handle 
ACK thinning, so we have to test that, esp. over WiFi.

>
> Please. just. stop. Real traffic looks nothing like that. The internet
> looks nothing like that.

[BB] Right from the start, we also tested L4S with numerous real 
applications on the same real broadband equipment testbed. Here's a 
paper that accompanied the demo we did at the Multimedia Systems 
conference in 2015 (remote camera in racing car over cloud-rendered VR 
goggles, cloud-rendered sub-view from a panoramic camera at a football 
match controlled by finger-gestures, web sessions, game traffic, and 
video streaming sessions all in parallel over a 40Mb/s broadband link):

https://riteproject.files.wordpress.com/2015/10/uld4all-demo_mmsys.pdf

We had tested that demo on the Al-Lu testbed with real equipment, but 
obviously the testbed we took to the conference had to be portable.


> The netops folk I know just roll their eyes up at benchmarks like this
> that prove nothing and tell me to go to ripe meetings instead.
> When y'all talk about "not looking foolish for not mandating ecn now",
> you've already lost that audience with benchmarks like these.
>
> Sure, setup a background flow(s)  like that, but then hit the result
> with a mix of
> far more normal traffic? Please? networks are never used unidirectionally
> and both directions congesting is frequent.

[BB] You may not be aware of the following work going on in the IETF at 
the mo. to do ACK thinning in the transport layer in order to address 
reverse path congestion:

https://github.com/quicwg/base-drafts/issues/1978
https://tools.ietf.org/html/draft-iyengar-quic-delayed-ack
https://tools.ietf.org/html/draft-fairhurst-quic-ack-scaling
https://tools.ietf.org/html/draft-gomez-tcpm-delack-suppr-reqs


> To illustrate that problem...
>
> I have a really robust benchmark that we have used throughout the bufferbloat
> project that I would like everyone to run in their environments, the flent
> "rrul" test. Everybody on both sides has big enough testbeds setup that a few
> hours spent on doing that - and please add in asymmetric networks especially -
> and perusing the results ought to be enlightening to everyone as to the kind
> of problems real people have, on real networks.
>
> Can the L4S and SCE folk run the rrul test some day soon? Please?

[BB] Does this measure the delay of every packet, so we can measure 
delay percentiles? I've asked you this a couple of times on these lists 
over the years. It looks like it still uses ping. You will see that all 
our results measure the delay of /every/ data packet.

Real time applications are sensitive to the higher percentiles of delay. 
If anyone is extracting delay percentiles from data that's so sparsely 
sampled, their results will be meaningless.


>
> I rather liked this benchmark that tested another traffic mix,
>
> ( https://www.cablelabs.com/wp-content/uploads/2014/06/DOCSIS-AQM_May2014.pdf )
>
> although it had many flaws (like not doing dns lookups), I wish it
> could be dusted off and used to compare this
> new fangled ecn enabled stuff with the kind of results you can merely get
> with packet loss and rtt awareness. It would be so great to be able
> to directly compare all these new algorithms against this benchmark.
>
> Adding in a non ecn'd udp based routing protocol on heavily
> oversubscribed 100mbit link is also enlightening.
>
> I'd rather like to see that benchmark improved for a more modernized
> home traffic mix
> where it is projected there may be 30 devices on the network on average,
> in a few years.

[BB] That was the idea of the MMSYS demo above (we didn't bother with 
the devices that would be low data rate, 'cos we had other game-like 
traffic that stood in for that).

And incidentally, that's where we discovered and fixed problems with DNS 
requests and SYNs.

A benchmark for this sort of scenario would certainly be useful.

>
> If there is any one thing y'all can do to reduce my blood pressure and
> keep me engaged here whilst you
> debate the end of the internet as I understand it, it would be to run
> the rrul test as part of all your benchmarks.

PS. Links to all the above are off the L4S landing page:
     https://riteproject.eu/dctth/

Cheers



Bob
>
> thank you.
>
> B) Stuart Cheshire regaled us with several anecdotes - one concerning
> his problems
> with comcast's 1Gbit/35mbit service being unusable, under load, for
> videoconferencing. This is true. The overbuffering at the CMTSes
> still, has to be seen to be believed, at all rates. At lower rates
> it's possible to shape this, with another device (which is what
> the entire SQM deployment does in self defense and why cake has a
> specific docsis ingress mode), but it is cpu intensive
> and requires x86 hardware to do well at rates above 500Mbits, presently.
>
> So I wish CMTS makers (Arris and Cisco) were in this room. are they?
>
> (Stuart, if you'd like a box that can make your comcast link pleasurable
> under all workloads, whenever you get back to los gatos, I've got a few
> lying around. Was so happy to get a few ietfers this past week to apply
> what's off the shelf for end users today. :)
>
> C) I am glad bob said the L4S is finally looking at asymmetric
> networks, and starting to tackle ack-filtering and accecn issues
> there.
>
> But... I would have *started there*. Asymmetric access is the predominate form
> of all edge technologies.
>
> I would love to see flent rrul test results for 1gig/35mbit, 100/10, 200/10
> services, in particular. (from SCE also!). "lifeline" service (11/2)
> would be good
> to have results on. It would be especially good to have baseline
> comparison data from the measured, current deployment
> of the CMTSes at these rates, to start with, with no queue management in
> play, then pie on the uplink, then fq_codel on the uplink, and then
> this ecn stuff, and so on.
>
> D) The two CPE makers in the room have dismissed both fq and sce as
> being too difficult to implement. They did say that dualpi was
> actually implemented in software, not hardware.
>
> I would certainly like them to benchmark what they plan to offer in L4S
> vs what is already available in the edgerouter X, as one low end
> example among thousands.
>
> I also have to note, at higher speeds, all the buffering moves into
> the wifi and the results are currently ugly. I imagine
> they are exploring how to fix their wifi stacks also? I wish more folk
> were using RVR + latency benchmarks like this one:
>
> http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf
>
> Same goes for the LTE folk.
>
> E) Andrew mcgregor mentioned how great it would be for a closeted musician to
> be able to play in real time with someone across town. that has been my goal
> for nearly 30 years now!! And although I rather enjoyed his participation in
> my last talk on the subject (
> https://blog.apnic.net/2020/01/22/bufferbloat-may-be-solved-but-its-not-over-yet/
> ) conflating
> a need for ecn and l4s signalling for low latency audio applications
> with what I actually said in that talk, kind of hurt. I achieved
> "my 2ms fiber based guitarist to fiber based drummer dream" 4+ years
> back with fq_codel and diffserv, no ecn required,
> no changes to the specs, no mandating packets be undroppable" and
> would like to rip the opus codec out of that mix one day.
>
> F) I agree with jana that changing the definition of RFC3168 to suit
> the RED algorithm (which is not pi or anything fancy) often present in
> network switches,
> today to suit dctcp, works. But you should say "configuring red to
> have l4s marking style" and document that.
>
> Sometimes I try to point out many switches have a form of DRR in them,
> and it's helpful to use that in conjunction with whatever diffserv
> markings you trust in your network.
>
> To this day I wish someone would publish how much they use DCTCP style
> signalling on a dc network relative to their other traffic.
>
> To this day I keep hoping that someone will publish a suitable
> set of RED parameters for a wide variety of switches and routers -
> for the most common switches and ethernet chips, for correct DCTCP usage.
>
> Mellonox's example:
> ( https://community.mellanox.com/s/article/howto-configure-ecn-on-mellanox-ethernet-switches--spectrum-x
> ) is not dctcp specific.
>
> many switches have a form of DRR in them, and it's helpful to use that
> in conjunction with whatever diffserv markings you trust in your
> network,
> and, as per the above example, segregate two red queues that way. From
> what I see
> above there is no way to differentiate ECT(0) from ECT(1) in that switch. (?)
>
> I do keep trying to point out the size of the end user ecn enabled
> deployment, starting with the data I have from free.fr. Are we
> building a network for AIs or people?
>
> G) Jana also made a point about 2 queues "being enough" (I might be
> mis-remembering the exact point). Mellonoxes ethernet chips at 10Gig expose
> 64 hardware queues, some new intel hardware exposes 2000+. How do these
> queues work relative to these algorithms?
>
> We have generally found hw mq to be far less of a benefit than the
> manufacturers think, especially as regard to
> lower latency or reduced cpu usage (as cache crossing is a bear).
> There is a lot of software work in this area left to be done, however
> they are needed to match queues to cpus (and tenants)
>
> Until sch_pie gained timestamping support recently, the rate estimator
> did not work correctly in a hw mq environment. Haven't looked over
> dualpi in this respect.
>
>
>
>
>
> --
> Make Music, Not War
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-435-0729
>
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ecn-sane] [tsvwg] Fwd: my backlogged comments on the ECT(1) interim call
  2020-04-29  9:31   ` Bob Briscoe
@ 2020-04-29  9:46     ` Sebastian Moeller
  2020-04-29 10:32       ` Bob Briscoe
  2020-04-29  9:49     ` [Ecn-sane] " Jonathan Morton
  2020-05-16 16:32     ` Dave Taht
  2 siblings, 1 reply; 10+ messages in thread
From: Sebastian Moeller @ 2020-04-29  9:46 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Dave Täht, ECN-Sane, tsvwg IETF list

Hi Bob,


> On Apr 29, 2020, at 11:31, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Dave,
> 
> Please don't tar everything with the same brush. Inline...
> 
> On 27/04/2020 20:26, Dave Taht wrote:
>> just because I read this list more often than tsvwg.
>> 
>> ---------- Forwarded message ---------
>> From: Dave Taht <dave.taht@gmail.com>
>> Date: Mon, Apr 27, 2020 at 12:24 PM
>> Subject: my backlogged comments on the ECT(1) interim call
>> To: tsvwg IETF list <tsvwg@ietf.org>
>> Cc: bloat <bloat@lists.bufferbloat.net>
>> 
>> 
>> It looks like the majority of what I say below is not related to the
>> fate of the "bit". The push to take the bit was
>> strong with this one, and me... can't we deploy more of what we
>> already got in places where it matters?
>> 
>> ...
>> 
>> so: A) PLEA: From 10 years now, of me working on bufferbloat, working
>> on real end-user and wifi traffic and real networks....
>> 
>> I would like folk here to stop benchmarking two flows that run for a long time
>> and in one direction only... and thus exclusively in tcp congestion
>> avoidance mode.
> 
> [BB] All the results that the L4S team has ever published include short flow mixes either with or without long flows.
>     2020: http://folk.uio.no/asadsa/ecn-fbk/results_v2.2/full_heatmap_rrr/
>     2019: http://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf#subsection.4.2
>     2019: https://www.files.netdevconf.info/f/febbe8c6a05b4ceab641/?dl=1
>     2015: http://bobbriscoe.net/projects/latency/dctth_preprint.pdf#subsection.7.2
> 
> I think this implies you have never actually looked at our data, which would be highly concerning if true.

	[SM] Bob, please take the time to read what Dave is asking for here, it is rather specific, and as far as I can tell has never been tested in all the years.

> 
> Regarding asymmetric links, as you will see in the 2015 and 2019 papers, our original tests were conducted over Al-Lu's broadband testbed with real ADSL lines, real home routers, etc. When we switched to a Linux testbed, we checked we were getting identical results to the testbed that used real broadband kit, but I admit we omitted to emulate the asymmetric upstream. As I said, we can add asymmetric tests back again, and we should.

	[SM] You tested an asymmetric link, with no AQM on the uplink and also no saturating traffic on the uplink, this is not the test Dave has been championing for years a fully saturating load by 4 or more capacity-seeking flows per direction. 
	As far as I can tell in all of the testing you did, you never got around to test this  for end-user rather important condition: the whole family/group is using the internet access link to its max. It is exactly that condition where latencies typically go through the roof and people resort to crude behavioral solutions (do not game while someone vide-conferences, and the like).
	The pure fact that you never tested this really IMHO demonstrates that magnitude of testing is no good proxy for quality of testing, especially since Dave repeatedly asked for bi-directionally saturating loads to be tested.

Best Regards
	Sebastian








^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call
  2020-04-29  9:31   ` Bob Briscoe
  2020-04-29  9:46     ` [Ecn-sane] [tsvwg] " Sebastian Moeller
@ 2020-04-29  9:49     ` Jonathan Morton
  2020-04-29 13:37       ` Bob Briscoe
  2020-05-16 16:32     ` Dave Taht
  2 siblings, 1 reply; 10+ messages in thread
From: Jonathan Morton @ 2020-04-29  9:49 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Dave Taht, ECN-Sane, tsvwg IETF list

> On 29 Apr, 2020, at 12:31 pm, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
>> Can the L4S and SCE folk run the rrul test some day soon? Please?
> 
> [BB] Does this measure the delay of every packet, so we can measure delay percentiles?

As shown in our test results, Flent (which implements the RRUL test) is indeed now capable of tracking the latency experienced by individual TCP flows.  It does not do so at the packet level, but at the socket level.

Of course it is also possible to capture the traffic and analyse the traces offline, if you really do want packet-level detail.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ecn-sane] [tsvwg] Fwd: my backlogged comments on the ECT(1) interim call
  2020-04-29  9:46     ` [Ecn-sane] [tsvwg] " Sebastian Moeller
@ 2020-04-29 10:32       ` Bob Briscoe
  2020-04-29 11:21         ` Sebastian Moeller
  0 siblings, 1 reply; 10+ messages in thread
From: Bob Briscoe @ 2020-04-29 10:32 UTC (permalink / raw)
  To: Sebastian Moeller; +Cc: Dave Täht, ECN-Sane, tsvwg IETF list

Sebastian,

On 29/04/2020 10:46, Sebastian Moeller wrote:
> Hi Bob,
>
>
>> On Apr 29, 2020, at 11:31, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>
>> Dave,
>>
>> Please don't tar everything with the same brush. Inline...
>>
>> On 27/04/2020 20:26, Dave Taht wrote:
>>> just because I read this list more often than tsvwg.
>>>
>>> ---------- Forwarded message ---------
>>> From: Dave Taht <dave.taht@gmail.com>
>>> Date: Mon, Apr 27, 2020 at 12:24 PM
>>> Subject: my backlogged comments on the ECT(1) interim call
>>> To: tsvwg IETF list <tsvwg@ietf.org>
>>> Cc: bloat <bloat@lists.bufferbloat.net>
>>>
>>>
>>> It looks like the majority of what I say below is not related to the
>>> fate of the "bit". The push to take the bit was
>>> strong with this one, and me... can't we deploy more of what we
>>> already got in places where it matters?
>>>
>>> ...
>>>
>>> so: A) PLEA: From 10 years now, of me working on bufferbloat, working
>>> on real end-user and wifi traffic and real networks....
>>>
>>> I would like folk here to stop benchmarking two flows that run for a long time
>>> and in one direction only... and thus exclusively in tcp congestion
>>> avoidance mode.
>> [BB] All the results that the L4S team has ever published include short flow mixes either with or without long flows.
>>      2020: http://folk.uio.no/asadsa/ecn-fbk/results_v2.2/full_heatmap_rrr/
>>      2019: http://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf#subsection.4.2
>>      2019: https://www.files.netdevconf.info/f/febbe8c6a05b4ceab641/?dl=1
>>      2015: http://bobbriscoe.net/projects/latency/dctth_preprint.pdf#subsection.7.2
>>
>> I think this implies you have never actually looked at our data, which would be highly concerning if true.
> 	[SM] Bob, please take the time to read what Dave is asking for here, it is rather specific, and as far as I can tell has never been tested in all the years.
>
>> Regarding asymmetric links, as you will see in the 2015 and 2019 papers, our original tests were conducted over Al-Lu's broadband testbed with real ADSL lines, real home routers, etc. When we switched to a Linux testbed, we checked we were getting identical results to the testbed that used real broadband kit, but I admit we omitted to emulate the asymmetric upstream. As I said, we can add asymmetric tests back again, and we should.
> 	[SM] You tested an asymmetric link, with no AQM on the uplink and also no saturating traffic on the uplink, this is not the test Dave has been championing for years a fully saturating load by 4 or more capacity-seeking flows per direction.
> 	As far as I can tell in all of the testing you did, you never got around to test this  for end-user rather important condition: the whole family/group is using the internet access link to its max. It is exactly that condition where latencies typically go through the roof and people resort to crude behavioral solutions (do not game while someone vide-conferences, and the like).
> 	The pure fact that you never tested this really IMHO demonstrates that magnitude of testing is no good proxy for quality of testing, especially since Dave repeatedly asked for bi-directionally saturating loads to be tested.

[BB] Fair enough, it's true there was not saturating traffic in the uplink.

Until we introduce ACK congestion control (host) or RFC3449-compliant 
ACK thinning (network), such tests would just prove the need for better 
ACK congestion control or good ACK thinning, and not properly test the 
thing we're both trying to test.

IOW, if you want to test a solution to problem A, there's no point using 
a scenario that deliberately shows up problem B, for which a solution 
isn't included in the test environment.

As is made clear in the informative part of the ECN++ draft ( 
https://tools.ietf.org/html/draft-ietf-tcpm-generalized-ecn-05#section-4.4.1 
), AccECN feedback provides an easy basis to integrate ACK congestion 
control with data congestion control. However, when you're digging the 
foundations for others to build on - you can't do all the building work 
as well - you have to rely on other people to take up that mantle (hence 
the pointers I gave to the other work on sender control of ACK thinning, 
that are now snipped from this thread).

This is something else I meant to say to Dave. When testing distributed 
systems, you have to test each change one at a time, to discover which 
changes are good and which are bad. That's why there is a role for tests 
with simple traffic mixes, even tho they're not realistic. Yes, it's 
also necessary to show a system works as a whole in a realistic setting 
(hence the tests I pointed to with real applications), but for testing 
purposes, changing multiple things at once gives little insight into 
potential problems or actual problems.

Regards



Bob



>
> Best Regards
> 	Sebastian
>
>
>
>
>
>
>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ecn-sane] [tsvwg] Fwd: my backlogged comments on the ECT(1) interim call
  2020-04-29 10:32       ` Bob Briscoe
@ 2020-04-29 11:21         ` Sebastian Moeller
  0 siblings, 0 replies; 10+ messages in thread
From: Sebastian Moeller @ 2020-04-29 11:21 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Dave Täht, ECN-Sane, tsvwg IETF list

Hi Bob,

more below in-line.

> On Apr 29, 2020, at 12:32, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> Sebastian,
> 
> On 29/04/2020 10:46, Sebastian Moeller wrote:
>> Hi Bob,
>> 
>> 
>>> On Apr 29, 2020, at 11:31, Bob Briscoe <ietf@bobbriscoe.net> wrote:
>>> 
>>> Dave,
>>> 
>>> Please don't tar everything with the same brush. Inline...
>>> 
>>> On 27/04/2020 20:26, Dave Taht wrote:
>>>> just because I read this list more often than tsvwg.
>>>> 
>>>> ---------- Forwarded message ---------
>>>> From: Dave Taht <dave.taht@gmail.com>
>>>> Date: Mon, Apr 27, 2020 at 12:24 PM
>>>> Subject: my backlogged comments on the ECT(1) interim call
>>>> To: tsvwg IETF list <tsvwg@ietf.org>
>>>> Cc: bloat <bloat@lists.bufferbloat.net>
>>>> 
>>>> 
>>>> It looks like the majority of what I say below is not related to the
>>>> fate of the "bit". The push to take the bit was
>>>> strong with this one, and me... can't we deploy more of what we
>>>> already got in places where it matters?
>>>> 
>>>> ...
>>>> 
>>>> so: A) PLEA: From 10 years now, of me working on bufferbloat, working
>>>> on real end-user and wifi traffic and real networks....
>>>> 
>>>> I would like folk here to stop benchmarking two flows that run for a long time
>>>> and in one direction only... and thus exclusively in tcp congestion
>>>> avoidance mode.
>>> [BB] All the results that the L4S team has ever published include short flow mixes either with or without long flows.
>>>     2020: http://folk.uio.no/asadsa/ecn-fbk/results_v2.2/full_heatmap_rrr/
>>>     2019: http://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf#subsection.4.2
>>>     2019: https://www.files.netdevconf.info/f/febbe8c6a05b4ceab641/?dl=1
>>>     2015: http://bobbriscoe.net/projects/latency/dctth_preprint.pdf#subsection.7.2
>>> 
>>> I think this implies you have never actually looked at our data, which would be highly concerning if true.
>> 	[SM] Bob, please take the time to read what Dave is asking for here, it is rather specific, and as far as I can tell has never been tested in all the years.
>> 
>>> Regarding asymmetric links, as you will see in the 2015 and 2019 papers, our original tests were conducted over Al-Lu's broadband testbed with real ADSL lines, real home routers, etc. When we switched to a Linux testbed, we checked we were getting identical results to the testbed that used real broadband kit, but I admit we omitted to emulate the asymmetric upstream. As I said, we can add asymmetric tests back again, and we should.
>> 	[SM] You tested an asymmetric link, with no AQM on the uplink and also no saturating traffic on the uplink, this is not the test Dave has been championing for years a fully saturating load by 4 or more capacity-seeking flows per direction.
>> 	As far as I can tell in all of the testing you did, you never got around to test this  for end-user rather important condition: the whole family/group is using the internet access link to its max. It is exactly that condition where latencies typically go through the roof and people resort to crude behavioral solutions (do not game while someone vide-conferences, and the like).
>> 	The pure fact that you never tested this really IMHO demonstrates that magnitude of testing is no good proxy for quality of testing, especially since Dave repeatedly asked for bi-directionally saturating loads to be tested.
> 
> [BB] Fair enough, it's true there was not saturating traffic in the uplink.

	[SM] Yes, please report the outcome of such a test, as this s not an irrelevant corner-case, but one of the conditions where L4S needs to shine if you want it to succeed. The idea here would be obviously to instantiate one L4S AQM instance per direction, just as the RFC's propose L4S to be rolled-out. 

QUESTON: I take it that you actually have tested topologies with one DualQ instance per direction, but you never attempted to saturate bot directions simultaneously, or did you only ever test topologies with a single AQM instance?


> 
> Until we introduce ACK congestion control (host) or RFC3449-compliant ACK thinning (network), such tests would just prove the need for better ACK congestion control or good ACK thinning, and not properly test the thing we're both trying to test.

	[SM] Pardon me? Please go and try standard SQM with either HTB+fq_codel or just cake and revisit the idea about ACK traffic to be an unsurmountable challenge. And for your entertainment cake actually has a optional ACK filter... 
	What I want to say, I have tested that condition already and the relative low-latency solution I use copes with that rather nasty traffic pattern pretty well, from reading the L4S drafts my understanding that L4S should also perform admirably here. 

> 
> IOW, if you want to test a solution to problem A, there's no point using a scenario that deliberately shows up problem B, for which a solution isn't included in the test environment.

	[SM] So you are saying that L4S is not suitable for bi-directionally saturating traffic? I have long complained about the lack of testing against adversarial traffic patterns in L4S, but here we are not talking adversarial traffic, but really the bread and butter traffic, a solution that claims to deliver low-latency, low-loss and high throughput needs to cope with gracefully.
	IMHO, ACK thinning is a red herring, as the acceptable performance of SQM (without ACK thinning) under these conditions demonstrates. 

> 
> As is made clear in the informative part of the ECN++ draft ( https://tools.ietf.org/html/draft-ietf-tcpm-generalized-ecn-05#section-4.4.1 ), AccECN feedback provides an easy basis to integrate ACK congestion control with data congestion control.

	[SM] All fine, but if L4S really depends on this for normal operation, this needs to be honestly described in the L4S RFC IMHO, especially what the expected consequences are of not fixing ACK congestion.

> However, when you're digging the foundations for others to build on -

	[SM] How about we let history be the judge of that?

> you can't do all the building work as well - you have to rely on other people to take up that mantle (hence the pointers I gave to the other work on sender control of ACK thinning, that are now snipped from this thread).

	[SM] Well, if you promise the world, the onus is on you to deliver it. If, as it seems, you just agreed, that L4S as currently designed and implemented will not gracefully deal with bi-directionally saturating loads... Puzzled, why you believe the releasing L4S into the wild before that open flank is closed? 

> 
> This is something else I meant to say to Dave. When testing distributed systems, you have to test each change one at a time, to discover which changes are good and which are bad.

	[SM] And then you need to test whether what you learned in the isolated/reduced cases also holds for the interactions...

> That's why there is a role for tests with simple traffic mixes, even tho they're not realistic. Yes, it's also necessary to show a system works as a whole in a realistic setting (hence the tests I pointed to with real applications), but for testing purposes, changing multiple things at once gives little insight into potential problems or actual problems.

	[SM] I am puzzled, if you consider bi-directional saturating traffic to be theoretically problematic for L4S, why did you not actually go and confirm that, and then try to figure out remedies INSIDE of the L4S framework (and with that I mean the ietf relevant portions, if you consider TCP Prague as part of L4S, It would be nice to see an RFC draft for that)

Regards
	Sebastian


> 
> Regards
> 
> 
> 
> Bob
> 
> 
> 
>> 
>> Best Regards
>> 	Sebastian
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call
  2020-04-29  9:49     ` [Ecn-sane] " Jonathan Morton
@ 2020-04-29 13:37       ` Bob Briscoe
  2020-04-29 15:07         ` [Ecn-sane] [tsvwg] " Sebastian Moeller
  2020-04-29 16:03         ` [Ecn-sane] " Jonathan Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Bob Briscoe @ 2020-04-29 13:37 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Dave Taht, ECN-Sane, tsvwg IETF list

Jonathan,

On 29/04/2020 10:49, Jonathan Morton wrote:
>> On 29 Apr, 2020, at 12:31 pm, Bob Briscoe<ietf@bobbriscoe.net>  wrote:
>>
>>> Can the L4S and SCE folk run the rrul test some day soon? Please?
>> [BB] Does this measure the delay of every packet, so we can measure delay percentiles?
> As shown in our test results, Flent (which implements the RRUL test) is indeed now capable of tracking the latency experienced by individual TCP flows.  It does not do so at the packet level, but at the socket level.
>
> Of course it is also possible to capture the traffic and analyse the traces offline, if you really do want packet-level detail.

1. So, why do you continue to use this approach? It hides all the larger 
delays underneath a moving average. This seems more designed to hide 
inconsistent delay results than to measure them. If a real-time 
application waited only for the median delay before rendering, it would 
have to discard the 50% of traffic in the upper median! TCP also 
delivers nothing until the more delayed packets have been delivered to 
get the stream in order. So median (or mean) delay is a meaningless metric.

2. I also notice you didn't address Dave's point about using short flows 
as well as long. I don't believe I have ever seen a test from you or 
PeteH with anything but long flows.

3. I should add that I know you personally have tried to address the 
asymmetric capacity problem with the ACK thinning in CAKE.

Bob

>   - Jonathan Morton
>

-- 
________________________________________________________________
Bob Briscoehttp://bobbriscoe.net/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ecn-sane] [tsvwg] Fwd: my backlogged comments on the ECT(1) interim call
  2020-04-29 13:37       ` Bob Briscoe
@ 2020-04-29 15:07         ` Sebastian Moeller
  2020-04-29 16:03         ` [Ecn-sane] " Jonathan Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Sebastian Moeller @ 2020-04-29 15:07 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Jonathan Morton, ECN-Sane, tsvwg IETF list

Hi Bob,

> On Apr 29, 2020, at 15:37, Bob Briscoe <ietf@bobbriscoe.net> wrote:
> 
> [...]
> 3. I should add that I know you personally have tried to address the asymmetric capacity problem with the ACK thinning in CAKE.

ACK thinning in cake happened because Ryan Mounce on a rather unsavory asymmetric DOCSIS link (120/2.5 or 48:1) wanted to free some upstream capacity for data (with cake in the path his DOCSIS-Modem stopped ACK filtering, since it never saw the queue necessary to collect the ACKs to "merge"). Also ACK compression was eating into the downstream rates as well (turns out ACK "clocking" is a rather optimistic concept to rely on once bandwidth get tight and jitter gets high). 
The root cause here was the arguably misconfigured asymmetry of his link by his ISP, and that can unfortunately not be addressed by cake's ACK-filter, just worked-around. 

That said, in your case with the first mover being CableLabs/DOCSIS, I would have assumed that you (or cablelabs) would simply have grandfathered in the ACK filter into the design of the upstream shaper instance. Did you not?

Best Regards
	Sebastian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call
  2020-04-29 13:37       ` Bob Briscoe
  2020-04-29 15:07         ` [Ecn-sane] [tsvwg] " Sebastian Moeller
@ 2020-04-29 16:03         ` Jonathan Morton
  1 sibling, 0 replies; 10+ messages in thread
From: Jonathan Morton @ 2020-04-29 16:03 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: Dave Taht, ECN-Sane, tsvwg IETF list

>> As shown in our test results, Flent (which implements the RRUL test) is indeed now capable of tracking the latency experienced by individual TCP flows.  It does not do so at the packet level, but at the socket level.
>> 
>> Of course it is also possible to capture the traffic and analyse the traces offline, if you really do want packet-level detail.
> 
> 1. So, why do you continue to use this approach? It hides all the larger delays underneath a moving average. This seems more designed to hide inconsistent delay results than to measure them. If a real-time application waited only for the median delay before rendering, it would have to discard the 50% of traffic in the upper median! TCP also delivers nothing until the more delayed packets have been delivered to get the stream in order. So median (or mean) delay is a meaningless metric.

Because we are not laser-focused on the five-nines tail of latency as you are, but instead take a broader view that recognises some tradeoffs.  For us, the 99th percentile is interesting, but anything beyond that is not worth sacrificing other desirable properties for.  And, as I said, we can extract whatever per-packet statistics we like from a packet trace.

The time-series plot, however, tells us much more about the gross behaviour.  It tells us, for example, that the reason you have such good five-nines behaviour is because you drop out of slow-start extremely early, so you end up having to grow linearly to the BDP instead of using some faster function.  I bet when you get around to implementing Paced Chirping to correct that, you'll find that your five-nines latency takes a hit from the queue pacing the probe trains.  That's a prediction I'd like to see tested.

> 2. I also notice you didn't address Dave's point about using short flows as well as long. I don't believe I have ever seen a test from you or PeteH with anything but long flows.

That's mostly a tools problem.  Flent doesn't currently have a way to generate that type of traffic and incorporate it into a test run.  It's on our todo list though.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call
  2020-04-29  9:31   ` Bob Briscoe
  2020-04-29  9:46     ` [Ecn-sane] [tsvwg] " Sebastian Moeller
  2020-04-29  9:49     ` [Ecn-sane] " Jonathan Morton
@ 2020-05-16 16:32     ` Dave Taht
  2 siblings, 0 replies; 10+ messages in thread
From: Dave Taht @ 2020-05-16 16:32 UTC (permalink / raw)
  To: Bob Briscoe; +Cc: ECN-Sane, tsvwg IETF list

On Wed, Apr 29, 2020 at 2:31 AM Bob Briscoe <ietf@bobbriscoe.net> wrote:
>
> Dave,
>
> Please don't tar everything with the same brush. Inline...
>
> On 27/04/2020 20:26, Dave Taht wrote:
> > just because I read this list more often than tsvwg.
> >
> > ---------- Forwarded message ---------
> > From: Dave Taht <dave.taht@gmail.com>
> > Date: Mon, Apr 27, 2020 at 12:24 PM
> > Subject: my backlogged comments on the ECT(1) interim call
> > To: tsvwg IETF list <tsvwg@ietf.org>
> > Cc: bloat <bloat@lists.bufferbloat.net>
> >
> >
> > It looks like the majority of what I say below is not related to the
> > fate of the "bit". The push to take the bit was
> > strong with this one, and me... can't we deploy more of what we
> > already got in places where it matters?
> >
> > ...
> >
> > so: A) PLEA: From 10 years now, of me working on bufferbloat, working
> > on real end-user and wifi traffic and real networks....
> >
> > I would like folk here to stop benchmarking two flows that run for a long time
> > and in one direction only... and thus exclusively in tcp congestion
> > avoidance mode.
>
> [BB] All the results that the L4S team has ever published include short
> flow mixes either with or without long flows.
>      2020: http://folk.uio.no/asadsa/ecn-fbk/results_v2.2/full_heatmap_rrr/
>      2019:
> http://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf#subsection.4.2
>      2019: https://www.files.netdevconf.info/f/febbe8c6a05b4ceab641/?dl=1
>      2015:
> http://bobbriscoe.net/projects/latency/dctth_preprint.pdf#subsection.7.2
>
> I think this implies you have never actually looked at our data, which
> would be highly concerning if true.

I have never had access to your *data*. Just papers that cherry pick
results that support your arguments. No repeatable experiments, no
open source code, the only thing consistent about them has been...
irreproduceable results. Once upon a time I was invited to keynote
a talk at sigcomm (
https://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf ),
where I had an opportunity to lay into not
sad state of network research today but all of science (they've not
invited me back).

So in researching the state of the art since I last checked in, I did
go and read y'alls more recent stuff. Taking on this one :

http://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf#subsection.4.2

The experimental testbed design is decent. The actual experiment laid
out in that section was as a test of everything... but the
behaviors of the traffic types I care about most: voip and
videoconferencing and web. I found
the graphs in the appendix too difficult to compare and unreadable,
and I would have preferred comparison plots.

A) Referring to some page or another of my above paper... It came with
"ludicrous constants". For a 40Mbit link, it had:

Buffer: 40,000 pkt, ECN enabled
Pie: Configured to drop at 25% probability # We put in 10% as an
escape valve in the rfc, why 25%? Did it engage?
fq_codel: default constants -
dualpi: Target delay: 15 ms, TUpdate: 16 ms, L4S T: 1 ms, WRR Cweight:
10%,α: 0.16,β: 3.2, k: 2, Classic ECNdrop: 25%

The source code I have for dualpi has a 1000 packet buffer. The dualpi
example code
(when last I looked at it) had 0 probability of drop. A naive user
would just use that default.

Secondly, your experiment seems to imply y'all think drop will never
happen in the ll queue, even when ping -Q 1 -s 1000 -f is sufficient
to demonstrate that probability.

OK, so this gets me to...

Most of the cpe and home router hardware I work with doesn't have much
more than 64MB of memory, into you also have to fit a full operating
system, routing table, utilities and so on. GRO is a thing, so the
peak amount of memory a 40,000 packet buffer might use is is 40000 *
1500 * 64 = 3,840,000,000. ~4GB of memory. Worst case.
For each interface in the system. For a 40Mbit simulation. Despite
decades of work on making OSes reliable, running out of memory
in any given component tends to have bad sideffects.

OK, had this been a repeatable experiment, I'd have plugged in real
world values, and repeated it. I think on some bug report or another
I suggested y'all switch to byte, rather than packet limits, for the
code, as you will especially see, mixed up and down traffic on the
rrul_be
tends to either exhaust a short fixed length packet fifo, or clog it
up, if it's longer. Byte limits (and especially bql) is a much better
approximation to time, and works vastly better with mixed-up/down
traffic, and in the presence of GRO.

If I have any one central tenant: edge gateways need to transit all
kinds of traffic in both directions, efficiently. And not crash.

OK... so lacking support for byte limits in the code, and not having
4GB of memory to spare... and not being able to plug in real world
values into your test framework...

So what happens with 1000 packets?

Well, the
SCE team just ran that benchmark. The full results are published, and
repeatable. And dismal, for dualpi, compared to the state of the art.
I'll
write to that more, but the results are plain as day.

B) "were connected to amodem using 100Mbps Fast Ethernet; the xDSL
linewas configured at 48Mbps downstream and 12Mbps up-stream; the
links between network elements consistedof at least 1GigE connections"

So you tested 4/1 down up asymmetry. but you didn't try an asymmetric
load up/down load. The 1Gbit/35 mbit rrul_be test just performed by
that team, as well as the 200/10 test - both shipping values in the
field, demonstrated the problems that induces. Problems so severe that
low rate videoconferencing on such a system, when busy, was
impossible.

While I would certainly recommend that ISPs NEVER ship anything with
more than a 10x1 ratio, it happens. More than 10x1 is the
current "standard" in the cable industry. Please start testing with that?

>
> Regarding asymmetric links, as you will see in the 2015 and 2019 papers,
> our original tests were conducted over Al-Lu's broadband testbed with
> real ADSL lines, real home routers, etc. When we switched to a Linux
> testbed, we checked we were getting identical results to the testbed
> that used real broadband kit, but I admit we omitted to emulate the
> asymmetric upstream. As I said, we can add asymmetric tests back again,
> and we should.

Thank you. I've also asked that y'all plug in realistic values for
present day buffering
both in the cmts and cablemodems. and use the rrul_be, rtt_fair_var,
and rrul tests
as a basic starting point for a background traffic load.

DSL is *different* and more like fiber, in that it is
a isochronous stream, that has an error rate, but no retransmits.

Request/grant systems, such as wifi and cable, operate vastly differently.

worse, Wifi and LTE, especially, have a tendency to retry a lot, which
leads to very counter-intuitive
behaviors that long ago made me dismiss reno/cubic/dctcps as
appropriate and BBR-like
cc protocols using a mixture of indicators, especially including rtt,
the only way forward for these
kinds of systems.

Packet aggregation is a thing.

We need to get MUCH better about dropping packets in the retry portion
of the wireless macs, especially for
voip/videoconferencing/gaming traffic.

There's a paper on that, and work is in progress.

>
> Nonetheless, when testing Accurate ECN feedback specifically we have
> been watching for the reverse path, given AccECN is designed to handle
> ACK thinning, so we have to test that, esp. over WiFi.

In self defense, before anybody uses it any further in testing 'round here:

I would like to note that my netem "slot model", although a start
towards emulating request/grant systems better,
and coupled with *careful*, incremental, repeatable analysis via the
trace stuff also now in linux netem, can be used to improve
congestion control behavior of transports. See:

https://lore.kernel.org/netdev/20190123200454.260121-3-priyarjha@google.com/#t

the slot model alone,  does not, emphatically, model wifi correctly at
all for, any but the most limited scenarios. Grant/requests are
coupled in wifi,
which are driven by endpoint behavior. It's complete GIGO after the
first exchange, if you trust in the slot model naively, without
recreating traces for every mod in your transport, and retesting,
retesting, retesting.

The linux commit for netem's slotting model provides a reference for a
1-2 station 802.11n overly ideal emulation; it
was incorrect and unscalable, and I wish people would stop
copy/pasting that into any future work on the subject.
802.11ac is very different, and 802.11ax different too. As one
example, the limited number of packets you can fit
into 802.11n txop makes SFQ (which ubnt uses) a better choice than
DRR, but DRR is a better approach for 802.11ac and later. (IMHO).

However! most emulations of wifi assume that it's lossy (like a 1%
rate) which is also totally wrong. So the slot model
was progress. I don't know enough about lte, but they leave retries
undefined by the operator and they are usually set really high.

I've long said there is no such thing as a rate in wireless -
bandwidth/interval is a fiction, because over any given set of
intervals, in request/grant/retry prone systems, bandwidth varies from
0 to a lot on very irregular timescales.

Eliding the rest of this message.

--
"For a successful technology, reality must take precedence over public
relations, for Mother Nature cannot be fooled" - Richard Feynman

dave@taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-05-16 16:32 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAA93jw5g8vMNEoV899tX=89HzHSG5s2E3sGU1EFVdqRkWryCqw@mail.gmail.com>
2020-04-27 19:26 ` [Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call Dave Taht
2020-04-29  9:31   ` Bob Briscoe
2020-04-29  9:46     ` [Ecn-sane] [tsvwg] " Sebastian Moeller
2020-04-29 10:32       ` Bob Briscoe
2020-04-29 11:21         ` Sebastian Moeller
2020-04-29  9:49     ` [Ecn-sane] " Jonathan Morton
2020-04-29 13:37       ` Bob Briscoe
2020-04-29 15:07         ` [Ecn-sane] [tsvwg] " Sebastian Moeller
2020-04-29 16:03         ` [Ecn-sane] " Jonathan Morton
2020-05-16 16:32     ` Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox