[Ecn-sane] abc congestion control on time varying wireless links

Discussion of explicit congestion notification's impact on the Internet
 help / color / mirror / Atom feed

* [Ecn-sane] abc congestion control on time varying wireless links
@ 2019-12-11 19:17 Dave Taht
  2019-12-11 19:19 ` Prateesh Goyal
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Taht @ 2019-12-11 19:17 UTC (permalink / raw)
  To: bloat, ECN-Sane

https://arxiv.org/pdf/1905.03429.pdf

the principal item of interest is section 3.1.2 where the accelerate
and brake concepts and math are described.

-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ecn-sane] abc congestion control on time varying wireless links
  2019-12-11 19:17 [Ecn-sane] abc congestion control on time varying wireless links Dave Taht
@ 2019-12-11 19:19 ` Prateesh Goyal
  2019-12-11 19:54   ` Dave Taht
  0 siblings, 1 reply; 9+ messages in thread
From: Prateesh Goyal @ 2019-12-11 19:19 UTC (permalink / raw)
  To: Dave Taht; +Cc: bloat, ECN-Sane, Hari Balakrishnan, Mohammad Alizadeh

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

Adding Hari, Mohammad

On Wed, Dec 11, 2019 at 2:17 PM Dave Taht <dave.taht@gmail.com> wrote:

> https://arxiv.org/pdf/1905.03429.pdf
>
> the principal item of interest is section 3.1.2 where the accelerate
> and brake concepts and math are described.
>
> --
> Make Music, Not War
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-435-0729
>

[-- Attachment #2: Type: text/html, Size: 883 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ecn-sane] abc congestion control on time varying wireless links
  2019-12-11 19:19 ` Prateesh Goyal
@ 2019-12-11 19:54   ` Dave Taht
  2019-12-11 20:10     ` Dave Taht
                       ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Dave Taht @ 2019-12-11 19:54 UTC (permalink / raw)
  To: Prateesh Goyal; +Cc: bloat, ECN-Sane, Hari Balakrishnan, Mohammad Alizadeh

On Wed, Dec 11, 2019 at 11:19 AM Prateesh Goyal <g.pratish@gmail.com> wrote:
>
> Adding Hari, Mohammad
>
> On Wed, Dec 11, 2019 at 2:17 PM Dave Taht <dave.taht@gmail.com> wrote:
>>
>> https://arxiv.org/pdf/1905.03429.pdf
>>
>> the principal item of interest is section 3.1.2 where the accelerate
>> and brake concepts and math are described.

What we have now is a string of conflicts of interest over the values
of the ecn bits, in part based
on the characteristics of the underlying link technologies.

The DC folk want a multibit more immediate signal, for which L4S is
kind of targetted, (and SCE also
applies). I haven't seen any data on how well dctcp or SCE -style can
work on wildly RTT varying links as yet, although it's been pitched at
the LTE direction, not at wifi.

The abc concept hasn't been tried in a DC-like environment, and while
it shows some good results for both the LTE and wifi simulations, was
not compared against the fq_codel based solution currently in linux
wifi, nor against the minstrel rate controller.

I have plenty of data on how fq_codel + RFC3168 ecn currently works on
wifi, I like to think it's pretty good, but it's still pretty slow to
respond with just RFC3168 or drop.

this is yet another one of those cases where unified sets of
benchmarks would help.

And then there's, like, the actual deployment on actual devices... I
just did a string of benchmarks, tethered to my new moto 6e phone. You
saturate the download, and nearly ALL other traffic (icmp and udp) in
the upstream direction, gets starved out.

I just did a string of benchmarks on my new LTE

>>
>> --
>> Make Music, Not War
>>
>> Dave Täht
>> CTO, TekLibre, LLC
>> http://www.teklibre.com
>> Tel: 1-831-435-0729

-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ecn-sane] abc congestion control on time varying wireless links
  2019-12-11 19:54   ` Dave Taht
@ 2019-12-11 20:10     ` Dave Taht
  2019-12-11 20:12     ` [Ecn-sane] [Bloat] " Jonathan Morton
  2019-12-11 21:18     ` [Ecn-sane] " David P. Reed
  2 siblings, 0 replies; 9+ messages in thread
From: Dave Taht @ 2019-12-11 20:10 UTC (permalink / raw)
  To: Prateesh Goyal
  Cc: bloat, ECN-Sane, Hari Balakrishnan, Mohammad Alizadeh, Make-Wifi-fast

On Wed, Dec 11, 2019 at 11:54 AM Dave Taht <dave.taht@gmail.com> wrote:
>
> On Wed, Dec 11, 2019 at 11:19 AM Prateesh Goyal <g.pratish@gmail.com> wrote:
> >
> > Adding Hari, Mohammad
> >
> > On Wed, Dec 11, 2019 at 2:17 PM Dave Taht <dave.taht@gmail.com> wrote:
> >>
> >> https://arxiv.org/pdf/1905.03429.pdf
> >>
> >> the principal item of interest is section 3.1.2 where the accelerate
> >> and brake concepts and math are described.
>
> What we have now is a string of conflicts of interest over the values
> of the ecn bits, in part based
> on the characteristics of the underlying link technologies.
>
> The DC folk want a multibit more immediate signal, for which L4S is
> kind of targetted, (and SCE also
> applies). I haven't seen any data on how well dctcp or SCE -style can
> work on wildly RTT varying links as yet, although it's been pitched at
> the LTE direction, not at wifi.
>
> The abc concept hasn't been tried in a DC-like environment, and while
> it shows some good results for both the LTE and wifi simulations, was
> not compared against the fq_codel based solution currently in linux
> wifi, nor against the minstrel rate controller.
>
> I have plenty of data on how fq_codel + RFC3168 ecn currently works on
> wifi, I like to think it's pretty good, but it's still pretty slow to
> respond with just RFC3168 or drop.
>
> this is yet another one of those cases where unified sets of
> benchmarks would help.
>
> And then there's, like, the actual deployment on actual devices... I
> just did a string of benchmarks, tethered to my new moto 6e phone. You
> saturate the download, and nearly ALL other traffic (icmp and udp) in
> the upstream direction, gets starved out.
>
> I just did a string of benchmarks on my new LTE

(I'll post these at some point)

Kan Yan, Toke, and a multitude of others have committed AQL
(Airtime queue limits) for the QCA ath10k 802.11ac chip to
the linux kernel and it should be appearing in mainline
and in openwrt soon if it hasn't already. (It already worked on the
mt76, and I'm hoping we can make it work on the iwl devices, notably
the new ax ones)

https://lore.kernel.org/linux-wireless/20191119060610.76681-5-kyan@google.com/

Kan's data and post about it:

https://drive.google.com/corp/drive/folders/14OIuQEHOUiIoNrVnKprj6rBYFNZ0Coif

The raw trace, parsed data in csv format and plots can be found here:
https://drive.google.com/open?id=1Mg_wHu7elYAdkXz4u--42qGCVE1nrILV

All tests are done with 2 TCP download sessions that oversubscribed
the link bandwidth.

With AQL on, the mean sojourn time about ~20000us, matches the default
codel "target".

With AQL off, the mean sojourn time is less than 4us even the latency
is off the charts, just as we expected that fd_codel with mac80211
alone is not effective for drivers with deep firmware/hardware queues.

Kan followed up with some 10ms vs 20 codel target data

> Apologize for the late reply. Here is the test results with target set to 10ms.
> The trace for the sojourn time:
> https://drive.google.com/open?id=1MEy_wbKKdl22yF17hZaGzpv3uOz6orTi
>
> Flent test for 20 ms target time vs 10 ms target time:
> https://drive.google.com/open?id=1leIWe0-L0XE78eFvlmRJlNmYgbpoH8xZ

At which point a debate kicked off on the make-wifi-list about using
the 10ms target on wifi, particularly with multiple stations transmitting.

https://lists.bufferbloat.net/pipermail/make-wifi-fast/2019-December/002605.html

To me, the arrival of AQL, and the applicability various AQM
technologies to 802.11ac devices is kind of a whole new debate, that
we simply do not have enough data on.


> >>
> >> --
> >> Make Music, Not War
> >>
> >> Dave Täht
> >> CTO, TekLibre, LLC
> >> http://www.teklibre.com
> >> Tel: 1-831-435-0729
>
>
>
> --
> Make Music, Not War
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-435-0729



-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ecn-sane] [Bloat] abc congestion control on time varying wireless links
  2019-12-11 19:54   ` Dave Taht
  2019-12-11 20:10     ` Dave Taht
@ 2019-12-11 20:12     ` Jonathan Morton
  2019-12-12 21:31       ` Dave Taht
  2019-12-11 21:18     ` [Ecn-sane] " David P. Reed
  2 siblings, 1 reply; 9+ messages in thread
From: Jonathan Morton @ 2019-12-11 20:12 UTC (permalink / raw)
  To: Dave Taht
  Cc: Prateesh Goyal, Hari Balakrishnan, ECN-Sane, Mohammad Alizadeh, bloat

> On 11 Dec, 2019, at 9:54 pm, Dave Taht <dave.taht@gmail.com> wrote:
> 
> The DC folk want a multibit more immediate signal, for which L4S is
> kind of targetted, (and SCE also
> applies). I haven't seen any data on how well dctcp or SCE -style can
> work on wildly RTT varying links as yet, although it's been pitched at
> the LTE direction, not at wifi.

It turns out that a Codel marking strategy for SCE, with modified parameters of course, works well for tolerating bursty and aggregating links.  The RED-ramp and step-function strategies do not - and they're equally bad if the same test scenario is applied to DCTCP or TCP Prague.

The difference is not small; switching from RED to Codel improves goodput from 1/8th to 80% of nominal link capacity, when a rough model of wifi characteristics is inserted into our usual Internet-path scenario.

We're currently exploring how best to set the extra set of Codel parameters involved.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ecn-sane] [Bloat] abc congestion control on time varying wireless links
  2019-12-11 20:12     ` [Ecn-sane] [Bloat] " Jonathan Morton
@ 2019-12-12 21:31       ` Dave Taht
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Taht @ 2019-12-12 21:31 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Prateesh Goyal, Hari Balakrishnan, ECN-Sane, Mohammad Alizadeh, bloat

On Wed, Dec 11, 2019 at 12:12 PM Jonathan Morton <chromatix99@gmail.com> wrote:
>
> > On 11 Dec, 2019, at 9:54 pm, Dave Taht <dave.taht@gmail.com> wrote:
> >
> > The DC folk want a multibit more immediate signal, for which L4S is
> > kind of targetted, (and SCE also
> > applies). I haven't seen any data on how well dctcp or SCE -style can
> > work on wildly RTT varying links as yet, although it's been pitched at
> > the LTE direction, not at wifi.
>
> It turns out that a Codel marking strategy for SCE, with modified parameters of course, works well for tolerating bursty and aggregating links.  The RED-ramp and step-function strategies do not - and they're equally bad if the same test scenario is applied to DCTCP or TCP Prague.

That matches my preliminary observations. thx.

>
> The difference is not small; switching from RED to Codel improves goodput from 1/8th to 80% of nominal link capacity, when a rough model of wifi characteristics is inserted into our usual Internet-path scenario.

While I'm the author of the netem slotting code that does this bursty
wifi emulation, and delighted you are using it, it would better to
additionally use the delay distribution modelling feature added
shortly therafter by members of the stadia team. The distributions of
wifi delay with just slotting did not resemble the real world enough
for my taste after I fiddled with it extensively. There is a very,
very long tail.

(or conversely: making wifi retry algos match a sane model better
would be good - it makes no sense to have 30 dumb retries in the mac
layer! mac retries really dominate my pdv data at these vastly lower
levels  of queuing)

I would love it if google were to publish the wifi distribution tables
they have developed since then for inclusion in iproute2.

>
> We're currently exploring how best to set the extra set of Codel parameters involved.

One of my thoughts after reading about the abc idea is that there is
an underused 4th state, in taking away both ect(1) and ect(0) on a
previously marked ect(0) stream as a sign you could accellerate. This
would only work if you were the only bottleneck router, of course, and
not with tcp as we know it.

I really like the idea of paced chirping in genera, as it could better
fill a wifi aggregate, and perhaps mitigate some of the exponential
overshoot problems in slow start (and this is separate from fiddling
with abc or other abc states)

>  - Jonathan Morton
>

-- 
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ecn-sane] abc congestion control on time varying wireless links
  2019-12-11 19:54   ` Dave Taht
  2019-12-11 20:10     ` Dave Taht
  2019-12-11 20:12     ` [Ecn-sane] [Bloat] " Jonathan Morton
@ 2019-12-11 21:18     ` David P. Reed
  2019-12-11 21:30       ` David P. Reed
  2 siblings, 1 reply; 9+ messages in thread
From: David P. Reed @ 2019-12-11 21:18 UTC (permalink / raw)
  To: Dave Taht
  Cc: Prateesh Goyal, Hari Balakrishnan, ECN-Sane, Mohammad Alizadeh, bloat

I will not be gentle here. THe authors deserve my typical peer-review feedback as an expert in the field of wireless protocols and congestion. (Many of you on the list are as well, I know, and may have different reviews. But I'm very troubled by this paper's claims. It's interesting technically, but flawed seriously, enough that I would send it back for more work before publication. (not that my opinion matters these days)

A separate perspective from me on the paper.

1) There is a problem in the very title wording of the paper. WiFi is not at all a "time varying wireless link" Nor is it obvious that a time varying link is even a good approximate model of WiFi LANs.  What do I mean here?

  a. WiFi is not a link. In its typical deployment (non-peer-to-peer) it is a hub that is multiplexed by many wireless links that share the same spatial channel, but follow different paths.
  b. WiFi's spatial shared wireless channel's temporal behavior is not modeled by a single scalar variable called "speed" or "error rate" that is varying over a range over time.
  c. congestion is typically queueing delay on a shared FIFO queue. In the AP-STA operation described, when delays happen they are not at all characterized by a shared single FIFO queue. In fact each packet travels twice through the air, each time in a highly correlated temporal distribution, and each packet travels through two FIFO cuques, plus a strange CSMA exponential backoff queue. This is NOT congestion in any real sense.

2) the paper doesn't present any data whatever regarding actual observed channel behaviors, or even actual observed effects.
   a. Indoor propagation of OFDM signals is complicated. I've done actual measurements, and continue to carry them out. But many others have as well. The sources of variability over time and the time constants of that variability are not well characterized in the literature at all. My dear friend Ted Rapaport is an expert on *outdoor* microwave and mmwave propagation, and has done lots of measurements there. But not indoor, where such things as rotating fans, moving people, floorand ceiling elements, etc. all affect the propagation of OFDM signals in ways that do vary, but not according to any model that has been characterized sufficiently to build, say, an ns2 simulation.
   b. the indoor behavior at the MAC layer of signals is highly variable due to many effects, not all physical (for example, microwave noise that affects the time waiting for a "clear" channel before a station can transmit. This can vary a lot. Also, in a Multi-User Dwelling or an enterprise office/campus, other WiFi traffic causes delay at the MAC layer that is non trivial. What's the problem here is that this "interference" (not radio interference at all, but MAC layer variability) is not slowly varying in any sense. The idea that this is modelable by a congestion control mechanism of any sort is not clear.
   c. driving all of this is the mix of application traffic in a "local area" (the physical region around the access point, and the upstream network to which the access point connects. Not all of this traffic is anything like a simple distribution. In fact, it's time varying across many time scales. For example, Netflix video is typically TCP with controlled bursts (buffer filling) separated by long relatively quiet periods. These bursts can use up all available airtime. In contrast, web traffic for one "page" often involves many independent HTTP streams (or soon HTTP3/UDP streams being rolled out at scale by Google on all its services) involving 10's or even hundreds of remote distinct sites, where response time is critical (lag under load is unacceptable).

3. the paper alludes to, but doesn't really characterize, the issue of "fairness" very well. Fairness isn't Harrison Bergeron style exact matching of bits delivered among all pairs. Instead, it really amounts to allocation of latency degradation (due to excess queueing) among independent applications sharing the medium. In other words, it is more like "non-starvation", except where the applications themselves may actually back off their load when resources are reduced, to be "friendly".
I am afraid that this pragmatic issue, the real goal of congestion control, is poorly discussed in the paper, yet it is the crucial measure of a good congestion control scheme. Throughput is entirely secondary to avoiding starvation unless the starvation can be proved to be inherent in the load presented.

Now I will say the mechanism presented may well be quite useful, but I think such mechanisms should not just be preseented in the technical literature *as if it were obvious that they are useful* at least in some typical real world situations.

In other words, before launching into solving a problem, one needs to research and characterize the problem being solved. Preferably this research will produce good experimentally valid models.

We saw back in the early 1970's a huge volume of theoretical work from some famous people (Bob Gallegher of MIT is a good example) where packet networks were evaluated under Poisson arrival loads, and asserted to be good. It turns out that there are NO real world networks that have anything like Poisson arrival processes. The only reason Poisson arrival processas are interesting is because they are mathematically trivial to analyze in closed form without simulation.

But work on time-shared operating system schedulers in the 1960's (at MIT, in the Multics project, but also at Berkeley and other places) had already demonstrated that user requests are not at all Poisson. In fact, so far from Poisson that any scheduler that assumed Poisson arrivals was dreadful in practice. Adding epicycles to Poisson arrivals fixed nothing, but produced even richer "closed form" solutions, and a vast literature of research results in departments focused on scheduling theory around the US.

The same has been true of Gallegher and his theory students. Poisson random arrivals infest the literature, and measurement driven, practical research in networking has been despised. 

It's time to focus on the Science of actual real networks, wireless ones in the real world, and simulations validated against real world situations (as scientists do when they have to model the real world).

I'm very, very sad to see this kind of publication, which is not science, but just a mathematical game played based on a hunch about wireless behavior that is not based on measurements or characteristic applicaitons. IN contrast, the reality centered work being done by people like the bloat project, while not so academically abstract, is the state of the art

A proper title would be "a random congestion control mmethod on ann imaginary artificial network that might, if we are lucky, be somewhat like a wifi network, but honestly we never actually looked at one in the wild"

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ecn-sane] abc congestion control on time varying wireless links
  2019-12-11 21:18     ` [Ecn-sane] " David P. Reed
@ 2019-12-11 21:30       ` David P. Reed
  2019-12-18 19:43         ` [Ecn-sane] [Bloat] " Dave Taht
  0 siblings, 1 reply; 9+ messages in thread
From: David P. Reed @ 2019-12-11 21:30 UTC (permalink / raw)
  To: David P. Reed
  Cc: Dave Taht, Prateesh Goyal, ECN-Sane, Hari Balakrishnan,
	Mohammad Alizadeh, bloat

I should explain that my motivation in writing the previous review is twofold:

1) I think the authors are very capable of doing great and valuable work. And they have in the past done so.

2) Intellectual honesty, professional honesty, and rigor are values that seem to have been declining over the years, in favor of a pragmatic sense that anything that can get published is important. (I think it has been reinforced by the idea that counting publications in good journals is the only metric of scientific contribution that counts, even at MIT). To me, this is a VERY important value. I learned that from my mentors, Jerry Saltzer and F.J. Corbato, who were relentless in making sure publications were worthy.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Ecn-sane] [Bloat] abc congestion control on time varying wireless links
  2019-12-11 21:30       ` David P. Reed
@ 2019-12-18 19:43         ` Dave Taht
  0 siblings, 0 replies; 9+ messages in thread
From: Dave Taht @ 2019-12-18 19:43 UTC (permalink / raw)
  To: David P. Reed
  Cc: Hari Balakrishnan, Prateesh Goyal, ECN-Sane, bloat, Mohammad Alizadeh

"David P. Reed" <dpreed@deepplum.com> writes:

> I should explain that my motivation in writing the previous review is twofold:
>
> 1) I think the authors are very capable of doing great and valuable work. And they have in the past done so.
>
> 2) Intellectual honesty, professional honesty, and rigor are values
> that seem to have been declining over the years, in favor of a
> pragmatic sense that anything that can get published is important. (I
> think it has been reinforced by the idea that counting publications in
> good journals is the only metric of scientific contribution that
> counts, even at MIT). To me, this is a VERY important value. I learned
> that from my mentors, Jerry Saltzer and F.J. Corbato, who were
> relentless in making sure publications were worthy.

I think the core-est problem is the major universities cannot get source
licenses to the 5g and wifi stuff at the lowest level. Nor can they get
sufficient hardware IP to make a meaningful contribution anymore.

As a result, the next generation of engineers and scientists have to
make do with interpreting elephant entrails, and building cargo cult
papers for submission to cargo cult journals with cargo cult reviewers
who've not seen real research in the professional lifetime.

I can't help but note that several of the core-est papers in the
bufferbloat effort, couldn't be published - and reviewers still
currently poo-poo packet scheduling ideas (like pacing, fq-ing) in favor
of aqm idea of the month.

I was kind of hopeful, in that Huewei as one of their remediation ideas
for making their 5G stuff more trustable was a promise to make code
available to all. THAT would be breath of fresh air and level the
playing field once again so valid research could resume.

And for all I know, they are doing that... for chinese universities.

About the only thing going on in the computing world that's encouraging
is the rapidly accellerating work on the risc-v.

>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-12-18 19:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11 19:17 [Ecn-sane] abc congestion control on time varying wireless links Dave Taht
2019-12-11 19:19 ` Prateesh Goyal
2019-12-11 19:54   ` Dave Taht
2019-12-11 20:10     ` Dave Taht
2019-12-11 20:12     ` [Ecn-sane] [Bloat] " Jonathan Morton
2019-12-12 21:31       ` Dave Taht
2019-12-11 21:18     ` [Ecn-sane] " David P. Reed
2019-12-11 21:30       ` David P. Reed
2019-12-18 19:43         ` [Ecn-sane] [Bloat] " Dave Taht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox