[Ecn-sane] Fwd: my backlogged comments on the ECT(1) interim call

Sat May 16 12:32:07 EDT 2020

On Wed, Apr 29, 2020 at 2:31 AM Bob Briscoe <ietf at bobbriscoe.net> wrote:
>
> Dave,
>
> Please don't tar everything with the same brush. Inline...
>
> On 27/04/2020 20:26, Dave Taht wrote:
> > just because I read this list more often than tsvwg.
> >
> > ---------- Forwarded message ---------
> > From: Dave Taht <dave.taht at gmail.com>
> > Date: Mon, Apr 27, 2020 at 12:24 PM
> > Subject: my backlogged comments on the ECT(1) interim call
> > To: tsvwg IETF list <tsvwg at ietf.org>
> > Cc: bloat <bloat at lists.bufferbloat.net>
> >
> >
> > It looks like the majority of what I say below is not related to the
> > fate of the "bit". The push to take the bit was
> > strong with this one, and me... can't we deploy more of what we
> > already got in places where it matters?
> >
> > ...
> >
> > so: A) PLEA: From 10 years now, of me working on bufferbloat, working
> > on real end-user and wifi traffic and real networks....
> >
> > I would like folk here to stop benchmarking two flows that run for a long time
> > and in one direction only... and thus exclusively in tcp congestion
> > avoidance mode.
>
> [BB] All the results that the L4S team has ever published include short
> flow mixes either with or without long flows.
>      2020: http://folk.uio.no/asadsa/ecn-fbk/results_v2.2/full_heatmap_rrr/
>      2019:
> http://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf#subsection.4.2
>      2019: https://www.files.netdevconf.info/f/febbe8c6a05b4ceab641/?dl=1
>      2015:
> http://bobbriscoe.net/projects/latency/dctth_preprint.pdf#subsection.7.2
>
> I think this implies you have never actually looked at our data, which
> would be highly concerning if true.

I have never had access to your *data*. Just papers that cherry pick
results that support your arguments. No repeatable experiments, no
open source code, the only thing consistent about them has been...
irreproduceable results. Once upon a time I was invited to keynote
a talk at sigcomm (
https://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf ),
where I had an opportunity to lay into not
sad state of network research today but all of science (they've not
invited me back).

So in researching the state of the art since I last checked in, I did
go and read y'alls more recent stuff. Taking on this one :

http://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf#subsection.4.2

The experimental testbed design is decent. The actual experiment laid
out in that section was as a test of everything... but the
behaviors of the traffic types I care about most: voip and
videoconferencing and web. I found
the graphs in the appendix too difficult to compare and unreadable,
and I would have preferred comparison plots.

A) Referring to some page or another of my above paper... It came with
"ludicrous constants". For a 40Mbit link, it had:

Buffer: 40,000 pkt, ECN enabled
Pie: Configured to drop at 25% probability # We put in 10% as an
escape valve in the rfc, why 25%? Did it engage?
fq_codel: default constants -
dualpi: Target delay: 15 ms, TUpdate: 16 ms, L4S T: 1 ms, WRR Cweight:
10%,α: 0.16,β: 3.2, k: 2, Classic ECNdrop: 25%

The source code I have for dualpi has a 1000 packet buffer. The dualpi
example code
(when last I looked at it) had 0 probability of drop. A naive user
would just use that default.

Secondly, your experiment seems to imply y'all think drop will never
happen in the ll queue, even when ping -Q 1 -s 1000 -f is sufficient
to demonstrate that probability.

OK, so this gets me to...

Most of the cpe and home router hardware I work with doesn't have much
more than 64MB of memory, into you also have to fit a full operating
system, routing table, utilities and so on. GRO is a thing, so the
peak amount of memory a 40,000 packet buffer might use is is 40000 *
1500 * 64 = 3,840,000,000. ~4GB of memory. Worst case.
For each interface in the system. For a 40Mbit simulation. Despite
decades of work on making OSes reliable, running out of memory
in any given component tends to have bad sideffects.

OK, had this been a repeatable experiment, I'd have plugged in real
world values, and repeated it. I think on some bug report or another
I suggested y'all switch to byte, rather than packet limits, for the
code, as you will especially see, mixed up and down traffic on the
rrul_be
tends to either exhaust a short fixed length packet fifo, or clog it
up, if it's longer. Byte limits (and especially bql) is a much better
approximation to time, and works vastly better with mixed-up/down
traffic, and in the presence of GRO.

If I have any one central tenant: edge gateways need to transit all
kinds of traffic in both directions, efficiently. And not crash.

OK... so lacking support for byte limits in the code, and not having
4GB of memory to spare... and not being able to plug in real world
values into your test framework...

So what happens with 1000 packets?

Well, the
SCE team just ran that benchmark. The full results are published, and
repeatable. And dismal, for dualpi, compared to the state of the art.
I'll
write to that more, but the results are plain as day.

B) "were connected to amodem using 100Mbps Fast Ethernet; the xDSL
linewas configured at 48Mbps downstream and 12Mbps up-stream; the
links between network elements consistedof at least 1GigE connections"

So you tested 4/1 down up asymmetry. but you didn't try an asymmetric
load up/down load. The 1Gbit/35 mbit rrul_be test just performed by
that team, as well as the 200/10 test - both shipping values in the
field, demonstrated the problems that induces. Problems so severe that
low rate videoconferencing on such a system, when busy, was
impossible.

While I would certainly recommend that ISPs NEVER ship anything with
more than a 10x1 ratio, it happens. More than 10x1 is the
current "standard" in the cable industry. Please start testing with that?

>
> Regarding asymmetric links, as you will see in the 2015 and 2019 papers,
> our original tests were conducted over Al-Lu's broadband testbed with
> real ADSL lines, real home routers, etc. When we switched to a Linux
> testbed, we checked we were getting identical results to the testbed
> that used real broadband kit, but I admit we omitted to emulate the
> asymmetric upstream. As I said, we can add asymmetric tests back again,
> and we should.

Thank you. I've also asked that y'all plug in realistic values for
present day buffering
both in the cmts and cablemodems. and use the rrul_be, rtt_fair_var,
and rrul tests
as a basic starting point for a background traffic load.

DSL is *different* and more like fiber, in that it is
a isochronous stream, that has an error rate, but no retransmits.

Request/grant systems, such as wifi and cable, operate vastly differently.

worse, Wifi and LTE, especially, have a tendency to retry a lot, which
leads to very counter-intuitive
behaviors that long ago made me dismiss reno/cubic/dctcps as
appropriate and BBR-like
cc protocols using a mixture of indicators, especially including rtt,
the only way forward for these
kinds of systems.

Packet aggregation is a thing.

We need to get MUCH better about dropping packets in the retry portion
of the wireless macs, especially for
voip/videoconferencing/gaming traffic.

There's a paper on that, and work is in progress.

>
> Nonetheless, when testing Accurate ECN feedback specifically we have
> been watching for the reverse path, given AccECN is designed to handle
> ACK thinning, so we have to test that, esp. over WiFi.

In self defense, before anybody uses it any further in testing 'round here:

I would like to note that my netem "slot model", although a start
towards emulating request/grant systems better,
and coupled with *careful*, incremental, repeatable analysis via the
trace stuff also now in linux netem, can be used to improve
congestion control behavior of transports. See:

https://lore.kernel.org/netdev/20190123200454.260121-3-priyarjha@google.com/#t

the slot model alone,  does not, emphatically, model wifi correctly at
all for, any but the most limited scenarios. Grant/requests are
coupled in wifi,
which are driven by endpoint behavior. It's complete GIGO after the
first exchange, if you trust in the slot model naively, without
recreating traces for every mod in your transport, and retesting,
retesting, retesting.

The linux commit for netem's slotting model provides a reference for a
1-2 station 802.11n overly ideal emulation; it
was incorrect and unscalable, and I wish people would stop
copy/pasting that into any future work on the subject.
802.11ac is very different, and 802.11ax different too. As one
example, the limited number of packets you can fit
into 802.11n txop makes SFQ (which ubnt uses) a better choice than
DRR, but DRR is a better approach for 802.11ac and later. (IMHO).

However! most emulations of wifi assume that it's lossy (like a 1%
rate) which is also totally wrong. So the slot model
was progress. I don't know enough about lte, but they leave retries
undefined by the operator and they are usually set really high.

I've long said there is no such thing as a rate in wireless -
bandwidth/interval is a fiction, because over any given set of
intervals, in request/grant/retry prone systems, bandwidth varies from
0 to a lot on very irregular timescales.

Eliding the rest of this message.

--
"For a successful technology, reality must take precedence over public
relations, for Mother Nature cannot be fooled" - Richard Feynman

dave at taht.net <Dave Täht> CTO, TekLibre, LLC Tel: 1-831-435-0729