Hi Dave and list members,

It was difficult to follow the discussion at the meeting yesterday.

Who  said what in the first place.

There have been a lot of non-technical comments such as: this solution

is better than another in my opinion. "better" has often been used

as when evaluating the taste of an ice cream: White chocolate vs black chocolate.

This has taken a significant amount of time at the meeting. I haven't learned

much from that kind of discussion and I do not think that helped to make 

much progress.

If people can re-make their points in the list it would help the debate.

Another point that a few raised is that we have to make a decision as fast as possible.

I dismissed entirely that argument. Trading off latency with resilience of the Internet

is entirely against the design principle of the Internet architecture itself.

Risk analysis is something that we should keep in mind even when deploying any experiment

and should be a substantial part of it. 

Someone claimed that on-line meeting traffic is elastic. This is not true, I tried to

clarify this. These applications (WebEx/Zoom) are low rate, a typical maximum upstream

rate is 2Mbps and is not elastic. These applications have often a stand-alone app

that is not using the browser WebRTC stack (the standalone app typically works better).

A client sends upstream one or two video qualities unless the video camera is switched off. 

In presence of losses, FEC is used but it is still non elastic.

Someone claimed (at yesterday's meeting) that fairness is not an issue (who cares, I heard!)

Well, fairness can constitute a differentiation advantage between two companies that are 

commercializing on-line meetings products. Unless at the IETF we accept 

"law-of-the-jungle" behaviours from Internet applications developers, we should be careful

about making such claims.

Any opportunity to cheat, that brings a business advantage WILL be used.

/Luca

TL;DR

To Dave: you asked several times what  Cisco does on latency reduction in

network equipment. I tend to be very shy when replying on these questions

as this is not vendor neutral. If chairs think this is not appropriate for

the list, please say it and I'll reply privately only.

What I write below can be found in Cisco products data sheets and is not

trade secret. There are very good blog posts explaining details.

Not surprisingly Cisco implements the state of the art on the topic

and it is totally feasible to do-the-right-thing in software and hardware.

Cisco implements AFD (one queue + a flow table) accompanied by a priority queue for 

flows that have a certain profile in rate and size. The concept is well known and well

studied in the literature. AFD is safe and can well serve a complex traffic mix when 

accompanied by a priority queue. This prio-queue should not be confused with a strict

priority queue (e.g. EF in diffserv). There are subtleties related to the DOCSIS

shared medium which would be too long to describe here.

This is available in Cisco CMTS for the DOCSIS segment. Bottleneck traffic

does not negatively impact non-bottlenecked-traffic such as an on-line meeting like

the WebEx call we had yesterday. It is safe from a network neutrality point-of-view

and no applications get hurt. 

Cisco implements AFD+prio also for some DC switches such as the Nexus 9k. There

is a blog post written by Tom Edsal online that explains pretty well how that works.

This includes mechanisms such as p-fabric to approximate SRPT (shortest remaining processing time)

and minimize flow completion time for many DC workloads. The mix of the two

brings FCT minimization AND latency minimization. This is silicon and scales at any speed.

For those who are not familiar with these concepts, please search the research work of Balaji 

Prabhakar and Ron Pang at Stanford.

Wi-Fi: Cisco does airtime fairness in Aironet but I think in the Meraki series too.

The concept is similar to what described above but there are several queues, one per STA.

Packets are enqueued in the access (category) queue at dequeue time from the air-time

packet scheduler. 

On Mon, Apr 27, 2020 at 9:24 PM Dave Taht <dave.taht@gmail.com> wrote:

It looks like the majority of what I say below is not related to the
fate of the "bit". The push to take the bit was
strong with this one, and me... can't we deploy more of what we
already got in places where it matters?

...

so: A) PLEA: From 10 years now, of me working on bufferbloat, working
on real end-user and wifi traffic and real networks....

I would like folk here to stop benchmarking two flows that run for a long time
and in one direction only... and thus exclusively in tcp congestion
avoidance mode.

Please. just. stop. Real traffic looks nothing like that. The internet
looks nothing like that.
The netops folk I know just roll their eyes up at benchmarks like this
that prove nothing and tell me to go to ripe meetings instead.
When y'all talk about "not looking foolish for not mandating ecn now",
you've already lost that audience with benchmarks like these.

Sure, setup a background flow(s) like that, but then hit the result
with a mix of
far more normal traffic? Please? networks are never used unidirectionally
and both directions congesting is frequent. To illustrate that problem...

I have a really robust benchmark that we have used throughout the bufferbloat
project that I would like everyone to run in their environments, the flent
"rrul" test. Everybody on both sides has big enough testbeds setup that a few
hours spent on doing that - and please add in asymmetric networks especially -
and perusing the results ought to be enlightening to everyone as to the kind
of problems real people have, on real networks.

Can the L4S and SCE folk run the rrul test some day soon? Please?

I rather liked this benchmark that tested another traffic mix,

( https://www.cablelabs.com/wp-content/uploads/2014/06/DOCSIS-AQM_May2014.pdf )

although it had many flaws (like not doing dns lookups), I wish it
could be dusted off and used to compare this
new fangled ecn enabled stuff with the kind of results you can merely get
with packet loss and rtt awareness. It would be so great to be able
to directly compare all these new algorithms against this benchmark.

Adding in a non ecn'd udp based routing protocol on heavily
oversubscribed 100mbit link is also enlightening.

I'd rather like to see that benchmark improved for a more modernized
home traffic mix
where it is projected there may be 30 devices on the network on average,
in a few years.

If there is any one thing y'all can do to reduce my blood pressure and
keep me engaged here whilst you
debate the end of the internet as I understand it, it would be to run
the rrul test as part of all your benchmarks.

thank you.

B) Stuart Cheshire regaled us with several anecdotes - one concerning
his problems
with comcast's 1Gbit/35mbit service being unusable, under load, for
videoconferencing. This is true. The overbuffering at the CMTSes
still, has to be seen to be believed, at all rates. At lower rates
it's possible to shape this, with another device (which is what
the entire SQM deployment does in self defense and why cake has a
specific docsis ingress mode), but it is cpu intensive
and requires x86 hardware to do well at rates above 500Mbits, presently.

So I wish CMTS makers (Arris and Cisco) were in this room. are they?

(Stuart, if you'd like a box that can make your comcast link pleasurable
under all workloads, whenever you get back to los gatos, I've got a few
lying around. Was so happy to get a few ietfers this past week to apply
what's off the shelf for end users today. :)

C) I am glad bob said the L4S is finally looking at asymmetric
networks, and starting to tackle ack-filtering and accecn issues
there.

But... I would have *started there*. Asymmetric access is the predominate form
of all edge technologies.

I would love to see flent rrul test results for 1gig/35mbit, 100/10, 200/10
services, in particular. (from SCE also!). "lifeline" service (11/2)
would be good
to have results on. It would be especially good to have baseline
comparison data from the measured, current deployment
of the CMTSes at these rates, to start with, with no queue management in
play, then pie on the uplink, then fq_codel on the uplink, and then
this ecn stuff, and so on.

D) The two CPE makers in the room have dismissed both fq and sce as
being too difficult to implement. They did say that dualpi was
actually implemented in software, not hardware.

I would certainly like them to benchmark what they plan to offer in L4S
vs what is already available in the edgerouter X, as one low end
example among thousands.

I also have to note, at higher speeds, all the buffering moves into
the wifi and the results are currently ugly. I imagine
they are exploring how to fix their wifi stacks also? I wish more folk
were using RVR + latency benchmarks like this one:

http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf

Same goes for the LTE folk.

E) Andrew mcgregor mentioned how great it would be for a closeted musician to
be able to play in real time with someone across town. that has been my goal
for nearly 30 years now!! And although I rather enjoyed his participation in
my last talk on the subject (
https://blog.apnic.net/2020/01/22/bufferbloat-may-be-solved-but-its-not-over-yet/
) conflating
a need for ecn and l4s signalling for low latency audio applications
with what I actually said in that talk, kind of hurt. I achieved
"my 2ms fiber based guitarist to fiber based drummer dream" 4+ years
back with fq_codel and diffserv, no ecn required,
no changes to the specs, no mandating packets be undroppable" and
would like to rip the opus codec out of that mix one day.

F) I agree with jana that changing the definition of RFC3168 to suit
the RED algorithm (which is not pi or anything fancy) often present in
network switches,
today to suit dctcp, works. But you should say "configuring red to
have l4s marking style" and document that.

Sometimes I try to point out many switches have a form of DRR in them,
and it's helpful to use that in conjunction with whatever diffserv
markings you trust in your network.

To this day I wish someone would publish how much they use DCTCP style
signalling on a dc network relative to their other traffic.

To this day I keep hoping that someone will publish a suitable
set of RED parameters for a wide variety of switches and routers -
for the most common switches and ethernet chips, for correct DCTCP usage.

Mellonox's example:
( https://community.mellanox.com/s/article/howto-configure-ecn-on-mellanox-ethernet-switches--spectrum-x
) is not dctcp specific.

many switches have a form of DRR in them, and it's helpful to use that
in conjunction with whatever diffserv markings you trust in your
network,
and, as per the above example, segregate two red queues that way. From
what I see
above there is no way to differentiate ECT(0) from ECT(1) in that switch. (?)

I do keep trying to point out the size of the end user ecn enabled
deployment, starting with the data I have from free.fr. Are we
building a network for AIs or people?

G) Jana also made a point about 2 queues "being enough" (I might be
mis-remembering the exact point). Mellonoxes ethernet chips at 10Gig expose
64 hardware queues, some new intel hardware exposes 2000+. How do these
queues work relative to these algorithms?

We have generally found hw mq to be far less of a benefit than the
manufacturers think, especially as regard to
lower latency or reduced cpu usage (as cache crossing is a bear).
There is a lot of software work in this area left to be done, however
they are needed to match queues to cpus (and tenants)

Until sch_pie gained timestamping support recently, the rate estimator
did not work correctly in a hw mq environment. Haven't looked over
dualpi in this respect.

--
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat