[Ecn-sane] The state of l4s, bbrv2, sce?

Dave Taht dave.taht at gmail.com
Fri Jul 26 11:05:11 EDT 2019


Changing the title....

I hope to be able to add some features and boxes to the worldwide
flent fleet to gather up some more data. Simple stuff includes trying
to verify more fully worldwide what happens when you twiddle the ecn
bits, mildly longer term look at what happens when conflicting
interpretations
of these bits are in play somewhere on the path, bit longer than that
getting an openwrt build up as a middlebox and vm, and then finally,
finally
see what happens on a couple kinds of wifi.

There's now a flent server in mumbai, in particular, which I hope will
shed some insight as to the state of networks in india, long term, on
a variety
of fronts. But none of it's ready lacking a good release to freeze on.

1) BBRv2 is now available for public hacking. I had a good readthrough
last night.

The published tree applies cleanly (with a small patch) to net-next.
I've had a chance to read through the code (lots of good changes to
bbr!).

Although neal was careful to say in iccrg the optional ecn mode uses
"dctcp/l4s-style signalling", he did not identify how that was
actually applied
at the middleboxes, and the supplied test scripts
(gtests/net/tcp/bbr/nsperf) don't do that. All we know is that it's
set to kick in at 20 packets. Is it fq_codel's ce_threshold? red? pie?
dualpi? Does it revert to drop on overload?

Is it running on bare metal? 260us is at the bare bottom of what linux
can schedule reliably, vms are much worse.

Couple notes:

BBRv2 doesn't use ect(1) as an identifier.

The chromium release has no support for ecn at all.

Adding back in the stuff I'd first done to rfc3168 bbrv1 looks
straightforward, making it do sce, less so.

2) To clarify something from the l4s team, are the results you've been
presenting for years all from the 3.19 kernel? bsd? microsoft? ns2?
ns3? what?

The code on github is not worth testing against currently? It does
have some needed features like a setsockopt for using up ect(1).

should I use the issue tracker for that? I have some comments on
dualpi in addition to my outstanding question about pie's default of
drop at 10% mark
rate vs dualpi's 0. Notably it's set to 1000 packets now (fq_codel
defaults to 10,000 and we switched to memory limits both in it and
cake given a modern
packet's dynamic range of 64b to 64k). I've observed 10gige can be in
the 2-3k packets range... has dualpi been tested above 1gige yet?

3) The current patches for sce need to get rebased for net-next. The
sch_cake mods are easy but as the dctcp code did morph a bit since sce
work forked it as did the other tcps. I took a stab at forward porting
it to net-next, but I figure that development is hot and heavy and
some patches will land after ietf. I do not mind taking a stab again
at cleaning it up (helps me to understand what's going on), as how the
algos currently (as of, like, yesterday) work is clear to me... what
I'd like to do at least is also add 'em to the out of tree
fq_codel_fast implementation.

Did I miss anything about the current state of things?

My basic testbed is a string of containers on a couple 12 core boxes
on bare metal, and more advanced is the openwrt stuff part of my wifi
lab. That's
presently almost all 4.14 based on arm, mips, and x86, running both on
real hardware and in emulation.

On Fri, Jul 26, 2019 at 6:10 AM Pete Heist <pete at heistp.net> wrote:
>
>
> > On Jul 25, 2019, at 12:14 PM, De Schepper, Koen (Nokia - BE/Antwerp) <koen.de_schepper at nokia-bell-labs.com> wrote:
> >
> > We have the testbed running our reference kernel version 3.19 with the drop patch. Let me know if you want to see the difference in behavior between the “good” DCTCP and the “deteriorated” DCTCP in the latest kernels too. There were several issues introduced which made DCTCP both more aggressive, and currently less aggressive. It calls for better regression tests (for Prague at least) to make sure it’s behavior is not changed too drastically by new updates. If enough people are interested, we can organize a session in one of the available rooms.
> >
> > Pete, Jonathan,
> >
> > Also for testing further your tests, let me know when you are available.
>
> Regarding testing, we now have a five node setup in our test environment running a mixture of tcp-prague and dualq kernels to cover the scenarios Jon outlined earlier. With what little time we’ve had for it this week, we’ve only done some basic tests, and seem to be seeing behavior similar to what we saw at the hackathon, but we can discuss specific results following IETF 105.
>
> Our intention is to coordinate a public effort to create reproducible test scenarios for L4S using flent. Details to follow post-conference. We do feel it’s important that all of our Linux testing be on modern 5.1+ kernels, as the 3.19 series was end of life as of May 2015 (https://lwn.net/Articles/643934/), so we'll try to keep up to date with any patches you might have for the newer kernels.
>
> Overall, I think we’ve improved the cooperation between the teams this week (from zero to a little bit :), which should hopefully help move both projects along...
> _______________________________________________
> Ecn-sane mailing list
> Ecn-sane at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/ecn-sane



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740


More information about the Ecn-sane mailing list