On Fri, Apr 5, 2019 at 3:42 AM Dave Taht <dave.taht@gmail.com> wrote:

I see from the iccrg preso at 7 minutes 55 s in, that there is a test
described as:

20 BBRv2 flows
starting each 100ms, 1G, 1ms
Linux codel with ECN ce_threshold at 242us sojurn time.

Hi, Dave! Thanks for your e-mail.

I interpret this as

20 flows, starting 100ms apart
on a 1G link
with a 1ms transit time
and linux codel with ce_threshold 242us

Yes, except the 1ms is end-to-end two-way propagation time.

0) This is iperf? There is no crypto?

Each flow is a netperf TCP stream, with no crypto.

1) "sojourn time" not as as setting the codel target to 242us?

I tend to mentally tie the concept of sojourn time to the target
variable, not ce_threshold

Right. I didn't mean setting the codel target to 242us. Where the slide says "Linux codel with ECN ce_threshold at 242us sojourn time" I literally mean a Linux machine with a codel qdisc configured as:

codel ce_threshold 242us

This is using the ce_threshold feature added in:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=80ba92fa1a92dea1

... for which the commit message says:

"A DCTCP enabled egress port simply have a queue occupancy threshold

above which ECT packets get CE mark. In codel language this translates to a sojourn time, so that one doesn't have to worry about bytes or bandwidth but delays."

The 242us comes from the seriailization delay for 20 packets at 1Gbps.

2) In our current SCE work we have repurposed ce_threshold to do sce
instead (to save on cpu and also to make it possible to fiddle without
making a userspace api change). Should we instead create a separate
sce_threshold option to allow for backward compatible usage?

Yes, you would need to maintain the semantics of ce_threshold for backwards compatibility for users who are relying on the current semantics. IMHO your suggestion to use a separate sce_threshold sounds like the way to go, if adding SCE to qdiscs in Linux.

3) Transit time on your typical 1G link is actually 13us for a big
packet, why 1ms?

The 1ms is the path two-way propagation delay ("min RTT"). We run a range of RTTs in our tests, and the graph happens to be for an RTT of 1ms.

is that 1ms from netem?

Yes.

4) What is the topology here?

host -> qdisc -> wire -> host?

host -> qdisc -> wire -> router -> host?

Those two won't work with Linux TCP, because putting the qdisc on the sender pulls the qdisc delays inside the TSQ control loop, giving a behavior very different from reality (even CUBIC won't bloat if the network emulation qdiscs are on the sender host).

What we use for our testing is:

host -> wire -> qdiscs -> host

Where "qdiscs" includes netem and whatever AQM is in use, if any.

5) What was the result with fq_codel instead?

With fq_codel and the same ECN marking threshold (fq_codel ce_threshold 242us), we see slightly smoother fairness properties (not surprising) but with slightly higher latency.

The basic summary:

retransmits: 0

flow throughput: [46.77 .. 51.48]

RTT samples at various percentiles:

% | RTT (ms)

------+---------

0 1.009

50 1.334

60 1.416

70 1.493

80 1.569

90 1.655

95 1.725

99 1.902

99.9 2.328

100 6.414

Bandwidth share graphs are attached. (Hopefully the graphs will make it through various lists; if not, you can check the bbr-dev group thread.)

best,
neal