On Fri, Apr 5, 2019 at 3:42 AM Dave Taht <dave.taht@gmail.com> wrote:

> I see from the iccrg preso at 7 minutes 55 s in, that there is a test
> described as:
>
> 20 BBRv2 flows
> starting each 100ms, 1G, 1ms
> Linux codel with ECN ce_threshold at 242us sojurn time.
>

Hi, Dave! Thanks for your e-mail.


> I interpret this as
>
> 20 flows, starting 100ms apart
> on a 1G link
> with a 1ms transit time
> and linux codel with ce_threshold 242us
>

Yes, except the 1ms is end-to-end two-way propagation time.


> 0) This is iperf? There is no crypto?
>

Each flow is a netperf TCP stream, with no crypto.


>
> 1) "sojourn time" not as as setting the codel target to 242us?
>
> I tend to mentally tie the concept of sojourn time to the target
> variable, not ce_threshold
>

Right. I didn't mean setting the codel target to 242us. Where the slide
says "Linux codel with ECN ce_threshold at 242us sojourn time" I literally
mean a Linux machine with a codel qdisc configured as:

  codel ce_threshold 242us

This is using the ce_threshold feature added in:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=80ba92fa1a92dea1

... for which the commit message says:

"A DCTCP enabled egress port simply have a queue occupancy threshold
above which ECT packets get CE mark. In codel language this translates to a
sojourn time, so that one doesn't have to worry about bytes or bandwidth
but delays."

The 242us comes from the seriailization delay for 20 packets at 1Gbps.

2) In our current SCE work we have repurposed ce_threshold to do sce
> instead (to save on cpu and also to make it possible to fiddle without
> making a userspace api change). Should we instead create a separate
> sce_threshold option to allow for backward compatible usage?
>

Yes, you would need to maintain the semantics of ce_threshold for backwards
compatibility for users who are relying on the current semantics. IMHO your
suggestion to use a separate sce_threshold sounds like the way to go, if
adding SCE to qdiscs in Linux.


> 3) Transit time on your typical 1G link is actually 13us for a big
> packet, why 1ms?
>

The 1ms is the path two-way propagation delay ("min RTT"). We run a range
of RTTs in our tests, and the graph happens to be for an RTT of 1ms.


> is that 1ms from netem?
>

Yes.


> 4) What is the topology here?
>
> host -> qdisc -> wire -> host?
>
> host -> qdisc -> wire -> router -> host?
>

Those two won't work with Linux TCP, because putting the qdisc on the
sender pulls the qdisc delays inside the TSQ control loop, giving a
behavior very different from reality (even CUBIC won't bloat if the network
emulation qdiscs are on the sender host).

What we use for our testing is:

  host -> wire -> qdiscs -> host

Where "qdiscs" includes netem and whatever AQM is in use, if any.


> 5) What was the result with fq_codel instead?
>

With fq_codel and the same ECN marking threshold (fq_codel ce_threshold
242us), we see slightly smoother fairness properties (not surprising) but
with slightly higher latency.

The basic summary:

retransmits: 0
flow throughput: [46.77 .. 51.48]
RTT samples at various percentiles:
  %   | RTT (ms)
------+---------
   0    1.009
  50    1.334
  60    1.416
  70    1.493
  80    1.569
  90    1.655
  95    1.725
  99    1.902
  99.9  2.328
 100    6.414

Bandwidth share graphs are attached. (Hopefully the graphs will make it
through various lists; if not, you can check the bbr-dev group thread
<https://groups.google.com/d/msg/bbr-dev/34nWGWLUjp4/nBxQRV01BgAJ>.)

best,
neal