From: Dave Taht <dave.taht@gmail.com>
To: Neal Cardwell <ncardwell@google.com>
Cc: ECN-Sane <ecn-sane@lists.bufferbloat.net>,
BBR Development <bbr-dev@googlegroups.com>,
flent-users <flent-users@flent.org>
Subject: Re: [Ecn-sane] [bbr-dev] duplicating the BBRv2 tests at iccrg in flent?
Date: Fri, 5 Apr 2019 17:51:03 +0200 [thread overview]
Message-ID: <CAA93jw6HUzjq1Rk9OsqXRuWze3tzXfhz1p3promaD_Zx1Xbdbw@mail.gmail.com> (raw)
In-Reply-To: <CADVnQy=DfST=dHFkZg9EeQRL0OH9HOqgRfJ0uWgUu4fBLD9tSA@mail.gmail.com>
Thanks!
On Fri, Apr 5, 2019 at 5:11 PM Neal Cardwell <ncardwell@google.com> wrote:
>
> On Fri, Apr 5, 2019 at 3:42 AM Dave Taht <dave.taht@gmail.com> wrote:
>>
>> I see from the iccrg preso at 7 minutes 55 s in, that there is a test
>> described as:
>>
>> 20 BBRv2 flows
>> starting each 100ms, 1G, 1ms
>> Linux codel with ECN ce_threshold at 242us sojurn time.
>
>
> Hi, Dave! Thanks for your e-mail.
I have added you to ecn-sane's allowed sender filters.
>
>>
>> I interpret this as
>>
>> 20 flows, starting 100ms apart
>> on a 1G link
>> with a 1ms transit time
>> and linux codel with ce_threshold 242us
>
>
> Yes, except the 1ms is end-to-end two-way propagation time.
>
>>
>> 0) This is iperf? There is no crypto?
>
>
> Each flow is a netperf TCP stream, with no crypto.
OK. I do wish netperf had a tls mode.
>
>>
>>
>> 1) "sojourn time" not as as setting the codel target to 242us?
>>
>> I tend to mentally tie the concept of sojourn time to the target
>> variable, not ce_threshold
>
>
> Right. I didn't mean setting the codel target to 242us. Where the slide says "Linux codel with ECN ce_threshold at 242us sojourn time" I literally mean a Linux machine with a codel qdisc configured as:
>
> codel ce_threshold 242us
>
> This is using the ce_threshold feature added in:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=80ba92fa1a92dea1
>
> ... for which the commit message says:
>
> "A DCTCP enabled egress port simply have a queue occupancy threshold
> above which ECT packets get CE mark. In codel language this translates to a sojourn time, so that one doesn't have to worry about bytes or bandwidth but delays."
I had attempted to discuss deprecating this option back in august on
the codel list:
https://lists.bufferbloat.net/pipermail/codel/2018-August/002367.html
As well as changing a few other core features. I put most of what I
discussed there into https://github.com/dtaht/fq_codel_fast which I
was using for comparison to the upcoming cake paper, and that is now
where the first cut at the sce work resides also.
> The 242us comes from the seriailization delay for 20 packets at 1Gbps.
I thought it was more because how hard it is to get an accurate
measurement below about ~500us. In our early work on attempting
virtualizations, things like Xen would frequently jitter scheduling by
10-20ms or more. While that situation has got much better, I tend to
still prefer "bare metal" when working on this stuff - and often
"weak" bare metal. like the mips processors we mostly use in the
cerowrt project.
Even then I get nervous below 500us unless it's a r/t kernel.
I used irtt to profile this underlying packet + scheduling jitter on
various virtual machine fabrics from 2ms to 10us a while back (google
cloud, aws, linode) but never got around to publishing the work. I
guess I should go pull those numbers out...
>
>> 2) In our current SCE work we have repurposed ce_threshold to do sce
>> instead (to save on cpu and also to make it possible to fiddle without
>> making a userspace api change). Should we instead create a separate
>> sce_threshold option to allow for backward compatible usage?
>
>
> Yes, you would need to maintain the semantics of ce_threshold for backwards compatibility for users who are relying on the current semantics. IMHO your suggestion to use a separate sce_threshold sounds like the way to go, if adding SCE to qdiscs in Linux.
>
>>
>> 3) Transit time on your typical 1G link is actually 13us for a big
>> packet, why 1ms?
>
>
> The 1ms is the path two-way propagation delay ("min RTT"). We run a range of RTTs in our tests, and the graph happens to be for an RTT of 1ms.
>
OK.
>>
>> is that 1ms from netem?
>
>
> Yes.
>
>>
>> 4) What is the topology here?
>>
>> host -> qdisc -> wire -> host?
>>
>> host -> qdisc -> wire -> router -> host?
>
>
> Those two won't work with Linux TCP, because putting the qdisc on the sender pulls the qdisc delays inside the TSQ control loop, giving a behavior very different from reality (even CUBIC won't bloat if the network emulation qdiscs are on the sender host).
>
> What we use for our testing is:
>
> host -> wire -> qdiscs -> host
>
> Where "qdiscs" includes netem and whatever AQM is in use, if any.
Normally how I do the "qdiscs" is I call it a "router" :) and then the
qdiscs usually look like this:
eth0 -> netem -> aqm_alg -> eth1
eth0 <- aqm_alg <- netem <- eth1
using ifb for the inbound management.
I didn't get to where I trusted netem to do this right until about a
year ago, up until that point I had also always used a separate
"delay" box.
Was GRO/GSO enabled on the router? host? server?
>
>>
>> 5) What was the result with fq_codel instead?
>
>
> With fq_codel and the same ECN marking threshold (fq_codel ce_threshold 242us), we see slightly smoother fairness properties (not surprising) but with slightly higher latency.
>
> The basic summary:
>
> retransmits: 0
> flow throughput: [46.77 .. 51.48]
> RTT samples at various percentiles:
> % | RTT (ms)
> ------+---------
> 0 1.009
> 50 1.334
> 60 1.416
> 70 1.493
> 80 1.569
> 90 1.655
> 95 1.725
> 99 1.902
> 99.9 2.328
> 100 6.414
This is lovely. Is there an open source tool you are using to generate
this from the packet capture? From wireshark? Or is this from sampling
the TCP_INFO parameter of netperf?
>
> Bandwidth share graphs are attached. (Hopefully the graphs will make it through various lists; if not, you can check the bbr-dev group thread.)
>
> best,
> neal
>
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
next prev parent reply other threads:[~2019-04-05 15:51 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-05 7:42 [Ecn-sane] " Dave Taht
2019-04-05 15:10 ` [Ecn-sane] [bbr-dev] " Neal Cardwell
2019-04-05 15:51 ` Dave Taht [this message]
2019-04-05 16:58 ` Neal Cardwell
2019-04-05 16:20 ` Jonathan Morton
2019-04-06 11:56 ` Neal Cardwell
2019-04-06 14:37 ` [Ecn-sane] [Flent-users] " Sebastian Moeller
2019-04-09 1:33 ` Neal Cardwell
2019-04-09 2:09 ` Jonathan Morton
2019-04-09 6:30 ` Sebastian Moeller
2019-04-09 14:33 ` Neal Cardwell
2019-04-09 17:20 ` Sebastian Moeller
2019-04-06 11:49 ` [Ecn-sane] " Dave Taht
2019-04-06 12:31 ` Neal Cardwell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/ecn-sane.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAA93jw6HUzjq1Rk9OsqXRuWze3tzXfhz1p3promaD_Zx1Xbdbw@mail.gmail.com \
--to=dave.taht@gmail.com \
--cc=bbr-dev@googlegroups.com \
--cc=ecn-sane@lists.bufferbloat.net \
--cc=flent-users@flent.org \
--cc=ncardwell@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox