From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qt1-x841.google.com (mail-qt1-x841.google.com [IPv6:2607:f8b0:4864:20::841]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id CBCD23CB35 for ; Fri, 5 Apr 2019 11:51:15 -0400 (EDT) Received: by mail-qt1-x841.google.com with SMTP id z16so7948069qtn.4 for ; Fri, 05 Apr 2019 08:51:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=nid1kyjSQDBhvoBhnZetQXsJWMp0a/Yov4EJIQC4OnM=; b=sLLBMbtE/0Ykfc1vDQ2tIuQ4wAAabgGJwOhkbBU5ql0P56mVF+1KkCKRO1Ki6s/kEk z/+o7k+0RBMT+BMF33Npv9mjoMd5yDTnX64TKNpNagqyKitKHfmtu59oHJkVN6oDiObX b5kTQDYbMSlOfvKXA+BLq7w5b4aatB/54oH+L9MToz3mIZUkwHeL073Vo17BJIjOWp4g SiBQrqe171RNj9SkOfYmBX4z2BDjff7b3R45+bbwq6GskKfswnOM7A+MwFVxg/QIA593 ndt+bSJSLCAgEcs0vVKrhTvfSixocpgZ+eAGbgTsuLrY8qpdxPMWM3SvRZ79ZtlsE2kw BIGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=nid1kyjSQDBhvoBhnZetQXsJWMp0a/Yov4EJIQC4OnM=; b=jnoKYfKWrk7K1rJ7O53z+RIvS+rb3oZoY90Aa66HAcxyGBlW0ScXqyixJVlreRuNRk HhKdIwZ1V99nh7w1I7JkapZnoJi5yvl1059AG3Z+fSYY1GzvDvrwJ9hGMU72ydKjxrzA o7lA7Vav8TbNqDvuj5svP1bJZ/+DkVh2+9/i8HYLZnUn+ZIUIhLuVQW6xg7hX7b7EeL9 +ZZM6JzOkhAQGrQ+02XsPbvI977u7lIWLoHWP70FpmXvj6ceXT2qZVDAoKtVylbbS/iy xpndMJ2N5Rlt0fskDN2KBJuIM+EK52K+t8UgNVwYIp1vYSpuwutghaKls4p7CgRuziKN erzQ== X-Gm-Message-State: APjAAAVf001Ywl8jYxb7fgbraijLL1+ch+if5XL9iTJdzlAdIIt2z0BJ AGsOhwLFbiO4zGNmS/9VYDvVK/lZJgRmPFPEt9g= X-Google-Smtp-Source: APXvYqzJNnBJ+G6GcQdQ2cmKdVuacTN/lQHZmpXDocvZbltnjTkMbztbFKmGhifURvtMYNcbUgy/nLMlKXZNzMlxCnI= X-Received: by 2002:ac8:2368:: with SMTP id b37mr11766433qtb.50.1554479475259; Fri, 05 Apr 2019 08:51:15 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dave Taht Date: Fri, 5 Apr 2019 17:51:03 +0200 Message-ID: To: Neal Cardwell Cc: ECN-Sane , BBR Development , flent-users Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Ecn-sane] [bbr-dev] duplicating the BBRv2 tests at iccrg in flent? X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Apr 2019 15:51:15 -0000 Thanks! On Fri, Apr 5, 2019 at 5:11 PM Neal Cardwell wrote: > > On Fri, Apr 5, 2019 at 3:42 AM Dave Taht wrote: >> >> I see from the iccrg preso at 7 minutes 55 s in, that there is a test >> described as: >> >> 20 BBRv2 flows >> starting each 100ms, 1G, 1ms >> Linux codel with ECN ce_threshold at 242us sojurn time. > > > Hi, Dave! Thanks for your e-mail. I have added you to ecn-sane's allowed sender filters. > >> >> I interpret this as >> >> 20 flows, starting 100ms apart >> on a 1G link >> with a 1ms transit time >> and linux codel with ce_threshold 242us > > > Yes, except the 1ms is end-to-end two-way propagation time. > >> >> 0) This is iperf? There is no crypto? > > > Each flow is a netperf TCP stream, with no crypto. OK. I do wish netperf had a tls mode. > >> >> >> 1) "sojourn time" not as as setting the codel target to 242us? >> >> I tend to mentally tie the concept of sojourn time to the target >> variable, not ce_threshold > > > Right. I didn't mean setting the codel target to 242us. Where the slide s= ays "Linux codel with ECN ce_threshold at 242us sojourn time" I literally m= ean a Linux machine with a codel qdisc configured as: > > codel ce_threshold 242us > > This is using the ce_threshold feature added in: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm= it/?id=3D80ba92fa1a92dea1 > > ... for which the commit message says: > > "A DCTCP enabled egress port simply have a queue occupancy threshold > above which ECT packets get CE mark. In codel language this translates to= a sojourn time, so that one doesn't have to worry about bytes or bandwidth= but delays." I had attempted to discuss deprecating this option back in august on the codel list: https://lists.bufferbloat.net/pipermail/codel/2018-August/002367.html As well as changing a few other core features. I put most of what I discussed there into https://github.com/dtaht/fq_codel_fast which I was using for comparison to the upcoming cake paper, and that is now where the first cut at the sce work resides also. > The 242us comes from the seriailization delay for 20 packets at 1Gbps. I thought it was more because how hard it is to get an accurate measurement below about ~500us. In our early work on attempting virtualizations, things like Xen would frequently jitter scheduling by 10-20ms or more. While that situation has got much better, I tend to still prefer "bare metal" when working on this stuff - and often "weak" bare metal. like the mips processors we mostly use in the cerowrt project. Even then I get nervous below 500us unless it's a r/t kernel. I used irtt to profile this underlying packet + scheduling jitter on various virtual machine fabrics from 2ms to 10us a while back (google cloud, aws, linode) but never got around to publishing the work. I guess I should go pull those numbers out... > >> 2) In our current SCE work we have repurposed ce_threshold to do sce >> instead (to save on cpu and also to make it possible to fiddle without >> making a userspace api change). Should we instead create a separate >> sce_threshold option to allow for backward compatible usage? > > > Yes, you would need to maintain the semantics of ce_threshold for backwar= ds compatibility for users who are relying on the current semantics. IMHO y= our suggestion to use a separate sce_threshold sounds like the way to go, i= f adding SCE to qdiscs in Linux. > >> >> 3) Transit time on your typical 1G link is actually 13us for a big >> packet, why 1ms? > > > The 1ms is the path two-way propagation delay ("min RTT"). We run a range= of RTTs in our tests, and the graph happens to be for an RTT of 1ms. > OK. >> >> is that 1ms from netem? > > > Yes. > >> >> 4) What is the topology here? >> >> host -> qdisc -> wire -> host? >> >> host -> qdisc -> wire -> router -> host? > > > Those two won't work with Linux TCP, because putting the qdisc on the sen= der pulls the qdisc delays inside the TSQ control loop, giving a behavior v= ery different from reality (even CUBIC won't bloat if the network emulation= qdiscs are on the sender host). > > What we use for our testing is: > > host -> wire -> qdiscs -> host > > Where "qdiscs" includes netem and whatever AQM is in use, if any. Normally how I do the "qdiscs" is I call it a "router" :) and then the qdiscs usually look like this: eth0 -> netem -> aqm_alg -> eth1 eth0 <- aqm_alg <- netem <- eth1 using ifb for the inbound management. I didn't get to where I trusted netem to do this right until about a year ago, up until that point I had also always used a separate "delay" box. Was GRO/GSO enabled on the router? host? server? > >> >> 5) What was the result with fq_codel instead? > > > With fq_codel and the same ECN marking threshold (fq_codel ce_threshold 2= 42us), we see slightly smoother fairness properties (not surprising) but wi= th slightly higher latency. > > The basic summary: > > retransmits: 0 > flow throughput: [46.77 .. 51.48] > RTT samples at various percentiles: > % | RTT (ms) > ------+--------- > 0 1.009 > 50 1.334 > 60 1.416 > 70 1.493 > 80 1.569 > 90 1.655 > 95 1.725 > 99 1.902 > 99.9 2.328 > 100 6.414 This is lovely. Is there an open source tool you are using to generate this from the packet capture? From wireshark? Or is this from sampling the TCP_INFO parameter of netperf? > > Bandwidth share graphs are attached. (Hopefully the graphs will make it t= hrough various lists; if not, you can check the bbr-dev group thread.) > > best, > neal > --=20 Dave T=C3=A4ht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740