From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x22a.google.com (mail-oi0-x22a.google.com [IPv6:2607:f8b0:4003:c06::22a]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 2E97621F4E3 for ; Thu, 4 Dec 2014 10:24:33 -0800 (PST) Received: by mail-oi0-f42.google.com with SMTP id v63so12790149oia.15 for ; Thu, 04 Dec 2014 10:24:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=JJgrxFRjm3cFGouJTmLkjOeqCeZWjaPpObXZy3oOjMA=; b=jpazBOwHj0aobd4i4XpfTZ01uhytmyF4hr8TmM11mayuQB42pxe6q2stYq9Aesjn/j I29v82HlUE81jlabt2ullvarxQMWu4/19nwtGhzntx7Tz4P7NJmsngeazLsMp1E85jyT wflVt4oMDbQstO/mMjKzovY6Yx6kR/MP5Wb8Zl+YVwSaOhthEJZxM7hh0iK5sW8BoYjp OhQ5zUD+29QcFpsqipEx6sS1yi8VuF5BkEpH9U8jC/Zgf2mxrqA9nm/3RVICmeMURfGY 2an2Ts9tzGDnZ2vVPxKFvEMQEnLXFH+2IBfKC0vhjbirl6rpsayB55xBfl0pMIHFp+4t NEyQ== MIME-Version: 1.0 X-Received: by 10.182.245.130 with SMTP id xo2mr1042409obc.16.1417717472746; Thu, 04 Dec 2014 10:24:32 -0800 (PST) Received: by 10.202.227.77 with HTTP; Thu, 4 Dec 2014 10:24:32 -0800 (PST) In-Reply-To: References: <121767.1413574248@turing-police.cc.vt.edu> <9382.1413826910@turing-police.cc.vt.edu> <544672D3.8020709@redhat.com> <58702.1413908858@turing-police.cc.vt.edu> <54469242.5010506@redhat.com> <54469B96.50204@redhat.com> Date: Thu, 4 Dec 2014 10:24:32 -0800 Message-ID: From: Dave Taht To: Tom Gundersen Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] SQM in mainline openwrt, fq_codel considered for fedora default X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Dec 2014 18:25:02 -0000 On Thu, Dec 4, 2014 at 8:09 AM, Tom Gundersen wrote: > On Tue, Oct 21, 2014 at 10:59 PM, Dave Taht wrote: >> On Tue, Oct 21, 2014 at 12:51 PM, Dave Taht wrote: >>>> http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference= .png >>>> >>>> You can see that BQL makes the most difference in the latency. >>> >>> And ALSO that these fixes also improved system throughput enormously. >> >> Meant to include that plot. >> >> http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewithbql.png >> >> You can disregard the decline in download bandwidth (as we are also >> sending 5x as many acks and measurement data, which is not counted >> in that part of the plot) >> >>> This is partially due to the improvement in ack clocking you get from >>> reduced RTTs, partially due to improved cache behavior (shorter >>> queues), and partially continual improvements elsewhere in the tcp >>> portions of the stack. >>> >>> With more recent kernels... >>> >>> I now get full throughput from the beagles in both directions with the >>> 3.16 kernel, >>> the stil out of tree bql patch, and either fq or fq_codel. I haven't >>> got around to plotting all those results (they are from kathie's new >>> lab), but they are here: >>> http://snapon.lab.bufferbloat.net/~d/pollere/ >> >> The latency spikes are generally due to not having BQL, probably: >> http://snapon.lab.bufferbloat.net/~d/pollere/beagle/beagle-3.8-nobql-fq-= fq_codel.png >> >> This is using the new fq scheduler on both sides, with BQL enabled. >> >> http://snapon.lab.bufferbloat.net/~d/pollere/beagle/beagle_3.16-fq-fq-no= -offloads.png >> >> The switch most likely is prioritizing EF marked packets. (as is >> sch_fq). Most of the buffering is in the switch, not the host, now. >> (the prior results I showed had no switch in the way) >> >>> There is a small buffered tail dropping switch in the way, on these >>> later data sets. There was some puzzling behavior on the e1000e that I >>> need to poke into in a more controlled setting. >>> >>> As for other tunables on hosts, TCP small queues might be amiable to >>> some tuning, but that too may well evolve further in kernelspace. >> >> So I have now drowned you in data on one architecture and setup. The >> most thoroughly publicly analyzed devices and drivers are the ar71xx, >> e1000e, and beaglebone at this point. >> >> The use of fq_codel in a qos system (artificially rate limited using >> htb, hfsc, or tbf) is pretty well proven to be a huge win at this >> point. >> >> At line rates fq and fq_codel still help quite a bit without a BQL >> enabled driver, BQL is needed for best results. I don't know to what >> extent the BQL enabled drivers already cover the marketplace, it was >> generally my assumption that the e1000e counted for a lot... >> >> http://www.bufferbloat.net/projects/bloat/wiki/BQL_enabled_drivers >> >> And thus with all the positive results so far, more wider distribution >> of the new stuff >> on more devices outside the sample set of those on the bloat, >> cerowrt-devel and codel lists (about 500 people all told), >> >> and all ya gotta do is turn it on. > > Thanks for the very detailed info Dave. > > What I'm taking from this is that at the moment we should not be doing > anything more than what we already do (the sysctl changes that Michal > pushed), and that any improvements should be made in the kernel > drivers (BQL support in particular). I would like to see more pushback from everyone in the universe on new ethernet drivers in particular, to always have BQL. it is very useful at all speeds and there has been a big push for it in the 10GigE+ space, but not enough at gigE and below. So we are always asking for it when a new driver arrives, or new work on an old driver arrives. (recently the nvneta for example). It would be nice if there was a big push to add BQL to older drivers (talk to jesper about that) Recently xmit_more support arrived in the kernel mainline, it needs BQL, and it's *very good*. > What we could do in the future, is in case you have kernel features > that should be enabled by default as they benefit the general user, > but where you cannot change the kernel defaults due to backwards > compat, we can (as we did with fq) add these to the recommended sysctl > files shipped with systemd. This is of course just a gentle nudge for > people to use features, distros/admins can still override them. Cool. But I don't mind testing stuff at this level for a year or more befor= e inflicting it on users. :) Relative to the kernel options, IPV6_SUBTREES is needed for good quality source specific routing support. So far as I know that has defaulte= d to on in most distros for a long time. Relative to device naming, we are increasingly living in a world where IP addresses come and go, and firewalls are having a lot of trouble dealing with that - (two recent examples: http://www.bufferbloat.net/issues/438 https://github.com/sbyx/odhcp6c/issues/27 ) My own take on this was to name devices after their security level and then type in cerowrt, which is not an idea that has been taken up anywhere else - but that allows for never having to reload the firewall and loss conntrack state on an address change or interface add, if more work wa= s put into it. It does look like nftables has some simpler means for adding/ deleting interfaces from a security model, but many solutions seem to require changing a lot of state and introducing brief windows of vulnerability or network disconnectivity. In my case I regularly run with two or more interfaces always connected to the internet, I never need to down one and up the other, or change default routes, I just unplug in one place, and go to another, and plug in again... and all my sessions stay up.... [1]. Admittedly now that I use mosh more often (has ipv6 support in git head), I am not as bothered by disconnectivity as I used to be..... Lastly I would like to see the dhcp-related work in systemd paying attention to the kinds of issues the ipv6 homenet wg in the ietf and openwrt (and early distributors of ipv6 native addressing, like comcast) are dealing with... and things like hnetd looked at. >> This gets me to the stopping point that we hit a whlle back, which was >> reliably determining if a good clocksource was present in the system. >> Somehow. clock_getres, perhaps? > > I appear to have missed the context on this. In what situation do we > (userspace) need to care about the existence of a high quality > clocksource? There are a (very few) cpus/arches that lack a fast and high quality timeso= urce. way older x86 boxes lacking hpet, for example. Network performance on these devices tends to be pretty bad in the first place, and it is simply unknown= at this point how much the per-packet timestamping in codel would change thing= s. Nearly all modern chips have a fast, onchip, single cycle timesource, so th= is is not a problem for those. > > Cheers, > > Tom [1] In fact I just noticed my ethernet was unplugged, and had been for a while.... d@ganesha:~/git/tinc/src$ ifconfig eth0 eth0 Link encap:Ethernet HWaddr d8:50:e6:a0:b6:32 inet addr:172.26.16.2 Bcast:172.26.16.255 Mask:255.255.255.0 inet6 addr: fe80::da50:e6ff:fea0:b632/64 Scope:Link UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:9508669 errors:0 dropped:0 overruns:0 frame:0 TX packets:8321719 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:9637122680 (9.6 GB) TX bytes:5583512622 (5.5 GB) d@ganesha:~/git/tinc/src$ ifconfig wlan0 wlan0 Link encap:Ethernet HWaddr 24:0a:64:cc:24:7d inet addr:172.26.17.228 Bcast:255.255.255.255 Mask:0.0.0.0 inet6 addr: fe80::260a:64ff:fecc:247d/64 Scope:Link inet6 addr: 2601:9:3600:b397:260a:64ff:fecc:242d/128 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1743729 errors:0 dropped:0 overruns:0 frame:0 TX packets:1489781 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1077109813 (1.0 GB) TX bytes:504903872 (504.9 MB) d@ganesha:~/git/tinc/src$ ip route default via 172.26.17.224 dev wlan0 proto 42 onlink 69.181.216.0/22 via 172.26.17.224 dev wlan0 proto 42 onlink 169.254.0.0/16 dev eth0 scope link metric 1000 172.26.16.0/24 dev eth0 proto kernel scope link src 172.26.16.2 172.26.16.1 via 172.26.17.227 dev wlan0 proto 42 onlink 172.26.16.3 via 172.26.17.227 dev wlan0 proto 42 onlink 172.26.17.0/24 via 172.26.17.224 dev wlan0 proto 42 onlink 172.26.17.3 via 172.26.17.227 dev wlan0 proto 42 onlink 172.26.17.227 via 172.26.17.227 dev wlan0 proto 42 onlink 192.168.7.2 via 172.26.17.227 dev wlan0 proto 42 onlink d@ganesha:~/git/tinc/src$ --=20 Dave T=C3=A4ht http://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks