From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-x229.google.com (mail-ob0-x229.google.com [IPv6:2607:f8b0:4003:c01::229]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id DAC1521F33C for ; Tue, 21 Oct 2014 12:21:05 -0700 (PDT) Received: by mail-ob0-f169.google.com with SMTP id m8so1625051obr.14 for ; Tue, 21 Oct 2014 12:21:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=tSZyrGR0GIeCQL/9lDnXz4fyRJLWS1cY7KBVWs8vNWE=; b=jNEbts5l8I1uQbjM3aLLvjGGE+V1HMXsVuR2v4WkemCATKoELCbhk2xTs0wfVUzpD6 vKR2qjqLOucyyP/qQtnCrxEL9GWvGw/l3gVmVDJ8y41kd1wHRvGW2QlgY5L1FeBd9wKh GAsh6xZmkj98OqaxVmc+6sduN6B0BUhLUy+szAy6MOQD+OB9VC0xuARQCtFt7RXz4ff4 EwBsixcThHK3vp5d6GHy7wVmCmE1Dd5aTCclFmmuqNwBi+Jdm8aN0kPCqGNNhdigOIfX 99FjCUeLF7wommRFoMnR+8cG4FAwpH7209toxbFZYkx6/TNQ6Tf35yXet+BMR9WaGicf mZqA== MIME-Version: 1.0 X-Received: by 10.202.179.10 with SMTP id c10mr3673914oif.102.1413919264506; Tue, 21 Oct 2014 12:21:04 -0700 (PDT) Received: by 10.202.227.211 with HTTP; Tue, 21 Oct 2014 12:21:04 -0700 (PDT) In-Reply-To: References: <121767.1413574248@turing-police.cc.vt.edu> <9382.1413826910@turing-police.cc.vt.edu> <544672D3.8020709@redhat.com> <58702.1413908858@turing-police.cc.vt.edu> <54469242.5010506@redhat.com> <54469B96.50204@redhat.com> Date: Tue, 21 Oct 2014 12:21:04 -0700 Message-ID: From: Dave Taht To: Tom Gundersen Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] SQM in mainline openwrt, fq_codel considered for fedora default X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 21 Oct 2014 19:21:34 -0000 On Tue, Oct 21, 2014 at 11:06 AM, Tom Gundersen wrote: > On Tue, Oct 21, 2014 at 7:44 PM, Michal Schmidt wro= te: >> On 10/21/2014 07:24 PM, Tom Gundersen wrote: >>> I have now subscribed to cerowrt-devel (long overdue), and I would I am curious if you or michal are also openwrt or cerowrt users? Or are running things like sch_fq or fq_codel on your desktops and servers? Having native, first hand experience with this stuff would be a good guideline. There is a lot to like about the new fq scheduler for servers and maybe for hosts= . And "cake" continues to progress. >>> very much appreciate any comments you guys may have on our networking >>> work in systemd. In particular, if there are any more tweaks like >>> making fq_codel the deafult, which would be the reasonable choice for >>> 95% of users (most of whom don't know about these things and would >>> otherwise never touch them), we are very open to suggestions. >> >> An idea: Can networkd configure interfaces' txqueuelen? >> (Though with BQL and codel maybe it's not that important anymore.) One thing that is missed by people that calculate BDP is that they usually do the math one way, with the biggest packet size or an average packet size. There are several problems with this: 1) With the advent of TSO and GSO offloads, the packet size on servers can bloat up to 64k each. Multiply this by 1256 (txqueuelen + the typical size of a tx ring) packets and you can see all the pre-BQL, pre codel latency in all it's glory, particularly at lower rates. There's a paper on this... 2) Most client workloads are ack dominated, tending towards 66 bytes each with some larger packets for http get requests, dns and voip. At this level a queue with only 1000 small packets is 3 orders of magnitude smaller, and until some recent work, could be starved by other processing on the sys= tem. 3) txqueuelen only has effect on certain qdiscs. In the case of pfifo_fast,= you can and do actually hit that limit, but in the aqms (*codel, pie, red, ared, sfqred), the limit is just there to keep from running out of resources - it's otherwise really hard to hit in those qdiscs as they start shooting or marking packets long before the limit is hit. So... fiddling with txqueuelen or the ring buffer sizes is something of a losing game. A qdisc (like bfifo or *codel) that buffered up acks or big packets with a byte, rather than packet limit, is saner, along with BQL underneath. > Hm, the way I read the docs, figuring out the "good" values is not > that straight-forward, and doing this will anyway be obsolete soon, so > not sure we should be setting anything by default. Tend to agree. I am generally allergic to TSO/GSO/GRO/LFO offloads at speeds below 100mbit, (although, sigh, I found one still shipping box from alix with a geode in it that benefits from gso slightly (being able to push out 60Mbit rather than 40)) and certainly txqueuelen is just plain too big at these speeds, but you'd have to detect the link rate in order to change it to something saner. These show the difference in pfifo_fast on the current beagle at 100mbit with txqueuelen 1000 and 100, offloads off. http://snapon.lab.bufferbloat.net/~d/beagle_nobql/pfifo_nobql_tsq3028txqueu= e1000.svg http://snapon.lab.bufferbloat.net/~d/beagle_nobql/pfifo_nobql_tsq3028txqueu= e100.svg (there are a ton of results on the beagle here in this directory, at different speeds, and buffering, before I got around to actually adding bql to it (even more results in this dir, data sets easily compared with netperf-wrapper)) http://snapon.lab.bufferbloat.net/~d/beagle_bql/pfifo_bql_tsq3028txqueue100= .svg http://snapon.lab.bufferbloat.net/~d/beagle_bql/fq_bql_tsq3028.svg http://snapon.lab.bufferbloat.net/~d/beagle_bql/fq_codel_bql_tsq3028.svg http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png You can see that BQL makes the most difference in the latency. I keep hoping for saner tuning of these offloads at higher speeds on better hardware, but it appears as of the last kernel version I tested thoroughly TSO/GSO is still needed on devices with gigE interfaces. http://snapon.lab.bufferbloat.net/~cero2/nuc-to-puck/results.html And then there's (sigh) wifi. > However, we > probably should make it much simpler to configure. We could add > support for both ringbuffer and quelength sizes to our link files [0], > so admins colud implement the bufferbloat recommendations by doing > something like: > > ----8<------ > /etc/systemd/network/00-wlan.link > [Match] > Type=3Dwlan As for wifi, there is much now published on all the problems there. A recent summary of what seems to be needed I did at ieee: (see pp 23- ) http://snapon.lab.bufferbloat.net/~d/ieee802.11-sept-17-2014/11-14-1265-00-= 0wng-More-on-Bufferbloat.pdf There is no ring buffer. Often tuning down txqueuelen is a very good idea, = with today's wifi drivers being MASSIVELY overbuffered. Better to apply fq_codel, for now, and work on restructuring that entire subsystem. > [Link] > TransmitRingBuffer=3D4 > TransmitQueueLength=3D16 Regrettably many devices do not respond to tuning such as this. Example the e1000e doesn't let you get below 64 entries in the ring buffer, and the ar71xx allows it, but crashes... (thankfully both have BQL) > ----8<------ > > (suggestions welcome for the naming of the variables and also for man > page sections). > > These settings would then be applied by udev to any udev interface as > it appears (and before libudev notifies applications about its > existence). > > Does something like this make sense? Regettably, no. I think printing a warning somewhere, when BQL is not detec= ted on an interface going up would clue more towards getting BQL adopted more f= ully. "BQL not detected on interface X, latency may be compromised, beg your vendor for BQL support" - > > Cheers, > > Tom > > [0]: > [1]: --=20 Dave T=C3=A4ht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks