From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-x22c.google.com (mail-ob0-x22c.google.com [IPv6:2607:f8b0:4003:c01::22c]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 81CA321F2BF; Mon, 1 Sep 2014 11:32:19 -0700 (PDT) Received: by mail-ob0-f172.google.com with SMTP id wo20so4089640obc.31 for ; Mon, 01 Sep 2014 11:32:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=m8BizE3qD1TWw4fdinSbrQqEc8ydPEJ5eyfvhuuQDWY=; b=RNkZHt4/aQL9wdX1DtpxQqm6kHO862KZtYmu/DgCbm2zGQkq6pr2v7S/EfX1ofArmG d5eIyc/tuY0XrAbmr9kbaA1XUVJxUd2KWd/dNXEZ68uWFSnrIn9NyAb70hawwcEFwhqG 7FQwTIvC6NwB0WR3aw0mhtt8SoPo+JKGAR+mBDT79VnkOK9Vl+8QRkGwQExyvbnHAR14 mq+pMB4mRytzHFWuOTZ9ITwKJymNbU8F/2C67uKVKMIaT4HTDkgsxhT0shHpQREQU/Kc zSBEd2jYNoRFWUO2NmlRWRC++FvZMdkgtODc6VZ3DKBGuqixOvN5hvXkrWGJo9uQppAo LYhg== MIME-Version: 1.0 X-Received: by 10.182.249.52 with SMTP id yr20mr27584860obc.10.1409596338776; Mon, 01 Sep 2014 11:32:18 -0700 (PDT) Received: by 10.202.227.76 with HTTP; Mon, 1 Sep 2014 11:32:18 -0700 (PDT) In-Reply-To: References: <87ppfijfjc.fsf@toke.dk> <4FF4917C-1B6D-4D5F-81B6-5FC177F12BFC@gmail.com> <4DA71387-6720-4A2F-B462-2E1295604C21@gmail.com> <0DB9E121-7073-4DE9-B7E2-73A41BCBA1D1@gmail.com> Date: Mon, 1 Sep 2014 11:32:18 -0700 Message-ID: From: Dave Taht To: Jonathan Morton Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "cerowrt-devel@lists.bufferbloat.net" , bloat Subject: Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope... X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 18:32:19 -0000 On Mon, Sep 1, 2014 at 11:06 AM, Jonathan Morton wr= ote: > > On 1 Sep, 2014, at 8:01 pm, Dave Taht wrote: > >> On Sun, Aug 31, 2014 at 3:18 AM, Jonathan Morton = wrote: >>> >>> On 31 Aug, 2014, at 1:30 am, Dave Taht wrote: >>> >>>> Could I get you to also try HFSC? >>> >>> Once I got a kernel running that included it, and figured out how to ma= ke it do what I wanted... >>> >>> ...it seems to be indistinguishable from HTB and FQ in terms of CPU loa= d. >> >> If you are feeling really inspired, try cbq. :) One thing I sort of like= about cbq is that it (I think) >> (unlike htb presently) operates off an estimated size for the next packe= t (which isn't dynamic, sadly), >> where the others buffer up an extra packet until they can be delivered. > > It's also hilariously opaque to configure, which is probably why nobody u= ses it - the RED problem again - and the top link when I Googled for best p= ractice on it gushes enthusiastically about Linux 2.2! The idea of manuall= y specifying an "average packet size" in particular feels intuitively wrong= to me. Still, I might be able to try it later on. I felt a ewma of egress packet sizes would be a better estimator, yes. > Most class-based shapers are probably more complex to set up for simple n= eeds than they need to be. I have to issue three separate 'tc' invocations= for a minimal configuration of each of them, repeating several items of da= ta between them. > They scale up reasonably well to complex situations, but such uses are re= latively rare. >> In my quest for absolutely minimal latency I'd love to be rid of that >> last extra non-in-the-fq_codel-qdisc packet... either with a "peek" >> operation or with a running estimate. > > I suspect that something like fq_codel which included its own shaper (wit= h the knobs set sensibly by default) would gain more traction via ease of u= se - and might even answer your wish. I agree that a simpler to use qdisc would be good. I'd like something that preserves multiple (3-4) service classes (as pfifo_fast and sch_fq do) using drr, deals with diffserv, and could be invoked with a command line like: tc qdisc add dev eth0 cake bandwidth 50mbit diffservmap std I had started at that (basically pouring cerowrt's "simple.qos" code into C with a simple lookup table for diffserv) many moons ago, but the contents of the yurtlab and that code was stolen - and I was (and remain) completely stuck on how to do soft rate limiting saner, particularly in asymmetric scenarios. ("cake" stood for "Common Applications Kept Enhanced". fq_codel is not a drop in replacement for pfifo_fast due to the classless nature of it. sch_fq comes closer, but it's more server oriented. QFQ with 4 weighted bands + fq_codel can be made to do the levels of service stuff fairly straight forwardly at line rate, but the tc "filter" code tends to get rather long to handle all the diffserv classes... So... we keep polishing the sqm system, and I keep tracking progress in how diffserv classification will be done in the future (in ietf groups like rmcat and dart), and figuring out how to deal better with aggregating macs in general is what keeps me awake nights, more than finishing cake... We'll get there, eventually. >> It would be cool to be able to program the ethernet hardware itself to >> return completion interrupts at a given transmit rate (so you could >> program the hardware to be any bandwidth not just 10/100/1000). Some >> hardware so far as I know supports this with a "pacing" feature. > > Is there a summary of hardware features like this anywhere? It'd be nice= to see what us GEM and RTL proles are missing out on. :-) I'd like one. There are certain 3rd party firmwares like octeon's where it seems possible to add more features to the firmware co-processor, in particular. > >>> Actually, I think most of the CPU load is due to overheads in the users= pace-kernel interface and the device driver, rather than the qdiscs themsel= ves. >> >> You will see it bound by the softirq thread, but, what, exactly, >> inside that, is kind of unknown. (I presently lack time to build up >> profilable kernels on these low end arches. ) > > When I eventually got RRUL running (on one of the AMD boxes, so the Power= Book only has to run the server end of netperf), the bandwidth maxed out at= about 300Mbps each way, and the softirq was bouncing around 60% CPU. I'm = pretty sure most of that is shoving stuff across the PCI bus (even though i= t's internal to the northbridge), or at least waiting for it to go there. = I'm happy to assume that the rest was mostly kernel-userspace interface ove= rhead to the netserver instances. perf and the older oprofile are our friends here. > But this doesn't really answer the question of why the WNDR has so much l= ower a ceiling with shaping than without. The G4 is powerful enough that t= he overhead of shaping simply disappears next to the overhead of shoving da= ta around. Even when I turn up the shaping knob to a value quite close to = the hardware's unshaped capabilities (eg. 400Mbps one-way), most of the sha= pers stick to the requested limit like glue, and even the worst offender is= within 10%. I estimate that it's using only about 500 clocks per packet *= unless* it saturates the PCI bus. > > It's possible, however, that we're not really looking at a CPU limitation= , but a timer problem. The PowerBook is a "proper" desktop computer with h= ardware to match (modulo its age). If all the shapers now depend on the hi= gh-resolution timer, how high-resolution is the WNDR's timer? Both good questions worth further exploration. > - Jonathan Morton > --=20 Dave T=C3=A4ht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_= indecent.article