[Bloat] beating the drum for BQL

Dave Taht dave.taht at gmail.com
Fri Aug 24 13:30:22 EDT 2018


On Fri, Aug 24, 2018 at 10:13 AM Pete Heist <pete at heistp.net> wrote:
>
>
> On Aug 23, 2018, at 10:26 AM, Pete Heist <pete at heistp.net> wrote:
>
> On Aug 23, 2018, at 2:49 AM, Dave Taht <dave.taht at gmail.com> wrote:
>
> I had a chance to give a talk at broadcom recently, slides here:
>
> http://flent-fremont.bufferbloat.net/~d/broadcom_aug9.pdf
>
>
> Thanks for sharing, this is really useful, raising awareness where it matters. Quite a bit of content... :)
>
> Ubiquiti needs some work getting this into more of their products (EdgeMAX in particular). A good time to lobby for this might be, well a couple months ago, as they’re producing alpha builds for their upcoming 2.0 release with kernel 4.9 and new Cavium/Mediatek/Octeon SDKs. I just asked about the status in the EdgeRouter Beta forum, in case it finds the right eyes before the release:
>
> https://community.ubnt.com/t5/EdgeRouter-Beta/BQL-support/m-p/2466657
>
>
> This started a discussion, and no, so far it looks like there’s no BQL support in the upcoming 2.0 release.
>
> For my own benefit, re-reading the original patch series comment (https://lwn.net/Articles/469652/) makes it sound like BQL is useful even without AQM (original benchmarks were done with straight pfifo_fast). I didn’t realize this, actually. If anything incorrect about BQL was said in this discussion, correct us, please… :)

yes, bql is very useful even with pfifo fast. without BQL I doubt the
internet would be scaling as it is today in the dc, or on the smaller
hosts and devices that support it. It's in the mvneta, it's in the
ar71xx, with documented results there that I could dig up.  (tho:
things like tsq are helping and mask the problem on simple tests) The
experiment I documented on the slides that kicked off this thread and
the other experiment on the systemd bug, easily show the benefit on
hosts forwarding packets (be they from local applications, coming from
various sources like docker containers, etc), and anyone can show what
goes wrong if you disable BQL nowadays, basically restoring linux-3.3
behavior, with a very simple test:

For I in /sys/class/net/your_device/queues/tx*/byte_queue_limits/limit_min
do
     echo 10000000 > $I
done

so long as you run enough kinds of flows that don't engage TSQ.

However, in the edgerouter w/offloads case all that part of the stack
has been short circuited into the offload engine. I don't know how
much buffering is in there on the new firmware, I'd done a few tests
on it in the old days, showing it to be around 10ms at gigE but even
that memory is kind of vague (the easy test here is slam two ports
into one), and for all I know the new firmware is worse, without going
back to track this new release. (I do have a few edgerouters but they
are all in production)

There was also a paper on BQL a few years back that I can dig up....



> Pete
>


-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619


More information about the Bloat mailing list