[Bloat] [Cerowrt-devel] BQL, txqueue lengths and the internet of things

Dave Taht dave.taht at gmail.com
Thu Jun 12 14:46:18 PDT 2014


On Wed, Jun 11, 2014 at 6:05 PM, David P. Reed <dpreed at reed.com> wrote:
> Maybe you can do a quick blog howto?

I am thinking of a writeup, yes. But I am low on time this month. I'd
like to target a publication outside the normal bufferbloat community.
The folk working on beaglebone stuff DO tend to care about latency and
realtime behavior a lot, and are willing to resort to programming the
PRU to do robotic pwm projects and projects like
http://www.nycresistor.com/2013/09/12/octoscroller/ ...

so it would be my hope that community would "get bufferbloat" at a
whole different level than we do.

In my case I have a longstanding interest in reliably transporting
real time audio which usually has latency constraints below 2ms, no
matter what. The BQL patch with sch_fq makes the difference between
success of failure for that. Hmm. Maybe AES.

> I'd bet the same could be done for
> raspberry pi and perhaps my other toy the wandboard which has a gigE adapter

Well, measure first? Many of the low end devices can't achieve gigE
in the first place, rendering the BQL method somewhat moot.

If you can provide a pointer to the right driver I can take a look.
(dmesg | grep eth)

I've long planned on doing a BQL'd driver for the zedboard/zynq and
parallella, I've looked over that code thoroughly, all I need is a
board to test and spare time.

(in fact for the latter, I'd like to take a stab at writing better
ethernet hardware)

> and Scsi making it a nice iscsi target or nfs server.

We took a initial stab at BQL'ing the usbnet driver about a year back.

There were too many possible error out conditions to get a proper
accounting (at the time, anyway). Fixing this would not only make the
Pi better; hundreds of usb ethernet devices exist, and the vast
majority of lte devices are hooked up via this driver.

The numbers I just got for usb latency on that were for 100+mbit
operation, and the amount of buffering is fixed...

https://plus.google.com/u/0/107942175615993706558/posts/Cpd76KHUbpp

> De bloating the world... One step at a time.

Adding BQL is easy, and most of the work can be done with code
inspection. The huge problem is you absolutely have to be able to test
the device and driver under a variety of circumstances after you patch
it in. (What took me the longest was finding a correct toolchain for
kernel builds, actually, and then finding a bug in dma padding that
I'd missed until a whole bunch of printks and enlightenment dawned). I
estimate that doing the beaglebone BQL patch cost me 30 hrs of time,
or about 6k at my current billing rate. (while the 160,000+ present
users of the beaglebone might get back 20ms under load on a regular
basis, that's 30 hours of my life I'll never have back)

I figure to get the patch accepted in mainline will take another 10,
and for it to flow out to the beaglebone userbase in less than a year
I'd have to convince the maintainers to incorporate it out of tree.

The new rev C beaglebone only ships with 3.8.x as it's default kernel,
although modern kernels are readily available via Robert Nelson's work (
https://rcn-ee.net/deb/wheezy-armhf/ ), and building your own is more
or less correctly documented here:
http://eewiki.net/display/linuxonarm/BeagleBone+Black

So...

Were it a new device, and a driver in development, and I'd had data
sheets, and a working compiler, it probably would have been much less.
The right people to do this work are the  chipmakers writing drivers
before they ever hit the mainline, or places like linaro.org that are
doing tons of arm work.

It's certainly my hope now that now that there's a demonstrable proof
of concept and the benefit, that TI will retrofit similar code to the
other devices that it makes and do the needed testing. (But I'm not
holding my breath).

So BQL-on-everything is something of a project that needs to get
driven somehow... and whether to get eyeballs and resources on it is
better done via just doing it, or creating publicity around the need
for it so others do it, has a cost.

... and

fixing wifi is going to be a lot harder than fixing usb.

>
> On Jun 11, 2014, Dave Taht <dave.taht at gmail.com> wrote:
>>
>> The bloat problem and solutions are not just limited to fixing
>> routers, but hosts.
>>
>> Nearly every low end board I've seen out there forgos a gigE ethernet
>> interface in favor of a lower power and cost 100mbit interface.
>>
>> No distro I've seen modifies the default pfifo txqueuelen from the
>> current 1000 packet default down to a more reasonable 100 packet
>> default in that case. And, while many ethernet devices in this
>> category are hooked up via usb (and currently hard to add BQL support
>> to), some are not, and byte queue limit support can be easily added to
>> those.
>>
>> Sadly byte queue limits (BQL) is only implemented on a bunch of top
>> end ethernet drivers. (about 10, last I looked)
>>
>> I needed a break from big problems, so a couple late nights later, I
>> have a very small patch adding support for BQL to the beaglebone
>> black:
>>
>>
>> http://snapon.lab.bufferbloat.net/~d/0001-Add-BQL-support-to-cpsw-beaglebone-driver.patch
>>
>> And the results were quite pleasing at 100mbit. BQL holds things down
>> to two full size packets in the tx ring and we see an enormous
>> improvement in bidirectional throughput, jitter, and latency.
>>
>> http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png
>> http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewins.png
>>
>> The default linux behavior ( pfifo fast, txqueue 1000 ) prior to this
>> patch looked pretty awful:
>>
>>
>> http://snapon.lab.bufferbloat.net/~d/beagle_nobql/pfifo_nobql_tsq3028txqueue1000.svg
>>
>> and went to looking like this:
>>
>>
>> http://snapon.lab.bufferbloat.net/~d/beagle_bql/pfifo_bql_tsq3028txqueue1000.svg
>>
>> And adding the new fq scheduler looked like this:
>>
>> http://snapon.lab.bufferbloat.net/~d/beagle_bql/fq_bql_tsq3028.svg
>>
>> (fq_codel was similar)
>>
>> The fact that we don't achieve full upload throughput on this last
>> test is probably
>> due to having a tail dropping switch in the way, and/or some dma dequeuing
>> cleanup conflicts between the low level transmit and receive queues on
>> this device (they share an interrupt AND use napi which seems
>> puzzling).
>>
>> But any day I can get a 4-10x improvement in latency and throughput is
>> a good day. One IoT device down, thousands to go. It would be nice if
>> the chipmakers were incorporating bql into boxes destined for the
>> internet of things.
>
>
> -- Sent from my Android device with K-@ Mail. Please excuse my brevity.



-- 
Dave Täht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article


More information about the Bloat mailing list