Development issues regarding the cerowrt test router project
 help / color / mirror / Atom feed
* [Cerowrt-devel] BQL, txqueue lengths and the internet of things
@ 2014-06-11 22:49 Dave Taht
  2014-06-12  1:05 ` David P. Reed
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Taht @ 2014-06-11 22:49 UTC (permalink / raw)
  To: bloat, cerowrt-devel

The bloat problem and solutions are not just limited to fixing
routers, but hosts.

Nearly every low end board I've seen out there forgos a gigE ethernet
interface in favor of a lower power and cost 100mbit interface.

No distro I've seen modifies the default pfifo txqueuelen from the
current 1000 packet default down to a more reasonable 100 packet
default in that case. And, while many ethernet devices in this
category are hooked up via usb (and currently hard to add BQL support
to), some are not, and byte queue limit support can be easily added to
those.

Sadly byte queue limits (BQL) is only implemented on a bunch of top
end ethernet drivers. (about 10, last I looked)

I needed a break from big problems, so a couple late nights later, I
have a very small patch adding support for BQL to the beaglebone
black:

http://snapon.lab.bufferbloat.net/~d/0001-Add-BQL-support-to-cpsw-beaglebone-driver.patch

And the results were quite pleasing at 100mbit. BQL holds things down
to two full size packets in the tx ring and we see an enormous
improvement in bidirectional throughput, jitter, and latency.

http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png
http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewins.png

The default linux behavior ( pfifo fast, txqueue 1000 ) prior to this
patch looked pretty awful:

http://snapon.lab.bufferbloat.net/~d/beagle_nobql/pfifo_nobql_tsq3028txqueue1000.svg

and went to looking like this:

http://snapon.lab.bufferbloat.net/~d/beagle_bql/pfifo_bql_tsq3028txqueue1000.svg

And adding the new fq scheduler looked like this:

http://snapon.lab.bufferbloat.net/~d/beagle_bql/fq_bql_tsq3028.svg

(fq_codel was similar)

The fact that we don't achieve full upload throughput on this last
test is probably
due to having a tail dropping switch in the way, and/or some dma dequeuing
cleanup conflicts between the low level transmit and receive queues on
this device (they share an interrupt AND use napi which seems
puzzling).

But any day I can get a 4-10x improvement in latency and throughput is
a good day. One IoT device down, thousands to go. It would be nice if
the chipmakers were incorporating bql into boxes destined for the
internet of things.

-- 
Dave Täht

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Cerowrt-devel] BQL, txqueue lengths and the internet of things
  2014-06-11 22:49 [Cerowrt-devel] BQL, txqueue lengths and the internet of things Dave Taht
@ 2014-06-12  1:05 ` David P. Reed
  2014-06-12  1:57   ` [Cerowrt-devel] [Bloat] " Jonathan Morton
  2014-06-12 21:46   ` [Cerowrt-devel] " Dave Taht
  0 siblings, 2 replies; 5+ messages in thread
From: David P. Reed @ 2014-06-12  1:05 UTC (permalink / raw)
  To: Dave Taht, bloat, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 2847 bytes --]

Maybe you can do a quick blog howto?  I'd bet the same could be done for raspberry pi and perhaps my other toy the wandboard which has a gigE adapter and Scsi making it a nice iscsi target or nfs server. 

De bloating the world... One step at a time.

On Jun 11, 2014, Dave Taht <dave.taht@gmail.com> wrote:
>The bloat problem and solutions are not just limited to fixing
>routers, but hosts.
>
>Nearly every low end board I've seen out there forgos a gigE ethernet
>interface in favor of a lower power and cost 100mbit interface.
>
>No distro I've seen modifies the default pfifo txqueuelen from the
>current 1000 packet default down to a more reasonable 100 packet
>default in that case. And, while many ethernet devices in this
>category are hooked up via usb (and currently hard to add BQL support
>to), some are not, and byte queue limit support can be easily added to
>those.
>
>Sadly byte queue limits (BQL) is only implemented on a bunch of top
>end ethernet drivers. (about 10, last I looked)
>
>I needed a break from big problems, so a couple late nights later, I
>have a very small patch adding support for BQL to the beaglebone
>black:
>
>http://snapon.lab.bufferbloat.net/~d/0001-Add-BQL-support-to-cpsw-beaglebone-driver.patch
>
>And the results were quite pleasing at 100mbit. BQL holds things down
>to two full size packets in the tx ring and we see an enormous
>improvement in bidirectional throughput, jitter, and latency.
>
>http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png
>http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewins.png
>
>The default linux behavior ( pfifo fast, txqueue 1000 ) prior to this
>patch looked pretty awful:
>
>http://snapon.lab.bufferbloat.net/~d/beagle_nobql/pfifo_nobql_tsq3028txqueue1000.svg
>
>and went to looking like this:
>
>http://snapon.lab.bufferbloat.net/~d/beagle_bql/pfifo_bql_tsq3028txqueue1000.svg
>
>And adding the new fq scheduler looked like this:
>
>http://snapon.lab.bufferbloat.net/~d/beagle_bql/fq_bql_tsq3028.svg
>
>(fq_codel was similar)
>
>The fact that we don't achieve full upload throughput on this last
>test is probably
>due to having a tail dropping switch in the way, and/or some dma
>dequeuing
>cleanup conflicts between the low level transmit and receive queues on
>this device (they share an interrupt AND use napi which seems
>puzzling).
>
>But any day I can get a 4-10x improvement in latency and throughput is
>a good day. One IoT device down, thousands to go. It would be nice if
>the chipmakers were incorporating bql into boxes destined for the
>internet of things.
>
>-- 
>Dave Täht
>_______________________________________________
>Cerowrt-devel mailing list
>Cerowrt-devel@lists.bufferbloat.net
>https://lists.bufferbloat.net/listinfo/cerowrt-devel

-- Sent from my Android device with K-@ Mail. Please excuse my brevity.

[-- Attachment #2: Type: text/html, Size: 3787 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Cerowrt-devel] [Bloat]  BQL, txqueue lengths and the internet of things
  2014-06-12  1:05 ` David P. Reed
@ 2014-06-12  1:57   ` Jonathan Morton
  2014-06-12 21:46   ` [Cerowrt-devel] " Dave Taht
  1 sibling, 0 replies; 5+ messages in thread
From: Jonathan Morton @ 2014-06-12  1:57 UTC (permalink / raw)
  To: David P. Reed; +Cc: cerowrt-devel, bloat


On 12 Jun, 2014, at 4:05 am, David P. Reed wrote:

> Maybe you can do a quick blog howto?  I'd bet the same could be done for raspberry pi and perhaps my other toy the wandboard which has a gigE adapter and Scsi making it a nice iscsi target or nfs server. 

FYI, the Raspberry Pi's built-in Ethernet is attached via USB.  It's a chip that also includes a USB hub, which is why the cheaper model which drops Ethernet also loses a USB port.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Cerowrt-devel] BQL, txqueue lengths and the internet of things
  2014-06-12  1:05 ` David P. Reed
  2014-06-12  1:57   ` [Cerowrt-devel] [Bloat] " Jonathan Morton
@ 2014-06-12 21:46   ` Dave Taht
  2014-06-13  3:49     ` Chuck Anderson
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Taht @ 2014-06-12 21:46 UTC (permalink / raw)
  To: David P. Reed; +Cc: cerowrt-devel, bloat

On Wed, Jun 11, 2014 at 6:05 PM, David P. Reed <dpreed@reed.com> wrote:
> Maybe you can do a quick blog howto?

I am thinking of a writeup, yes. But I am low on time this month. I'd
like to target a publication outside the normal bufferbloat community.
The folk working on beaglebone stuff DO tend to care about latency and
realtime behavior a lot, and are willing to resort to programming the
PRU to do robotic pwm projects and projects like
http://www.nycresistor.com/2013/09/12/octoscroller/ ...

so it would be my hope that community would "get bufferbloat" at a
whole different level than we do.

In my case I have a longstanding interest in reliably transporting
real time audio which usually has latency constraints below 2ms, no
matter what. The BQL patch with sch_fq makes the difference between
success of failure for that. Hmm. Maybe AES.

> I'd bet the same could be done for
> raspberry pi and perhaps my other toy the wandboard which has a gigE adapter

Well, measure first? Many of the low end devices can't achieve gigE
in the first place, rendering the BQL method somewhat moot.

If you can provide a pointer to the right driver I can take a look.
(dmesg | grep eth)

I've long planned on doing a BQL'd driver for the zedboard/zynq and
parallella, I've looked over that code thoroughly, all I need is a
board to test and spare time.

(in fact for the latter, I'd like to take a stab at writing better
ethernet hardware)

> and Scsi making it a nice iscsi target or nfs server.

We took a initial stab at BQL'ing the usbnet driver about a year back.

There were too many possible error out conditions to get a proper
accounting (at the time, anyway). Fixing this would not only make the
Pi better; hundreds of usb ethernet devices exist, and the vast
majority of lte devices are hooked up via this driver.

The numbers I just got for usb latency on that were for 100+mbit
operation, and the amount of buffering is fixed...

https://plus.google.com/u/0/107942175615993706558/posts/Cpd76KHUbpp

> De bloating the world... One step at a time.

Adding BQL is easy, and most of the work can be done with code
inspection. The huge problem is you absolutely have to be able to test
the device and driver under a variety of circumstances after you patch
it in. (What took me the longest was finding a correct toolchain for
kernel builds, actually, and then finding a bug in dma padding that
I'd missed until a whole bunch of printks and enlightenment dawned). I
estimate that doing the beaglebone BQL patch cost me 30 hrs of time,
or about 6k at my current billing rate. (while the 160,000+ present
users of the beaglebone might get back 20ms under load on a regular
basis, that's 30 hours of my life I'll never have back)

I figure to get the patch accepted in mainline will take another 10,
and for it to flow out to the beaglebone userbase in less than a year
I'd have to convince the maintainers to incorporate it out of tree.

The new rev C beaglebone only ships with 3.8.x as it's default kernel,
although modern kernels are readily available via Robert Nelson's work (
https://rcn-ee.net/deb/wheezy-armhf/ ), and building your own is more
or less correctly documented here:
http://eewiki.net/display/linuxonarm/BeagleBone+Black

So...

Were it a new device, and a driver in development, and I'd had data
sheets, and a working compiler, it probably would have been much less.
The right people to do this work are the  chipmakers writing drivers
before they ever hit the mainline, or places like linaro.org that are
doing tons of arm work.

It's certainly my hope now that now that there's a demonstrable proof
of concept and the benefit, that TI will retrofit similar code to the
other devices that it makes and do the needed testing. (But I'm not
holding my breath).

So BQL-on-everything is something of a project that needs to get
driven somehow... and whether to get eyeballs and resources on it is
better done via just doing it, or creating publicity around the need
for it so others do it, has a cost.

... and

fixing wifi is going to be a lot harder than fixing usb.

>
> On Jun 11, 2014, Dave Taht <dave.taht@gmail.com> wrote:
>>
>> The bloat problem and solutions are not just limited to fixing
>> routers, but hosts.
>>
>> Nearly every low end board I've seen out there forgos a gigE ethernet
>> interface in favor of a lower power and cost 100mbit interface.
>>
>> No distro I've seen modifies the default pfifo txqueuelen from the
>> current 1000 packet default down to a more reasonable 100 packet
>> default in that case. And, while many ethernet devices in this
>> category are hooked up via usb (and currently hard to add BQL support
>> to), some are not, and byte queue limit support can be easily added to
>> those.
>>
>> Sadly byte queue limits (BQL) is only implemented on a bunch of top
>> end ethernet drivers. (about 10, last I looked)
>>
>> I needed a break from big problems, so a couple late nights later, I
>> have a very small patch adding support for BQL to the beaglebone
>> black:
>>
>>
>> http://snapon.lab.bufferbloat.net/~d/0001-Add-BQL-support-to-cpsw-beaglebone-driver.patch
>>
>> And the results were quite pleasing at 100mbit. BQL holds things down
>> to two full size packets in the tx ring and we see an enormous
>> improvement in bidirectional throughput, jitter, and latency.
>>
>> http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png
>> http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewins.png
>>
>> The default linux behavior ( pfifo fast, txqueue 1000 ) prior to this
>> patch looked pretty awful:
>>
>>
>> http://snapon.lab.bufferbloat.net/~d/beagle_nobql/pfifo_nobql_tsq3028txqueue1000.svg
>>
>> and went to looking like this:
>>
>>
>> http://snapon.lab.bufferbloat.net/~d/beagle_bql/pfifo_bql_tsq3028txqueue1000.svg
>>
>> And adding the new fq scheduler looked like this:
>>
>> http://snapon.lab.bufferbloat.net/~d/beagle_bql/fq_bql_tsq3028.svg
>>
>> (fq_codel was similar)
>>
>> The fact that we don't achieve full upload throughput on this last
>> test is probably
>> due to having a tail dropping switch in the way, and/or some dma dequeuing
>> cleanup conflicts between the low level transmit and receive queues on
>> this device (they share an interrupt AND use napi which seems
>> puzzling).
>>
>> But any day I can get a 4-10x improvement in latency and throughput is
>> a good day. One IoT device down, thousands to go. It would be nice if
>> the chipmakers were incorporating bql into boxes destined for the
>> internet of things.
>
>
> -- Sent from my Android device with K-@ Mail. Please excuse my brevity.



-- 
Dave Täht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Cerowrt-devel] BQL, txqueue lengths and the internet of things
  2014-06-12 21:46   ` [Cerowrt-devel] " Dave Taht
@ 2014-06-13  3:49     ` Chuck Anderson
  0 siblings, 0 replies; 5+ messages in thread
From: Chuck Anderson @ 2014-06-13  3:49 UTC (permalink / raw)
  To: cerowrt-devel

On Thu, Jun 12, 2014 at 02:46:18PM -0700, Dave Taht wrote:
> > I'd bet the same could be done for
> > raspberry pi and perhaps my other toy the wandboard which has a gigE adapter
> 
> Well, measure first? Many of the low end devices can't achieve gigE
> in the first place, rendering the BQL method somewhat moot.
> 
> If you can provide a pointer to the right driver I can take a look.
> (dmesg | grep eth)

Mine's a Rasberry Pi model B running OpenELEC 4.0.4, an XBMC distro:

# uname -a
Linux OpenELEC 3.14.5 #1 PREEMPT Wed Jun 4 14:03:32 CEST 2014 armv6l GNU/Linux
# dmesg|grep eth
[    2.652186] smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-bcm2708_usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:70:1c:6e
[   18.188931] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
[   18.190252] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[   19.720667] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
[   19.756150] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready

So it looks to be 100meg, because it is plugged into a gig capable
switch but it comes up at 100.  Confirmed with the hardware specs
here:

http://elinux.org/RPi_Hardware


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-06-13  3:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-11 22:49 [Cerowrt-devel] BQL, txqueue lengths and the internet of things Dave Taht
2014-06-12  1:05 ` David P. Reed
2014-06-12  1:57   ` [Cerowrt-devel] [Bloat] " Jonathan Morton
2014-06-12 21:46   ` [Cerowrt-devel] " Dave Taht
2014-06-13  3:49     ` Chuck Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox