* [Cerowrt-devel] BQL, txqueue lengths and the internet of things @ 2014-06-11 22:49 Dave Taht 2014-06-12 1:05 ` David P. Reed 0 siblings, 1 reply; 5+ messages in thread From: Dave Taht @ 2014-06-11 22:49 UTC (permalink / raw) To: bloat, cerowrt-devel The bloat problem and solutions are not just limited to fixing routers, but hosts. Nearly every low end board I've seen out there forgos a gigE ethernet interface in favor of a lower power and cost 100mbit interface. No distro I've seen modifies the default pfifo txqueuelen from the current 1000 packet default down to a more reasonable 100 packet default in that case. And, while many ethernet devices in this category are hooked up via usb (and currently hard to add BQL support to), some are not, and byte queue limit support can be easily added to those. Sadly byte queue limits (BQL) is only implemented on a bunch of top end ethernet drivers. (about 10, last I looked) I needed a break from big problems, so a couple late nights later, I have a very small patch adding support for BQL to the beaglebone black: http://snapon.lab.bufferbloat.net/~d/0001-Add-BQL-support-to-cpsw-beaglebone-driver.patch And the results were quite pleasing at 100mbit. BQL holds things down to two full size packets in the tx ring and we see an enormous improvement in bidirectional throughput, jitter, and latency. http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewins.png The default linux behavior ( pfifo fast, txqueue 1000 ) prior to this patch looked pretty awful: http://snapon.lab.bufferbloat.net/~d/beagle_nobql/pfifo_nobql_tsq3028txqueue1000.svg and went to looking like this: http://snapon.lab.bufferbloat.net/~d/beagle_bql/pfifo_bql_tsq3028txqueue1000.svg And adding the new fq scheduler looked like this: http://snapon.lab.bufferbloat.net/~d/beagle_bql/fq_bql_tsq3028.svg (fq_codel was similar) The fact that we don't achieve full upload throughput on this last test is probably due to having a tail dropping switch in the way, and/or some dma dequeuing cleanup conflicts between the low level transmit and receive queues on this device (they share an interrupt AND use napi which seems puzzling). But any day I can get a 4-10x improvement in latency and throughput is a good day. One IoT device down, thousands to go. It would be nice if the chipmakers were incorporating bql into boxes destined for the internet of things. -- Dave Täht ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Cerowrt-devel] BQL, txqueue lengths and the internet of things 2014-06-11 22:49 [Cerowrt-devel] BQL, txqueue lengths and the internet of things Dave Taht @ 2014-06-12 1:05 ` David P. Reed 2014-06-12 1:57 ` [Cerowrt-devel] [Bloat] " Jonathan Morton 2014-06-12 21:46 ` [Cerowrt-devel] " Dave Taht 0 siblings, 2 replies; 5+ messages in thread From: David P. Reed @ 2014-06-12 1:05 UTC (permalink / raw) To: Dave Taht, bloat, cerowrt-devel [-- Attachment #1: Type: text/plain, Size: 2847 bytes --] Maybe you can do a quick blog howto? I'd bet the same could be done for raspberry pi and perhaps my other toy the wandboard which has a gigE adapter and Scsi making it a nice iscsi target or nfs server. De bloating the world... One step at a time. On Jun 11, 2014, Dave Taht <dave.taht@gmail.com> wrote: >The bloat problem and solutions are not just limited to fixing >routers, but hosts. > >Nearly every low end board I've seen out there forgos a gigE ethernet >interface in favor of a lower power and cost 100mbit interface. > >No distro I've seen modifies the default pfifo txqueuelen from the >current 1000 packet default down to a more reasonable 100 packet >default in that case. And, while many ethernet devices in this >category are hooked up via usb (and currently hard to add BQL support >to), some are not, and byte queue limit support can be easily added to >those. > >Sadly byte queue limits (BQL) is only implemented on a bunch of top >end ethernet drivers. (about 10, last I looked) > >I needed a break from big problems, so a couple late nights later, I >have a very small patch adding support for BQL to the beaglebone >black: > >http://snapon.lab.bufferbloat.net/~d/0001-Add-BQL-support-to-cpsw-beaglebone-driver.patch > >And the results were quite pleasing at 100mbit. BQL holds things down >to two full size packets in the tx ring and we see an enormous >improvement in bidirectional throughput, jitter, and latency. > >http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png >http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewins.png > >The default linux behavior ( pfifo fast, txqueue 1000 ) prior to this >patch looked pretty awful: > >http://snapon.lab.bufferbloat.net/~d/beagle_nobql/pfifo_nobql_tsq3028txqueue1000.svg > >and went to looking like this: > >http://snapon.lab.bufferbloat.net/~d/beagle_bql/pfifo_bql_tsq3028txqueue1000.svg > >And adding the new fq scheduler looked like this: > >http://snapon.lab.bufferbloat.net/~d/beagle_bql/fq_bql_tsq3028.svg > >(fq_codel was similar) > >The fact that we don't achieve full upload throughput on this last >test is probably >due to having a tail dropping switch in the way, and/or some dma >dequeuing >cleanup conflicts between the low level transmit and receive queues on >this device (they share an interrupt AND use napi which seems >puzzling). > >But any day I can get a 4-10x improvement in latency and throughput is >a good day. One IoT device down, thousands to go. It would be nice if >the chipmakers were incorporating bql into boxes destined for the >internet of things. > >-- >Dave Täht >_______________________________________________ >Cerowrt-devel mailing list >Cerowrt-devel@lists.bufferbloat.net >https://lists.bufferbloat.net/listinfo/cerowrt-devel -- Sent from my Android device with K-@ Mail. Please excuse my brevity. [-- Attachment #2: Type: text/html, Size: 3787 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Cerowrt-devel] [Bloat] BQL, txqueue lengths and the internet of things 2014-06-12 1:05 ` David P. Reed @ 2014-06-12 1:57 ` Jonathan Morton 2014-06-12 21:46 ` [Cerowrt-devel] " Dave Taht 1 sibling, 0 replies; 5+ messages in thread From: Jonathan Morton @ 2014-06-12 1:57 UTC (permalink / raw) To: David P. Reed; +Cc: cerowrt-devel, bloat On 12 Jun, 2014, at 4:05 am, David P. Reed wrote: > Maybe you can do a quick blog howto? I'd bet the same could be done for raspberry pi and perhaps my other toy the wandboard which has a gigE adapter and Scsi making it a nice iscsi target or nfs server. FYI, the Raspberry Pi's built-in Ethernet is attached via USB. It's a chip that also includes a USB hub, which is why the cheaper model which drops Ethernet also loses a USB port. - Jonathan Morton ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Cerowrt-devel] BQL, txqueue lengths and the internet of things 2014-06-12 1:05 ` David P. Reed 2014-06-12 1:57 ` [Cerowrt-devel] [Bloat] " Jonathan Morton @ 2014-06-12 21:46 ` Dave Taht 2014-06-13 3:49 ` Chuck Anderson 1 sibling, 1 reply; 5+ messages in thread From: Dave Taht @ 2014-06-12 21:46 UTC (permalink / raw) To: David P. Reed; +Cc: cerowrt-devel, bloat On Wed, Jun 11, 2014 at 6:05 PM, David P. Reed <dpreed@reed.com> wrote: > Maybe you can do a quick blog howto? I am thinking of a writeup, yes. But I am low on time this month. I'd like to target a publication outside the normal bufferbloat community. The folk working on beaglebone stuff DO tend to care about latency and realtime behavior a lot, and are willing to resort to programming the PRU to do robotic pwm projects and projects like http://www.nycresistor.com/2013/09/12/octoscroller/ ... so it would be my hope that community would "get bufferbloat" at a whole different level than we do. In my case I have a longstanding interest in reliably transporting real time audio which usually has latency constraints below 2ms, no matter what. The BQL patch with sch_fq makes the difference between success of failure for that. Hmm. Maybe AES. > I'd bet the same could be done for > raspberry pi and perhaps my other toy the wandboard which has a gigE adapter Well, measure first? Many of the low end devices can't achieve gigE in the first place, rendering the BQL method somewhat moot. If you can provide a pointer to the right driver I can take a look. (dmesg | grep eth) I've long planned on doing a BQL'd driver for the zedboard/zynq and parallella, I've looked over that code thoroughly, all I need is a board to test and spare time. (in fact for the latter, I'd like to take a stab at writing better ethernet hardware) > and Scsi making it a nice iscsi target or nfs server. We took a initial stab at BQL'ing the usbnet driver about a year back. There were too many possible error out conditions to get a proper accounting (at the time, anyway). Fixing this would not only make the Pi better; hundreds of usb ethernet devices exist, and the vast majority of lte devices are hooked up via this driver. The numbers I just got for usb latency on that were for 100+mbit operation, and the amount of buffering is fixed... https://plus.google.com/u/0/107942175615993706558/posts/Cpd76KHUbpp > De bloating the world... One step at a time. Adding BQL is easy, and most of the work can be done with code inspection. The huge problem is you absolutely have to be able to test the device and driver under a variety of circumstances after you patch it in. (What took me the longest was finding a correct toolchain for kernel builds, actually, and then finding a bug in dma padding that I'd missed until a whole bunch of printks and enlightenment dawned). I estimate that doing the beaglebone BQL patch cost me 30 hrs of time, or about 6k at my current billing rate. (while the 160,000+ present users of the beaglebone might get back 20ms under load on a regular basis, that's 30 hours of my life I'll never have back) I figure to get the patch accepted in mainline will take another 10, and for it to flow out to the beaglebone userbase in less than a year I'd have to convince the maintainers to incorporate it out of tree. The new rev C beaglebone only ships with 3.8.x as it's default kernel, although modern kernels are readily available via Robert Nelson's work ( https://rcn-ee.net/deb/wheezy-armhf/ ), and building your own is more or less correctly documented here: http://eewiki.net/display/linuxonarm/BeagleBone+Black So... Were it a new device, and a driver in development, and I'd had data sheets, and a working compiler, it probably would have been much less. The right people to do this work are the chipmakers writing drivers before they ever hit the mainline, or places like linaro.org that are doing tons of arm work. It's certainly my hope now that now that there's a demonstrable proof of concept and the benefit, that TI will retrofit similar code to the other devices that it makes and do the needed testing. (But I'm not holding my breath). So BQL-on-everything is something of a project that needs to get driven somehow... and whether to get eyeballs and resources on it is better done via just doing it, or creating publicity around the need for it so others do it, has a cost. ... and fixing wifi is going to be a lot harder than fixing usb. > > On Jun 11, 2014, Dave Taht <dave.taht@gmail.com> wrote: >> >> The bloat problem and solutions are not just limited to fixing >> routers, but hosts. >> >> Nearly every low end board I've seen out there forgos a gigE ethernet >> interface in favor of a lower power and cost 100mbit interface. >> >> No distro I've seen modifies the default pfifo txqueuelen from the >> current 1000 packet default down to a more reasonable 100 packet >> default in that case. And, while many ethernet devices in this >> category are hooked up via usb (and currently hard to add BQL support >> to), some are not, and byte queue limit support can be easily added to >> those. >> >> Sadly byte queue limits (BQL) is only implemented on a bunch of top >> end ethernet drivers. (about 10, last I looked) >> >> I needed a break from big problems, so a couple late nights later, I >> have a very small patch adding support for BQL to the beaglebone >> black: >> >> >> http://snapon.lab.bufferbloat.net/~d/0001-Add-BQL-support-to-cpsw-beaglebone-driver.patch >> >> And the results were quite pleasing at 100mbit. BQL holds things down >> to two full size packets in the tx ring and we see an enormous >> improvement in bidirectional throughput, jitter, and latency. >> >> http://snapon.lab.bufferbloat.net/~d/beagle_bql/bql_makes_a_difference.png >> http://snapon.lab.bufferbloat.net/~d/beagle_bql/beaglebonewins.png >> >> The default linux behavior ( pfifo fast, txqueue 1000 ) prior to this >> patch looked pretty awful: >> >> >> http://snapon.lab.bufferbloat.net/~d/beagle_nobql/pfifo_nobql_tsq3028txqueue1000.svg >> >> and went to looking like this: >> >> >> http://snapon.lab.bufferbloat.net/~d/beagle_bql/pfifo_bql_tsq3028txqueue1000.svg >> >> And adding the new fq scheduler looked like this: >> >> http://snapon.lab.bufferbloat.net/~d/beagle_bql/fq_bql_tsq3028.svg >> >> (fq_codel was similar) >> >> The fact that we don't achieve full upload throughput on this last >> test is probably >> due to having a tail dropping switch in the way, and/or some dma dequeuing >> cleanup conflicts between the low level transmit and receive queues on >> this device (they share an interrupt AND use napi which seems >> puzzling). >> >> But any day I can get a 4-10x improvement in latency and throughput is >> a good day. One IoT device down, thousands to go. It would be nice if >> the chipmakers were incorporating bql into boxes destined for the >> internet of things. > > > -- Sent from my Android device with K-@ Mail. Please excuse my brevity. -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Cerowrt-devel] BQL, txqueue lengths and the internet of things 2014-06-12 21:46 ` [Cerowrt-devel] " Dave Taht @ 2014-06-13 3:49 ` Chuck Anderson 0 siblings, 0 replies; 5+ messages in thread From: Chuck Anderson @ 2014-06-13 3:49 UTC (permalink / raw) To: cerowrt-devel On Thu, Jun 12, 2014 at 02:46:18PM -0700, Dave Taht wrote: > > I'd bet the same could be done for > > raspberry pi and perhaps my other toy the wandboard which has a gigE adapter > > Well, measure first? Many of the low end devices can't achieve gigE > in the first place, rendering the BQL method somewhat moot. > > If you can provide a pointer to the right driver I can take a look. > (dmesg | grep eth) Mine's a Rasberry Pi model B running OpenELEC 4.0.4, an XBMC distro: # uname -a Linux OpenELEC 3.14.5 #1 PREEMPT Wed Jun 4 14:03:32 CEST 2014 armv6l GNU/Linux # dmesg|grep eth [ 2.652186] smsc95xx 1-1.1:1.0 eth0: register 'smsc95xx' at usb-bcm2708_usb-1.1, smsc95xx USB 2.0 Ethernet, b8:27:eb:70:1c:6e [ 18.188931] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup [ 18.190252] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready [ 19.720667] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1 [ 19.756150] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready So it looks to be 100meg, because it is plugged into a gig capable switch but it comes up at 100. Confirmed with the hardware specs here: http://elinux.org/RPi_Hardware ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-06-13 3:49 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-06-11 22:49 [Cerowrt-devel] BQL, txqueue lengths and the internet of things Dave Taht 2014-06-12 1:05 ` David P. Reed 2014-06-12 1:57 ` [Cerowrt-devel] [Bloat] " Jonathan Morton 2014-06-12 21:46 ` [Cerowrt-devel] " Dave Taht 2014-06-13 3:49 ` Chuck Anderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox