* [Bloat] Fwd: performance testing on the WRT1200AC
[not found] ` <CAA93jw5o74-sRKZagJWQBYBbfDcO9h0X+ZHehQCZ17hPVJoodA@mail.gmail.com>
@ 2015-06-14 17:39 ` Dave Taht
2015-06-14 19:43 ` Mikael Abrahamsson
0 siblings, 1 reply; 7+ messages in thread
From: Dave Taht @ 2015-06-14 17:39 UTC (permalink / raw)
To: bloat
a wider audience for the issues in new consumer hardware seems desirable.
forwarding with permission.
---------- Forwarded message ----------
From: Dave Taht <dave.taht@gmail.com>
Date: Sun, Jun 14, 2015 at 8:41 AM
Subject: Re: performance testing on the WRT1200AC
To: Mikael Abrahamsson <swmike@swm.pp.se>, Aaron Wood <woody77@gmail.com>
Dear Mikael:
netperf-wrapper has been renamed to flent. :) Quite a bit of new stuff
is dropping into it, one of my favorite tests is the new qdisc_stats
test (which I run at the same time as another test). It hasn't been
tested on a multi-queue interface (and doesn't work with openwrt's sh
implementation dang it). But do a pull anyway. :)
On Sun, Jun 14, 2015 at 8:18 AM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>
> Hi,
>
> I want to do some more demanding testing of the WRT1200AC. Currently it's
> running a few days old openwrt CC. It comes with the below qdisc setting. I
> will be testing it using the following setup:
>
> linux-switch-wrt1200ac-linux
>
> All links above are gigabit ethernet links.
>
> My plan is to for instance run netperf-wrapper with a few different tests.
>
> Would it strain the WRT1200AC if I configured it to shape to 900 megabit/s
> bidirectionallty? I guess in order to actually achieve a little bit of
My original tests with the 1900AC showed htb peaking out with sqm +
offloads at about 550/650mbit on the rrul test. (I can't remember if
nat was on or off, but I think off)
but that was months ago. I have a huge hope that cake will do better
on this platform and recently (yesterday) I think got that to the
point where we could push it to openwrt to be built regularly.
Aaron, cc'd, has done quite a bit of work with the 1900, and I think
he started running into trouble at 200mbit.
> buffering, I'm going to have to run below wirespeed? Because I can't get
> more than 1 gigabit/s of traffic to the wrt1200ac because of above layout,
> so doing bidirectional shaping to 900 on eth0 (WAN PORT) would at least give
> it a bit more to do and also give a chance to induce some buffering?
Ain't it a bitch? A thought would be to also exercise the wifi a bit
to drive it past gigE overall. So have two clients running flent tests
simultaneously, one on wifi, one on ethernet, and there you go,
driving it into overload.
> Do you have some other ideas for testing? I am mostly interested in making
> sure the CPU is fast enough to do AQM at gig speeds...
Well, there are other issues.
A) The mvneta ethernet driver in the 1900 did not support BQL when
last I looked, supplying insufficient backpressure to the upper
layers.
B) The multiqueued hardware applies a bit of fq for you automagically,
BUT, even if BQL was in place, BQL's buffering is additive per
hardware queue, so it tends to
what I saw was nearly no drops in the qdisc. I don't think I even saw
maxpacket grow (a sure sign you are backlogging in the qdisc) I ended
up disabling the hardware mq multiqueue[1] stuff entirely by "tc qdisc
add dev eth0 root fq_codel", and even then, see A) - but I did finally
see maxpacket grow...
C) to realize to my horror that they had very aggressively implemented
GRO for everything, giving us 64k "packets" to deal with coming in
from the gigE ethernet... which interacted rather badly with the
10Mbit outgoing interface I had at the time.
and that explained why nearly all the QoS systems as deployed in this
generation of router were doing so badly...
which led to a change in codel's control law (upstream in linux, not
sure in openwrt), and ultimately frantic activity in cake to do
peeling apart of superpackets like that.
I applaud further testing, and I would love it if you could verify
that the GRO problem remains and that it's hard to get sufficient
backpressure (and latencies should grow a lot) when driven with wifi+
ethernet
On simple single threaded up or down tests I was able to get full gigE
throughput out of the 1900's wan interface, but disabling offloads was
quite damaging, as was mixed traffic like rrul_50 up, which makes GRO
far less effective.
I wish I had time to go and add BQL. I requested it of the author, no response.
>
> root@OpenWrt:/tmp# tc qdisc
tc -s qdisc show # -s is more revealing
> qdisc mq 0: dev eth0 root
> qdisc fq_codel 0: dev eth0 parent :1 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth0 parent :2 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth0 parent :3 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth0 parent :4 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth0 parent :5 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth0 parent :6 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth0 parent :7 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth0 parent :8 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc mq 0: dev eth1 root
> qdisc fq_codel 0: dev eth1 parent :1 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth1 parent :2 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth1 parent :3 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth1 parent :4 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth1 parent :5 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth1 parent :6 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth1 parent :7 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev eth1 parent :8 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc mq 0: dev wlan0 root
> qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
> qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum 300
> target 5.0ms interval 100.0ms ecn
>
>
> --
> Mikael Abrahamsson email: swmike@swm.pp.se
--
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
--
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Fwd: performance testing on the WRT1200AC
2015-06-14 17:39 ` [Bloat] Fwd: performance testing on the WRT1200AC Dave Taht
@ 2015-06-14 19:43 ` Mikael Abrahamsson
2015-06-14 20:16 ` Dave Taht
0 siblings, 1 reply; 7+ messages in thread
From: Mikael Abrahamsson @ 2015-06-14 19:43 UTC (permalink / raw)
To: Dave Taht; +Cc: bloat
[-- Attachment #1: Type: TEXT/PLAIN, Size: 8943 bytes --]
Hi,
Some background:
The WRT1900ACv1 (which has been shipping for 6 months or so) is based on
Marvell Armada XP, which uses a packet processor. There is no support for
this in the generic Linux kernel, which means performance is a lot lower
with the generic kernel compared to the "special" kernel which has patches
and where you use the Marvell SDK to compile it to support the packet
processor. With the generic kernel, you get CPU only based forwarding
which is around 300-500 megabit/s of TCP.
Now, with WRT1200AC and WRT1900ACv2 which was released in the last few
weeks or so and just now becoming more widely available, they've changed
to Marvell Armada 385 which is the beefiest packet forwarding generic CPU
I have ever heard of or encountered in a "home gateway" kind of package. I
have an WRT1200AC for testing I received this week, and so far I have been
able to verify that it does 940 megabit/s of TCP (iperf) with the generic
kernel shipped with OpenWRT CC with the below default qdisc. It seems to
do this using approximately 25% CPU.
So what I would like to do now is try to push it a little bit harder, so
if someone could give me an example of a more punishing qdisc setup and
test to run through it, that would be very interesting.
But so far, the Armada 385 chipset (and I hope we'll see more devices
based on it) seems to be a perfect platform for bufferbload testing and
development. Yes, it's a lot pricier than the WNDR3800 that for instance
CeroWRT uses, but on the other hand, it seems to have 10x the performance
of that box, and everything seems to work right out of the box without any
special patches.
On Sun, 14 Jun 2015, Dave Taht wrote:
> a wider audience for the issues in new consumer hardware seems desirable.
>
> forwarding with permission.
>
>
> ---------- Forwarded message ----------
> From: Dave Taht <dave.taht@gmail.com>
> Date: Sun, Jun 14, 2015 at 8:41 AM
> Subject: Re: performance testing on the WRT1200AC
> To: Mikael Abrahamsson <swmike@swm.pp.se>, Aaron Wood <woody77@gmail.com>
>
>
> Dear Mikael:
>
> netperf-wrapper has been renamed to flent. :) Quite a bit of new stuff
> is dropping into it, one of my favorite tests is the new qdisc_stats
> test (which I run at the same time as another test). It hasn't been
> tested on a multi-queue interface (and doesn't work with openwrt's sh
> implementation dang it). But do a pull anyway. :)
>
> On Sun, Jun 14, 2015 at 8:18 AM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>>
>> Hi,
>>
>> I want to do some more demanding testing of the WRT1200AC. Currently it's
>> running a few days old openwrt CC. It comes with the below qdisc setting. I
>> will be testing it using the following setup:
>>
>> linux-switch-wrt1200ac-linux
>>
>> All links above are gigabit ethernet links.
>>
>> My plan is to for instance run netperf-wrapper with a few different tests.
>>
>> Would it strain the WRT1200AC if I configured it to shape to 900 megabit/s
>> bidirectionallty? I guess in order to actually achieve a little bit of
>
> My original tests with the 1900AC showed htb peaking out with sqm +
> offloads at about 550/650mbit on the rrul test. (I can't remember if
> nat was on or off, but I think off)
>
> but that was months ago. I have a huge hope that cake will do better
> on this platform and recently (yesterday) I think got that to the
> point where we could push it to openwrt to be built regularly.
>
> Aaron, cc'd, has done quite a bit of work with the 1900, and I think
> he started running into trouble at 200mbit.
>
>> buffering, I'm going to have to run below wirespeed? Because I can't get
>> more than 1 gigabit/s of traffic to the wrt1200ac because of above layout,
>> so doing bidirectional shaping to 900 on eth0 (WAN PORT) would at least give
>> it a bit more to do and also give a chance to induce some buffering?
>
> Ain't it a bitch? A thought would be to also exercise the wifi a bit
> to drive it past gigE overall. So have two clients running flent tests
> simultaneously, one on wifi, one on ethernet, and there you go,
> driving it into overload.
>
>> Do you have some other ideas for testing? I am mostly interested in making
>> sure the CPU is fast enough to do AQM at gig speeds...
>
> Well, there are other issues.
>
> A) The mvneta ethernet driver in the 1900 did not support BQL when
> last I looked, supplying insufficient backpressure to the upper
> layers.
>
> B) The multiqueued hardware applies a bit of fq for you automagically,
> BUT, even if BQL was in place, BQL's buffering is additive per
> hardware queue, so it tends to
>
> what I saw was nearly no drops in the qdisc. I don't think I even saw
> maxpacket grow (a sure sign you are backlogging in the qdisc) I ended
> up disabling the hardware mq multiqueue[1] stuff entirely by "tc qdisc
> add dev eth0 root fq_codel", and even then, see A) - but I did finally
> see maxpacket grow...
>
> C) to realize to my horror that they had very aggressively implemented
> GRO for everything, giving us 64k "packets" to deal with coming in
> from the gigE ethernet... which interacted rather badly with the
> 10Mbit outgoing interface I had at the time.
>
> and that explained why nearly all the QoS systems as deployed in this
> generation of router were doing so badly...
>
> which led to a change in codel's control law (upstream in linux, not
> sure in openwrt), and ultimately frantic activity in cake to do
> peeling apart of superpackets like that.
>
> I applaud further testing, and I would love it if you could verify
> that the GRO problem remains and that it's hard to get sufficient
> backpressure (and latencies should grow a lot) when driven with wifi+
> ethernet
>
> On simple single threaded up or down tests I was able to get full gigE
> throughput out of the 1900's wan interface, but disabling offloads was
> quite damaging, as was mixed traffic like rrul_50 up, which makes GRO
> far less effective.
>
> I wish I had time to go and add BQL. I requested it of the author, no response.
>
>>
>> root@OpenWrt:/tmp# tc qdisc
>
> tc -s qdisc show # -s is more revealing
>
>> qdisc mq 0: dev eth0 root
>> qdisc fq_codel 0: dev eth0 parent :1 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth0 parent :2 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth0 parent :3 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth0 parent :4 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth0 parent :5 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth0 parent :6 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth0 parent :7 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth0 parent :8 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc mq 0: dev eth1 root
>> qdisc fq_codel 0: dev eth1 parent :1 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth1 parent :2 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth1 parent :3 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth1 parent :4 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth1 parent :5 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth1 parent :6 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth1 parent :7 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev eth1 parent :8 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc mq 0: dev wlan0 root
>> qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>> qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum 300
>> target 5.0ms interval 100.0ms ecn
>>
>>
>> --
>> Mikael Abrahamsson email: swmike@swm.pp.se
>
>
>
> --
> Dave Täht
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast
>
>
> --
> Dave Täht
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Fwd: performance testing on the WRT1200AC
2015-06-14 19:43 ` Mikael Abrahamsson
@ 2015-06-14 20:16 ` Dave Taht
2015-06-23 12:54 ` Bill Ver Steeg (versteb)
0 siblings, 1 reply; 7+ messages in thread
From: Dave Taht @ 2015-06-14 20:16 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: bloat
On Sun, Jun 14, 2015 at 12:43 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>
> Hi,
>
> Some background:
>
> The WRT1900ACv1 (which has been shipping for 6 months or so) is based on
> Marvell Armada XP, which uses a packet processor. There is no support for
> this in the generic Linux kernel, which means performance is a lot lower
> with the generic kernel compared to the "special" kernel which has patches
> and where you use the Marvell SDK to compile it to support the packet
> processor. With the generic kernel, you get CPU only based forwarding which
> is around 300-500 megabit/s of TCP.
>
> Now, with WRT1200AC and WRT1900ACv2 which was released in the last few weeks
> or so and just now becoming more widely available, they've changed to
> Marvell Armada 385 which is the beefiest packet forwarding generic CPU I
> have ever heard of or encountered in a "home gateway" kind of package. I
> have an WRT1200AC for testing I received this week, and so far I have been
> able to verify that it does 940 megabit/s of TCP (iperf) with the generic
> kernel shipped with OpenWRT CC with the below default qdisc. It seems to do
> this using approximately 25% CPU.
>
> So what I would like to do now is try to push it a little bit harder, so if
> someone could give me an example of a more punishing qdisc setup and test to
> run through it, that would be very interesting.
>
> But so far, the Armada 385 chipset (and I hope we'll see more devices based
> on it) seems to be a perfect platform for bufferbload testing and
> development. Yes, it's a lot pricier than the WNDR3800 that for instance
> CeroWRT uses, but on the other hand, it seems to have 10x the performance of
> that box, and everything seems to work right out of the box without any
> special patches.
The pricepoint is just fine (150 dollars). The chipset looks excellent.
However, this time around, I would like to have explicit support (both
financial and physical) from somewhere(s), including the SoC maker and
vendor, to go push the state of the art forward again.
Do you have any contacts at marvell?
With CeroWrt, we shook up the industry. We should all be proud of that!
But... shipping an OS for only one box - that not only pushes the
state of the art forward, but is reliable, secure, and stable enough
for day to day use is *hard*.
- and it came at a terrible cost to me and several others here.
Karmically, I'm pretty happy, tho.
I could be 10% behind someone else to do another cerowrt-ish thing on
their own time and budget.
If I could hand off most of the things that cost me the most hair
(chasing down bugs, doing builds, qa-ing builds, flashing routers,
updating testbeds, running the build cluster, sysadmining the site,
coming up with tests, giving talks), in favor of staying more high
level, and doing more analytics and research... I'd be up for "CeroWrt
v3... the Next Generation... Reference Router Distro". I am sad we
have spent so much time making software rate limiting work better,
when the core algorithms (pie, codel) were designed for variable rate
environments, and millions (poorly implemented) wifi stack based
devices ship every day - and are still not quite enough for what is
really needed.
So...
Right now the meandering course is to just contribute incremental
improvements back to the linux and openwrt mainlines, continue
standardization efforts and develop better tests, towards a better
network end goal...
...with the far more broadly scoped make-wifi-fast project concept now
making it's rounds through various possible funding agencies.
Limited, but broad goals on what budget I have let me sleep better:
Example: I hope that we get cake building soon for all of openwrt and
then having y'all test it on this platform, and all the others. Being
able to profile it would be nice too. I am pretty sure we can take
better advantage of multicore. Jonathon is funded for cake for a while
longer. My work to get it into openwrt, and the testing I've done, is
not.
It would be nice to find someone to sink a week into adding BQL to the
mvneta driver, and profiling it.
I would like to see BQL behave better on hardware multiqueue.
I'd like someone to poke deeply into the wifi chipset this box uses to
see if we can apply ideas from make-wifi-fast to it.
It would be great to have source access to the blobs.
I am always glad we have such a diverse range of people here,
scratching the beat-the-bufferbloat itch, and making a difference
wherever they can, with what abilities and spare time they have.
I would like to see what happened for OCP happen for embedded edge
devices, also.
http://www.businessinsider.com/facebook-open-compute-project-history-2015-6
From the bottom, looking up, that seems hard.
> On Sun, 14 Jun 2015, Dave Taht wrote:
>
>> a wider audience for the issues in new consumer hardware seems desirable.
>>
>> forwarding with permission.
>>
>>
>> ---------- Forwarded message ----------
>> From: Dave Taht <dave.taht@gmail.com>
>> Date: Sun, Jun 14, 2015 at 8:41 AM
>> Subject: Re: performance testing on the WRT1200AC
>> To: Mikael Abrahamsson <swmike@swm.pp.se>, Aaron Wood <woody77@gmail.com>
>>
>>
>> Dear Mikael:
>>
>> netperf-wrapper has been renamed to flent. :) Quite a bit of new stuff
>> is dropping into it, one of my favorite tests is the new qdisc_stats
>> test (which I run at the same time as another test). It hasn't been
>> tested on a multi-queue interface (and doesn't work with openwrt's sh
>> implementation dang it). But do a pull anyway. :)
>>
>> On Sun, Jun 14, 2015 at 8:18 AM, Mikael Abrahamsson <swmike@swm.pp.se>
>> wrote:
>>>
>>>
>>> Hi,
>>>
>>> I want to do some more demanding testing of the WRT1200AC. Currently it's
>>> running a few days old openwrt CC. It comes with the below qdisc setting.
>>> I
>>> will be testing it using the following setup:
>>>
>>> linux-switch-wrt1200ac-linux
>>>
>>> All links above are gigabit ethernet links.
>>>
>>> My plan is to for instance run netperf-wrapper with a few different
>>> tests.
>>>
>>> Would it strain the WRT1200AC if I configured it to shape to 900
>>> megabit/s
>>> bidirectionallty? I guess in order to actually achieve a little bit of
>>
>>
>> My original tests with the 1900AC showed htb peaking out with sqm +
>> offloads at about 550/650mbit on the rrul test. (I can't remember if
>> nat was on or off, but I think off)
>>
>> but that was months ago. I have a huge hope that cake will do better
>> on this platform and recently (yesterday) I think got that to the
>> point where we could push it to openwrt to be built regularly.
>>
>> Aaron, cc'd, has done quite a bit of work with the 1900, and I think
>> he started running into trouble at 200mbit.
>>
>>> buffering, I'm going to have to run below wirespeed? Because I can't get
>>> more than 1 gigabit/s of traffic to the wrt1200ac because of above
>>> layout,
>>> so doing bidirectional shaping to 900 on eth0 (WAN PORT) would at least
>>> give
>>> it a bit more to do and also give a chance to induce some buffering?
>>
>>
>> Ain't it a bitch? A thought would be to also exercise the wifi a bit
>> to drive it past gigE overall. So have two clients running flent tests
>> simultaneously, one on wifi, one on ethernet, and there you go,
>> driving it into overload.
>>
>>> Do you have some other ideas for testing? I am mostly interested in
>>> making
>>> sure the CPU is fast enough to do AQM at gig speeds...
>>
>>
>> Well, there are other issues.
>>
>> A) The mvneta ethernet driver in the 1900 did not support BQL when
>> last I looked, supplying insufficient backpressure to the upper
>> layers.
>>
>> B) The multiqueued hardware applies a bit of fq for you automagically,
>> BUT, even if BQL was in place, BQL's buffering is additive per
>> hardware queue, so it tends to
>>
>> what I saw was nearly no drops in the qdisc. I don't think I even saw
>> maxpacket grow (a sure sign you are backlogging in the qdisc) I ended
>> up disabling the hardware mq multiqueue[1] stuff entirely by "tc qdisc
>> add dev eth0 root fq_codel", and even then, see A) - but I did finally
>> see maxpacket grow...
>>
>> C) to realize to my horror that they had very aggressively implemented
>> GRO for everything, giving us 64k "packets" to deal with coming in
>> from the gigE ethernet... which interacted rather badly with the
>> 10Mbit outgoing interface I had at the time.
>>
>> and that explained why nearly all the QoS systems as deployed in this
>> generation of router were doing so badly...
>>
>> which led to a change in codel's control law (upstream in linux, not
>> sure in openwrt), and ultimately frantic activity in cake to do
>> peeling apart of superpackets like that.
>>
>> I applaud further testing, and I would love it if you could verify
>> that the GRO problem remains and that it's hard to get sufficient
>> backpressure (and latencies should grow a lot) when driven with wifi+
>> ethernet
>>
>> On simple single threaded up or down tests I was able to get full gigE
>> throughput out of the 1900's wan interface, but disabling offloads was
>> quite damaging, as was mixed traffic like rrul_50 up, which makes GRO
>> far less effective.
>>
>> I wish I had time to go and add BQL. I requested it of the author, no
>> response.
>>
>>>
>>> root@OpenWrt:/tmp# tc qdisc
>>
>>
>> tc -s qdisc show # -s is more revealing
>>
>>> qdisc mq 0: dev eth0 root
>>> qdisc fq_codel 0: dev eth0 parent :1 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :2 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :3 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :4 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :5 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :6 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :7 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :8 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc mq 0: dev eth1 root
>>> qdisc fq_codel 0: dev eth1 parent :1 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :2 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :3 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :4 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :5 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :6 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :7 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :8 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc mq 0: dev wlan0 root
>>> qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>>
>>>
>>> --
>>> Mikael Abrahamsson email: swmike@swm.pp.se
>>
>>
>>
>>
>> --
>> Dave Täht
>> What will it take to vastly improve wifi for everyone?
>> https://plus.google.com/u/0/explore/makewififast
>>
>>
>> --
>> Dave Täht
>> What will it take to vastly improve wifi for everyone?
>> https://plus.google.com/u/0/explore/makewififast
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
>
> --
> Mikael Abrahamsson email: swmike@swm.pp.se
--
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Fwd: performance testing on the WRT1200AC
2015-06-14 20:16 ` Dave Taht
@ 2015-06-23 12:54 ` Bill Ver Steeg (versteb)
2015-06-23 13:54 ` Dave Taht
0 siblings, 1 reply; 7+ messages in thread
From: Bill Ver Steeg (versteb) @ 2015-06-23 12:54 UTC (permalink / raw)
To: Dave Taht, Mikael Abrahamsson; +Cc: bloat
Regarding getting AQM into fast-path on edge router silicon - If there are any chipset vendors monitoring this list, drop me a note and I can help get you up to speed on the various algorithms. For several of our lower end products, we are dependent on the drivers / fast path provided by the various merchant silicon vendors...... I have reached out through our normal channels, but have not been able to close the loop yet.
Bill VerSteeg
(wearing my Cisco product engineering hat, as opposed to the other hats I occasionally wear)
-----Original Message-----
From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
Sent: Sunday, June 14, 2015 9:16 PM
To: Mikael Abrahamsson
Cc: bloat
Subject: Re: [Bloat] Fwd: performance testing on the WRT1200AC
On Sun, Jun 14, 2015 at 12:43 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>
> Hi,
>
> Some background:
>
> The WRT1900ACv1 (which has been shipping for 6 months or so) is based
> on Marvell Armada XP, which uses a packet processor. There is no
> support for this in the generic Linux kernel, which means performance
> is a lot lower with the generic kernel compared to the "special"
> kernel which has patches and where you use the Marvell SDK to compile
> it to support the packet processor. With the generic kernel, you get
> CPU only based forwarding which is around 300-500 megabit/s of TCP.
>
> Now, with WRT1200AC and WRT1900ACv2 which was released in the last few
> weeks or so and just now becoming more widely available, they've
> changed to Marvell Armada 385 which is the beefiest packet forwarding
> generic CPU I have ever heard of or encountered in a "home gateway"
> kind of package. I have an WRT1200AC for testing I received this week,
> and so far I have been able to verify that it does 940 megabit/s of
> TCP (iperf) with the generic kernel shipped with OpenWRT CC with the
> below default qdisc. It seems to do this using approximately 25% CPU.
>
> So what I would like to do now is try to push it a little bit harder,
> so if someone could give me an example of a more punishing qdisc setup
> and test to run through it, that would be very interesting.
>
> But so far, the Armada 385 chipset (and I hope we'll see more devices
> based on it) seems to be a perfect platform for bufferbload testing
> and development. Yes, it's a lot pricier than the WNDR3800 that for
> instance CeroWRT uses, but on the other hand, it seems to have 10x the
> performance of that box, and everything seems to work right out of the
> box without any special patches.
The pricepoint is just fine (150 dollars). The chipset looks excellent.
However, this time around, I would like to have explicit support (both financial and physical) from somewhere(s), including the SoC maker and vendor, to go push the state of the art forward again.
Do you have any contacts at marvell?
With CeroWrt, we shook up the industry. We should all be proud of that!
But... shipping an OS for only one box - that not only pushes the state of the art forward, but is reliable, secure, and stable enough for day to day use is *hard*.
- and it came at a terrible cost to me and several others here.
Karmically, I'm pretty happy, tho.
I could be 10% behind someone else to do another cerowrt-ish thing on their own time and budget.
If I could hand off most of the things that cost me the most hair (chasing down bugs, doing builds, qa-ing builds, flashing routers, updating testbeds, running the build cluster, sysadmining the site, coming up with tests, giving talks), in favor of staying more high level, and doing more analytics and research... I'd be up for "CeroWrt v3... the Next Generation... Reference Router Distro". I am sad we have spent so much time making software rate limiting work better, when the core algorithms (pie, codel) were designed for variable rate environments, and millions (poorly implemented) wifi stack based devices ship every day - and are still not quite enough for what is really needed.
So...
Right now the meandering course is to just contribute incremental improvements back to the linux and openwrt mainlines, continue standardization efforts and develop better tests, towards a better network end goal...
...with the far more broadly scoped make-wifi-fast project concept now making it's rounds through various possible funding agencies.
Limited, but broad goals on what budget I have let me sleep better:
Example: I hope that we get cake building soon for all of openwrt and then having y'all test it on this platform, and all the others. Being able to profile it would be nice too. I am pretty sure we can take better advantage of multicore. Jonathon is funded for cake for a while longer. My work to get it into openwrt, and the testing I've done, is not.
It would be nice to find someone to sink a week into adding BQL to the mvneta driver, and profiling it.
I would like to see BQL behave better on hardware multiqueue.
I'd like someone to poke deeply into the wifi chipset this box uses to see if we can apply ideas from make-wifi-fast to it.
It would be great to have source access to the blobs.
I am always glad we have such a diverse range of people here, scratching the beat-the-bufferbloat itch, and making a difference wherever they can, with what abilities and spare time they have.
I would like to see what happened for OCP happen for embedded edge devices, also.
http://www.businessinsider.com/facebook-open-compute-project-history-2015-6
From the bottom, looking up, that seems hard.
> On Sun, 14 Jun 2015, Dave Taht wrote:
>
>> a wider audience for the issues in new consumer hardware seems desirable.
>>
>> forwarding with permission.
>>
>>
>> ---------- Forwarded message ----------
>> From: Dave Taht <dave.taht@gmail.com>
>> Date: Sun, Jun 14, 2015 at 8:41 AM
>> Subject: Re: performance testing on the WRT1200AC
>> To: Mikael Abrahamsson <swmike@swm.pp.se>, Aaron Wood
>> <woody77@gmail.com>
>>
>>
>> Dear Mikael:
>>
>> netperf-wrapper has been renamed to flent. :) Quite a bit of new
>> stuff is dropping into it, one of my favorite tests is the new
>> qdisc_stats test (which I run at the same time as another test). It
>> hasn't been tested on a multi-queue interface (and doesn't work with
>> openwrt's sh implementation dang it). But do a pull anyway. :)
>>
>> On Sun, Jun 14, 2015 at 8:18 AM, Mikael Abrahamsson
>> <swmike@swm.pp.se>
>> wrote:
>>>
>>>
>>> Hi,
>>>
>>> I want to do some more demanding testing of the WRT1200AC. Currently
>>> it's running a few days old openwrt CC. It comes with the below qdisc setting.
>>> I
>>> will be testing it using the following setup:
>>>
>>> linux-switch-wrt1200ac-linux
>>>
>>> All links above are gigabit ethernet links.
>>>
>>> My plan is to for instance run netperf-wrapper with a few different
>>> tests.
>>>
>>> Would it strain the WRT1200AC if I configured it to shape to 900
>>> megabit/s bidirectionallty? I guess in order to actually achieve a
>>> little bit of
>>
>>
>> My original tests with the 1900AC showed htb peaking out with sqm +
>> offloads at about 550/650mbit on the rrul test. (I can't remember if
>> nat was on or off, but I think off)
>>
>> but that was months ago. I have a huge hope that cake will do better
>> on this platform and recently (yesterday) I think got that to the
>> point where we could push it to openwrt to be built regularly.
>>
>> Aaron, cc'd, has done quite a bit of work with the 1900, and I think
>> he started running into trouble at 200mbit.
>>
>>> buffering, I'm going to have to run below wirespeed? Because I can't
>>> get more than 1 gigabit/s of traffic to the wrt1200ac because of
>>> above layout, so doing bidirectional shaping to 900 on eth0 (WAN
>>> PORT) would at least give it a bit more to do and also give a chance
>>> to induce some buffering?
>>
>>
>> Ain't it a bitch? A thought would be to also exercise the wifi a bit
>> to drive it past gigE overall. So have two clients running flent
>> tests simultaneously, one on wifi, one on ethernet, and there you go,
>> driving it into overload.
>>
>>> Do you have some other ideas for testing? I am mostly interested in
>>> making sure the CPU is fast enough to do AQM at gig speeds...
>>
>>
>> Well, there are other issues.
>>
>> A) The mvneta ethernet driver in the 1900 did not support BQL when
>> last I looked, supplying insufficient backpressure to the upper
>> layers.
>>
>> B) The multiqueued hardware applies a bit of fq for you automagically,
>> BUT, even if BQL was in place, BQL's buffering is additive per
>> hardware queue, so it tends to
>>
>> what I saw was nearly no drops in the qdisc. I don't think I even saw
>> maxpacket grow (a sure sign you are backlogging in the qdisc) I ended
>> up disabling the hardware mq multiqueue[1] stuff entirely by "tc qdisc
>> add dev eth0 root fq_codel", and even then, see A) - but I did finally
>> see maxpacket grow...
>>
>> C) to realize to my horror that they had very aggressively implemented
>> GRO for everything, giving us 64k "packets" to deal with coming in
>> from the gigE ethernet... which interacted rather badly with the
>> 10Mbit outgoing interface I had at the time.
>>
>> and that explained why nearly all the QoS systems as deployed in this
>> generation of router were doing so badly...
>>
>> which led to a change in codel's control law (upstream in linux, not
>> sure in openwrt), and ultimately frantic activity in cake to do
>> peeling apart of superpackets like that.
>>
>> I applaud further testing, and I would love it if you could verify
>> that the GRO problem remains and that it's hard to get sufficient
>> backpressure (and latencies should grow a lot) when driven with wifi+
>> ethernet
>>
>> On simple single threaded up or down tests I was able to get full gigE
>> throughput out of the 1900's wan interface, but disabling offloads was
>> quite damaging, as was mixed traffic like rrul_50 up, which makes GRO
>> far less effective.
>>
>> I wish I had time to go and add BQL. I requested it of the author, no
>> response.
>>
>>>
>>> root@OpenWrt:/tmp# tc qdisc
>>
>>
>> tc -s qdisc show # -s is more revealing
>>
>>> qdisc mq 0: dev eth0 root
>>> qdisc fq_codel 0: dev eth0 parent :1 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :2 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :3 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :4 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :5 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :6 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :7 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth0 parent :8 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc mq 0: dev eth1 root
>>> qdisc fq_codel 0: dev eth1 parent :1 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :2 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :3 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :4 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :5 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :6 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :7 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev eth1 parent :8 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc mq 0: dev wlan0 root
>>> qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>> qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum 300
>>> target 5.0ms interval 100.0ms ecn
>>>
>>>
>>> --
>>> Mikael Abrahamsson email: swmike@swm.pp.se
>>
>>
>>
>>
>> --
>> Dave Täht
>> What will it take to vastly improve wifi for everyone?
>> https://plus.google.com/u/0/explore/makewififast
>>
>>
>> --
>> Dave Täht
>> What will it take to vastly improve wifi for everyone?
>> https://plus.google.com/u/0/explore/makewififast
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
>
> --
> Mikael Abrahamsson email: swmike@swm.pp.se
--
Dave Täht
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Fwd: performance testing on the WRT1200AC
2015-06-23 12:54 ` Bill Ver Steeg (versteb)
@ 2015-06-23 13:54 ` Dave Taht
2015-06-23 14:03 ` Bill Ver Steeg (versteb)
0 siblings, 1 reply; 7+ messages in thread
From: Dave Taht @ 2015-06-23 13:54 UTC (permalink / raw)
To: Bill Ver Steeg (versteb); +Cc: bloat
On Tue, Jun 23, 2015 at 5:54 AM, Bill Ver Steeg (versteb)
<versteb@cisco.com> wrote:
> Regarding getting AQM into fast-path on edge router silicon - If there are any chipset vendors monitoring this list, drop me a note and I can help get you up to speed on the various algorithms. For several of our lower end products, we are dependent on the drivers / fast path provided by the various merchant silicon vendors...... I have reached out through our normal channels, but have not been able to close the loop yet.
There is some work going into SDN and FPGA capable implementations of
this stuff at meshsr as part of the onenetswitch30 project, which will
in the end apply to netfpga also, and ultimately be available as VHDL
or verilog for silicon for SoCs.
You can contact me or meshsr´s folk (huc at ieee.org) for more
details. The board booted linux 3.13 recently and ubuntu 14.04. With
sufficient funding we'll be done with the ethernet and switch
re-design by december.
I note I would rather like cisco and other ethernet card driver
makers to gain BQL support in Linux, as these fq and aqm algorithms
are lightweight enough to implement in software, rather than hardware,
so long as there is sufficient backpressure from the hardware.
Current BQL support remains limited, and it´s only 7 lines of new
code, typically, and some hairy testing for edge cases.
http://www.bufferbloat.net/projects/bloat/wiki/BQL_enabled_drivers
>
>
> Bill VerSteeg
> (wearing my Cisco product engineering hat, as opposed to the other hats I occasionally wear)
>
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
> Sent: Sunday, June 14, 2015 9:16 PM
> To: Mikael Abrahamsson
> Cc: bloat
> Subject: Re: [Bloat] Fwd: performance testing on the WRT1200AC
>
> On Sun, Jun 14, 2015 at 12:43 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>>
>> Hi,
>>
>> Some background:
>>
>> The WRT1900ACv1 (which has been shipping for 6 months or so) is based
>> on Marvell Armada XP, which uses a packet processor. There is no
>> support for this in the generic Linux kernel, which means performance
>> is a lot lower with the generic kernel compared to the "special"
>> kernel which has patches and where you use the Marvell SDK to compile
>> it to support the packet processor. With the generic kernel, you get
>> CPU only based forwarding which is around 300-500 megabit/s of TCP.
>>
>> Now, with WRT1200AC and WRT1900ACv2 which was released in the last few
>> weeks or so and just now becoming more widely available, they've
>> changed to Marvell Armada 385 which is the beefiest packet forwarding
>> generic CPU I have ever heard of or encountered in a "home gateway"
>> kind of package. I have an WRT1200AC for testing I received this week,
>> and so far I have been able to verify that it does 940 megabit/s of
>> TCP (iperf) with the generic kernel shipped with OpenWRT CC with the
>> below default qdisc. It seems to do this using approximately 25% CPU.
>>
>> So what I would like to do now is try to push it a little bit harder,
>> so if someone could give me an example of a more punishing qdisc setup
>> and test to run through it, that would be very interesting.
>>
>> But so far, the Armada 385 chipset (and I hope we'll see more devices
>> based on it) seems to be a perfect platform for bufferbload testing
>> and development. Yes, it's a lot pricier than the WNDR3800 that for
>> instance CeroWRT uses, but on the other hand, it seems to have 10x the
>> performance of that box, and everything seems to work right out of the
>> box without any special patches.
>
> The pricepoint is just fine (150 dollars). The chipset looks excellent.
>
> However, this time around, I would like to have explicit support (both financial and physical) from somewhere(s), including the SoC maker and vendor, to go push the state of the art forward again.
>
> Do you have any contacts at marvell?
>
> With CeroWrt, we shook up the industry. We should all be proud of that!
>
> But... shipping an OS for only one box - that not only pushes the state of the art forward, but is reliable, secure, and stable enough for day to day use is *hard*.
>
> - and it came at a terrible cost to me and several others here.
> Karmically, I'm pretty happy, tho.
>
> I could be 10% behind someone else to do another cerowrt-ish thing on their own time and budget.
>
> If I could hand off most of the things that cost me the most hair (chasing down bugs, doing builds, qa-ing builds, flashing routers, updating testbeds, running the build cluster, sysadmining the site, coming up with tests, giving talks), in favor of staying more high level, and doing more analytics and research... I'd be up for "CeroWrt v3... the Next Generation... Reference Router Distro". I am sad we have spent so much time making software rate limiting work better, when the core algorithms (pie, codel) were designed for variable rate environments, and millions (poorly implemented) wifi stack based devices ship every day - and are still not quite enough for what is really needed.
>
> So...
>
> Right now the meandering course is to just contribute incremental improvements back to the linux and openwrt mainlines, continue standardization efforts and develop better tests, towards a better network end goal...
>
> ...with the far more broadly scoped make-wifi-fast project concept now making it's rounds through various possible funding agencies.
>
> Limited, but broad goals on what budget I have let me sleep better:
>
> Example: I hope that we get cake building soon for all of openwrt and then having y'all test it on this platform, and all the others. Being able to profile it would be nice too. I am pretty sure we can take better advantage of multicore. Jonathon is funded for cake for a while longer. My work to get it into openwrt, and the testing I've done, is not.
>
> It would be nice to find someone to sink a week into adding BQL to the mvneta driver, and profiling it.
>
> I would like to see BQL behave better on hardware multiqueue.
>
> I'd like someone to poke deeply into the wifi chipset this box uses to see if we can apply ideas from make-wifi-fast to it.
>
> It would be great to have source access to the blobs.
>
> I am always glad we have such a diverse range of people here, scratching the beat-the-bufferbloat itch, and making a difference wherever they can, with what abilities and spare time they have.
>
> I would like to see what happened for OCP happen for embedded edge devices, also.
>
> http://www.businessinsider.com/facebook-open-compute-project-history-2015-6
>
> From the bottom, looking up, that seems hard.
>
>> On Sun, 14 Jun 2015, Dave Taht wrote:
>>
>>> a wider audience for the issues in new consumer hardware seems desirable.
>>>
>>> forwarding with permission.
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Dave Taht <dave.taht@gmail.com>
>>> Date: Sun, Jun 14, 2015 at 8:41 AM
>>> Subject: Re: performance testing on the WRT1200AC
>>> To: Mikael Abrahamsson <swmike@swm.pp.se>, Aaron Wood
>>> <woody77@gmail.com>
>>>
>>>
>>> Dear Mikael:
>>>
>>> netperf-wrapper has been renamed to flent. :) Quite a bit of new
>>> stuff is dropping into it, one of my favorite tests is the new
>>> qdisc_stats test (which I run at the same time as another test). It
>>> hasn't been tested on a multi-queue interface (and doesn't work with
>>> openwrt's sh implementation dang it). But do a pull anyway. :)
>>>
>>> On Sun, Jun 14, 2015 at 8:18 AM, Mikael Abrahamsson
>>> <swmike@swm.pp.se>
>>> wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I want to do some more demanding testing of the WRT1200AC. Currently
>>>> it's running a few days old openwrt CC. It comes with the below qdisc setting.
>>>> I
>>>> will be testing it using the following setup:
>>>>
>>>> linux-switch-wrt1200ac-linux
>>>>
>>>> All links above are gigabit ethernet links.
>>>>
>>>> My plan is to for instance run netperf-wrapper with a few different
>>>> tests.
>>>>
>>>> Would it strain the WRT1200AC if I configured it to shape to 900
>>>> megabit/s bidirectionallty? I guess in order to actually achieve a
>>>> little bit of
>>>
>>>
>>> My original tests with the 1900AC showed htb peaking out with sqm +
>>> offloads at about 550/650mbit on the rrul test. (I can't remember if
>>> nat was on or off, but I think off)
>>>
>>> but that was months ago. I have a huge hope that cake will do better
>>> on this platform and recently (yesterday) I think got that to the
>>> point where we could push it to openwrt to be built regularly.
>>>
>>> Aaron, cc'd, has done quite a bit of work with the 1900, and I think
>>> he started running into trouble at 200mbit.
>>>
>>>> buffering, I'm going to have to run below wirespeed? Because I can't
>>>> get more than 1 gigabit/s of traffic to the wrt1200ac because of
>>>> above layout, so doing bidirectional shaping to 900 on eth0 (WAN
>>>> PORT) would at least give it a bit more to do and also give a chance
>>>> to induce some buffering?
>>>
>>>
>>> Ain't it a bitch? A thought would be to also exercise the wifi a bit
>>> to drive it past gigE overall. So have two clients running flent
>>> tests simultaneously, one on wifi, one on ethernet, and there you go,
>>> driving it into overload.
>>>
>>>> Do you have some other ideas for testing? I am mostly interested in
>>>> making sure the CPU is fast enough to do AQM at gig speeds...
>>>
>>>
>>> Well, there are other issues.
>>>
>>> A) The mvneta ethernet driver in the 1900 did not support BQL when
>>> last I looked, supplying insufficient backpressure to the upper
>>> layers.
>>>
>>> B) The multiqueued hardware applies a bit of fq for you automagically,
>>> BUT, even if BQL was in place, BQL's buffering is additive per
>>> hardware queue, so it tends to
>>>
>>> what I saw was nearly no drops in the qdisc. I don't think I even saw
>>> maxpacket grow (a sure sign you are backlogging in the qdisc) I ended
>>> up disabling the hardware mq multiqueue[1] stuff entirely by "tc qdisc
>>> add dev eth0 root fq_codel", and even then, see A) - but I did finally
>>> see maxpacket grow...
>>>
>>> C) to realize to my horror that they had very aggressively implemented
>>> GRO for everything, giving us 64k "packets" to deal with coming in
>>> from the gigE ethernet... which interacted rather badly with the
>>> 10Mbit outgoing interface I had at the time.
>>>
>>> and that explained why nearly all the QoS systems as deployed in this
>>> generation of router were doing so badly...
>>>
>>> which led to a change in codel's control law (upstream in linux, not
>>> sure in openwrt), and ultimately frantic activity in cake to do
>>> peeling apart of superpackets like that.
>>>
>>> I applaud further testing, and I would love it if you could verify
>>> that the GRO problem remains and that it's hard to get sufficient
>>> backpressure (and latencies should grow a lot) when driven with wifi+
>>> ethernet
>>>
>>> On simple single threaded up or down tests I was able to get full gigE
>>> throughput out of the 1900's wan interface, but disabling offloads was
>>> quite damaging, as was mixed traffic like rrul_50 up, which makes GRO
>>> far less effective.
>>>
>>> I wish I had time to go and add BQL. I requested it of the author, no
>>> response.
>>>
>>>>
>>>> root@OpenWrt:/tmp# tc qdisc
>>>
>>>
>>> tc -s qdisc show # -s is more revealing
>>>
>>>> qdisc mq 0: dev eth0 root
>>>> qdisc fq_codel 0: dev eth0 parent :1 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth0 parent :2 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth0 parent :3 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth0 parent :4 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth0 parent :5 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth0 parent :6 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth0 parent :7 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth0 parent :8 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc mq 0: dev eth1 root
>>>> qdisc fq_codel 0: dev eth1 parent :1 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth1 parent :2 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth1 parent :3 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth1 parent :4 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth1 parent :5 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth1 parent :6 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth1 parent :7 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth1 parent :8 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc mq 0: dev wlan0 root
>>>> qdisc fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev wlan0 parent :2 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn
>>>>
>>>>
>>>> --
>>>> Mikael Abrahamsson email: swmike@swm.pp.se
>>>
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>> What will it take to vastly improve wifi for everyone?
>>> https://plus.google.com/u/0/explore/makewififast
>>>
>>>
>>> --
>>> Dave Täht
>>> What will it take to vastly improve wifi for everyone?
>>> https://plus.google.com/u/0/explore/makewififast
>>> _______________________________________________
>>> Bloat mailing list
>>> Bloat@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>>
>>
>> --
>> Mikael Abrahamsson email: swmike@swm.pp.se
>
>
>
> --
> Dave Täht
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Fwd: performance testing on the WRT1200AC
2015-06-23 13:54 ` Dave Taht
@ 2015-06-23 14:03 ` Bill Ver Steeg (versteb)
2015-06-23 14:35 ` Dave Taht
0 siblings, 1 reply; 7+ messages in thread
From: Bill Ver Steeg (versteb) @ 2015-06-23 14:03 UTC (permalink / raw)
To: Dave Taht; +Cc: bloat
Dave-
I was actually thinking of Broadcom/Intel/Entropic and the like. I suspect that these folks handle the packet processing logic (in the fast path) on a large number of commercially deployed cable/DSL modems - particularly the commodity devices found in most SP deployments today.
Bvs
-----Original Message-----
From: Dave Taht [mailto:dave.taht@gmail.com]
Sent: Tuesday, June 23, 2015 9:54 AM
To: Bill Ver Steeg (versteb)
Cc: Mikael Abrahamsson; bloat
Subject: Re: [Bloat] Fwd: performance testing on the WRT1200AC
On Tue, Jun 23, 2015 at 5:54 AM, Bill Ver Steeg (versteb) <versteb@cisco.com> wrote:
> Regarding getting AQM into fast-path on edge router silicon - If there are any chipset vendors monitoring this list, drop me a note and I can help get you up to speed on the various algorithms. For several of our lower end products, we are dependent on the drivers / fast path provided by the various merchant silicon vendors...... I have reached out through our normal channels, but have not been able to close the loop yet.
There is some work going into SDN and FPGA capable implementations of this stuff at meshsr as part of the onenetswitch30 project, which will in the end apply to netfpga also, and ultimately be available as VHDL or verilog for silicon for SoCs.
You can contact me or meshsr´s folk (huc at ieee.org) for more details. The board booted linux 3.13 recently and ubuntu 14.04. With sufficient funding we'll be done with the ethernet and switch re-design by december.
I note I would rather like cisco and other ethernet card driver makers to gain BQL support in Linux, as these fq and aqm algorithms are lightweight enough to implement in software, rather than hardware, so long as there is sufficient backpressure from the hardware.
Current BQL support remains limited, and it´s only 7 lines of new code, typically, and some hairy testing for edge cases.
http://www.bufferbloat.net/projects/bloat/wiki/BQL_enabled_drivers
>
>
> Bill VerSteeg
> (wearing my Cisco product engineering hat, as opposed to the other
> hats I occasionally wear)
>
>
> -----Original Message-----
> From: bloat-bounces@lists.bufferbloat.net
> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
> Sent: Sunday, June 14, 2015 9:16 PM
> To: Mikael Abrahamsson
> Cc: bloat
> Subject: Re: [Bloat] Fwd: performance testing on the WRT1200AC
>
> On Sun, Jun 14, 2015 at 12:43 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>>
>> Hi,
>>
>> Some background:
>>
>> The WRT1900ACv1 (which has been shipping for 6 months or so) is based
>> on Marvell Armada XP, which uses a packet processor. There is no
>> support for this in the generic Linux kernel, which means performance
>> is a lot lower with the generic kernel compared to the "special"
>> kernel which has patches and where you use the Marvell SDK to compile
>> it to support the packet processor. With the generic kernel, you get
>> CPU only based forwarding which is around 300-500 megabit/s of TCP.
>>
>> Now, with WRT1200AC and WRT1900ACv2 which was released in the last
>> few weeks or so and just now becoming more widely available, they've
>> changed to Marvell Armada 385 which is the beefiest packet forwarding
>> generic CPU I have ever heard of or encountered in a "home gateway"
>> kind of package. I have an WRT1200AC for testing I received this
>> week, and so far I have been able to verify that it does 940
>> megabit/s of TCP (iperf) with the generic kernel shipped with OpenWRT
>> CC with the below default qdisc. It seems to do this using approximately 25% CPU.
>>
>> So what I would like to do now is try to push it a little bit harder,
>> so if someone could give me an example of a more punishing qdisc
>> setup and test to run through it, that would be very interesting.
>>
>> But so far, the Armada 385 chipset (and I hope we'll see more devices
>> based on it) seems to be a perfect platform for bufferbload testing
>> and development. Yes, it's a lot pricier than the WNDR3800 that for
>> instance CeroWRT uses, but on the other hand, it seems to have 10x
>> the performance of that box, and everything seems to work right out
>> of the box without any special patches.
>
> The pricepoint is just fine (150 dollars). The chipset looks excellent.
>
> However, this time around, I would like to have explicit support (both financial and physical) from somewhere(s), including the SoC maker and vendor, to go push the state of the art forward again.
>
> Do you have any contacts at marvell?
>
> With CeroWrt, we shook up the industry. We should all be proud of that!
>
> But... shipping an OS for only one box - that not only pushes the state of the art forward, but is reliable, secure, and stable enough for day to day use is *hard*.
>
> - and it came at a terrible cost to me and several others here.
> Karmically, I'm pretty happy, tho.
>
> I could be 10% behind someone else to do another cerowrt-ish thing on their own time and budget.
>
> If I could hand off most of the things that cost me the most hair (chasing down bugs, doing builds, qa-ing builds, flashing routers, updating testbeds, running the build cluster, sysadmining the site, coming up with tests, giving talks), in favor of staying more high level, and doing more analytics and research... I'd be up for "CeroWrt v3... the Next Generation... Reference Router Distro". I am sad we have spent so much time making software rate limiting work better, when the core algorithms (pie, codel) were designed for variable rate environments, and millions (poorly implemented) wifi stack based devices ship every day - and are still not quite enough for what is really needed.
>
> So...
>
> Right now the meandering course is to just contribute incremental improvements back to the linux and openwrt mainlines, continue standardization efforts and develop better tests, towards a better network end goal...
>
> ...with the far more broadly scoped make-wifi-fast project concept now making it's rounds through various possible funding agencies.
>
> Limited, but broad goals on what budget I have let me sleep better:
>
> Example: I hope that we get cake building soon for all of openwrt and then having y'all test it on this platform, and all the others. Being able to profile it would be nice too. I am pretty sure we can take better advantage of multicore. Jonathon is funded for cake for a while longer. My work to get it into openwrt, and the testing I've done, is not.
>
> It would be nice to find someone to sink a week into adding BQL to the mvneta driver, and profiling it.
>
> I would like to see BQL behave better on hardware multiqueue.
>
> I'd like someone to poke deeply into the wifi chipset this box uses to see if we can apply ideas from make-wifi-fast to it.
>
> It would be great to have source access to the blobs.
>
> I am always glad we have such a diverse range of people here, scratching the beat-the-bufferbloat itch, and making a difference wherever they can, with what abilities and spare time they have.
>
> I would like to see what happened for OCP happen for embedded edge devices, also.
>
> http://www.businessinsider.com/facebook-open-compute-project-history-2
> 015-6
>
> From the bottom, looking up, that seems hard.
>
>> On Sun, 14 Jun 2015, Dave Taht wrote:
>>
>>> a wider audience for the issues in new consumer hardware seems desirable.
>>>
>>> forwarding with permission.
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Dave Taht <dave.taht@gmail.com>
>>> Date: Sun, Jun 14, 2015 at 8:41 AM
>>> Subject: Re: performance testing on the WRT1200AC
>>> To: Mikael Abrahamsson <swmike@swm.pp.se>, Aaron Wood
>>> <woody77@gmail.com>
>>>
>>>
>>> Dear Mikael:
>>>
>>> netperf-wrapper has been renamed to flent. :) Quite a bit of new
>>> stuff is dropping into it, one of my favorite tests is the new
>>> qdisc_stats test (which I run at the same time as another test). It
>>> hasn't been tested on a multi-queue interface (and doesn't work with
>>> openwrt's sh implementation dang it). But do a pull anyway. :)
>>>
>>> On Sun, Jun 14, 2015 at 8:18 AM, Mikael Abrahamsson
>>> <swmike@swm.pp.se>
>>> wrote:
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I want to do some more demanding testing of the WRT1200AC.
>>>> Currently it's running a few days old openwrt CC. It comes with the below qdisc setting.
>>>> I
>>>> will be testing it using the following setup:
>>>>
>>>> linux-switch-wrt1200ac-linux
>>>>
>>>> All links above are gigabit ethernet links.
>>>>
>>>> My plan is to for instance run netperf-wrapper with a few different
>>>> tests.
>>>>
>>>> Would it strain the WRT1200AC if I configured it to shape to 900
>>>> megabit/s bidirectionallty? I guess in order to actually achieve a
>>>> little bit of
>>>
>>>
>>> My original tests with the 1900AC showed htb peaking out with sqm +
>>> offloads at about 550/650mbit on the rrul test. (I can't remember if
>>> nat was on or off, but I think off)
>>>
>>> but that was months ago. I have a huge hope that cake will do better
>>> on this platform and recently (yesterday) I think got that to the
>>> point where we could push it to openwrt to be built regularly.
>>>
>>> Aaron, cc'd, has done quite a bit of work with the 1900, and I think
>>> he started running into trouble at 200mbit.
>>>
>>>> buffering, I'm going to have to run below wirespeed? Because I
>>>> can't get more than 1 gigabit/s of traffic to the wrt1200ac because
>>>> of above layout, so doing bidirectional shaping to 900 on eth0 (WAN
>>>> PORT) would at least give it a bit more to do and also give a
>>>> chance to induce some buffering?
>>>
>>>
>>> Ain't it a bitch? A thought would be to also exercise the wifi a bit
>>> to drive it past gigE overall. So have two clients running flent
>>> tests simultaneously, one on wifi, one on ethernet, and there you
>>> go, driving it into overload.
>>>
>>>> Do you have some other ideas for testing? I am mostly interested in
>>>> making sure the CPU is fast enough to do AQM at gig speeds...
>>>
>>>
>>> Well, there are other issues.
>>>
>>> A) The mvneta ethernet driver in the 1900 did not support BQL when
>>> last I looked, supplying insufficient backpressure to the upper
>>> layers.
>>>
>>> B) The multiqueued hardware applies a bit of fq for you
>>> automagically, BUT, even if BQL was in place, BQL's buffering is
>>> additive per hardware queue, so it tends to
>>>
>>> what I saw was nearly no drops in the qdisc. I don't think I even
>>> saw maxpacket grow (a sure sign you are backlogging in the qdisc) I
>>> ended up disabling the hardware mq multiqueue[1] stuff entirely by
>>> "tc qdisc add dev eth0 root fq_codel", and even then, see A) - but I
>>> did finally see maxpacket grow...
>>>
>>> C) to realize to my horror that they had very aggressively
>>> implemented GRO for everything, giving us 64k "packets" to deal with
>>> coming in from the gigE ethernet... which interacted rather badly
>>> with the 10Mbit outgoing interface I had at the time.
>>>
>>> and that explained why nearly all the QoS systems as deployed in
>>> this generation of router were doing so badly...
>>>
>>> which led to a change in codel's control law (upstream in linux, not
>>> sure in openwrt), and ultimately frantic activity in cake to do
>>> peeling apart of superpackets like that.
>>>
>>> I applaud further testing, and I would love it if you could verify
>>> that the GRO problem remains and that it's hard to get sufficient
>>> backpressure (and latencies should grow a lot) when driven with
>>> wifi+ ethernet
>>>
>>> On simple single threaded up or down tests I was able to get full
>>> gigE throughput out of the 1900's wan interface, but disabling
>>> offloads was quite damaging, as was mixed traffic like rrul_50 up,
>>> which makes GRO far less effective.
>>>
>>> I wish I had time to go and add BQL. I requested it of the author,
>>> no response.
>>>
>>>>
>>>> root@OpenWrt:/tmp# tc qdisc
>>>
>>>
>>> tc -s qdisc show # -s is more revealing
>>>
>>>> qdisc mq 0: dev eth0 root
>>>> qdisc fq_codel 0: dev eth0 parent :1 limit 1024p flows 1024 quantum
>>>> 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev eth0
>>>> parent :2 limit 1024p flows 1024 quantum 300 target 5.0ms interval
>>>> 100.0ms ecn qdisc fq_codel 0: dev eth0 parent :3 limit 1024p flows
>>>> 1024 quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel
>>>> 0: dev eth0 parent :4 limit 1024p flows 1024 quantum 300 target
>>>> 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev eth0 parent :5
>>>> limit 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms
>>>> ecn qdisc fq_codel 0: dev eth0 parent :6 limit 1024p flows 1024
>>>> quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev
>>>> eth0 parent :7 limit 1024p flows 1024 quantum 300 target 5.0ms
>>>> interval 100.0ms ecn qdisc fq_codel 0: dev eth0 parent :8 limit
>>>> 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
>>>> qdisc mq 0: dev eth1 root qdisc fq_codel 0: dev eth1 parent :1
>>>> limit 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms
>>>> ecn qdisc fq_codel 0: dev eth1 parent :2 limit 1024p flows 1024
>>>> quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev
>>>> eth1 parent :3 limit 1024p flows 1024 quantum 300 target 5.0ms
>>>> interval 100.0ms ecn qdisc fq_codel 0: dev eth1 parent :4 limit
>>>> 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
>>>> qdisc fq_codel 0: dev eth1 parent :5 limit 1024p flows 1024 quantum
>>>> 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev eth1
>>>> parent :6 limit 1024p flows 1024 quantum 300 target 5.0ms interval
>>>> 100.0ms ecn qdisc fq_codel 0: dev eth1 parent :7 limit 1024p flows
>>>> 1024 quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel
>>>> 0: dev eth1 parent :8 limit 1024p flows 1024 quantum 300 target
>>>> 5.0ms interval 100.0ms ecn qdisc mq 0: dev wlan0 root qdisc
>>>> fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum 300
>>>> target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev wlan0
>>>> parent :2 limit 1024p flows 1024 quantum 300 target 5.0ms interval
>>>> 100.0ms ecn qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows
>>>> 1024 quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel
>>>> 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum 300 target
>>>> 5.0ms interval 100.0ms ecn
>>>>
>>>>
>>>> --
>>>> Mikael Abrahamsson email: swmike@swm.pp.se
>>>
>>>
>>>
>>>
>>> --
>>> Dave Täht
>>> What will it take to vastly improve wifi for everyone?
>>> https://plus.google.com/u/0/explore/makewififast
>>>
>>>
>>> --
>>> Dave Täht
>>> What will it take to vastly improve wifi for everyone?
>>> https://plus.google.com/u/0/explore/makewififast
>>> _______________________________________________
>>> Bloat mailing list
>>> Bloat@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>>
>>
>> --
>> Mikael Abrahamsson email: swmike@swm.pp.se
>
>
>
> --
> Dave Täht
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Fwd: performance testing on the WRT1200AC
2015-06-23 14:03 ` Bill Ver Steeg (versteb)
@ 2015-06-23 14:35 ` Dave Taht
0 siblings, 0 replies; 7+ messages in thread
From: Dave Taht @ 2015-06-23 14:35 UTC (permalink / raw)
To: Bill Ver Steeg (versteb); +Cc: bloat
On Tue, Jun 23, 2015 at 7:03 AM, Bill Ver Steeg (versteb)
<versteb@cisco.com> wrote:
> Dave-
>
> I was actually thinking of Broadcom/Intel/Entropic and the like.
I would certainly love to know if anyone at those companies was paying
attention.
> I suspect that these folks handle the packet processing logic (in the fast path) on a large number of commercially deployed cable/DSL modems - particularly the commodity devices found in most SP deployments today.
Well, in my mind the biggest problems that need to get solved in
silicon are on the other side of the edge - the cmtses, dslams, brams,
and enodebs.
We are almost (but not quite) solving the outbound ethernet problems
on cpe in pure software. Getting fq-codel + BQL on the ac1900 would
take care of a huge swath of the present and projected edge, for
example, at speeds up to a gigabit, and speeds 1/5 that are handled
well by stuff now well below the 50 dollar range.
http://community.ubnt.com/t5/EdgeMAX/FQ-Codel-bandwidth-limit-on-EdgeRouter-PRO/td-p/1264300
Sure having better hardware, ethernet cards (tx rings must die!
offloads must be mitigated!), and switches on the low end would be
great, and I hope work is happening, also - but I mostly figured we´d
see early innovation in chips like the octeon and tilera, and high end
stuff like what we are seeing mellonox do.... and FPGAs (as meshsr is
doing), and only gradually see stuff enter in the cheap soc market.
There are still things going on (like ECN) that are unsettled and I
would like to see more uptake (like in apple´s full product wide
rollout) before burning handling that kind of stuff into silicon.
and then there´s.... sigh... wifi... where all the bloat moves to once
you crack 35mbit.
>
> Bvs
>
>
> -----Original Message-----
> From: Dave Taht [mailto:dave.taht@gmail.com]
> Sent: Tuesday, June 23, 2015 9:54 AM
> To: Bill Ver Steeg (versteb)
> Cc: Mikael Abrahamsson; bloat
> Subject: Re: [Bloat] Fwd: performance testing on the WRT1200AC
>
> On Tue, Jun 23, 2015 at 5:54 AM, Bill Ver Steeg (versteb) <versteb@cisco.com> wrote:
>> Regarding getting AQM into fast-path on edge router silicon - If there are any chipset vendors monitoring this list, drop me a note and I can help get you up to speed on the various algorithms. For several of our lower end products, we are dependent on the drivers / fast path provided by the various merchant silicon vendors...... I have reached out through our normal channels, but have not been able to close the loop yet.
>
> There is some work going into SDN and FPGA capable implementations of this stuff at meshsr as part of the onenetswitch30 project, which will in the end apply to netfpga also, and ultimately be available as VHDL or verilog for silicon for SoCs.
>
> You can contact me or meshsr´s folk (huc at ieee.org) for more details. The board booted linux 3.13 recently and ubuntu 14.04. With sufficient funding we'll be done with the ethernet and switch re-design by december.
>
> I note I would rather like cisco and other ethernet card driver makers to gain BQL support in Linux, as these fq and aqm algorithms are lightweight enough to implement in software, rather than hardware, so long as there is sufficient backpressure from the hardware.
>
> Current BQL support remains limited, and it´s only 7 lines of new code, typically, and some hairy testing for edge cases.
>
> http://www.bufferbloat.net/projects/bloat/wiki/BQL_enabled_drivers
>
>>
>>
>> Bill VerSteeg
>> (wearing my Cisco product engineering hat, as opposed to the other
>> hats I occasionally wear)
>>
>>
>> -----Original Message-----
>> From: bloat-bounces@lists.bufferbloat.net
>> [mailto:bloat-bounces@lists.bufferbloat.net] On Behalf Of Dave Taht
>> Sent: Sunday, June 14, 2015 9:16 PM
>> To: Mikael Abrahamsson
>> Cc: bloat
>> Subject: Re: [Bloat] Fwd: performance testing on the WRT1200AC
>>
>> On Sun, Jun 14, 2015 at 12:43 PM, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
>>>
>>> Hi,
>>>
>>> Some background:
>>>
>>> The WRT1900ACv1 (which has been shipping for 6 months or so) is based
>>> on Marvell Armada XP, which uses a packet processor. There is no
>>> support for this in the generic Linux kernel, which means performance
>>> is a lot lower with the generic kernel compared to the "special"
>>> kernel which has patches and where you use the Marvell SDK to compile
>>> it to support the packet processor. With the generic kernel, you get
>>> CPU only based forwarding which is around 300-500 megabit/s of TCP.
>>>
>>> Now, with WRT1200AC and WRT1900ACv2 which was released in the last
>>> few weeks or so and just now becoming more widely available, they've
>>> changed to Marvell Armada 385 which is the beefiest packet forwarding
>>> generic CPU I have ever heard of or encountered in a "home gateway"
>>> kind of package. I have an WRT1200AC for testing I received this
>>> week, and so far I have been able to verify that it does 940
>>> megabit/s of TCP (iperf) with the generic kernel shipped with OpenWRT
>>> CC with the below default qdisc. It seems to do this using approximately 25% CPU.
>>>
>>> So what I would like to do now is try to push it a little bit harder,
>>> so if someone could give me an example of a more punishing qdisc
>>> setup and test to run through it, that would be very interesting.
>>>
>>> But so far, the Armada 385 chipset (and I hope we'll see more devices
>>> based on it) seems to be a perfect platform for bufferbload testing
>>> and development. Yes, it's a lot pricier than the WNDR3800 that for
>>> instance CeroWRT uses, but on the other hand, it seems to have 10x
>>> the performance of that box, and everything seems to work right out
>>> of the box without any special patches.
>>
>> The pricepoint is just fine (150 dollars). The chipset looks excellent.
>>
>> However, this time around, I would like to have explicit support (both financial and physical) from somewhere(s), including the SoC maker and vendor, to go push the state of the art forward again.
>>
>> Do you have any contacts at marvell?
>>
>> With CeroWrt, we shook up the industry. We should all be proud of that!
>>
>> But... shipping an OS for only one box - that not only pushes the state of the art forward, but is reliable, secure, and stable enough for day to day use is *hard*.
>>
>> - and it came at a terrible cost to me and several others here.
>> Karmically, I'm pretty happy, tho.
>>
>> I could be 10% behind someone else to do another cerowrt-ish thing on their own time and budget.
>>
>> If I could hand off most of the things that cost me the most hair (chasing down bugs, doing builds, qa-ing builds, flashing routers, updating testbeds, running the build cluster, sysadmining the site, coming up with tests, giving talks), in favor of staying more high level, and doing more analytics and research... I'd be up for "CeroWrt v3... the Next Generation... Reference Router Distro". I am sad we have spent so much time making software rate limiting work better, when the core algorithms (pie, codel) were designed for variable rate environments, and millions (poorly implemented) wifi stack based devices ship every day - and are still not quite enough for what is really needed.
>>
>> So...
>>
>> Right now the meandering course is to just contribute incremental improvements back to the linux and openwrt mainlines, continue standardization efforts and develop better tests, towards a better network end goal...
>>
>> ...with the far more broadly scoped make-wifi-fast project concept now making it's rounds through various possible funding agencies.
>>
>> Limited, but broad goals on what budget I have let me sleep better:
>>
>> Example: I hope that we get cake building soon for all of openwrt and then having y'all test it on this platform, and all the others. Being able to profile it would be nice too. I am pretty sure we can take better advantage of multicore. Jonathon is funded for cake for a while longer. My work to get it into openwrt, and the testing I've done, is not.
>>
>> It would be nice to find someone to sink a week into adding BQL to the mvneta driver, and profiling it.
>>
>> I would like to see BQL behave better on hardware multiqueue.
>>
>> I'd like someone to poke deeply into the wifi chipset this box uses to see if we can apply ideas from make-wifi-fast to it.
>>
>> It would be great to have source access to the blobs.
>>
>> I am always glad we have such a diverse range of people here, scratching the beat-the-bufferbloat itch, and making a difference wherever they can, with what abilities and spare time they have.
>>
>> I would like to see what happened for OCP happen for embedded edge devices, also.
>>
>> http://www.businessinsider.com/facebook-open-compute-project-history-2
>> 015-6
>>
>> From the bottom, looking up, that seems hard.
>>
>>> On Sun, 14 Jun 2015, Dave Taht wrote:
>>>
>>>> a wider audience for the issues in new consumer hardware seems desirable.
>>>>
>>>> forwarding with permission.
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Dave Taht <dave.taht@gmail.com>
>>>> Date: Sun, Jun 14, 2015 at 8:41 AM
>>>> Subject: Re: performance testing on the WRT1200AC
>>>> To: Mikael Abrahamsson <swmike@swm.pp.se>, Aaron Wood
>>>> <woody77@gmail.com>
>>>>
>>>>
>>>> Dear Mikael:
>>>>
>>>> netperf-wrapper has been renamed to flent. :) Quite a bit of new
>>>> stuff is dropping into it, one of my favorite tests is the new
>>>> qdisc_stats test (which I run at the same time as another test). It
>>>> hasn't been tested on a multi-queue interface (and doesn't work with
>>>> openwrt's sh implementation dang it). But do a pull anyway. :)
>>>>
>>>> On Sun, Jun 14, 2015 at 8:18 AM, Mikael Abrahamsson
>>>> <swmike@swm.pp.se>
>>>> wrote:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I want to do some more demanding testing of the WRT1200AC.
>>>>> Currently it's running a few days old openwrt CC. It comes with the below qdisc setting.
>>>>> I
>>>>> will be testing it using the following setup:
>>>>>
>>>>> linux-switch-wrt1200ac-linux
>>>>>
>>>>> All links above are gigabit ethernet links.
>>>>>
>>>>> My plan is to for instance run netperf-wrapper with a few different
>>>>> tests.
>>>>>
>>>>> Would it strain the WRT1200AC if I configured it to shape to 900
>>>>> megabit/s bidirectionallty? I guess in order to actually achieve a
>>>>> little bit of
>>>>
>>>>
>>>> My original tests with the 1900AC showed htb peaking out with sqm +
>>>> offloads at about 550/650mbit on the rrul test. (I can't remember if
>>>> nat was on or off, but I think off)
>>>>
>>>> but that was months ago. I have a huge hope that cake will do better
>>>> on this platform and recently (yesterday) I think got that to the
>>>> point where we could push it to openwrt to be built regularly.
>>>>
>>>> Aaron, cc'd, has done quite a bit of work with the 1900, and I think
>>>> he started running into trouble at 200mbit.
>>>>
>>>>> buffering, I'm going to have to run below wirespeed? Because I
>>>>> can't get more than 1 gigabit/s of traffic to the wrt1200ac because
>>>>> of above layout, so doing bidirectional shaping to 900 on eth0 (WAN
>>>>> PORT) would at least give it a bit more to do and also give a
>>>>> chance to induce some buffering?
>>>>
>>>>
>>>> Ain't it a bitch? A thought would be to also exercise the wifi a bit
>>>> to drive it past gigE overall. So have two clients running flent
>>>> tests simultaneously, one on wifi, one on ethernet, and there you
>>>> go, driving it into overload.
>>>>
>>>>> Do you have some other ideas for testing? I am mostly interested in
>>>>> making sure the CPU is fast enough to do AQM at gig speeds...
>>>>
>>>>
>>>> Well, there are other issues.
>>>>
>>>> A) The mvneta ethernet driver in the 1900 did not support BQL when
>>>> last I looked, supplying insufficient backpressure to the upper
>>>> layers.
>>>>
>>>> B) The multiqueued hardware applies a bit of fq for you
>>>> automagically, BUT, even if BQL was in place, BQL's buffering is
>>>> additive per hardware queue, so it tends to
>>>>
>>>> what I saw was nearly no drops in the qdisc. I don't think I even
>>>> saw maxpacket grow (a sure sign you are backlogging in the qdisc) I
>>>> ended up disabling the hardware mq multiqueue[1] stuff entirely by
>>>> "tc qdisc add dev eth0 root fq_codel", and even then, see A) - but I
>>>> did finally see maxpacket grow...
>>>>
>>>> C) to realize to my horror that they had very aggressively
>>>> implemented GRO for everything, giving us 64k "packets" to deal with
>>>> coming in from the gigE ethernet... which interacted rather badly
>>>> with the 10Mbit outgoing interface I had at the time.
>>>>
>>>> and that explained why nearly all the QoS systems as deployed in
>>>> this generation of router were doing so badly...
>>>>
>>>> which led to a change in codel's control law (upstream in linux, not
>>>> sure in openwrt), and ultimately frantic activity in cake to do
>>>> peeling apart of superpackets like that.
>>>>
>>>> I applaud further testing, and I would love it if you could verify
>>>> that the GRO problem remains and that it's hard to get sufficient
>>>> backpressure (and latencies should grow a lot) when driven with
>>>> wifi+ ethernet
>>>>
>>>> On simple single threaded up or down tests I was able to get full
>>>> gigE throughput out of the 1900's wan interface, but disabling
>>>> offloads was quite damaging, as was mixed traffic like rrul_50 up,
>>>> which makes GRO far less effective.
>>>>
>>>> I wish I had time to go and add BQL. I requested it of the author,
>>>> no response.
>>>>
>>>>>
>>>>> root@OpenWrt:/tmp# tc qdisc
>>>>
>>>>
>>>> tc -s qdisc show # -s is more revealing
>>>>
>>>>> qdisc mq 0: dev eth0 root
>>>>> qdisc fq_codel 0: dev eth0 parent :1 limit 1024p flows 1024 quantum
>>>>> 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev eth0
>>>>> parent :2 limit 1024p flows 1024 quantum 300 target 5.0ms interval
>>>>> 100.0ms ecn qdisc fq_codel 0: dev eth0 parent :3 limit 1024p flows
>>>>> 1024 quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel
>>>>> 0: dev eth0 parent :4 limit 1024p flows 1024 quantum 300 target
>>>>> 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev eth0 parent :5
>>>>> limit 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms
>>>>> ecn qdisc fq_codel 0: dev eth0 parent :6 limit 1024p flows 1024
>>>>> quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev
>>>>> eth0 parent :7 limit 1024p flows 1024 quantum 300 target 5.0ms
>>>>> interval 100.0ms ecn qdisc fq_codel 0: dev eth0 parent :8 limit
>>>>> 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
>>>>> qdisc mq 0: dev eth1 root qdisc fq_codel 0: dev eth1 parent :1
>>>>> limit 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms
>>>>> ecn qdisc fq_codel 0: dev eth1 parent :2 limit 1024p flows 1024
>>>>> quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev
>>>>> eth1 parent :3 limit 1024p flows 1024 quantum 300 target 5.0ms
>>>>> interval 100.0ms ecn qdisc fq_codel 0: dev eth1 parent :4 limit
>>>>> 1024p flows 1024 quantum 300 target 5.0ms interval 100.0ms ecn
>>>>> qdisc fq_codel 0: dev eth1 parent :5 limit 1024p flows 1024 quantum
>>>>> 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev eth1
>>>>> parent :6 limit 1024p flows 1024 quantum 300 target 5.0ms interval
>>>>> 100.0ms ecn qdisc fq_codel 0: dev eth1 parent :7 limit 1024p flows
>>>>> 1024 quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel
>>>>> 0: dev eth1 parent :8 limit 1024p flows 1024 quantum 300 target
>>>>> 5.0ms interval 100.0ms ecn qdisc mq 0: dev wlan0 root qdisc
>>>>> fq_codel 0: dev wlan0 parent :1 limit 1024p flows 1024 quantum 300
>>>>> target 5.0ms interval 100.0ms ecn qdisc fq_codel 0: dev wlan0
>>>>> parent :2 limit 1024p flows 1024 quantum 300 target 5.0ms interval
>>>>> 100.0ms ecn qdisc fq_codel 0: dev wlan0 parent :3 limit 1024p flows
>>>>> 1024 quantum 300 target 5.0ms interval 100.0ms ecn qdisc fq_codel
>>>>> 0: dev wlan0 parent :4 limit 1024p flows 1024 quantum 300 target
>>>>> 5.0ms interval 100.0ms ecn
>>>>>
>>>>>
>>>>> --
>>>>> Mikael Abrahamsson email: swmike@swm.pp.se
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dave Täht
>>>> What will it take to vastly improve wifi for everyone?
>>>> https://plus.google.com/u/0/explore/makewififast
>>>>
>>>>
>>>> --
>>>> Dave Täht
>>>> What will it take to vastly improve wifi for everyone?
>>>> https://plus.google.com/u/0/explore/makewififast
>>>> _______________________________________________
>>>> Bloat mailing list
>>>> Bloat@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/bloat
>>>
>>>
>>> --
>>> Mikael Abrahamsson email: swmike@swm.pp.se
>>
>>
>>
>> --
>> Dave Täht
>> What will it take to vastly improve wifi for everyone?
>> https://plus.google.com/u/0/explore/makewififast
>> _______________________________________________
>> Bloat mailing list
>> Bloat@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
>
>
> --
> Dave Täht
> worldwide bufferbloat report:
> http://www.dslreports.com/speedtest/results/bufferbloat
> And:
> What will it take to vastly improve wifi for everyone?
> https://plus.google.com/u/0/explore/makewififast
--
Dave Täht
worldwide bufferbloat report:
http://www.dslreports.com/speedtest/results/bufferbloat
And:
What will it take to vastly improve wifi for everyone?
https://plus.google.com/u/0/explore/makewififast
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-06-23 14:35 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <alpine.DEB.2.02.1506141711300.9487@uplift.swm.pp.se>
[not found] ` <CAA93jw5o74-sRKZagJWQBYBbfDcO9h0X+ZHehQCZ17hPVJoodA@mail.gmail.com>
2015-06-14 17:39 ` [Bloat] Fwd: performance testing on the WRT1200AC Dave Taht
2015-06-14 19:43 ` Mikael Abrahamsson
2015-06-14 20:16 ` Dave Taht
2015-06-23 12:54 ` Bill Ver Steeg (versteb)
2015-06-23 13:54 ` Dave Taht
2015-06-23 14:03 ` Bill Ver Steeg (versteb)
2015-06-23 14:35 ` Dave Taht
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox