[Cerowrt-devel] the agile thread, post-sugarland thoughts, etc

Fri Sep 21 11:26:42 EDT 2012

Getting to where we have a good suite of tests to run regularly,
through dubious devices, against servers that are reasonably local, is
creeping up in priority.

I've been dumping the local tests in my deBloat repo on github, but
they are sadly adhoc and could use a unified framework.

Another problem we're inducing in the short run is that codel's
approach towards accepting all packets up to its packet limit and only
later start dropping them is that common tests such as netanalyzer and
speedtest (which use udp flooding also) register a high buffer count
(which is true) and consider it a problem (which is not). We need to
move the world towards a set of tests that exercise TCP, not udp -
and/or test with multiple streams, much like the bloat.sh script does
in the netperf distribution.

As for de-crashifying cero...

One of my main tests for abusing wifi has been to fully exercise the
hw queues with netperf.

SERVER=wherever - a box on the other side that can run 12 netserver sessions
DUR=60

for tests in "TCP_MAERTS TCP_STREAM UDP_STREAM"
do
netperf -l $DUR -4 -Y CS1,CS1 -H $SERVER -t $tests & # wifi background queue
netperf -l $DUR -4 -Y EF,EF -H $SERVER -t $tests & # wifi VO queue
netperf -l $DUR -4 -Y CS5,CS5 -H $SERVER -t $tests & # wifi VI queue
netperf -l $DUR -4 -H $SERVER -t $tests & # BE queue
done

Back in aug/june this test blew up the ath9k driver... (and you can't
run 12 netservers on the wndr3700v2, not enough ram, period)

I've been able to crash many a non-cerowrt access point with 'em, and
(for example) the linux iwl driver on my laptop doesn't like them
either.

Do have fun with that test on various bits of hardware outside of the
cerowrt and openwrt universe. (openwrt deals with it with a default
txqueuelen of 30, which is sub-optimal) I recently crashed a couple
APs in the real world with that one,notably the amtrack AP and a
couple in hotels.... You can also get situations where VO and VI
starve BE and/or BK (on some drivers). I'd like to get to where we're
capturing mac addresses.

This test makes it painfully obvious that Linux's default 4 queues of
1000 packets per SSID is way too much on most gear available today.
Getting to where there was 1 queue of 1000 packets delivering stuff to
the 4 queues would be better, but best would be something that was
per-station aware, rate aware, and below the SSID layer closer to the
hardware queues.

Doing that right is difficult re-layering and refactoring problem.

On Thu, Sep 20, 2012 at 10:04 AM, Sebastian Moeller <moeller0 at gmx.de> wrote:
> Hi Dave,
>
> sugarland really took stability under load to a new level. My typical UDP flooding experiments failed to take the router down even though I opened the flood gates for a full hour (against qos; will repeat against simple-qos once time permits); not even a single report in dmesg on the router. Nice work. Thanks for all the hard work. (If time allows I will try to run a few more stability tests and will report noteworthy results back, if any should show up)
>
> best
>         sebastian
>
> On Sep 19, 2012, at 09:49 , Dave Taht wrote:
>
>> I am enjoying the thread on agile over here: http://esr.ibiblio.org/?p=4564
>>
>> Trying to formalize some stuff that I do instinctively into language
>> more folk grok would be good.
>>
>> One of the better links to come from it was this one:
>>
>> http://blogs.valvesoftware.com/abrash/valve-how-i-got-here-what-its-like-and-what-im-doing-2/
>>
>> This is something like what we've done with the bufferbloat effort -
>> find something worthwhile, start a project to do it. However steam has
>> a revenue model that we thus far lack. It does help to be making
>> something lots of people want, and I suppose the hard problem is
>> making people aware we have something they want.
>>
>> Speaking of that, the 3.6-rc6 kernel I was working on which has most
>> of the cerowrt stuff in it, but for x86 and ubuntu is here:
>> http://snapon.lab.bufferbloat.net/~cero1/deb/
>>
>> and (in trying to lick the memory problems) I've been doing some
>> builds for the 32MB ram nanostation M5 and picostation 2HP, based on
>> the current cerowrt patch sets. With a single SSID I haven't been able
>> to crash the 2HP yet with a variety of traffic. It's easy to calculate
>> however how to crash nearly any access point with extra SSIDs
>>
>> if (Total spare ram - (4 wireless queues, 1000 packets = 2Mbytes
>> roughly for each = 8Mbytes) * SSIDS) < 0)
>>          boom()
>>
>> This would be improvable with a multi hw queue fq_codel as each
>> hardware queue could share an overall fq_codel queue (factor of 4
>> decrease), however, it seems to make more sense to have the queueing
>> in the mac layer below the SSID abstractions.
>>
>> What's currently in cerowrt is eric dumazet's suggestions to reduce
>> packet allocations under load. The above math was worse before - no
>> matter the packet size, it seemed as though 2k and 4k allocations
>> would be exausted.
>>
>> ...
>>
>> After I recover from the sprint required to get "sugarland" out the
>> door, I'd like to work on ways to do scrum and sprint-like things
>> (google hangouts?) to spread the knowledge and work around, and to
>> parallelize the effort more.
>> So much work remains. Truly addressing the wireless problem hasn't
>> even started.
>>
>> I have to admit that after doing something like 30 official releases
>> of cerowrt out the last 18 months, I'd
>> really like to hand over the reins to that to someone else. Worse is
>> after the openwrt unfreeze, new kernels will start to appear, and
>> while working with Linux 3.6 and later would be helpful, I'd rather
>> have stability for a while to work on higher layers of the stack, and
>> get analytical. Doing both "stable" maintainence and trying to move
>> forward on new kernels is a problem...
>>
>> Next up for me is working on qos-scripts, analytical models and tests,
>> and updating my test deployment to
>> this generation of code if all goes well. I just dumped a ton of raw
>> data into the deBloat repo, too. Also have a few patches for the linux
>> and openwrt mainlines to polish...
>>
>> On other fronts, I'm still working the basic funding angles and trying
>> to fix things with amazon. I was encouraged enough by your (thus far
>> failed) attempts at financial help to sink the time I did into
>> sugarland (sugar helped too, I think she needs a job title). If it
>> wasn't for the outpouring of your support, I'd have given up. Thx. I
>> sure hope sugarland is better than -10.
>>
>> There has been an upswing in corporate interest in the last few weeks,
>> I may have some news on that shortly.
>>
>> I had planned originally to get to barcelona for the wireless summit
>> and the linux conference. I may still make the second (issue is in
>> doubt, though). Is anyone besides jg going to this?
>>
>> http://www.wirelesssummit.org/
>>
>> It's near the home of guifi.net which is one of the larger wireless
>> networks I've ever heard of.
>>
>> --
>> Dave Täht
>> http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-26 is out
>> with fq_codel!"
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-devel at lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>

-- 
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"