some (less than funny) interactions with networking on openwrt (and other systems)

Dave Taht dave.taht at gmail.com
Thu Jul 7 17:54:15 EDT 2011


I had hoped to get to where various qos scripts were pluggable and
fairly easy to write at some point.

It would be awesome to have a 'contest' for the best shaper or
something like that, some day.

I've also done a great deal of testing of various underlying network
subsystems over the last 6 months, and haven't taken the time, until
now to even try to get it down in one place.

I started writing two shapers - Diffserv, and "Ame" (Ants, Mice, and
Elephants) over the last few months but got bogged down by bugs in the
stack, speed problems, and the rest of my workload. Fragments of both
are in my 'Diffserv' repository on github, which also contains some
useful tests and means (via dynamically looking at data) to analyze
what's going on.

That said, the background knowledge acquired should be useful in
looking at network performance on Linux. The problems are probably
similar, but different, on other OSes.

0) Bufferbloat is the biggest problem everywhere on the path.

1) *everything* running linux uses a shaper by default. Ethernet
devices use pfifo_fast, which is a 3 band strictly prioritized shaper.

Wireless uses mq - which is supposed to sort stuff into 802.11e, if
driver support exists, which it doesn't in all cases, and people
bothered to classify their traffic any, which they don't.

pfifo_fast has bugs in that a *single* high performance stream can
take over the connection especially when certain forms of TCP
offloading are used. It also is not 'fair' in many respects and can
mess up when devices of wildly different speeds are used.

We fixed some bugs with ECN on pfifo_fast on the capetown release. (or
rather, the kernel devs did, I backported them from the mainline)

1) wshaper does the wrong thing with tos bits, not masking ecn in it's
classification step.

2) qos-scripts (already mentioned) as an additional problem of always
ending up with 127 buckets or so for packets in the SFQ qdisc, when a
'righter' number would be closer to 8-32. It also enables ecn by
default. but ecn is often turned off on the router, so there's no way
to see it in action if you are generating traffic from the router.

both sbaure and I have written some small 'can ecn work on this
connection' scripts...

3) SFB has not been played with sufficiently by many

4) SFQ - in an esfq-like mode - holds promise. It also had some bugs
in the 2.6.37 release of the kernel that are fixed in the upcoming
2.6.39 release of the kernel.

5) DRR is the new hotness, but I haven't played with it. It's supposed
to make red-like stuff simpler.

6) VERY IMPORTANT! when running tests directly from a capetown router,
you are using tcp 'westwood', which while being ideal for a wireless
device, does not match the real world *at all* which consists more of
bic, cubic, or reno-like algorithms.  I've made cubic, etc, pluggable
in the next release.

But it is important to note what tcp algo you are using and in what
direction(s) the flows are going.

7) Nobody is doing internal 'shaping' at all on local ethernet or
wireless devices (eg not to the internet,but to internal wireless or
wired), sticking to the default aformentioned pfifo_fast and mq.

SFQ or SFB would be more appropo to be running on ethernet devices,
SFQ should probably not be run on wireless devices as it's notion of
'fair' conflicts with wireless's needs to bunch packets somewhat.

8) out of the box, openwrt usually enables 'syn-flood support', which
caps syn attempts at 25 normal/50 burst/sec. This induces really
unusual behavior in that if you try to fire off, say 1000 connections
simultaneously (I have a test for this in the Diffserv repo, which
does this against the top1000 web sites) - you drop nearly all of
them, and retry.... and retry... and retry.... similarly the google
interactive a,b,c,d,e test does weird things, as you can easily have
multiple connections be initiated in less than a second (and
completed!)

the synflood thing is a very primitive (and effective!) means of
limiting the effect of tcp 'mice', and interacts with all shaping
systems and the end user experience badly...

I have both upped the defaults for flooding, and disabled it entirely.
Now the top 1000 web sites all at once merely renders the router
nearly catatonic for several seconds, which I haven't looked into.
(running out of nat ports? what?).

On Thu, Jul 7, 2011 at 3:17 PM, Dave Taht <dave.taht at gmail.com> wrote:
> 'qos' on with what parameters? It sounds like you are simulating capetown?
>
> excellent! get captures!
>
> I should, for benefit of the list say this:
>
> Some form of AQM (qos) is probably nessessary for todays internet.
>
> The qos system in capetown sucked, we've figured that out. Various
> other forms of qos systems out there suck to varying degrees, but I'd
> argue that as most of them were designed in the 90s or earlier,
> against vastly different traffic, that they suck equally or worse/
>
> My guess is that 'captetown qos' stomped on fin or fin/ack, but it
> could be anything - ecn bits got stopped by the router inbetween
> (quite highly likely), tos field mangled, bug in the router, bug in
> the code...'
>
> We kind of need a name for 'capetown's qos' - rather than saying 'qos'.
>
> The qos script to which sri is referring creates a set of shapers
> based on hsfq, sfq, red to match it's parameters. It is the standard
> 'qos-script' distributed in openwrt.
>
> We've already noted a major knee in the curve at low bandwidths...
>
> I note wondershaper is also in the build as a package (wshaper), btw.
> It's just a script - wshaper.htb - DO give that a shot - it has a bug
> with tos, but I've been using a 4 band version of that (which I'll
> also toss in the build) for years. Bugs in the *default* 'pfifo_fast'
> shaper can become a problem, too.
>
> there are a dozen other qos/shaping/policing systems out there.
>
> Incidentally I was working with the netperf author on 2.5.0 which I
> hope we can run out of xinetd in the next build.
>
> anyway, thx for looking at it AND making captures.
>
> On Thu, Jul 7, 2011 at 1:56 PM, Srikanth Sundaresan <srikanth at gatech.edu> wrote:
>> netperf acts funny with qos setting on. Downloads work OK, but uploads
>> don't seem to, under certain conditions.
>>
>> For example, from my apt, where qos was enabled (I was getting about
>> 80kbps), the netperf session would get stuck in TIME_WAIT on the server
>> side, which would lead the client to give an error. It doesn't always
>> happen though: with our measurement server it always happened, but with
>> my desktop as the server (it's on a different subnet), it never had a
>> problem. The router in Nick's apt, where qos is turned off, works well.
>>
>> Same with the io router. The uploads work perfectly from Thebe, which
>> doesn't have qos on, but from io, they don't work.
>>
>> It's not a problem with just qos, because if it was, it should work
>> against my server. It's probably a problem in some router setting
>> somewhere that's accentuated by qos.
>>
>> A couple of tcpdumps are here: http://galapagos.gtnoise.net/tmp/
>> These are from my home router to galapagos (my server, against which it
>> was successful) and porter-square (the meeasurement server, where it
>> failed).
>>
>>
>> - Srikanth
>> _______________________________________________
>> bismark-devel mailing list
>> bismark-devel at projectbismark.net
>> http://lists.noise.gatech.edu/listinfo/bismark-devel
>>
>
>
>
> --
> Dave Täht
> SKYPE: davetaht
> US Tel: 1-239-829-5608
> http://the-edge.blogspot.com
>



-- 
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
http://the-edge.blogspot.com



More information about the Bloat-devel mailing list