[Bloat] how to fix modem buffer bloat?

Tue Aug 27 14:11:19 EDT 2013

On Tue, Aug 27, 2013 at 6:31 AM, Naeem Khademi <naeem.khademi at gmail.com>wrote:

> I would like to hear a bit of more elaboration on why the use of
> fq_codel on wlanX interface is "premature". from what I have grasped
> so far, I can think of A) frame aggregation and TXOPs in 802.11n, B)
> anything on the downlink path that coexists with uplink traffic on
> 802.11g/n. any thoughts on other major issues?
>
>
I tried to hit most of the major problems at a high level in the latter
half of the MIT talk:

http://www.youtube.com/watch?v=Wksh2DPHCDI (slides in the comments)

and also here outlines the WIP on some atheros hardware. (I've also been
evaluating a few other chipsets with open drivers. The mwl8k looks kind of
promising, actually, except that it has its own rate control algorithm in
the firmware. The iwl, not so much.)

http://www.bufferbloat.net/projects/cerowrt/wiki/Fq_Codel_on_Wireless

To elaborate more:

1) Most wifi devices, firmware, and drivers are painfully overbuffered to
start with. Furthermore they have built-in settings for retries and go to
great lengths to avoid re-ordering flows and only drop packets under
greatest duress. Some hardware and firmware is highly "intelligent" and
move rate control, drop strategies, and aggregation into invisible areas.
Most of the control loops use counters rather than time based techniques,
so they degrade significantly at lower transmit rates. Some devices are
connected via odd busses, like USB or SPI, which introduce their own
latency and buffer problems. Multicast and management frames are
transmitted at the lowest possible rate, and management frames are largely
hidden from view above the device driver. Most things use FIFO queues,
except in dealing with retries. The 802.11e multi-queue concept is glommed
onto the 802.11g codebase and the aggregation concept in n is glommed on
top of all that. 802.11ac introduces new headaches (and potential, like
multi-user mimo!) The problems differ significantly by device, driver, and
firmware. And bugs are legion.

On many devices, it's nearly impossible to layer something like BQL to
measure completion time simply which is important as it's hard to operate
at sub 10ms intervals without some buffering somewhere in most OSes. I
remember tom herbert's (author of bql) reaction to his brief on this. He
said something like: "I'm glad I work on 10GigE stuff. It's way easier."

fq_codel presently layers on top of all that! I On a client (sta) nowadays,
STILL, despite all that, in the general case fq_codel is a win, (see
various attempts on android, try it yourself on a linux client with the
debloat script)...

but what I observe happening is still highly latent (all those buffers and
retries), and there are rather bursty drop rates out of it under load, as
by the time fq_codel (at the layer it is presently at) gets a clue there is
something wrong on the path, there are hundreds of packets in trouble below
it.

On an AP we introduce a new problem in that by re-ordering packets by flow
we increase the probability that we will not aggregate the next packet into
the FIFO controlling the TXOP
(this would be fixible if we had insight into what sta we were delivering
to and/or switched to a mac hash or a sta hash - this is part of the WIP).
N Aggregation rapidly scales down to 1 (802.11g) as you scale up to
multiple active clients on an AP. (Although this sounds bad, it isn't all
that much different that what happens with normal fifos as packet flows are
randomized anyway.  We expect *dramatic* improvements in AP aggregation
performance once the per-sta queues work is complete.)

This is 6 hops (2 over p2p wifi) into a pretty active real-world deployment:

http://huchra.bufferbloat.net/cgi-bin/smokeping.cgi?target=IPv6.yurtlab_tunnel

The core features of this different from public code is that 802.11e is
entirely disabled (diffserv is squashed), and that aggregation is set to
very low values. (128 is the default for the ath9k chipset, I use 12
packets for BE) I find the performance of the testbed to be generally
remarkably good - both gamers and netflix users report good results -

I strongly encourage those running wifi networks to run tools like
smokeping and rrul to observe just how bad their networks are. I have tons
of data collected from hotels and conference centers and it's generally all
pretty bad (with the sole exception of several of ietf's networks)

 So I am encouraged that we are on the right track. Still... tons of work
remains.

BTW: One of the things that fell out of the ipv6 tunnel test above was the
realization that the fq_codel flow hash didn't peer into ipv6 6in4
encapsulated packets and degraded to codel rather than fq_codel behavior in
that case. Corrected IPIP, 6in4, and and 802.1ad behavior in the
flow_dissector is fixed in linux 3.11, (gre was already handled) and
backported into cerowrt 3.10.x presently - but it's not in the testbed.

excluding .11n-specific issues, what else could be problematic for
> fq_codel for a 802.11g scenario with predominantly downlink traffic
> and minstrel RA?
>

The biggest thing I strongly encourage experimenters to do is to test in
bad conditions - at distances from the AP greater than 10 meters, through
walls, with multiple devices and with other APs present. Then, simplify, if
you can. it's nearly impossible to do isolated testing of 2.4ghz nowadays,
in particular...

I searched the world for a spot where there was no interfering signals and
landed where I am now to do it. I wish I was making more progress, or where
I was was a tropic island with table service...

Anyway:

in a purely g scenario we are governed by txops (in turn governed by
interference, the number of stations, and 802.11e scheduling) Still codel
depends on the idea that you can actually transmit stuff in under 5ms. As
you add wifi stations or problems you can get well above that figure. One
idea under consideration is to make the codel target a function of the
number of active devices, others are to schedule more by backlogged txops,
a third is to throw out edca and/or 802.11e as we know it (see mcgregor's
presentation at ietf) to fold more stuff into a txop (oops, that's a n, not
a g problem - 802.11e has got to gain better admission control...)

Much research remains! I hope more people have fun with it. Making wireless
continue to work even halfway decently in our increasingly crowded spectrum
is the Next Big Problem. I think the core ideas of fq_codel are good but I
certainly don't think we're even close to finding optimal solutions...
certainly reducing power is another good idea, as are various distributed
clocking schemes, etc.....

Of late, in my case I've been trying to gain extra insight into the dense
mesh problem (15+ devices) using a combination of one way delay
measurements and a tightly controlled testbed called the yurtlab, and I've
also been trying to leverage the battlemesh network.

Work proceeds slowly as my funding and time for it are both infinitesmal.
See

https://plus.google.com/u/0/s/%23yurtlab

for updates.

Does all that help?

> Regards,
> Naeem
>
> On Mon, Aug 26, 2013 at 9:28 PM, Dave Taht <dave.taht at gmail.com> wrote:
> > The advantage of cerowrt is that it runs about 3-4 months ahead of
> openwrt
> > on improvements to the bloat problem, and fixing bugs.
> >
> > The disadvantage is that it runs about 3-4 months ahead of openwrt on
> having
> > new bugs.
> >
> > Example: We just finished (with the aid of multiple parties ) finally
> fixing
> > a problem in HTB's atm DSL compensation that has existed for a year (and
> > probably several years before that), and I think the final set of fixes
> will
> > land in Linux 3.10.10 or .11 soon.
> >
> > Right now it's very possible to merely layer two components of cero on
> top
> > of openwrt to get most of the benefit of the current work. (the
> aqm-scripts
> > and gui, and if you are daring, a couple patches to codel and fq_codel)
> >
> > Sadly, I wouldn't recomend the current dev builds of cero for day-to-day
> use
> > at this point, although I hope to get to a new stable release by the end
> of
> > september. There's a ton of outstanding bugs left to fix.
> >
> > While openwrt runs fq_codel by default on all interfaces, it's mildly
> > premature to be doing so on the wifi front. Work is in progress. However
> in
> > the general case, at the moment the principal use for fq_codel in a home
> > router is on the gateway to the internet - the fq_codel QoS system in
> > openwrt and dd-wrt works extremely well (with the exception of ipv6
> native).
> > I believe the package in cerowrt is better in most respects (notably on
> > ipv6), but limited in others. Gargoyle is using a prior effort (improved
> sfq
> > + an automatic rate measurement system called ACC). There are other
> options
> > like using small atom boxes, ipfire, and several commercial products....
> >
> > The stable (feburary) release of cero is pretty usable, but lacks the
> > modernized aqm scripts, the htb fix, a bunch of ipv6 fixes, etc, etc.
> >
> > I wish I could give firm advice, but we're kind of in the middle of a
> ton of
> > stuff right now, all I can do is encourage you to leap in, fix things for
> > yourself, and help out where you can.
> >
> >
> >
> > On Mon, Aug 26, 2013 at 11:53 AM, Collin Anderson <cmawebsite at gmail.com>
> > wrote:
> >>
> >> Hi All,
> >>
> >> > Any recommendations for solving the bufferbloat on my Comcast SMC
> cable
> >> > modem?
> >>
> >> Looking at it more, a workaround is probably all I can hope for at
> >> this point. I first started keeping a ping session open back in 2008
> >> to debug the internet, and I see bufferbloat almost every day at home
> >> and at work. Anything to avoid the symptoms sounds great.
> >>
> >> I want something reliable and have minimal configuration. I'm thinking
> >> about buying a WNDR3800 and installing CeroWRT, or is there better
> >> recommended hardware?
> >>
> >> Also, isn't fq_codel "on by default" [1] in OpenWRT? If so, what's the
> >> advantage of CeroWRT?
> >>
> >> Thanks,
> >> Collin
> >>
> >> [1] http://www.ietf.org/proceedings/87/slides/slides-87-aqm-6.pdf
> >> _______________________________________________
> >> Bloat mailing list
> >> Bloat at lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/bloat
> >
> >
> >
> >
> > --
> > Dave Täht
> >
> > Fixing bufferbloat with cerowrt:
> > http://www.teklibre.com/cerowrt/subscribe.html
> >
> > _______________________________________________
> > Bloat mailing list
> > Bloat at lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
> >
>

-- 
Dave Täht

Fixing bufferbloat with cerowrt:
http://www.teklibre.com/cerowrt/subscribe.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20130827/61be4eb3/attachment-0002.html>