[Bloat] Announcing CeroWrt RC6 (beta) test.

Thu Aug 18 16:53:31 EDT 2011

Here's first cut at latest after our lunch yesterday - will be
http://digital-rag.com/article.php/InterludeToLinuxcon-BufferbloatBetaTest but is not yet out of draft state. Need a sniff test on some of my assumptions and simplifications if you have time.

----------------------------
Interlude to Linuxcon - Bufferbloat Fix Entering Beta Test Phase
Thursday, August 18 2011 @ 09:01 AM PDT
Contributed by: Richard Pitt

I'm not attending Linuxcon for various reasons - but I'm certainly aware
it is on - and watching the proceedings via live stream as and when I
can.
One of the attendees, Dave Täht, invited me to lunch yesterday and, as
we'd never met in meat space, I accepted. We met about 11:30 just
outside the Hyatt and I drove us to a little Greek place near Thurlow
and Davie. 
Dave was suffering mildly from lack of sleep and other maladies,
including having lost his glasses "some time in the past 2 weeks." He
set up a recorder to document our conversations and I'll get a copy some
time in the not too distant future he assured me, so I'll detail things
then. In the mean time I wanted to give a bit of background to our
meeting and some of the things Dave and others are doing that will
affect all of us in the internet community.

I've written previously about Bufferbloat and its effects on your
perceived response in using the internet for large file transfers and at
the same time doing "interactive" stuff like viewing web sites. The
large file packets stuff the buffer and get in the way of the little
stuff trying to get through at the same time.
Dave is one of the founders of the technical movement to get this
ubiquitous problem fixed all around the net, with their base at
www.bufferbloat.net. His latest work has been on "Cerowrt" (pronounced:
Sero-wurt) which is in his words "a proof of concept best-practices
implementation of de-bloated kernel and drivers in a wireless home
router" and uses the Netgear N600 WDR3700v2 hardware platform. He had
brought a couple of these with him to the conference and gave me one of
them (thanks Dave!) - I'll be installing it in place of the Dlink I
currently use to give the non-secure parts of the house their net
access, and will be doing testing and evaluation on it.
The fact is, I don't much use this router myself as my Linux workstation
is directly connected to my cable modem (via a dumb switch with minimal
buffering) and I've been playing with the settings to deal with
bufferbloat directly on it. I have a couple of wireless-connected
laptops that do use it, but I don't use them much. Having been using
interactive services for something well in excess of 30 years, it was my
own curiosity about the degrading interactive response I was getting
recently that drew me to participate in the bufferbloat community in the
first place.
On the other hand, many of the systems I administer have both wireless
portions and are all about streaming - with the Hancock Wildlife
Foundation's many wildlife cameras being the prime examples.
The problem of bufferbloat is not simple. It encompasses hardware at the
chip level and in aggregate such as core routers, WiFi and modems,
operating systems and server as well as user-level applications, and is
different in impact and potential solution in different areas of the
internet's hierarchy.
Without being exhaustive or for that matter all that technical, I'll try
to give you an idea of what I'm talking about.
Hardware

The choice of this (WNDR3700) single hardware platform as the "test
case" is predicated on the fact that simply fixing the chip-level
hardware-level problems in any particular platform is going to be a
major headache.
There are a lot of different manufacturers, base designs and versions of
things like Ethernet tranceivers and WiFi radio chips and chipsets, and
each has its own set of basic problems and opportunities for
driver-based fixes. The objective eventually is to move the solution out
to the design of new chips that better allow programatic and situational
control of the on-chip buffers each has in some fashion put in place.
The prime problem is that in most cases, once a packet has been handed
to the hardware interface chip (from the kernel's buffer pool) there is
little that can be done to either remove it (and not the others in the
queue) or change its priority or order; and many of these chips today
have fairly large buffers, both in absolute memory size and in numbers
of packets.
If you have the technical savvy and/or the stomach for wading through
some fairly intense low-level code, you might like to take a look at the
discussion over bug#216 in the current code base - and this involves
only the one (ath9k) chip on one hardware platform. Multiply this by the
number of different chips "out there" and you get an idea of the scale
of the real problem.
In the WiFi bufferbloat arena there are in fact two different and
mutually competing "physical" layers that are in the packet path: 802.3x
(aka wired Ethernet) and 802.11x (aka WiFi) and each has its own modus
operandi. The contentious side of the WiFi layer is the desire to stuff
as many IP packets into a wireless "frame" as possible to minimize the
overhead percentage of the radio channel. There is a fixed (and larger
than Ethernet's fixed packet overhead) time necessary to send a frame
regardless of actual payload size - and the maximum payload is many
times the maximum size of an IP packet. The base idea is to boost total
throughput of the radio because that is what each of the manufacturers
tout as their particular sales key; latency of small packets can become
huge because of this.
Due to interference from other WiFi clients and access points, there are
good chances that any particular frame will have to be re-transmitted
because it did not get properly received (and acknowledged) but once the
frame is built and handed to the radio chip there is almost no way to
stop it from "infinitely retrying" which will hold up the whole set of
IP packets and yet not give any way of feeding back to the originating
system that they are "really still in transit, don't send more... just
wait" - so again, the TCP retransmit mechanism can end up cutting in and
re-sending the same payload again (and again...) which just compounds
the problem (and as you'll see below, ends up backing up in buffers in
other links in the chain.)
In the "simple" world of wired (and fiber) Ethernet, the story is much
the same except that there is far less liklihood of interference or
intermittent failure - the links are either working or they're not; at
least with most of today's hardware. Where the problems come in is when
something downstream (toward you, the end viewer/user) of the particular
equipment slows or stops, and does not provide specific congestion
notification back to the originating system, which continues to send new
or re-transmit old packets. Again, packets fill the buffers and
eventually nothing can get through, including packets returning to the
sending system that might otherwise have signaled congestion and slowed
the link's packet flow.
This whole discussion points up why some sort of active queue management
that can reach into the hardware's (chip level) buffers is necessary.
The problem is, it simply isn't possible today.
A simplistic (and not terribly realistic) example is just that of having
a timer that counted down the lifetime of a TCP/IP packet (that has
entered the WiFi system but not yet left) to about 2 seconds, which
would go a long way towards removing duplicate packets from the stream.
Instead, we see frames that might take tens of seconds tieing up the
link with payloads that have become redundant.
Operating System

The Linux kernel and generic TCP/IP drivers currently provide some small
number of "knobs" that can be adjusted to affect buffering and queue
management in gross fashion. By default, things like txqueuelen (the
transmit buffer on a basic Ethernet interface) is today set to 1000 -
1000 blocks to be sent out the link - which in itself is fairly large
for most typical connections today. Worse, there are places on the
internet where, when asked "how do I change... txqueuelen" the
information perpetuates the "bigger on fast links is better" concepts
that have resulted in the default 1000 (was 100 but as typical Ethernet
interfaces moved from 10 to 100 to 1000base, it was increased without
much real reason it seems), and that we now know is at best simplistic
and at worst just plain wrong; at least with the currently available
queue management schemes, if one is turned on at all. 
There is also a lot of discussion about the various AQM (active queue
managment) algorithms. The concensus is that the current crop dealt with
the problems of a few years ago fairly well (again, if turned on at all)
but that today's completely different mix of activities (streaming
protocols and complex web pages with mouse-click interactions compared
to the largely text-oriented and keystroke-based net of 10+ years ago.)
The problem many network engineers have with the older AQM setups is
that they really only work well "if tuned correctly" and the tuning for
5 years ago's circumstances is radically different from today's - and
today's problems at 5PM are different from those at midnight - so manual
tuning is not going to be practical.
This will result (has resulted) in some new algorithms and some
hard-pushed selling on why they should be turned on by default and are
not "dangerous" as some have tagged currently available schemes in the
past. The problem as noted is simply to get them installed and turned on
in the face of the past reasons for not using AQM at all.
Applications

At the bare application level the objective is to need as little
recognition of the underlying network as possible. That said, the fact
is that "application" in some people's terms (major web applications
dealing with millions of users being the prime category) includes tuning
of network parameters to give better response - and this is where some
things are likely to have to change. One such is in the area of "fast
start" where an application will stuff a fairly large number of packets
onto the network without first waiting for acknowledgement that the
system at the other end can take them that fast. 
In an era of minimal buffering between sender and receiver, these
initial packets might in fact disappear (packet loss) because the
receiver was on a link that simply could not receive them fast enough.
Servers typically only sent the first 3 or 4 packets in this fashion, so
on average, the losses were minimal as most receiving systems could and
did deal with this number in their own receive buffers.
Today, some sending systems pre-send 12-15 packets or more. If the link
to the receiving system is truly slow (or otherwise congested already),
there is a good chance that one or more of the intermediary devices has
sufficient buffering that these packets do not get lost, but instead sit
in the backbone routers, cable modem or consumer router waiting for the
PC to take them and acknowledge them. In some cases, especially if the
receiving system is also getting other packets from some other place
(watching a video stream while surfing the web for instance) the delay
in getting all of these fast-start packets can be on the order of 2 or
more seconds (see section 2.1 of RFC2988 - they discuss 2.5 seconds as
"acceptable"), and at that point one of TCP's fall-backs kicks in - and
the server might re-send some or all of the packets causing further
congestion.
IMHO, at least in instances such as this, the application (and
designers) need to take buffering and congestion from other sources into
consideration instead of just making the assumption that their
application is the only one that matters.
There are both technical and educational problems to be dealt with.
Network Hierarchy

The infrastructure from server to your desktop is far more complex today
than even a few years ago, let alone when TCP/IP was first deployed.
In addition to this increase in complexity, there has been a whole new
generation of network engineers and designers grow up in times of
"plenty" (of network bandwidth and memory for buffering) and who have
been fairly used to effectively zero packet loss; and now they consider
this (zero packet loss) to be a goal instead of just an artifact of less
congested times and otherwise properly configured systems. They also
didn't grow up in an era of keystroke-oriented interactivity where
latencies were obvious if they got too long. 
Along the way, tools that controlled buffering that had been adequate in
times of fairly low network bandwidth have proven less so in faster
environments. Instead of pushing for better tools, the queue management
in many cases has just been turned off (see previous section on
operating systems for reasons). 
For different reasons, the alternative to packet loss for congestion
control, ECN (Explicit Congestion Notification,) has been turned off in
many links and is therefore all but useless when one particular portion
of the chain from server to client does turn it on. The problem is,
there was a time when it broke some types of network use. That has since
been fixed, but the damage was done.
Again, there are technical and educational problems to deal with along
the network hierarchy. Getting ECN turned on, recognizing that there is
such a thing as too much buffering, and getting more recent types of AQM
demanded of manufacturers and turned on by default.
There probably isn't any way of actually limiting the amount of
buffering various manufacturers will put into their equipment since it
seems to be one of the bullet points they differentiate products on -
but at least getting it properly managed will go a long way towards
resolving the problem at those various steps.
So, as you see, there is a myriad of aspects to this bufferbloat problem
and as of today's announcement of CEROWRT Release Candidate 5 - only one
single hardware platform has been worked upon with any degree of
success. There's still lots to do and huge numbers of people and
companies to contact and convince to climb on board.
If you can help or at least want to learn, visit the Bufferbloat site
and read some of the writings of the members to learn what is going on
and where you might fit in.
----------------------------
comments?

richard

On Thu, 2011-08-18 at 13:16 -0700, Dave Taht wrote:
> I note that rc6 is not ready yet, but rc5 is REALLY promising, and
> more people testing that now would make rc6 all the better.
> 
> -- 
> Dave Täht
> SKYPE: davetaht
> US Tel: 1-239-829-5608
> http://the-edge.blogspot.com 
> _______________________________________________
> Bloat mailing list
> Bloat at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat

-- 
Richard C. Pitt                 Pacific Data Capture
rcpitt at pacdat.net               604-644-9265
http://digital-rag.com          www.pacdat.net
PGP Fingerprint: FCEF 167D 151B 64C4 3333  57F0 4F18 AF98 9F59 DD73