From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from idcmail-mo1so.shaw.ca (idcmail-mo1so.shaw.ca [24.71.223.10]) by huchra.bufferbloat.net (Postfix) with ESMTP id 6358720067D for ; Thu, 18 Aug 2011 12:59:32 -0700 (PDT) Received: from pd4ml3so-ssvc.prod.shaw.ca ([10.0.141.150]) by pd2mo1so-svcs.prod.shaw.ca with ESMTP; 18 Aug 2011 14:53:33 -0600 X-Cloudmark-SP-Filtered: true X-Cloudmark-SP-Result: v=1.1 cv=fp/YrIG6YJuYYWILSlVONgBms0XWtFqTjqHWlNKfnDg= c=1 sm=1 a=kyMsgwd2vP0A:10 a=QmdwtOVf3dEA:10 a=BLceEmwcHowA:10 a=wPDyFdB5xvgA:10 a=IkcTkHD0fZMA:10 a=xqWC_Br6kY4A:10 a=5cEFxojLHbSazGx3ptQdfQ==:17 a=b7SLfKwVAAAA:8 a=kurRqvosAAAA:8 a=vF0ml_Q6AAAA:8 a=3dZX8JWgAAAA:8 a=nze6lSR0do3PQj8VNG4A:9 a=5J8bfWY5f5TYnen5uSwA:7 a=QEXdDO2ut3YA:10 a=Zc4GEFwx-ykA:10 a=Fw8iwiUKpeAA:10 a=Ls3g0dwzjE0A:10 a=iBO_L-grBaMA:10 a=TphoKWqS9HQA:10 a=gA6IeH5FQcgA:10 a=NWVoK91CQyQA:10 a=RzKJvV02MDx2uKwC:21 a=oORcPrlNK7OgOXGZ:21 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117 Received: from unknown (HELO adm.pacdat.net) ([96.48.80.31]) by pd4ml3so-dmz.prod.shaw.ca with ESMTP; 18 Aug 2011 14:53:33 -0600 Received: from localhost ([::1]) by adm.pacdat.net with esmtp (Exim 4.76) (envelope-from ) id 1Qu9al-0002Q6-Du; Thu, 18 Aug 2011 13:53:32 -0700 From: richard To: Dave Taht Date: Thu, 18 Aug 2011 13:53:31 -0700 In-Reply-To: References: <4E4D53B0.2000609@freedesktop.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.0.2 (3.0.2-3.fc15) Content-Transfer-Encoding: quoted-printable Message-ID: <1313700811.24725.17.camel@adm.pacdat.net> Mime-Version: 1.0 X-Spam_score: -1.0 X-Spam_score_int: -9 X-Spam_bar: - Cc: bloat@lists.bufferbloat.net Subject: Re: [Bloat] Announcing CeroWrt RC6 (beta) test. X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2011 19:59:32 -0000 Here's first cut at latest after our lunch yesterday - will be http://digital-rag.com/article.php/InterludeToLinuxcon-BufferbloatBetaTest = but is not yet out of draft state. Need a sniff test on some of my assumpti= ons and simplifications if you have time. ---------------------------- Interlude to Linuxcon - Bufferbloat Fix Entering Beta Test Phase Thursday, August 18 2011 @ 09:01 AM PDT Contributed by: Richard Pitt I'm not attending Linuxcon for various reasons - but I'm certainly aware it is on - and watching the proceedings via live stream as and when I can. One of the attendees, Dave T=C3=A4ht, invited me to lunch yesterday and, as we'd never met in meat space, I accepted. We met about 11:30 just outside the Hyatt and I drove us to a little Greek place near Thurlow and Davie.=20 Dave was suffering mildly from lack of sleep and other maladies, including having lost his glasses "some time in the past 2 weeks." He set up a recorder to document our conversations and I'll get a copy some time in the not too distant future he assured me, so I'll detail things then. In the mean time I wanted to give a bit of background to our meeting and some of the things Dave and others are doing that will affect all of us in the internet community. I've written previously about Bufferbloat and its effects on your perceived response in using the internet for large file transfers and at the same time doing "interactive" stuff like viewing web sites. The large file packets stuff the buffer and get in the way of the little stuff trying to get through at the same time. Dave is one of the founders of the technical movement to get this ubiquitous problem fixed all around the net, with their base at www.bufferbloat.net. His latest work has been on "Cerowrt" (pronounced: Sero-wurt) which is in his words "a proof of concept best-practices implementation of de-bloated kernel and drivers in a wireless home router" and uses the Netgear N600 WDR3700v2 hardware platform. He had brought a couple of these with him to the conference and gave me one of them (thanks Dave!) - I'll be installing it in place of the Dlink I currently use to give the non-secure parts of the house their net access, and will be doing testing and evaluation on it. The fact is, I don't much use this router myself as my Linux workstation is directly connected to my cable modem (via a dumb switch with minimal buffering) and I've been playing with the settings to deal with bufferbloat directly on it. I have a couple of wireless-connected laptops that do use it, but I don't use them much. Having been using interactive services for something well in excess of 30 years, it was my own curiosity about the degrading interactive response I was getting recently that drew me to participate in the bufferbloat community in the first place. On the other hand, many of the systems I administer have both wireless portions and are all about streaming - with the Hancock Wildlife Foundation's many wildlife cameras being the prime examples. The problem of bufferbloat is not simple. It encompasses hardware at the chip level and in aggregate such as core routers, WiFi and modems, operating systems and server as well as user-level applications, and is different in impact and potential solution in different areas of the internet's hierarchy. Without being exhaustive or for that matter all that technical, I'll try to give you an idea of what I'm talking about. Hardware The choice of this (WNDR3700) single hardware platform as the "test case" is predicated on the fact that simply fixing the chip-level hardware-level problems in any particular platform is going to be a major headache. There are a lot of different manufacturers, base designs and versions of things like Ethernet tranceivers and WiFi radio chips and chipsets, and each has its own set of basic problems and opportunities for driver-based fixes. The objective eventually is to move the solution out to the design of new chips that better allow programatic and situational control of the on-chip buffers each has in some fashion put in place. The prime problem is that in most cases, once a packet has been handed to the hardware interface chip (from the kernel's buffer pool) there is little that can be done to either remove it (and not the others in the queue) or change its priority or order; and many of these chips today have fairly large buffers, both in absolute memory size and in numbers of packets. If you have the technical savvy and/or the stomach for wading through some fairly intense low-level code, you might like to take a look at the discussion over bug#216 in the current code base - and this involves only the one (ath9k) chip on one hardware platform. Multiply this by the number of different chips "out there" and you get an idea of the scale of the real problem. In the WiFi bufferbloat arena there are in fact two different and mutually competing "physical" layers that are in the packet path: 802.3x (aka wired Ethernet) and 802.11x (aka WiFi) and each has its own modus operandi. The contentious side of the WiFi layer is the desire to stuff as many IP packets into a wireless "frame" as possible to minimize the overhead percentage of the radio channel. There is a fixed (and larger than Ethernet's fixed packet overhead) time necessary to send a frame regardless of actual payload size - and the maximum payload is many times the maximum size of an IP packet. The base idea is to boost total throughput of the radio because that is what each of the manufacturers tout as their particular sales key; latency of small packets can become huge because of this. Due to interference from other WiFi clients and access points, there are good chances that any particular frame will have to be re-transmitted because it did not get properly received (and acknowledged) but once the frame is built and handed to the radio chip there is almost no way to stop it from "infinitely retrying" which will hold up the whole set of IP packets and yet not give any way of feeding back to the originating system that they are "really still in transit, don't send more... just wait" - so again, the TCP retransmit mechanism can end up cutting in and re-sending the same payload again (and again...) which just compounds the problem (and as you'll see below, ends up backing up in buffers in other links in the chain.) In the "simple" world of wired (and fiber) Ethernet, the story is much the same except that there is far less liklihood of interference or intermittent failure - the links are either working or they're not; at least with most of today's hardware. Where the problems come in is when something downstream (toward you, the end viewer/user) of the particular equipment slows or stops, and does not provide specific congestion notification back to the originating system, which continues to send new or re-transmit old packets. Again, packets fill the buffers and eventually nothing can get through, including packets returning to the sending system that might otherwise have signaled congestion and slowed the link's packet flow. This whole discussion points up why some sort of active queue management that can reach into the hardware's (chip level) buffers is necessary. The problem is, it simply isn't possible today. A simplistic (and not terribly realistic) example is just that of having a timer that counted down the lifetime of a TCP/IP packet (that has entered the WiFi system but not yet left) to about 2 seconds, which would go a long way towards removing duplicate packets from the stream. Instead, we see frames that might take tens of seconds tieing up the link with payloads that have become redundant. Operating System The Linux kernel and generic TCP/IP drivers currently provide some small number of "knobs" that can be adjusted to affect buffering and queue management in gross fashion. By default, things like txqueuelen (the transmit buffer on a basic Ethernet interface) is today set to 1000 - 1000 blocks to be sent out the link - which in itself is fairly large for most typical connections today. Worse, there are places on the internet where, when asked "how do I change... txqueuelen" the information perpetuates the "bigger on fast links is better" concepts that have resulted in the default 1000 (was 100 but as typical Ethernet interfaces moved from 10 to 100 to 1000base, it was increased without much real reason it seems), and that we now know is at best simplistic and at worst just plain wrong; at least with the currently available queue management schemes, if one is turned on at all.=20 There is also a lot of discussion about the various AQM (active queue managment) algorithms. The concensus is that the current crop dealt with the problems of a few years ago fairly well (again, if turned on at all) but that today's completely different mix of activities (streaming protocols and complex web pages with mouse-click interactions compared to the largely text-oriented and keystroke-based net of 10+ years ago.) The problem many network engineers have with the older AQM setups is that they really only work well "if tuned correctly" and the tuning for 5 years ago's circumstances is radically different from today's - and today's problems at 5PM are different from those at midnight - so manual tuning is not going to be practical. This will result (has resulted) in some new algorithms and some hard-pushed selling on why they should be turned on by default and are not "dangerous" as some have tagged currently available schemes in the past. The problem as noted is simply to get them installed and turned on in the face of the past reasons for not using AQM at all. Applications At the bare application level the objective is to need as little recognition of the underlying network as possible. That said, the fact is that "application" in some people's terms (major web applications dealing with millions of users being the prime category) includes tuning of network parameters to give better response - and this is where some things are likely to have to change. One such is in the area of "fast start" where an application will stuff a fairly large number of packets onto the network without first waiting for acknowledgement that the system at the other end can take them that fast.=20 In an era of minimal buffering between sender and receiver, these initial packets might in fact disappear (packet loss) because the receiver was on a link that simply could not receive them fast enough. Servers typically only sent the first 3 or 4 packets in this fashion, so on average, the losses were minimal as most receiving systems could and did deal with this number in their own receive buffers. Today, some sending systems pre-send 12-15 packets or more. If the link to the receiving system is truly slow (or otherwise congested already), there is a good chance that one or more of the intermediary devices has sufficient buffering that these packets do not get lost, but instead sit in the backbone routers, cable modem or consumer router waiting for the PC to take them and acknowledge them. In some cases, especially if the receiving system is also getting other packets from some other place (watching a video stream while surfing the web for instance) the delay in getting all of these fast-start packets can be on the order of 2 or more seconds (see section 2.1 of RFC2988 - they discuss 2.5 seconds as "acceptable"), and at that point one of TCP's fall-backs kicks in - and the server might re-send some or all of the packets causing further congestion. IMHO, at least in instances such as this, the application (and designers) need to take buffering and congestion from other sources into consideration instead of just making the assumption that their application is the only one that matters. There are both technical and educational problems to be dealt with. Network Hierarchy The infrastructure from server to your desktop is far more complex today than even a few years ago, let alone when TCP/IP was first deployed. In addition to this increase in complexity, there has been a whole new generation of network engineers and designers grow up in times of "plenty" (of network bandwidth and memory for buffering) and who have been fairly used to effectively zero packet loss; and now they consider this (zero packet loss) to be a goal instead of just an artifact of less congested times and otherwise properly configured systems. They also didn't grow up in an era of keystroke-oriented interactivity where latencies were obvious if they got too long.=20 Along the way, tools that controlled buffering that had been adequate in times of fairly low network bandwidth have proven less so in faster environments. Instead of pushing for better tools, the queue management in many cases has just been turned off (see previous section on operating systems for reasons).=20 For different reasons, the alternative to packet loss for congestion control, ECN (Explicit Congestion Notification,) has been turned off in many links and is therefore all but useless when one particular portion of the chain from server to client does turn it on. The problem is, there was a time when it broke some types of network use. That has since been fixed, but the damage was done. Again, there are technical and educational problems to deal with along the network hierarchy. Getting ECN turned on, recognizing that there is such a thing as too much buffering, and getting more recent types of AQM demanded of manufacturers and turned on by default. There probably isn't any way of actually limiting the amount of buffering various manufacturers will put into their equipment since it seems to be one of the bullet points they differentiate products on - but at least getting it properly managed will go a long way towards resolving the problem at those various steps. So, as you see, there is a myriad of aspects to this bufferbloat problem and as of today's announcement of CEROWRT Release Candidate 5 - only one single hardware platform has been worked upon with any degree of success. There's still lots to do and huge numbers of people and companies to contact and convince to climb on board. If you can help or at least want to learn, visit the Bufferbloat site and read some of the writings of the members to learn what is going on and where you might fit in. ---------------------------- comments? richard On Thu, 2011-08-18 at 13:16 -0700, Dave Taht wrote: > I note that rc6 is not ready yet, but rc5 is REALLY promising, and > more people testing that now would make rc6 all the better. >=20 > --=20 > Dave T=C3=A4ht > SKYPE: davetaht > US Tel: 1-239-829-5608 > http://the-edge.blogspot.com=20 > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat --=20 Richard C. Pitt Pacific Data Capture rcpitt@pacdat.net 604-644-9265 http://digital-rag.com www.pacdat.net PGP Fingerprint: FCEF 167D 151B 64C4 3333 57F0 4F18 AF98 9F59 DD73