[Bloat] philosophical question

Jonathan Morton chromatix99 at gmail.com
Mon May 30 11:57:02 EDT 2011


If most of your clients are mobile, you should use a tcp congestion control algorithm such as Westwood+ which is designed for the task. This is designed to distinguish between congestion and random packet losses. It is much less aggressive at filling buffers than the default CUBIC. 

Your main bottleneck even at 2Gbps is at the uplink to the ISP. That is where you need an AQM capable router. You have no control over what happens further into the Internet except by turning on ECN. IMHO that is reasonably safe already and more people should do it, but you would be quite justified in running trials and listening for trouble. 

What ECN probably needs is a statement from several major players - that is Red Hat, Canonical, Linus, Apple, Microsoft - that they will unilaterally turn on ECN by default in releases and updates after some flag day. It has, after all, been in RFC and implemented for ages, so any remaining broken networks that actually block ECN packets really have no excuse. Stripping ECN is a slightly less serious problem which will be easier to address afterwards. 

If your internal bottleneck is a single dumb switch which supports PAUSE, you shouldn't have much trouble and a basic AQM such as SFQ on your servers may be sufficient. 

The key to knowledge is not to rely on others to teach you it. 

On 30 May 2011, at 18:29, "George B." <georgeb at gmail.com> wrote:

> On Mon, May 30, 2011 at 5:25 AM, Dave Taht <dave.taht at gmail.com> wrote:
>> 
>> 
>> On Sun, May 29, 2011 at 10:24 PM, George B. <georgeb at gmail.com> wrote:
>>> 
>>> Ok, say I have a network with no over subscription in my net.
>> 
>> I'd love to see one of those. Can I get on it?
> 
> Well, we currently have the potential for some microburst oversub
> inside the data center but not too much of it.  I can take a 48-port
> GigE switch and have 40G of uplink but the switches aren't fully
> populated yet.  Bottlenecks are currently where we might have 25 front
> end servers talking on GigE to a backend server with 20G.  So some
> potential for internal microburst oversub but that's beyond the scope
> of this discussion.
> 
>>> 
>>> I have
>>> 10G to the internet but am only using about 2G of that.  This is the
>>> server side of a network talking to millions of clients.  The clients
>>> in this case are on "lossy" wireless networks where packet loss is not
>>> an indication of congestion so much as it is an indication that the
>>> client moved 15 feet behind a pole and had poor network connectivity
>>> for a few minutes.
>>> 
>> Or is using multicast.
> 
> Multicast is a fact of life with which one is going to have to learn
> to live.  Better to somehow get the gear handling it in a better
> fashion, in my opinion.
> 
>>> The idea being that in today's internet, packet loss is not a good
>>> indication of congestion.  Often it just means that the radio signal
>>> has been briefly interrupted.  What I need is something that can tell
>>> the difference between real congestion and radio loss.  ECN seems to
>>> be the way forward in that respect.
>>> 
>> Yes. When it works. Which is rarely.
> 
> I have enabled ECN (been following various bufferbloat discussions for
> a while) on a couple of machines and also my own machine (my own in
> order to see where it might cause any problems browsing) without any
> problems so far.  "Back in the day" when ECN first came out on Linux,
> it was enabled by default and caused all sorts of issues with sites
> that simply drop packets with either/any of the ECN bits set.  So far
> there haven't been any issues that I have run into with ECN set on my
> Windows laptop.    Once I am convinced that setting that those bits
> isn't going to cause problems, I will roll that out in a more general
> fashion. But if networks upstream from us clear those bits anyway, I'm
> not convinced what difference it will make.
> 
> There is also one fairly small subnet in the overall network where I
> have enabled "random-detect ecn" with a policy map on a potentially
> oversubscribed link.  But that is the only router in the network that
> even supports ECN.  I have sent an inquiry to the manufacturer of the
> rest of the gear about supporting ECN with their WRED implementation
> but haven't heard anything from them on the subject.
> 
>>> But assuming my network, as a server of content is not over
>>> subscribed, what would you suggest as the best qdisc for such a
>>> traffic profile? In other words, I am looking at this from the server
>>> aspect rather than from the client aspect.
>>> 
>> 
>> Ah, ok. This was discussed in this loooong thread:
>> 
>> https://lists.bufferbloat.net/pipermail/bloat/2011-March/000272.html
>> 
>> Some form of fair queuing distributes the load to the ultimate end nodes
>> better.
> 
> Ok, as we are using Linux (mostly) for the servers talking to the
> clients, it shouldn't be much of an issue to put into place.  Thanks
> to the pointer to the thread and I will watch as things develop and
> see how things go.
> 
>> As for which packet scheduler to choose for that? Don't know, I'm just
>> trying to get to where we can actually test stuff on the edge gateways at
>> this point.
> 
> Yeah, what I am most interested in are things like smart
> phones/laptops/tablets and not necessarily on WiFi but also on 3/4g
> networks. Those things are pulling a lot of traffic these days and the
> network can be lossy at times.  From my own analysis of traffic
> captures, it is fairly easy to see when a device that is "on the move"
> changes cell towers.  You get a burst of resends and often some out of
> order packets and then things settle down for a while.  This isn't so
> big of a deal if you have only a few mobile clients but sites that
> cater to mobile content might have millions of such clients connected
> at any given time with many of them in a state where they have
> marginal connectivity or are in the process of moving between towers.
> So the TCP notion that "packet loss == congestion" doesn't apply in
> those networks.  With those, packet loss is just packet loss and
> shouldn't be treated as congestion.  This is why I think it is so
> important to get ECN working across the Internet.  But even with ECN
> capable end points, if the routers in the middle are not capable of
> using ECN to signal congestion and simply drop packets, there is
> always a question of why the packet was lost.
> 
> We need to hammer on our vendors a bit and get them properly
> supporting ECN to signal congestion on ECN aware flows.
> 
>> Dave Täht
> 
> 
> Thanks, all!
> 
> 
> 
> g
> _______________________________________________
> Bloat mailing list
> Bloat at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



More information about the Bloat mailing list