[Codel] [PATCH 1/2] codel: Controlled Delay AQM

Jim Gettys jg at freedesktop.org
Mon May 7 21:15:12 EDT 2012


On 05/07/2012 04:03 PM, Eric Dumazet wrote:
> On Mon, 2012-05-07 at 15:36 -0400, Jim Gettys wrote:
>
>> I think it is safe for it to behave the rest of the way Linux ECN
>> support does right now: it only gets used if the peer requests it.
>>
>> Not clear to me there needs to be/should be any option at all: the last
>> conversation I had with Steve Bauer was that something north of 20% of
>> conversations were ECN capable. Is there one for the other instances of
>> ECN support in Linux?  If so, it should be keyed by the same variable,
>> and not be a one-off for codel.
>>
> SFB, one of the latest qdisc added in linux has ECN support enabled.
>
> There is no option to disable it, because I felt it was safe. Maybe I
> was a fool, but problem is I am not sure SFB is even used.

Client initiated ECN may still be problematic.  It is *fine* if a system
responds to ECN that is sent to it.  We know that is fine, as Linux has
had the server side of ECN enabled for quite a few years, and those
systems now are > 20% of servers on the internet.

The issue is there are both networks and middleboxes that are
misconfigured/broken.  In some cases, it black-holes a connection and
your data just doesn't go through; in others, ECN is ignored,; in others
the ECN bits are cleared or mangled; and, most worryingly, there are
some devices of order a decade old that just crash if they see an ECN
marked packet.  We do not know how common the last of these problems are.

Anecdotal evidence: OpenWrt tried turning on the client side of ECN
several years ago, and had to back off due to too many bug reports. 
CeroWrt is small enough (and time has passed), so I am all for Dave
turning on ECN for CeroWrt and we can get a feeling for how common the
problem still is.

Steve Bauer has been studying ECN for the last couple years (and, at
times, getting various networks fixed, beginning with MIT's own network,
and later, some of the biggest commercial and academic networks fixed.

There are several papers on what Steve (et. al.) have found, but I don't
have the references handy.

But it's crashing the end user boxes that has been most problematic, and
we don't know how common that is.  There *used* to be a database of such
broken hardware, but it has bit-rotted.

So let's check with Steve Bauer on this years results.
>
>> If you wanted to test ECN separately from drop with codel, then you'd
>> just request ECN in the conversation (by default, OS's don't normally
>> request ECN today, as the remaining brokenness gets sorted out).
> Since ECN is not mentioned in Codel paper, this means no simulation was
> done to study the possible effects.
>
> So its probably better to leave ECN as an option. We can change the
> default later.
Yup.  Once we know if the current state of the net is good enough.

What we don't want is to get a pile of codel bug reports that are really
ECN related problems, of which we know there are quite a few.
                            - Jim






More information about the Codel mailing list