[Cerowrt-devel] SQM in mainline openwrt, fq_codel considered for fedora default

Dave Taht dave.taht at gmail.com
Tue Oct 21 15:21:04 EDT 2014

On Tue, Oct 21, 2014 at 11:06 AM, Tom Gundersen <teg at jklm.no> wrote:
> On Tue, Oct 21, 2014 at 7:44 PM, Michal Schmidt <mschmidt at redhat.com> wrote:
>> On 10/21/2014 07:24 PM, Tom Gundersen wrote:
>>> I have now subscribed to cerowrt-devel (long overdue), and I would

I am curious if you or michal are also openwrt or cerowrt users? Or
are running things like sch_fq or fq_codel on your desktops and

Having native, first hand experience with this stuff would be a good
guideline. There
is a lot to like about the new fq scheduler for servers and maybe for hosts.

And "cake" continues to progress.

>>> very much appreciate any comments you guys may have on our networking
>>> work in systemd. In particular, if there are any more tweaks like
>>> making fq_codel the deafult, which would be the reasonable choice for
>>> 95% of users (most of whom don't know about these things and would
>>> otherwise never touch them), we are very open to suggestions.
>> An idea: Can networkd configure interfaces' txqueuelen?
>> (Though with BQL and codel maybe it's not that important anymore.)

One thing that is missed by people that calculate BDP is that they
usually do the math one way, with the biggest packet size or an
average packet size. There are several problems with this:

1) With the advent of TSO and GSO offloads, the packet size
on servers can bloat up to 64k each. Multiply this by 1256 (txqueuelen +
the typical size of a tx ring) packets and you
can see all the pre-BQL, pre codel latency in all it's glory,
particularly at lower rates.

There's a paper on this...

2) Most client workloads are ack dominated, tending towards 66 bytes each
with some larger packets for http get requests, dns and voip. At this level
a queue with only 1000 small packets is 3 orders of magnitude smaller,
and until some recent work, could be starved by other processing on the system.

3) txqueuelen only has effect on certain qdiscs. In the case of pfifo_fast, you
can and do actually hit that limit, but in the aqms (*codel, pie, red,
ared, sfqred), the
limit is just there to keep from running out of resources - it's
otherwise really
hard to hit in those qdiscs as they start shooting or marking packets
long before
the limit is hit.

So... fiddling with txqueuelen or the ring buffer sizes is something
of a losing game. A qdisc
(like bfifo or *codel) that buffered up acks or big packets with a
byte, rather than packet limit,
is saner, along with BQL underneath.

> Hm, the way I read the docs, figuring out the "good" values is not
> that straight-forward, and doing this will anyway be obsolete soon, so
> not sure we should be setting anything by default.

Tend to agree.

I am generally allergic to TSO/GSO/GRO/LFO offloads at speeds below
100mbit, (although, sigh, I found one still shipping box from alix
with a geode in it that benefits from gso slightly (being able to push
out 60Mbit rather than 40))

and certainly txqueuelen is just plain too big at these speeds, but
you'd have to detect the link rate in order to change it to something

These show the difference in pfifo_fast on the current beagle at
100mbit with txqueuelen 1000 and 100, offloads off.


(there are a ton of results on the beagle here in this directory, at
different speeds, and buffering, before I got around to actually
adding bql to it (even more results in this dir, data sets easily
compared with netperf-wrapper))



You can see that BQL makes the most difference in the latency.

I keep hoping for saner tuning of these offloads at higher speeds on
better hardware, but it appears as of the last kernel version I tested
thoroughly TSO/GSO is still needed on devices with gigE interfaces.


And then there's (sigh) wifi.

> However, we
> probably should make it much simpler to configure. We could add
> support for both ringbuffer and quelength sizes to our link files [0],
> so admins colud implement the bufferbloat recommendations by doing
> something like:
> ----8<------
> /etc/systemd/network/00-wlan.link
> [Match]
> Type=wlan

As for wifi, there is much now published on all the problems there.

A recent summary of what seems to be needed I did at ieee: (see pp 23- )


There is no ring buffer. Often tuning down txqueuelen is a very good idea, with
today's wifi drivers being MASSIVELY overbuffered. Better to apply
fq_codel, for now, and work on restructuring that entire subsystem.

> [Link]
> TransmitRingBuffer=4
> TransmitQueueLength=16

Regrettably many devices do not respond to tuning such as this.
Example the e1000e
doesn't let you get below 64 entries in the ring buffer, and the
ar71xx allows it, but crashes...

(thankfully both have BQL)

> ----8<------
> (suggestions welcome for the naming of the variables and also for man
> page sections).
> These settings would then be applied by udev to any udev interface as
> it appears (and before libudev notifies applications about its
> existence).
> Does something like this make sense?

Regettably, no. I think printing a warning somewhere, when BQL is not detected
on an interface going up would clue more towards getting BQL adopted more fully.

"BQL not detected on interface X, latency may be compromised, beg your
vendor for BQL support" -

> Cheers,
> Tom
> [0]: <http://www.freedesktop.org/software/systemd/man/systemd.link.html>
> [1]: <http://www.bufferbloat.net/projects/bloat/wiki/Linux_Tips>

Dave Täht


More information about the Cerowrt-devel mailing list