[Bloat] [Cerowrt-devel] cerowrt 3.3.8-17: nice latency improvements, some issues with bind

Fri Aug 17 14:05:27 EDT 2012

I'm widening the distribution of this email a little bit in light of
the benchmark results (somewhat too far) below, and some of the other
issues raised.

On Fri, Aug 17, 2012 at 1:52 AM, Török Edwin
<edwin+ml-cerowrt at etorok.net> wrote:
> On 08/13/2012 09:08 AM, Dave Taht wrote:
>> I'm too tired to write up a full set of release notes, but I've been
>> testing it all day,
>> and it looks better than -10 and certainly better than -11, but I won't know
>> until some more folk sit down and test it, so here it is.
>>
>> http://huchra.bufferbloat.net/~cero1/3.3/3.3.8-17/
>>
>> fresh merge with openwrt, fix to a bind CVE, fixes for 6in4 and quagga
>> routing problems,
>> and a few tweaks to fq_codel setup that might make voip better.
>>
>> Go forth and break things!
>
> Hi,
>
> This is the first cerowrt that I tried on my router (was using Openwrt before), and I'm quite happy
> with the latency improvements on WiFi (see below).
>
> However I've encountered some issues with bind. After powering on the router this morning DNS wasn't working,
> and logread showed a lot of errors from bind about a broken trust chain on every domain name.
> Any idea what could've caused this behaviour?

This is http://www.bufferbloat.net/issues/113 (relevant bugs have also
been filed in the dnssec and ntp databases)

a long standing circular problem between getting accurate time via ntp
and dns, so that dnssec can be enabled. dnssec requires time be
accurate within an hour. I have tried multiple ways to fix it, and the
workaround in place doesn't always succeed (and has a bug parsing
ntpdc output, now, too).

It's been my hope that ntp would evolve to do more of the right thing
or that bind would, or that the issues with the dnsval patches would
get resolved, but that involves getting someone to step up to address
them, and I have been too focused on the bufferbloat issue personally
to fight this one of late.

It's a PITA. The workarounds are:

0) in case of failure -

rndc validation disable
/etc/init.d/ntp restart

(this is basically what the workaround attempts to do. The patch to
ntp is supposed to disable dnssec validation but doesn't work under
some scenarios)

1) Disable dnssec entirely in bind

turn off validation in the conf file

2) Use dnsmasq instead of bind (no dnssec there, too) - how documented here:

https://plus.google.com/101384639386588513837/posts/Cgvfn8m9XuC

3) Other workarounds and patches gladly accepted. (this sort of work
can be done on a conventional x86 box). The simplest thought I have is
to hammer validation off and get initial time via something other than
ntp - some web service. It would be better if ntp did the work
directly.

I note that regardless, if your ISP provided DNS forwarder can be
trusted, it's a good idea to point bind's forwarders.conf to that, so
as to get best DNS performance out of bind. Automating "is my local
ISP's DNS trustable" is something also on the very long outstanding
"todo" list....

>
> Note that my internet connection is through PPPoE, so when bind starts on boot it might not have IPv4 network connectivity yet.
> There's also a tiny delay between IPv4 and IPv6 connectivity, because IPv6 prefix is obtained using dhcp6c after PPPoE has connected.

Hmm. This makes the ongoing issues with getting accurate time on boot
even more severe.

A battery backed up clock, or gps provided time, would be good, too.
Using GPS provided time is one of the solutions under consideration
for the edge to edge measurement project.

>

> Another minor issue is that p910nd and luci-app-p910nd were not available via opkg install, but I found them on openwrt.org, so that works now.

I don't know what they are but I can enable them in the next build.

> DHCPv6-PD had to be configured manually of course, same as with openwrt, the difference is that I only get IPv6 on wired interfaces now,
> and not on wireless.
> That seems to be by design because the interfaces are not bridged anymore and I get only a /64 from my ISP (slan_len 0), so can't really create
> more sub-networks from it.

As multiple providers seem to think that a single /64 "is all you
need", despite the prevalence of guest and other sorts of secondary
networks on ipv4. This is a HUGE problem on the current native ipv6
deployments.

Note that it's not exactly fair to blame the providers, most of the
home CPE gear they are dealing with can barely handle ipv6 in the
first place, being based on ancient kernels and specifications. That
gear is improving, all too slowly, with things like openwrt/cerowrt in
the lead.... (apple seems to be doing fairly well, too)

Having only a single /64 delegated makes ipv6 unusable IMHO.

I (or rather juliusz) solved the single /64-only problem years ago by
switching to using babel and ahcp, which pushes out ipv6 /128 ips.
This method has the added benefit of making switching between multiple
wired and wireless APs utterly transparent, even for long held TCP
connections.

I run my own networks this way whenever possible, as it's *really
nice* to be able to unplug and not lose 20 ssh connections, and plug
back in, to get bandwidth, and have babel figure out the right way to
go automagically.

However fixing both the APs and the hosts (via adding ahcp and babel)
is kind of fixing a global infrastructure issue that is hard to get
the rest of the world to agree to, and things like network manager
don't think this way, either... But I'm glad to see progress being
made in homenet towards having a flooding prefix distribution protocol
based on something like ahcp, this will cut down on NAT usage in ipv4
and lead to a more flexible network in the future. - and I'm sure more
and more native deployments will delegate /60s or better in the
future.

Using dhcpv6 it is also possible to do allocations of /80s but this
breaks the 95% of all devices that only can do SLAAC.

It is best to get at least a /60 delegation from the ISP.

My way of coping with the half-arsed single /64 delegation ipv6 native
deployments I've dealt with thus far has been 6to4 and 6in4, which do
/48s. And kvetching, loudly, in every direction. And trying to make
dhcpv6 work better, as well as ahcp, and many other aspects of ipv6,
such as classification.

> Onto the good news, here are some measurements (ping time / bandwidth from my laptop connected through WiFi to my desktop connected through GbitE):
>
> no fq_codel on laptop, openwrt, wlan0 5Ghz: 0.859/174.859/923.768/198.308 ms; 120 - 140Mbps
> w/ fq_codel on laptop, openwrt, wlan0 5Ghz: 1.693/ 26.727/ 54.936/ 11.746 ms; 120 - 140Mbps
> no fq_codel on laptop, cerowrt, wlan0 5Ghz: 2.310/ 15.183/140.495/ 30.337 ms; 75 - 85 Mbps
> w/ fq_codel on laptop, cerowrt, wlan0 5Ghz: 1.464/  1.981/  2.223/  0.221 ms; 75 - 85 Mbps
>
> The latency improvement is awesome, and I don't really mind the sacrificed bandwidth to accomplish it.

A man after my own heart.

Thx! The industry as a whole has been focused on "bandwidth at any
cost, including massive latency", which leads to things like the ~1
second delays you observed on your fq_codel-less test. (and far worse
has been observed in the field) We're focused on improving latency,
because as stuart cheshire says: "once you have latency, you can't get
it back"

We hope that once some other concepts prove out, we can keep the low
latency and add even more bandwidth back.

http://www.bufferbloat.net/projects/cerowrt/wiki/Fq_Codel_on_Wireless

In day-to-day use the lowered latency and jitter in cero currently can
be really "felt" particularly in applications like skype and google
hangouts, and web pages (under load) feel much faster, as DNS lookups
happen really fast...

and (as another example), things like youtube far more rarely "stall out".

It's kind of hard to measure "feel", though. I wish we had better
benchmarks to show what we're accomplishing.

> Is the bandwidth drop intended though? When enabling fq_codel just on my laptop I didn't notice any bandwidth drop at all.

The core non-fq_codel change on cerowrt vs openwrt and/or your laptop
is that the aggregation buffer size at the driver level has been
severely reduced in cerowrt, from it's default of 128 buffers, to 3.
This means that the maximum aggregate size has been cut to 3 packets
from ~42, but more importantly, total outstanding buffers not managed
by codel to 3, rather than 128....

The fact that this costs so little bandwidth (40%) in exchange for
reducing latency and jitter by 25x (or 400x compared to no fq_codel at
all) suggests that in the long run, once we come up with fixes to the
mac80211 layer, we will be able to achieve better utilization,
latency, AND jitter overall than the current hw deployed everywhere.

IF you'd like to have more bandwidth back, you can jiggle the qlen_*
variables in the debloat script up, but remember that tcp's reaction
time is quadratic as to the amount of buffering. I'd be interested in
you repeating your benchmark after doing that? The difference between
3 buffers and 8 is pretty dramatic...

Personally I'd be happy if we could hold wifi jitter below 30ms, and
typical latency below 10ms, in most (home) scenarios. I think that is
eminently doable, and a reasonable compromise between cero's all-out
assault on latency and the marketing need for more bandwidth. fq_codel
all by itself gets close (the fair queuing part helps a lot)

'Course I'd love it if low latency became the subject of all out
marketing wars between home gateway makers, and between ISPs, with
1/100 the technical resources thrown into the problem as has been
expended on raw bandwidth.

Possible themes:

"Hetgear": Frag your friends, faster!
"Binksys": Torrents and TV? no problem.
"Chuckalo": making DNS zippy!

"Boogle fiber: now with 2ms cross town latency!"
"Merizon: Coast to coast in under 60ms!"
"Nomcast: Making webex work better"

But we're not living in that alternate reality (yet), although I think
we're beginning to see some light at the end of the tunnel.

That said, there are infrastructural problems regarding the overuse of
aggregation everywhere, in gpon, in cable modems and CMTSes, and in
other backbone technologies, in addition to the queue management
problem. It's going to be a hard slog to get back to where the
distance between your couch and the internet is consistently less than
between here and the moon.

But worth it, in terms of the global productivity gain, and lowered
annoyance levels, worldwide.

...

> AFAICT my router is the only radio on 5Ghz and it is configured the same way as openwrt was (HT40+).
>
> Note: I use WPA2-PSK, and I disabled the other two SSIDs on the 5Ghz.
>
> Best regards,
> --Edwin
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel

-- 
Dave Täht
http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out
with fq_codel!"