[Cerowrt-devel] cerowrt_stability?

Sat Jun 7 13:55:05 EDT 2014

On Sat, Jun 7, 2014 at 5:38 AM, Török Edwin <edwin+ml-cerowrt at etorok.net> wrote:
> On 06/06/2014 05:46 PM, Dave Taht wrote:
>> 1)how many are encountering bug 442 regularly?
>>
>> Getting it to occur is hard for me. I've only seen in once in the last
>> several weeks, jg can have it happen inside of two days. It mostly
>> seems to occur in conditions of poor signal strength, near as I can tell.
>>
>> 2) aside from that, how are things?
>
> I am quite happy with 3.10.40-5, although I might try -6 soon as apparently that has a new dnsmasq.
> I don't use wireless that often these days anymore, so I can't say about that bug.
> I didn't have troubles with DNSSEC with the default config.
> IPv6 works reliably too, in fact it is too reliable :) It happened once that I DHCP/IPv4 was broken, but IPv6 still worked.
>
> There are just 2 strange things that weren't reproducible [1], sorry that I can't give you more than anecdotal evidence:
> 1.  I booted the router, the interface went up, I read my email, tried to search on startpage.com, but it was down (or so I thought),
> so I searched google, but then none of the links in google work ... ah there is no IPV4 address ...
>   Running DHCP didn't give me anything, and manually setting an IP address on eth0 didn't help either as I wasn't able to ping / ssh the router on IPv4
> (and apparently ssh doesn't work on IPv6, might be my fault though)
>   I was able to open the web interface on IPv6, tried restarting dnsmasq but that didn't fix things, so I just rebooted the router and then it worked.

One of the problems is that openwrt is moving away from using dnsmasq
as a dhcp server, in favor of their
tightly integrated odhcp server. This is the opposite direction in
which I'd prefer, I'd like addressing and naming
to be more closely tied together than they are, (and dnsmasq's dhcp
and dhcpv6 implementations are more mature) but the size and
complexity of the code base for dnsmasq is intimidating, and the
functionality needed is tied to ubus, and dhcp is a fairly simple
protocol to implement, so there we are.

So I have been in cases where odhcpd AND dnsmasq get enabled for some
reason or another and bad things
happen. Currently the new hnetd code relies on odhcp not dnsmasq, for
dhcp service.

There is also an open bug in dnsmasq (race condition), that results in
it running away and giving
you the result you had. There's a fix in 2.72beta2 for it.

>
> 2. At some point my internet speed and latency become very bad. I don't know if this was due to the fault of my ISP or not, but a reboot fixed it.
>
> A ping looked like this (over an ethernet connection):
> PING www.google.com (173.194.44.52) 56(84) bytes of data.
> 64 bytes from muc03s08-in-f20.1e100.net (173.194.44.52): icmp_seq=1 ttl=55 time=2467 ms
> 64 bytes from muc03s08-in-f20.1e100.net (173.194.44.52): icmp_seq=2 ttl=55 time=2502 ms
> 64 bytes from muc03s08-in-f20.1e100.net (173.194.44.52): icmp_seq=3 ttl=55 time=2349 ms
>

I have seen this in a specific situation - ping flood, or (in my
case), I'd accidentally implemented a version of tcp more like tcp
relentless - so I ended up pouring out packets at 1gigE into the
router, which can only handle 300mbit forwarding at best,
in this case, simple.qos was enabled, so only 20mbit was egressing.

So although htb and fq_codel kept working, the cpu was overloaded, and
packet buffers remained full, leading to
really huge lag especially in simple.qos (which deprioritizes ping) in
the range you mention.

I've thought about improving or removing the overload search of the
flow space in fq_codel for this reason. What happens when you exceed
the packet limit is that it does a complete search of the flow space
to find the biggest flow (scanning 4k of data each time), and drops
packets from that flow. Dropping tail in that case would be simpler
and nearly as effective. Bumping up codel's drop rate faster in that
sort of situation might be good too. Still, the root of that problem
to me was that we can take gigE in but only put 300mbit out, so I
don't know if it would have done any good in that case.

> The good news is that I had a VoIP call running at the time, and I could still understand everyone, in fact I only noticed
> something was wrong when I finished the call and people told me I was very lagged with my replies.
> I think thats a success for cerowrt's SQM, that voip was still able to work even on a very lagged and slow line :)
>
> P.S. I use a variation of the attached script to configure my router, which is based on the script from the wiki.
>
> Best regards,
> --Edwin
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-devel at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>

-- 
Dave Täht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article