[Ecn-sane] [Babel-users] reducing delays in wifi mcast queues

Dave Taht dave.taht at gmail.com
Tue Sep 18 20:32:22 EDT 2018


On Tue, Sep 18, 2018 at 5:04 PM Juliusz Chroboczek <jch at irif.fr> wrote:
>
> > Recently I tried to deploy a few babel 1.8.2 nodes with the latest
> > openwrt, which I had to back out rapidly because I was dropping so many
> > babel packets under contention.
>
> That's interesting.  Could I please see a log?

I will be more rigorous while upgrading to 1.8.3 tomorrow. Not sure
what sort of log you would like
would:

echo dump | nc :1 32123 every 4 sec suit?

The other log I was creating was of ip route show every 10 sec

while collecting the usual flent stats of course.

tcpdump?

The most effective thing I've done to show "evolution" has been to
take a movie of babelweb...

> > A patch to universally enable babel ecn in net.c "solves" this problem,
>
> Interesting.  AFAIK, ECN is only considered by AQM queues, so this implies
> there's a queue in the way that's dropping Babel packets.

There's fq_codel on every queue, which does FQ, and codel assumes
everything is at least
moderately TCP friendly (and/or reasonably responsive to ecn marks)

My easy test '(other than a field deployment), is to try and pump,
say, 100 flent-driven TCP flows through an otherwise reliable 100Mbit
link for a few minutes. Routes get lost, hellos get lost, eventually
the link gets cut off from the net entirely, even if it's the only
link.

I've been planning on repeating that test formally since early august,
your 1.8.3 announcement
caught me at a good time.

> Perhaps this
> queue could be convinced to treat Babel packets specially without having
> to hack around it using ECN?

So this goes to a deep philosophical question also. I would not mind
if there was a setsockopt
like the existing TCP_SENT_LOWAT for udp to provide some backpressure.

Routing is a special case - for Babel, and OSPF, adding ecn is an
option. For ISIS not so.

> Or perhaps, if we know which queue that is,
> we could modify Babel's packet scheduling to be more AQM friendly?

How would you describe babel's packet schedulig now?

CS6 on wifi stuff tends to end up in the VO or VI queues
fq_codel by itself on eithernet doesn't pay attention to diffserv
cake has support for diffserv markings and reserves up to 25% of the
bandwidth for higher priority flows.  It's harder to get it to do bad
things unless you attack it with 100 CS4 marked tcp flows...

As for being AQM friendly, a better way to put it would be being
TCP-friendly, I guess.
Never put in more than you can expect to get out. The fq_codel
algorithm in the linux mac80211
stack currently defaults to 20ms as a target local delay. So dumping
packets in there at a rate
no more than 20ms each (short term burst of 100ms) - relative to
whatever bandwidth can be achieved vs the other flows.

Randomizing the order in which routes are sent out might help,
repeating critical routes (like hellos with default gateways in them),
I don't know what else. Perhaps we need to revisit the mcast queue
driver on this round of the mac802.11 work. It's just really
observable now...

BTW: The OSX version of fq_codel (which has been on by default for
wifi for a version or two), uses different targets for the VO queue.
Not clear how it does mcast.

daves-Air-3:~ d$ netstat -I en0 -qq

en0:

     [ sched:  FQ_CODEL  qlength:    0/128 ]

     [ pkts:          0  bytes:          0  dropped pkts:     50 bytes:   6129 ]

=====================================================

     [ pri: VO (1) srv_cl: 0x400180 quantum: 600 drr_max: 8 ]

     [ queued pkts: 0 bytes: 0 ]

     [ dequeued pkts: 2652 bytes: 272144 ]

     [ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ]

     [ flow control: 0 feedback: 0 stalls: 0 failed: 0 ]

     [ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ]

     [ flows total: 0 new: 0 old: 0 ]

     [ throttle on: 0 off: 0 drop: 0 ]

=====================================================

     [ pri: VI (2) srv_cl: 0x380100 quantum: 3000 drr_max: 6 ]

     [ queued pkts: 0 bytes: 0 ]

     [ dequeued pkts: 0 bytes: 0 ]

     [ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ]

     [ flow control: 0 feedback: 0 stalls: 0 failed: 0 ]

     [ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ]

     [ flows total: 0 new: 0 old: 0 ]

     [ throttle on: 0 off: 0 drop: 0 ]

=====================================================

     [ pri: BE (7) srv_cl: 0x0 quantum: 1500 drr_max: 4 ]

     [ queued pkts: 0 bytes: 0 ]

     [ dequeued pkts: 147577 bytes: 42979533 ]

     [ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ]

     [ flow control: 0 feedback: 0 stalls: 0 failed: 0 ]

     [ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ]

     [ flows total: 0 new: 0 old: 0 ]

     [ throttle on: 0 off: 0 drop: 0 ]

=====================================================

     [ pri: BK (8) srv_cl: 0x100080 quantum: 1500 drr_max: 2 ]

     [ queued pkts: 0 bytes: 0 ]

     [ dequeued pkts: 1312 bytes: 249257 ]

     [ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ]

     [ flow control: 0 feedback: 0 stalls: 0 failed: 0 ]

     [ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ]

     [ flows total: 0 new: 0 old: 0 ]

     [ throttle on: 0 off: 0 drop: 0 ]



>
> -- Juliusz



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619


More information about the Ecn-sane mailing list