* [Make-wifi-fast] reducing delays in wifi mcast queues @ 2018-07-19 4:15 Dave Taht 2018-09-18 23:19 ` [Make-wifi-fast] [Babel-users] " Juliusz Chroboczek 0 siblings, 1 reply; 9+ messages in thread From: Dave Taht @ 2018-07-19 4:15 UTC (permalink / raw) To: babel-users, Make-Wifi-fast as one of the zillion things I've never got around to fixing anywhere, rate limiting mcast and dropping packets either in mac80211 or in babel would be good. babel could possibly be sane about it by using it's timestamping facilities. (I ran across this babelweb pic from my old c.h.i.p + rtod experience, which cracked 16sec of delay in the mcast queue: http://www.taht.net/~d/gentleremindertogetridoftheinfinitemcastqueueinlinux80211.png ) -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Make-wifi-fast] [Babel-users] reducing delays in wifi mcast queues 2018-07-19 4:15 [Make-wifi-fast] reducing delays in wifi mcast queues Dave Taht @ 2018-09-18 23:19 ` Juliusz Chroboczek 2018-09-18 23:31 ` Dave Taht 0 siblings, 1 reply; 9+ messages in thread From: Juliusz Chroboczek @ 2018-09-18 23:19 UTC (permalink / raw) To: Dave Taht; +Cc: babel-users, Make-Wifi-fast > (I ran across this babelweb pic from my old c.h.i.p + rtod experience, > which cracked 16sec of delay in the mcast queue: Bufferbloat in the driver queues? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Make-wifi-fast] [Babel-users] reducing delays in wifi mcast queues 2018-09-18 23:19 ` [Make-wifi-fast] [Babel-users] " Juliusz Chroboczek @ 2018-09-18 23:31 ` Dave Taht 2018-09-19 0:04 ` Juliusz Chroboczek 0 siblings, 1 reply; 9+ messages in thread From: Dave Taht @ 2018-09-18 23:31 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: babel-users, Make-Wifi-fast welcome back! On Tue, Sep 18, 2018 at 4:19 PM Juliusz Chroboczek <jch@irif.fr> wrote: > > > (I ran across this babelweb pic from my old c.h.i.p + rtod experience, > > which cracked 16sec of delay in the mcast queue: > > Bufferbloat in the driver queues? In that case yes, the driver had an infinitely long mcast queue. Once the number of routes cracked a certain point, updates across the 1mbit "bus" hit RFC970. One answer of course is to fix the driver (which I never got around to), another is to think about some sort of delay based rate control, and to ensure hello packets get out on reasonable intervals. ... Now that bufferbloat is fixed in several wifi cards (ath9k, ath10k, mt76, and soon iwl), and we don't have infinitely long queues... new problems are rearing their heads. Recently I tried to deploy a few babel 1.8.2 nodes with the latest openwrt, which I had to back out rapidly because I was dropping so many babel packets under contention. A patch to universally enable babel ecn in net.c "solves" this problem, even with no defined response, my network - particularly the congested gw in the middle - got a *ton* more reliable. But this opens whole cans of worms as to what the right approach is to respond to CE marks (mine is to treat it like a loss - or like half a loss - but to never expire the route merely because it is congested) - and doesn't solve the infinite queue problem either. Honestly I'd hoped to have unicast to deploy rather than continue to fiddle with ecn. https://www.bufferbloat.net/projects/ecn-sane/wiki/ -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Make-wifi-fast] [Babel-users] reducing delays in wifi mcast queues 2018-09-18 23:31 ` Dave Taht @ 2018-09-19 0:04 ` Juliusz Chroboczek 2018-09-19 0:07 ` Jonathan Morton 2018-09-19 0:32 ` Dave Taht 0 siblings, 2 replies; 9+ messages in thread From: Juliusz Chroboczek @ 2018-09-19 0:04 UTC (permalink / raw) To: Dave Taht; +Cc: babel-users, Make-Wifi-fast > Recently I tried to deploy a few babel 1.8.2 nodes with the latest > openwrt, which I had to back out rapidly because I was dropping so many > babel packets under contention. That's interesting. Could I please see a log? > A patch to universally enable babel ecn in net.c "solves" this problem, Interesting. AFAIK, ECN is only considered by AQM queues, so this implies there's a queue in the way that's dropping Babel packets. Perhaps this queue could be convinced to treat Babel packets specially without having to hack around it using ECN? Or perhaps, if we know which queue that is, we could modify Babel's packet scheduling to be more AQM friendly? -- Juliusz ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Make-wifi-fast] [Babel-users] reducing delays in wifi mcast queues 2018-09-19 0:04 ` Juliusz Chroboczek @ 2018-09-19 0:07 ` Jonathan Morton 2018-09-19 0:32 ` Dave Taht 1 sibling, 0 replies; 9+ messages in thread From: Jonathan Morton @ 2018-09-19 0:07 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: Dave Taht, Make-Wifi-fast, babel-users > On 19 Sep, 2018, at 3:04 am, Juliusz Chroboczek <jch@irif.fr> wrote: > >> Recently I tried to deploy a few babel 1.8.2 nodes with the latest >> openwrt, which I had to back out rapidly because I was dropping so many >> babel packets under contention. > > That's interesting. Could I please see a log? > >> A patch to universally enable babel ecn in net.c "solves" this problem, > > Interesting. AFAIK, ECN is only considered by AQM queues, so this implies > there's a queue in the way that's dropping Babel packets. Perhaps this > queue could be convinced to treat Babel packets specially without having > to hack around it using ECN? Or perhaps, if we know which queue that is, > we could modify Babel's packet scheduling to be more AQM friendly? I assume it's the make-wifi-fast logic, which has fq_codel baked into the Linux wifi stack. That would react to congestion and ECN as described. - Jonathan Morton ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Make-wifi-fast] [Babel-users] reducing delays in wifi mcast queues 2018-09-19 0:04 ` Juliusz Chroboczek 2018-09-19 0:07 ` Jonathan Morton @ 2018-09-19 0:32 ` Dave Taht 2018-09-19 0:43 ` Juliusz Chroboczek 1 sibling, 1 reply; 9+ messages in thread From: Dave Taht @ 2018-09-19 0:32 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: babel-users, Make-Wifi-fast, ecn-sane On Tue, Sep 18, 2018 at 5:04 PM Juliusz Chroboczek <jch@irif.fr> wrote: > > > Recently I tried to deploy a few babel 1.8.2 nodes with the latest > > openwrt, which I had to back out rapidly because I was dropping so many > > babel packets under contention. > > That's interesting. Could I please see a log? I will be more rigorous while upgrading to 1.8.3 tomorrow. Not sure what sort of log you would like would: echo dump | nc :1 32123 every 4 sec suit? The other log I was creating was of ip route show every 10 sec while collecting the usual flent stats of course. tcpdump? The most effective thing I've done to show "evolution" has been to take a movie of babelweb... > > A patch to universally enable babel ecn in net.c "solves" this problem, > > Interesting. AFAIK, ECN is only considered by AQM queues, so this implies > there's a queue in the way that's dropping Babel packets. There's fq_codel on every queue, which does FQ, and codel assumes everything is at least moderately TCP friendly (and/or reasonably responsive to ecn marks) My easy test '(other than a field deployment), is to try and pump, say, 100 flent-driven TCP flows through an otherwise reliable 100Mbit link for a few minutes. Routes get lost, hellos get lost, eventually the link gets cut off from the net entirely, even if it's the only link. I've been planning on repeating that test formally since early august, your 1.8.3 announcement caught me at a good time. > Perhaps this > queue could be convinced to treat Babel packets specially without having > to hack around it using ECN? So this goes to a deep philosophical question also. I would not mind if there was a setsockopt like the existing TCP_SENT_LOWAT for udp to provide some backpressure. Routing is a special case - for Babel, and OSPF, adding ecn is an option. For ISIS not so. > Or perhaps, if we know which queue that is, > we could modify Babel's packet scheduling to be more AQM friendly? How would you describe babel's packet schedulig now? CS6 on wifi stuff tends to end up in the VO or VI queues fq_codel by itself on eithernet doesn't pay attention to diffserv cake has support for diffserv markings and reserves up to 25% of the bandwidth for higher priority flows. It's harder to get it to do bad things unless you attack it with 100 CS4 marked tcp flows... As for being AQM friendly, a better way to put it would be being TCP-friendly, I guess. Never put in more than you can expect to get out. The fq_codel algorithm in the linux mac80211 stack currently defaults to 20ms as a target local delay. So dumping packets in there at a rate no more than 20ms each (short term burst of 100ms) - relative to whatever bandwidth can be achieved vs the other flows. Randomizing the order in which routes are sent out might help, repeating critical routes (like hellos with default gateways in them), I don't know what else. Perhaps we need to revisit the mcast queue driver on this round of the mac802.11 work. It's just really observable now... BTW: The OSX version of fq_codel (which has been on by default for wifi for a version or two), uses different targets for the VO queue. Not clear how it does mcast. daves-Air-3:~ d$ netstat -I en0 -qq en0: [ sched: FQ_CODEL qlength: 0/128 ] [ pkts: 0 bytes: 0 dropped pkts: 50 bytes: 6129 ] ===================================================== [ pri: VO (1) srv_cl: 0x400180 quantum: 600 drr_max: 8 ] [ queued pkts: 0 bytes: 0 ] [ dequeued pkts: 2652 bytes: 272144 ] [ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ] [ flow control: 0 feedback: 0 stalls: 0 failed: 0 ] [ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ] [ flows total: 0 new: 0 old: 0 ] [ throttle on: 0 off: 0 drop: 0 ] ===================================================== [ pri: VI (2) srv_cl: 0x380100 quantum: 3000 drr_max: 6 ] [ queued pkts: 0 bytes: 0 ] [ dequeued pkts: 0 bytes: 0 ] [ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ] [ flow control: 0 feedback: 0 stalls: 0 failed: 0 ] [ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ] [ flows total: 0 new: 0 old: 0 ] [ throttle on: 0 off: 0 drop: 0 ] ===================================================== [ pri: BE (7) srv_cl: 0x0 quantum: 1500 drr_max: 4 ] [ queued pkts: 0 bytes: 0 ] [ dequeued pkts: 147577 bytes: 42979533 ] [ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ] [ flow control: 0 feedback: 0 stalls: 0 failed: 0 ] [ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ] [ flows total: 0 new: 0 old: 0 ] [ throttle on: 0 off: 0 drop: 0 ] ===================================================== [ pri: BK (8) srv_cl: 0x100080 quantum: 1500 drr_max: 2 ] [ queued pkts: 0 bytes: 0 ] [ dequeued pkts: 1312 bytes: 249257 ] [ budget: 0 target qdelay: 10.00 msec update interval:100.00 msec ] [ flow control: 0 feedback: 0 stalls: 0 failed: 0 ] [ drop overflow: 0 early: 0 memfail: 0 duprexmt:0 ] [ flows total: 0 new: 0 old: 0 ] [ throttle on: 0 off: 0 drop: 0 ] > > -- Juliusz -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Make-wifi-fast] [Babel-users] reducing delays in wifi mcast queues 2018-09-19 0:32 ` Dave Taht @ 2018-09-19 0:43 ` Juliusz Chroboczek 2018-09-19 0:53 ` Dave Taht 2018-09-19 6:05 ` Jonathan Morton 0 siblings, 2 replies; 9+ messages in thread From: Juliusz Chroboczek @ 2018-09-19 0:43 UTC (permalink / raw) To: Dave Taht; +Cc: babel-users, Make-Wifi-fast, ecn-sane >> Interesting. AFAIK, ECN is only considered by AQM queues, so this implies >> there's a queue in the way that's dropping Babel packets. > There's fq_codel on every queue, which does FQ, and codel assumes > everything is at least moderately TCP friendly (and/or reasonably > responsive to ecn marks) Jonathan seems to agree with you. Were your tests run with more than 60 installed routes or so? >> Or perhaps, if we know which queue that is, >> we could modify Babel's packet scheduling to be more AQM friendly? > How would you describe babel's packet schedulig now? The main flaw is that it sends periodic updates as a burst of back-to-back full-size packets. That could trigger Codel if you had more than 60 routes or so. > So dumping packets in there at a rate no more than 20ms each (short term > burst of 100ms) - relative to whatever bandwidth can be achieved vs the > other flows. Right. Guilty as charged. -- Juliusz ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Make-wifi-fast] [Babel-users] reducing delays in wifi mcast queues 2018-09-19 0:43 ` Juliusz Chroboczek @ 2018-09-19 0:53 ` Dave Taht 2018-09-19 6:05 ` Jonathan Morton 1 sibling, 0 replies; 9+ messages in thread From: Dave Taht @ 2018-09-19 0:53 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: babel-users, Make-Wifi-fast, ecn-sane On Tue, Sep 18, 2018 at 5:43 PM Juliusz Chroboczek <jch@irif.fr> wrote: > > >> Interesting. AFAIK, ECN is only considered by AQM queues, so this implies > >> there's a queue in the way that's dropping Babel packets. > > > There's fq_codel on every queue, which does FQ, and codel assumes > > everything is at least moderately TCP friendly (and/or reasonably > > responsive to ecn marks) > > Jonathan seems to agree with you. > > Were your tests run with more than 60 installed routes or so? yes. I can get an exact amount of ipv4 and ipv6 routes tomorrow, it has been well over 240 in various stages of this net's evolution, but probably below 100 as I write 'cause that main gw is reverted to static routes right now. > >> Or perhaps, if we know which queue that is, > >> we could modify Babel's packet scheduling to be more AQM friendly? > > > How would you describe babel's packet schedulig now? > > The main flaw is that it sends periodic updates as a burst of back-to-back > full-size packets. That could trigger Codel if you had more than 60 > routes or so. > > > So dumping packets in there at a rate no more than 20ms each (short term > > burst of 100ms) - relative to whatever bandwidth can be achieved vs the > > other flows. > > Right. Guilty as charged. Well, packet pacing is now a feature of the tcp stack but not the udp one, but it is good to know something can be done in addition to ecn.... > > -- Juliusz -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Make-wifi-fast] [Babel-users] reducing delays in wifi mcast queues 2018-09-19 0:43 ` Juliusz Chroboczek 2018-09-19 0:53 ` Dave Taht @ 2018-09-19 6:05 ` Jonathan Morton 1 sibling, 0 replies; 9+ messages in thread From: Jonathan Morton @ 2018-09-19 6:05 UTC (permalink / raw) To: Juliusz Chroboczek; +Cc: Dave Taht, Make-Wifi-fast, ecn-sane, babel-users > On 19 Sep, 2018, at 3:43 am, Juliusz Chroboczek <jch@irif.fr> wrote: > >>> Or perhaps, if we know which queue that is, >>> we could modify Babel's packet scheduling to be more AQM friendly? > >> How would you describe babel's packet schedulig now? > > The main flaw is that it sends periodic updates as a burst of back-to-back > full-size packets. That could trigger Codel if you had more than 60 > routes or so. > >> So dumping packets in there at a rate no more than 20ms each (short term >> burst of 100ms) - relative to whatever bandwidth can be achieved vs the >> other flows. > > Right. Guilty as charged. The general principle you want is pacing, rather than bursting. If you have a defined refresh interval, spread your routing updates evenly across that interval. Or pick a rate to send updates at, and use ECN feedback to adjust that rate. - Jonathan Morton ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2018-09-19 6:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-19 4:15 [Make-wifi-fast] reducing delays in wifi mcast queues Dave Taht 2018-09-18 23:19 ` [Make-wifi-fast] [Babel-users] " Juliusz Chroboczek 2018-09-18 23:31 ` Dave Taht 2018-09-19 0:04 ` Juliusz Chroboczek 2018-09-19 0:07 ` Jonathan Morton 2018-09-19 0:32 ` Dave Taht 2018-09-19 0:43 ` Juliusz Chroboczek 2018-09-19 0:53 ` Dave Taht 2018-09-19 6:05 ` Jonathan Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox