* [Bloat] sigcomm wifi @ 2014-08-19 16:45 Dave Taht 2014-08-20 7:12 ` Eggert, Lars 2014-08-20 8:30 ` Steinar H. Gunderson 0 siblings, 2 replies; 56+ messages in thread From: Dave Taht @ 2014-08-19 16:45 UTC (permalink / raw) To: bloat I figured y'all would be bemused by the wifi performance in the sigcomm main conference room this morning... http://snapon.lab.bufferbloat.net/~d/sigcomm_tuesday.png -- Dave Täht ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-19 16:45 [Bloat] sigcomm wifi Dave Taht @ 2014-08-20 7:12 ` Eggert, Lars 2014-08-20 14:01 ` Dave Taht 2014-08-20 22:05 ` Jim Gettys 2014-08-20 8:30 ` Steinar H. Gunderson 1 sibling, 2 replies; 56+ messages in thread From: Eggert, Lars @ 2014-08-20 7:12 UTC (permalink / raw) To: Dave Taht; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 364 bytes --] On 2014-8-19, at 18:45, Dave Taht <dave.taht@gmail.com> wrote: > I figured y'all would be bemused by the wifi performance in the sigcomm > main conference room this morning... > > http://snapon.lab.bufferbloat.net/~d/sigcomm_tuesday.png There is a reason we budgeted a 1G uplink for SIGCOMM Helsinki and made sure we had sufficient AP coverage... Lars [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 273 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-20 7:12 ` Eggert, Lars @ 2014-08-20 14:01 ` Dave Taht 2014-08-20 22:05 ` Jim Gettys 1 sibling, 0 replies; 56+ messages in thread From: Dave Taht @ 2014-08-20 14:01 UTC (permalink / raw) To: Eggert, Lars; +Cc: bloat On Wed, Aug 20, 2014 at 2:12 AM, Eggert, Lars <lars@netapp.com> wrote: > On 2014-8-19, at 18:45, Dave Taht <dave.taht@gmail.com> wrote: >> I figured y'all would be bemused by the wifi performance in the sigcomm >> main conference room this morning... >> >> http://snapon.lab.bufferbloat.net/~d/sigcomm_tuesday.png > > There is a reason we budgeted a 1G uplink for SIGCOMM Helsinki and made sure we had sufficient AP coverage... > > Lars My kvetch is mostly that here I am at a con that presents "revolutionary" congestion control algorithms every year, and I am watching preso after preso that talks about SDN using "commodity switches" and analyses that look at all this "big data"... ...and the entire conference is crippled from doing anything on the internet due to bad wifi... and nobody considered solving that problem "interesting"; they merely complained loudly to everyone in earshot about it. Near as I can tell there are no papers on wifi here. At one point there was 85% packet loss and the only thing that even halfway worked was mosh. -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-20 7:12 ` Eggert, Lars 2014-08-20 14:01 ` Dave Taht @ 2014-08-20 22:05 ` Jim Gettys 2014-08-21 6:52 ` Eggert, Lars ` (4 more replies) 1 sibling, 5 replies; 56+ messages in thread From: Jim Gettys @ 2014-08-20 22:05 UTC (permalink / raw) To: Eggert, Lars; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 1921 bytes --] On Wed, Aug 20, 2014 at 3:12 AM, Eggert, Lars <lars@netapp.com> wrote: > On 2014-8-19, at 18:45, Dave Taht <dave.taht@gmail.com> wrote: > > I figured y'all would be bemused by the wifi performance in the sigcomm > > main conference room this morning... > > > > http://snapon.lab.bufferbloat.net/~d/sigcomm_tuesday.png > > There is a reason we budgeted a 1G uplink for SIGCOMM Helsinki and made > sure we had sufficient AP coverage... > And what kinds of AP's? All the 1G guarantees you is that your bottleneck is in the wifi hop, and they can suffer as badly as anything else (particularly consumer home routers). The reason why 802.11 works ok at IETF and NANOG is that: o) they use Cisco enterprise AP's, which are not badly over buffered. I don't have data on which enterprise AP's are overbuffered. o) they do a good job of placing the AP's, given a lot of experience o) they turn on RED in the router, which, since there is a lot of aggregated traffic, can actually help rather than hurt, and keep TCP decently policed. o) they play some interesting diffserv marking tricks to prioritize some traffic, getting part of the effect the fq_codel gives you in its "new flow" behavior by manual configuration. Fq_codel does much better without having to mess around like this. Would be nice if they (the folks who run the IETF network) wrote a BCP on the topic; I urged them some IETF's ago, but if others asked, it would help. If you try to use consumer home routers running factory firmware and hack it yourself, you will likely lose no matter what you backhaul is (though you might do ok using current CeroWrt/OpenWrt if you know what you are doing. -- Jim > Lars > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > > [-- Attachment #2: Type: text/html, Size: 3690 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-20 22:05 ` Jim Gettys @ 2014-08-21 6:52 ` Eggert, Lars 2014-08-21 7:11 ` Michael Welzl 2014-08-21 6:56 ` David Lang ` (3 subsequent siblings) 4 siblings, 1 reply; 56+ messages in thread From: Eggert, Lars @ 2014-08-21 6:52 UTC (permalink / raw) To: Jim Gettys; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 551 bytes --] On 2014-8-21, at 0:05, Jim Gettys <jg@freedesktop.org> wrote: > And what kinds of AP's? All the 1G guarantees you is that your bottleneck is in the wifi hop, and they can suffer as badly as anything else (particularly consumer home routers). > > The reason why 802.11 works ok at IETF and NANOG is that: > o) they use Cisco enterprise AP's, which are not badly over buffered. IIRC, Finlandia Hall used those, too (they had just gotten an upgraded installation a few months before the conference, by people who understand WLAN.) Lars [-- Attachment #2: Message signed with OpenPGP using GPGMail --] [-- Type: application/pgp-signature, Size: 273 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 6:52 ` Eggert, Lars @ 2014-08-21 7:11 ` Michael Welzl 2014-08-21 8:30 ` David Lang 0 siblings, 1 reply; 56+ messages in thread From: Michael Welzl @ 2014-08-21 7:11 UTC (permalink / raw) To: Jim Gettys, bloat, Lars Eggert On 21. aug. 2014, at 08:52, Eggert, Lars wrote: > On 2014-8-21, at 0:05, Jim Gettys <jg@freedesktop.org> wrote: >> And what kinds of AP's? All the 1G guarantees you is that your bottleneck is in the wifi hop, and they can suffer as badly as anything else (particularly consumer home routers). >> >> The reason why 802.11 works ok at IETF and NANOG is that: >> o) they use Cisco enterprise AP's, which are not badly over buffered. I'd like to better understand this particular bloat problem: 100s of senders try to send at the same time. They can't all do that, so their cards retry a fixed number of times (10 or something, I don't remember, probably configurable), for which they need to have a buffer. Say, the buffer is too big. Say, we make it smaller. Then an 802.11 sender trying to get its time slot in a crowded network will have to drop a packet, requiring the TCP sender to retransmit the packet instead. The TCP sender will think it's congestion (not entirely wrong) and reduce its window (not entirely wrong either). How appropriate TCP's cwnd reduction is probably depends on how "true" the notion of congestion is ... i.e. if I can buffer only one packet and just don't get to send it, or it gets a CRC error ("collides" in the air), then that can be seen as a pure matter of luck. Then I provoke a sender reaction that's like the old story of TCP mis-interpreting random losses as a sign of congestion. I think in most practical systems this old story is now a myth because wireless equipment will try to buffer data for a relatively long time instead of exhibiting sporadic random drops to upper layers. That is, in principle, a good thing - but buffering too much has of course all the problems that we know. Not an easy trade-off at all I think. I have two questions: 1) is my characterization roughly correct? 2) have people investigated the downsides (negative effect on TCP) of buffering *too little* in wireless equipment? (I suspect so?) Finding where "too little" begins could give us a better idea of what the ideal buffer length should really be. Cheers, Michael ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 7:11 ` Michael Welzl @ 2014-08-21 8:30 ` David Lang 2014-08-22 23:07 ` Michael Welzl 0 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-21 8:30 UTC (permalink / raw) To: Michael Welzl; +Cc: bloat [-- Attachment #1: Type: TEXT/PLAIN, Size: 3579 bytes --] On Thu, 21 Aug 2014, Michael Welzl wrote: > On 21. aug. 2014, at 08:52, Eggert, Lars wrote: > >> On 2014-8-21, at 0:05, Jim Gettys <jg@freedesktop.org> wrote: >>> And what kinds of AP's? All the 1G guarantees you is that your bottleneck is in the wifi hop, and they can suffer as badly as anything else (particularly consumer home routers). >>> >>> The reason why 802.11 works ok at IETF and NANOG is that: >>> o) they use Cisco enterprise AP's, which are not badly over buffered. > > I'd like to better understand this particular bloat problem: > > 100s of senders try to send at the same time. They can't all do that, so their > cards retry a fixed number of times (10 or something, I don't remember, > probably configurable), for which they need to have a buffer. > > Say, the buffer is too big. Say, we make it smaller. Then an 802.11 sender > trying to get its time slot in a crowded network will have to drop a packet, > requiring the TCP sender to retransmit the packet instead. The TCP sender will > think it's congestion (not entirely wrong) and reduce its window (not entirely > wrong either). How appropriate TCP's cwnd reduction is probably depends on how > "true" the notion of congestion is ... i.e. if I can buffer only one packet > and just don't get to send it, or it gets a CRC error ("collides" in the air), > then that can be seen as a pure matter of luck. Then I provoke a sender > reaction that's like the old story of TCP mis-interpreting random losses as a > sign of congestion. I think in most practical systems this old story is now a > myth because wireless equipment will try to buffer data for a relatively long > time instead of exhibiting sporadic random drops to upper layers. That is, in > principle, a good thing - but buffering too much has of course all the > problems that we know. Not an easy trade-off at all I think. in this case the loss is a direct sign of congestion. remember that TCP was developed back in the days of 10base2 networks where everyone on the network was sharing a wire and it was very possible for multiple senders to start transmitting on the wire at the same time, just like with radio. A large part of the problem with high-density wifi is that it just wasn't designed for that sort of environment, and there are a lot of things that it does that work great for low-density, weak signal environments, but just make the problem worse for high-density environements batching packets together slowing down the transmit speed if you aren't getting through retries of packets that the OS has given up on (including the user has closed the app that sent them) Ideally we want the wifi layer to be just like the wired layer, buffer only what's needed to get it on the air without 'dead air' (where the driver is waiting for the OS to give it more data), at that point, we can do the retries from the OS as appropriate. > I have two questions: 1) is my characterization roughly correct? 2) have > people investigated the downsides (negative effect on TCP) of buffering *too > little* in wireless equipment? (I suspect so?) Finding where "too little" > begins could give us a better idea of what the ideal buffer length should > really be. too little buffering will reduce the throughput as a result of unused airtime. But at the low data rates involved, the system would have to be extremely busy to be a significant amount of time if even one packet at a time is buffered. You are also conflating the effect of the driver/hardware buffering with it doing retries. David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 8:30 ` David Lang @ 2014-08-22 23:07 ` Michael Welzl 2014-08-22 23:50 ` David Lang 0 siblings, 1 reply; 56+ messages in thread From: Michael Welzl @ 2014-08-22 23:07 UTC (permalink / raw) To: David Lang; +Cc: bloat On 21. aug. 2014, at 10:30, David Lang <david@lang.hm> wrote: > On Thu, 21 Aug 2014, Michael Welzl wrote: > >> On 21. aug. 2014, at 08:52, Eggert, Lars wrote: >> >>> On 2014-8-21, at 0:05, Jim Gettys <jg@freedesktop.org> wrote: >>>> And what kinds of AP's? All the 1G guarantees you is that your bottleneck is in the wifi hop, and they can suffer as badly as anything else (particularly consumer home routers). >>>> The reason why 802.11 works ok at IETF and NANOG is that: >>>> o) they use Cisco enterprise AP's, which are not badly over buffered. >> >> I'd like to better understand this particular bloat problem: >> >> 100s of senders try to send at the same time. They can't all do that, so their cards retry a fixed number of times (10 or something, I don't remember, probably configurable), for which they need to have a buffer. >> >> Say, the buffer is too big. Say, we make it smaller. Then an 802.11 sender trying to get its time slot in a crowded network will have to drop a packet, requiring the TCP sender to retransmit the packet instead. The TCP sender will think it's congestion (not entirely wrong) and reduce its window (not entirely wrong either). How appropriate TCP's cwnd reduction is probably depends on how "true" the notion of congestion is ... i.e. if I can buffer only one packet and just don't get to send it, or it gets a CRC error ("collides" in the air), then that can be seen as a pure matter of luck. Then I provoke a sender reaction that's like the old story of TCP mis-interpreting random losses as a sign of congestion. I think in most practical systems this old story is now a myth because wireless equipment will try to buffer data for a relatively long time instead of exhibiting sporadic random drops to upper layers. That is, in principle, a good thing - but buffering too much has of course all the problems that we know. Not an easy trade-off at all I think. > > in this case the loss is a direct sign of congestion. "this case" - I talk about different buffer lengths. E.g., take the minimal buffer that would just function, and set retransmissions to 0. Then, a packet loss is a pretty random matter - just because you and I contended, doesn't mean that the net is truly "overloaded" ? So my point is that the buffer creates a continuum from "random loss" to "actual congestion" - we want loss to mean "actual congestion", but how large should it be to meaningfully convey that? > remember that TCP was developed back in the days of 10base2 networks where everyone on the network was sharing a wire and it was very possible for multiple senders to start transmitting on the wire at the same time, just like with radio. cable or wireless: is one such occurrence "congestion"? i.e. is halving the cwnd really the right response to that sort of "congestion"? (contention, really) > A large part of the problem with high-density wifi is that it just wasn't designed for that sort of environment, and there are a lot of things that it does that work great for low-density, weak signal environments, but just make the problem worse for high-density environements > > batching packets together > slowing down the transmit speed if you aren't getting through well... this *should* only happen when there's an actual physical signal quality degradation, not just collisions. at least minstrel does quite a good job at ensuring that, most of the time. > retries of packets that the OS has given up on (including the user has closed the app that sent them) > > Ideally we want the wifi layer to be just like the wired layer, buffer only what's needed to get it on the air without 'dead air' (where the driver is waiting for the OS to give it more data), at that point, we can do the retries from the OS as appropriate. > >> I have two questions: 1) is my characterization roughly correct? 2) have people investigated the downsides (negative effect on TCP) of buffering *too little* in wireless equipment? (I suspect so?) Finding where "too little" begins could give us a better idea of what the ideal buffer length should really be. > > too little buffering will reduce the throughput as a result of unused airtime. so that's a function of, at least: 1) incoming traffic rate; 2) no. retries * ( f(MAC behavior; number of other senders trying) ). > But at the low data rates involved, the system would have to be extremely busy to be a significant amount of time if even one packet at a time is buffered. > You are also conflating the effect of the driver/hardware buffering with it doing retries. because of the "function" i wrote above: the more you retry, the more you need to buffer when traffic continuously arrives because you're stuck trying to send a frame again. what am I getting wrong? this seems to be just the conversation I was hoping to have ( so thanks!) - I'd like to figure out if there's a fault in my logic. Cheers, Michael ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-22 23:07 ` Michael Welzl @ 2014-08-22 23:50 ` David Lang 2014-08-23 19:26 ` Michael Welzl 0 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-22 23:50 UTC (permalink / raw) To: Michael Welzl; +Cc: bloat [-- Attachment #1: Type: TEXT/PLAIN, Size: 7254 bytes --] On Sat, 23 Aug 2014, Michael Welzl wrote: > On 21. aug. 2014, at 10:30, David Lang <david@lang.hm> wrote: > >> On Thu, 21 Aug 2014, Michael Welzl wrote: >> >>> On 21. aug. 2014, at 08:52, Eggert, Lars wrote: >>> >>>> On 2014-8-21, at 0:05, Jim Gettys <jg@freedesktop.org> wrote: >>>>> And what kinds of AP's? All the 1G guarantees you is that your bottleneck is in the wifi hop, and they can suffer as badly as anything else (particularly consumer home routers). >>>>> The reason why 802.11 works ok at IETF and NANOG is that: >>>>> o) they use Cisco enterprise AP's, which are not badly over buffered. >>> >>> I'd like to better understand this particular bloat problem: >>> >>> 100s of senders try to send at the same time. They can't all do that, so their cards retry a fixed number of times (10 or something, I don't remember, probably configurable), for which they need to have a buffer. >>> >>> Say, the buffer is too big. Say, we make it smaller. Then an 802.11 sender trying to get its time slot in a crowded network will have to drop a packet, requiring the TCP sender to retransmit the packet instead. The TCP sender will think it's congestion (not entirely wrong) and reduce its window (not entirely wrong either). How appropriate TCP's cwnd reduction is probably depends on how "true" the notion of congestion is ... i.e. if I can buffer only one packet and just don't get to send it, or it gets a CRC error ("collides" in the air), then that can be seen as a pure matter of luck. Then I provoke a sender reaction that's like the old story of TCP mis-interpreting random losses as a sign of congestion. I think in most practical systems this old story is now a myth because wireless equipment will try to buffer data for a relatively long time instead of exhibiting sporadic random drops to upper layers. That is, in principle, a good thing - but buffering too much has of c! ourse all the problems that we know.. Not an easy trade-off at all I think. >> >> in this case the loss is a direct sign of congestion. > > "this case" - I talk about different buffer lengths. E.g., take the minimal > buffer that would just function, and set retransmissions to 0. Then, a packet > loss is a pretty random matter - just because you and I contended, doesn't > mean that the net is truly "overloaded" ? So my point is that the buffer > creates a continuum from "random loss" to "actual congestion" - we want loss > to mean "actual congestion", but how large should it be to meaningfully convey > that? > > >> remember that TCP was developed back in the days of 10base2 networks where >> everyone on the network was sharing a wire and it was very possible for >> multiple senders to start transmitting on the wire at the same time, just >> like with radio. > > cable or wireless: is one such occurrence "congestion"? > i.e. is halving the cwnd really the right response to that sort of "congestion"? (contention, really) possibly not, but in practice it may be 'good enough' but to make it work well, you probably want to play games with how much you back off, and how quickly you retry if you don't get a response. The fact that the radio link can have it's own ack for the packet can actually be an improvement over doing it at the TCP level as it only need to ack/retry for that hop, and if that hop was good, there's far less of a need to retry if the server is just slow. so if we try and do the retries in the OS stack, it will need to know the difference between "failed to get out the first hop due to collision" and "got out the first hop, waiting for the server across the globe to respond" with different timeouts/retries for them. >> A large part of the problem with high-density wifi is that it just wasn't >> designed for that sort of environment, and there are a lot of things that it >> does that work great for low-density, weak signal environments, but just make >> the problem worse for high-density environements >> >> batching packets together >> slowing down the transmit speed if you aren't getting through > > well... this *should* only happen when there's an actual physical signal > quality degradation, not just collisions. at least minstrel does quite a good > job at ensuring that, most of the time. "should" :-) but can the firmware really tell the difference between quality degredation due to interference and collisions with other transmitters? >> retries of packets that the OS has given up on (including the user has closed >> the app that sent them) >> >> Ideally we want the wifi layer to be just like the wired layer, buffer only >> what's needed to get it on the air without 'dead air' (where the driver is >> waiting for the OS to give it more data), at that point, we can do the >> retries from the OS as appropriate. >> >>> I have two questions: 1) is my characterization roughly correct? 2) have >>> people investigated the downsides (negative effect on TCP) of buffering *too >>> little* in wireless equipment? (I suspect so?) Finding where "too little" >>> begins could give us a better idea of what the ideal buffer length should >>> really be. >> >> too little buffering will reduce the throughput as a result of unused >> airtime. > > so that's a function of, at least: 1) incoming traffic rate; 2) no. retries * > ( f(MAC behavior; number of other senders trying) ). incoming to the AP you mean? It also matters if you are worrying about aggregate throughput of a lot of users, or per-connection throughput for a single user. From a sender's point of view, if it takes 100 time units to send a packet, and 1-5 time units to queue the next packet for transmission, you loose a few percentage of your possible airtime and there's very little concern. but if it takes 10 time units to send the packet and 1-5 time units to queue the next packet, you have just lost a lot of potential bandwidth. But from the point of view of the aggregate, these gaps just give someone else a chance to transmit and have very little effect on the amount of traffic arriving at the AP. I was viewing things from the point of view of the app on the laptop. > >> But at the low data rates involved, the system would have to be extremely >> busy to be a significant amount of time if even one packet at a time is >> buffered. > > > >> You are also conflating the effect of the driver/hardware buffering with it >> doing retries. > > because of the "function" i wrote above: the more you retry, the more you need > to buffer when traffic continuously arrives because you're stuck trying to > send a frame again. huh, I'm missing something here, retrying sends would require you to buffer more when sending. If people are retrying when they really don't need to, that cuts down on the avialable airtime. But if you have continual transmissions taking place, so you have a hard time getting a chance to send your traffic, then you really do have congestion and should be dropping packets to let the sender know that it shouldn't try to generate as much. David Lang > what am I getting wrong? this seems to be just the conversation I was hoping > to have ( so thanks!) - I'd like to figure out if there's a fault in my > logic. > > Cheers, > Michael > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-22 23:50 ` David Lang @ 2014-08-23 19:26 ` Michael Welzl 2014-08-23 23:29 ` Jonathan Morton 2014-08-24 1:09 ` David Lang 0 siblings, 2 replies; 56+ messages in thread From: Michael Welzl @ 2014-08-23 19:26 UTC (permalink / raw) To: David Lang; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 9220 bytes --] [removing Lars and Jim from direct cc, don't want to spam them - I don't know if they're sooo interested in this thread?] On 23. aug. 2014, at 01:50, David Lang <david@lang.hm> wrote: > On Sat, 23 Aug 2014, Michael Welzl wrote: > >> On 21. aug. 2014, at 10:30, David Lang <david@lang.hm> wrote: >> >>> On Thu, 21 Aug 2014, Michael Welzl wrote: >>> >>>> On 21. aug. 2014, at 08:52, Eggert, Lars wrote: >>>> >>>>> On 2014-8-21, at 0:05, Jim Gettys <jg@freedesktop.org> wrote: >>>>>> And what kinds of AP's? All the 1G guarantees you is that your bottleneck is in the wifi hop, and they can suffer as badly as anything else (particularly consumer home routers). >>>>>> The reason why 802.11 works ok at IETF and NANOG is that: >>>>>> o) they use Cisco enterprise AP's, which are not badly over buffered. >>>> >>>> I'd like to better understand this particular bloat problem: >>>> >>>> 100s of senders try to send at the same time. They can't all do that, so their cards retry a fixed number of times (10 or something, I don't remember, probably configurable), for which they need to have a buffer. >>>> >>>> Say, the buffer is too big. Say, we make it smaller. Then an 802.11 sender trying to get its time slot in a crowded network will have to drop a packet, requiring the TCP sender to retransmit the packet instead. The TCP sender will think it's congestion (not entirely wrong) and reduce its window (not entirely wrong either). How appropriate TCP's cwnd reduction is probably depends on how "true" the notion of congestion is ... i.e. if I can buffer only one packet and just don't get to send it, or it gets a CRC error ("collides" in the air), then that can be seen as a pure matter of luck. Then I provoke a sender reaction that's like the old story of TCP mis-interpreting random losses as a sign of congestion. I think in most practical systems this old story is now a myth because wireless equipment will try to buffer data for a relatively long time instead of exhibiting sporadic random drops to upper layers. That is, in principle, a good thing - but buffering too much has of c! > ourse all the problems that we know.. Not an easy trade-off at all I think. >>> >>> in this case the loss is a direct sign of congestion. >> >> "this case" - I talk about different buffer lengths. E.g., take the minimal buffer that would just function, and set retransmissions to 0. Then, a packet loss is a pretty random matter - just because you and I contended, doesn't mean that the net is truly "overloaded" ? So my point is that the buffer creates a continuum from "random loss" to "actual congestion" - we want loss to mean "actual congestion", but how large should it be to meaningfully convey that? >> >> >>> remember that TCP was developed back in the days of 10base2 networks where everyone on the network was sharing a wire and it was very possible for multiple senders to start transmitting on the wire at the same time, just like with radio. >> >> cable or wireless: is one such occurrence "congestion"? >> i.e. is halving the cwnd really the right response to that sort of "congestion"? (contention, really) > > possibly not, but in practice it may be 'good enough' > > but to make it work well, you probably want to play games with how much you back off, and how quickly you retry if you don't get a response. > > The fact that the radio link can have it's own ack for the packet can actually be an improvement over doing it at the TCP level as it only need to ack/retry for that hop, and if that hop was good, there's far less of a need to retry if the server is just slow. Yep... I remember a neat paper from colleagues at Trento University that piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions between TCP's ACKs and other data packets - really nice. Not sure if it wasn't just simulations, though. > so if we try and do the retries in the OS stack, it will need to know the difference between "failed to get out the first hop due to collision" and "got out the first hop, waiting for the server across the globe to respond" with different timeouts/retries for them. > >>> A large part of the problem with high-density wifi is that it just wasn't designed for that sort of environment, and there are a lot of things that it does that work great for low-density, weak signal environments, but just make the problem worse for high-density environements >>> >>> batching packets together >>> slowing down the transmit speed if you aren't getting through >> >> well... this *should* only happen when there's an actual physical signal quality degradation, not just collisions. at least minstrel does quite a good job at ensuring that, most of the time. > > "should" :-) > > but can the firmware really tell the difference between quality degredation due to interference and collisions with other transmitters? Well, with heuristics it can, sort of. As a simple example from one older mechanism, consider: multiple consecutive losses are *less* likely from random collisions than from link noise. That sort of thing. Minstrel worked best our tests, using tables of rates that worked well / didn't work well in the past: http://heim.ifi.uio.no/michawe/research/publications/wowmom2012.pdf >>> retries of packets that the OS has given up on (including the user has closed the app that sent them) >>> >>> Ideally we want the wifi layer to be just like the wired layer, buffer only what's needed to get it on the air without 'dead air' (where the driver is waiting for the OS to give it more data), at that point, we can do the retries from the OS as appropriate. >>> >>>> I have two questions: 1) is my characterization roughly correct? 2) have people investigated the downsides (negative effect on TCP) of buffering *too little* in wireless equipment? (I suspect so?) Finding where "too little" begins could give us a better idea of what the ideal buffer length should really be. >>> >>> too little buffering will reduce the throughput as a result of unused airtime. >> >> so that's a function of, at least: 1) incoming traffic rate; 2) no. retries * ( f(MAC behavior; number of other senders trying) ). > > incoming to the AP you mean? incoming to whoever is sending and would be retrying - mostly the AP, yes. > It also matters if you are worrying about aggregate throughput of a lot of users, or per-connection throughput for a single user. > > From a sender's point of view, if it takes 100 time units to send a packet, and 1-5 time units to queue the next packet for transmission, you loose a few percentage of your possible airtime and there's very little concern. > > but if it takes 10 time units to send the packet and 1-5 time units to queue the next packet, you have just lost a lot of potential bandwidth. > > But from the point of view of the aggregate, these gaps just give someone else a chance to transmit and have very little effect on the amount of traffic arriving at the AP. > > I was viewing things from the point of view of the app on the laptop. Yes... I agree, and that's the more common + more reasonable way to think about it. I tend to think upstream, which of course is far less common, but maybe even more problematic. Actually I suspect the following: things get seriously bad when a lot of senders are sending upstream together; this isn't really happening much in practice - BUT when we have a very very large number of hosts connected in a conference style situation, all the HTTP GETs and SMTP messages and whatnot *do* create lots of collisions, a situation that isn't really too common (and maybe not envisioned / parametrized for), and that's why things often get so bad. (At least one of the reasons.) >>> But at the low data rates involved, the system would have to be extremely busy to be a significant amount of time if even one packet at a time is buffered. >> >> >> >>> You are also conflating the effect of the driver/hardware buffering with it doing retries. >> >> because of the "function" i wrote above: the more you retry, the more you need to buffer when traffic continuously arrives because you're stuck trying to send a frame again. > > huh, I'm missing something here, retrying sends would require you to buffer more when sending. aren't you the saying the same thing as I ? Sorry else, I might have expressed it confusingly somehow > If people are retrying when they really don't need to, that cuts down on the avialable airtime. Yes > But if you have continual transmissions taking place, so you have a hard time getting a chance to send your traffic, then you really do have congestion and should be dropping packets to let the sender know that it shouldn't try to generate as much. Yes; but the complexity that I was pointing at (but maybe it's a simple parameter, more like a 0 or 1 situation in practice?) lies in the word "continual". How long do you try before you decide that the sending TCP should really think it *is* congestion? To really optimize the behavior, that would have to depend on the RTT, which you can't easily know. Cheers, Michael [-- Attachment #2: Type: text/html, Size: 12903 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-23 19:26 ` Michael Welzl @ 2014-08-23 23:29 ` Jonathan Morton 2014-08-23 23:40 ` Steinar H. Gunderson ` (2 more replies) 2014-08-24 1:09 ` David Lang 1 sibling, 3 replies; 56+ messages in thread From: Jonathan Morton @ 2014-08-23 23:29 UTC (permalink / raw) To: Michael Welzl; +Cc: bloat Mainlinglist I've done some reading on how wifi actually works, and what mechanisms the latest variants use to improve performance. It might be helpful to summarise my understanding here - biased towards the newer variants, since they are by now widely deployed. First a note on the variants themselves: 802.11 without suffix is obsolete and no longer in use. 802.11a was the original 5GHz band version, giving 54Mbps in 20MHz channels. 802.11b was the first "affordable" version, using 2.4GHz and giving 11Mbps in 20MHz channels. 802.11g brought the 802.11a modulation schemes and (theoretical) performance to the 2.4GHz band. 802.11n is dual-band, but optionally. Aggregation, 40MHz channels, single-target MIMO. 802.11ac is 5GHz only. More aggregation, 80 & 160MHz channels, multi-target MIMO. Rationalised options, dropping many 'n' features that are more trouble than they're worth. Coexists nicely with older 20MHz-channel equipment, and nearby APs with overlapping spectrum. My general impression is that 802.11ac makes a serious effort to improve matters in heavily-congested, many-clients scenarios, which was where earlier variants had the most trouble. If you're planning to set up or go to a major conference, the best easy thing you can do is get 'ac' equipment all round - if nothing else, it's guaranteed to support the 5GHz band. Of course, we're not just considering the easy solutions. Now for some technical details: The wireless spectrum is fundamentally a shared-access medium. It also has the complication of being noisy and having various path-loss mechanisms, and of the "hidden node" problem where one client might not be able to hear another client's transmission, even though both are in range of the AP. Thus wifi uses a CSMA/CA algorithm as follows: 1) Listen for competing carrier. If heard, backoff and retry later. (Listening is continuous, and detected preambles are used to infer the time-length of packets when the data modulation is unreadable.) 2) Perform an RTS/CTS handshake. If CTS doesn't arrive, backoff and retry later. 3) Transmit, and await acknowledgement. If no ack, backoff and retry later, possibly using different modulation. This can be compared to Ethernet's CSMA/CD algorithm: 1) Listen for competing carrier. If heard, backoff and retry later. 2) Transmit, listening for collision with a competing transmission. If collision, backoff and retry later. In both cases, the backoff is random and exponentially increasing, to reduce the chance of repeated collisions. The 2.4GHz band is chock-full of noise sources, from legacy 802.11b/g equipment to cordless phones, Bluetooth, and even microwave ovens - which generate the best part of a kilowatt of RF energy, but somehow manage to contain the vast majority of it within the cavity. It's also a relatively narrow band, with only three completely separate 20MHz channels available in most of the world (four in Japan). This isn't a massive concern for home use, but consumers still notice the effects surprisingly often. Perhaps they live in an apartment block with lots of devices and APs crowded together in an unmanaged mess. Perhaps they have a large home to themselves, but a bunch of noisy equipment reduces the effective range and reliability of their network. It's not uncommon to hear about networks that drop out whenever the phone rings, thanks to an old cordless phone. The 5GHz band is much less crowded. There are several channels which are shared with weather radar, so wifi equipment can't use those unless they are capable of detecting the radar transmissions, but even without those there are far more 20MHz channels available. There's also much less legacy equipment using it - even 802.11a is relatively uncommon (and is fairly benign in behaviour). The downside is that 5GHz doesn't propagate as far, or as easily through walls. Wider bandwidth channels can be used to shorten the time taken for each transmission. However, this effect is not linear, because the RTS/CTS handshake and preamble are fixed overheads (since they must be transmitted at a low speed to ensure that all clients can hear them), taking the same length of time regardless of any other enhancements. This implies that in seriously geographically-congested scenarios, 20MHz channels (and lots of APs to use them all) are still the most efficient. MIMO can still be used to beneficial effect in these situations. Multi-target MIMO allows an AP to transmit to several clients simultaneously, without requiring the client to support MIMO themselves. This requires the AP's antennas and radios to be dynamically reconfigured for beamforming - giving each client a clear version of its own signal and a null for the other signals - which is a tricky procedure. APs that do implement this well are highly valuable in congested situations. Single-target MIMO allows higher bandwidth between one client at a time and the AP. Both the AP and the client must support MIMO for this to work. There are physical constraints which limit the ability for handheld devices to support MIMO. In general, this form of MIMO improves throughput in the home, but is not very useful in congested situations. High individual throughput is not what's needed in a crowded arena; rather, reliable if slow individual throughput, reasonable latency, and high aggregate throughput. Choosing the most effective radio bandwidth and modulation is a difficult problem. The Minstrel algorithm seems to be an effective solution for general traffic. Some manual constraints may be appropriate in some circumstances, such as reducing the maximum radio bandwidth (trading throughput of one AP against coexistence with other APs) and increasing the modulation rate of management broadcasts (reducing per-packet overhead). Packet aggregation allow several IP packets to be combined into a single wireless transmission. This avoids performing the CSMA/CA steps repeatedly, which is a considerable overhead. There are several types of packet aggregation - the type adopted by 802.11ac allows individual IP packets within a transmission to be link-layer acknowledged separately, so that a minor corruption doesn't require transmission of the entire aggregate. By contrast, 802.11n also supported a version which did require that, despite a slightly lower overhead. Implicit in the packet-aggregation system is the problem of collecting packets to aggregate. Each transmission is between the AP and one client, so the packets aggregated by the AP all have to be for the same client. (The client can assume that all packets go to the AP.) A fair-queueing algorithm could have the effect of forming per-client queues, so several suitable packets could easily be located in such a queue. In a straight FIFO queue, however, packets for the same client are likely to be separated in the queue and thus difficult to find. It is therefore *obviously* in the AP's interest to implement a fair-queueing algorithm based on client MAC address, even if it does nothing else to manage congestion. NB: if a single aggregate could be intended to be heard by more than one client, then the complexity of multi-target beamforming MIMO would not be necessary. This is how I infer the strict one-to-one nature of data transmissions, as distinct from management broadcasts. On 23 Aug, 2014, at 10:26 pm, Michael Welzl wrote: >>> because of the "function" i wrote above: the more you retry, the more you need to buffer when traffic continuously arrives because you're stuck trying to send a frame again. >> >> huh, I'm missing something here, retrying sends would require you to buffer more when sending. > > aren't you the saying the same thing as I ? Sorry else, I might have expressed it confusingly somehow There should be enough buffering to allow effective aggregation, but as little as possible on top of that. I don't know how much aggregation can be done, but I assume that there is a limit, and that it's not especially high in terms of full-length packets. After all, tying up the channel for long periods of time is unfair to other clients - a typical latency/throughput tradeoff. Equally clearly, in a heavily congested scenario the AP benefits from having a lot of buffer divided among a large number of clients, but each client should have only a small buffer. >> If people are retrying when they really don't need to, that cuts down on the avialable airtime. > > Yes Given that TCP retries on loss, and UDP protocols are generally loss-tolerant to a degree, there should therefore be a limit on how hard the link-layer stuff tries to get each individual packet through. Minstrel appears to be designed around a time limit for that sort of thing, which seems sane - and they explicitly talk about TCP retransmit timers in that context. With that said, link-layer retries are a valid mechanism to minimise unnecessarily lost packets. It's also not new - bus/hub Ethernet does this on collision detection. What Ethernet doesn't have is the link-layer ack, so there's an additional set of reasons why a backoff-and-retry might happen in wifi. Modern wifi variants use packet aggregation to improve efficiency. This only works when there are multiple packets to send at a time from one place to a specific other place - which is more likely when the link is congested. In the event of a retry, it makes sense to aggregate newly buffered packets with the original ones, to reduce the number of negotiation and retry cycles. >> But if you have continual transmissions taking place, so you have a hard time getting a chance to send your traffic, then you really do have congestion and should be dropping packets to let the sender know that it shouldn't try to generate as much. > > Yes; but the complexity that I was pointing at (but maybe it's a simple parameter, more like a 0 or 1 situation in practice?) lies in the word "continual". How long do you try before you decide that the sending TCP should really think it *is* congestion? To really optimize the behavior, that would have to depend on the RTT, which you can't easily know. There are TCP congestion algorithms which explicitly address this (eg. Westwood+), by reacting only a little to individual drops, but reacting more rapidly if drops occur frequently. In principle they should also react quickly to ECN, because that is never triggered by random noise loss alone. - Jonathan Morton ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-23 23:29 ` Jonathan Morton @ 2014-08-23 23:40 ` Steinar H. Gunderson 2014-08-23 23:49 ` Jonathan Morton 2014-08-24 1:33 ` David Lang 2014-08-25 7:35 ` Michael Welzl 2 siblings, 1 reply; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-23 23:40 UTC (permalink / raw) To: bloat On Sun, Aug 24, 2014 at 02:29:50AM +0300, Jonathan Morton wrote: > Multi-target MIMO allows an AP to transmit to several clients > simultaneously, without requiring the client to support MIMO themselves. > This requires the AP's antennas and radios to be dynamically reconfigured > for beamforming - giving each client a clear version of its own signal and > a null for the other signals - which is a tricky procedure. APs that do > implement this well are highly valuable in congested situations. FWIW; I don't think you're right about the nulls. Beamforming has some gain, and there are some “darker spots”, but they're not what the algorithm is aiming for (it aims to maximize the signal at the client, not to minimize it at all other clients), and it's not -inf dB, more like -10 dB. See in particular http://apenwarr.ca/log/?m=201408#01 and play around with the applet. /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-23 23:40 ` Steinar H. Gunderson @ 2014-08-23 23:49 ` Jonathan Morton 0 siblings, 0 replies; 56+ messages in thread From: Jonathan Morton @ 2014-08-23 23:49 UTC (permalink / raw) To: Steinar H. Gunderson; +Cc: bloat On 24 Aug, 2014, at 2:40 am, Steinar H. Gunderson wrote: > On Sun, Aug 24, 2014 at 02:29:50AM +0300, Jonathan Morton wrote: >> Multi-target MIMO allows an AP to transmit to several clients >> simultaneously, without requiring the client to support MIMO themselves. >> This requires the AP's antennas and radios to be dynamically reconfigured >> for beamforming - giving each client a clear version of its own signal and >> a null for the other signals - which is a tricky procedure. APs that do >> implement this well are highly valuable in congested situations. > > FWIW; I don't think you're right about the nulls. Beamforming has some gain, > and there are some “darker spots”, but they're not what the algorithm is > aiming for (it aims to maximize the signal at the client, not to minimize it > at all other clients), and it's not -inf dB, more like -10 dB. That's true of plain beamforming, whose goal is to increase SNR to a particular client. That's been around since 802.11g days. But that's not what I'm talking about here. I'm talking specifically about 802.11ac multi-target MIMO, which *does* attempt to null out the other clients to which it is simultaneously transmitting, purely so that it *can* transmit to them simultaneously. But that's only two or three other clients to form nulls for, not the hundred others who are not involved in that particular timeslot. - Jonathan Morton ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-23 23:29 ` Jonathan Morton 2014-08-23 23:40 ` Steinar H. Gunderson @ 2014-08-24 1:33 ` David Lang 2014-08-24 2:29 ` Jonathan Morton 2014-08-25 7:35 ` Michael Welzl 2 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-24 1:33 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat Mainlinglist On Sun, 24 Aug 2014, Jonathan Morton wrote: > I've done some reading on how wifi actually works, and what mechanisms the latest variants use to improve performance. It might be helpful to summarise my understanding here - biased towards the newer variants, since they are by now widely deployed. > > First a note on the variants themselves: > > 802.11 without suffix is obsolete and no longer in use. > 802.11a was the original 5GHz band version, giving 54Mbps in 20MHz channels. > 802.11b was the first "affordable" version, using 2.4GHz and giving 11Mbps in 20MHz channels. > 802.11g brought the 802.11a modulation schemes and (theoretical) performance to the 2.4GHz band. > 802.11n is dual-band, but optionally. Aggregation, 40MHz channels, single-target MIMO. > 802.11ac is 5GHz only. More aggregation, 80 & 160MHz channels, multi-target MIMO. Rationalised options, dropping many 'n' features that are more trouble than they're worth. Coexists nicely with older 20MHz-channel equipment, and nearby APs with overlapping spectrum. > My general impression is that 802.11ac makes a serious effort to improve > matters in heavily-congested, many-clients scenarios, which was where earlier > variants had the most trouble. If you're planning to set up or go to a major > conference, the best easy thing you can do is get 'ac' equipment all round - > if nothing else, it's guaranteed to support the 5GHz band. Of course, we're > not just considering the easy solutions. If ac had reasonable drivers available I would agree, but when you are limited to factory firmware, it's not good. > Now for some technical details: > > The wireless spectrum is fundamentally a shared-access medium. It also has > the complication of being noisy and having various path-loss mechanisms, and > of the "hidden node" problem where one client might not be able to hear > another client's transmission, even though both are in range of the AP. > > Thus wifi uses a CSMA/CA algorithm as follows: > > 1) Listen for competing carrier. If heard, backoff and retry later. > (Listening is continuous, and detected preambles are used to infer the > time-length of packets when the data modulation is unreadable.) > 2) Perform an RTS/CTS handshake. If CTS doesn't arrive, backoff and retry later. > 3) Transmit, and await acknowledgement. If no ack, backoff and retry later, > possibly using different modulation. > > This can be compared to Ethernet's CSMA/CD algorithm: > > 1) Listen for competing carrier. If heard, backoff and retry later. > 2) Transmit, listening for collision with a competing transmission. If > collision, backoff and retry later. > > In both cases, the backoff is random and exponentially increasing, to reduce > the chance of repeated collisions. > > The 2.4GHz band is chock-full of noise sources, from legacy 802.11b/g > equipment to cordless phones, Bluetooth, and even microwave ovens - which > generate the best part of a kilowatt of RF energy, but somehow manage to > contain the vast majority of it within the cavity. It's also a relatively > narrow band, with only three completely separate 20MHz channels available in > most of the world (four in Japan). > > This isn't a massive concern for home use, but consumers still notice the > effects surprisingly often. Perhaps they live in an apartment block with lots > of devices and APs crowded together in an unmanaged mess. Perhaps they have a > large home to themselves, but a bunch of noisy equipment reduces the effective > range and reliability of their network. It's not uncommon to hear about > networks that drop out whenever the phone rings, thanks to an old cordless > phone. > > The 5GHz band is much less crowded. There are several channels which are > shared with weather radar, so wifi equipment can't use those unless they are > capable of detecting the radar transmissions, but even without those there are > far more 20MHz channels available. There's also much less legacy equipment > using it - even 802.11a is relatively uncommon (and is fairly benign in > behaviour). The downside is that 5GHz doesn't propagate as far, or as easily > through walls. > > Wider bandwidth channels can be used to shorten the time taken for each > transmission. However, this effect is not linear, because the RTS/CTS > handshake and preamble are fixed overheads (since they must be transmitted at > a low speed to ensure that all clients can hear them), taking the same length > of time regardless of any other enhancements. This implies that in seriously > geographically-congested scenarios, 20MHz channels (and lots of APs to use > them all) are still the most efficient. MIMO can still be used to beneficial > effect in these situations. Another good reason for sticking to 20MHz channels is that it gives you more channels available, so you can deploy more APs without them interfering with each other's footprints. This can significantly reduce the distance between the mobile user and the closest AP. > Multi-target MIMO allows an AP to transmit to several clients simultaneously, > without requiring the client to support MIMO themselves. This requires the > AP's antennas and radios to be dynamically reconfigured for beamforming - > giving each client a clear version of its own signal and a null for the other > signals - which is a tricky procedure. APs that do implement this well are > highly valuable in congested situations. how many different targets can such APs handle? if it's only a small number, I'm not sure it helps much. Also, is this a transmit-only feature? or can it help decipher multiple mobile devices transmitting at the same time? > Single-target MIMO allows higher bandwidth between one client at a time and > the AP. Both the AP and the client must support MIMO for this to work. > There are physical constraints which limit the ability for handheld devices to > support MIMO. In general, this form of MIMO improves throughput in the home, > but is not very useful in congested situations. High individual throughput is > not what's needed in a crowded arena; rather, reliable if slow individual > throughput, reasonable latency, and high aggregate throughput. well, if the higher bandwidth to an individual user ended up reducing the airtime that user takes up, it could help. but I suspect that the devices that do this couldn't keep track of a few dozen endpoints. > Choosing the most effective radio bandwidth and modulation is a difficult > problem. The Minstrel algorithm seems to be an effective solution for general > traffic. Some manual constraints may be appropriate in some circumstances, > such as reducing the maximum radio bandwidth (trading throughput of one AP > against coexistence with other APs) and increasing the modulation rate of > management broadcasts (reducing per-packet overhead). agreed. > Packet aggregation allow several IP packets to be combined into a single > wireless transmission. This avoids performing the CSMA/CA steps repeatedly, > which is a considerable overhead. There are several types of packet > aggregation - the type adopted by 802.11ac allows individual IP packets within > a transmission to be link-layer acknowledged separately, so that a minor > corruption doesn't require transmission of the entire aggregate. By contrast, > 802.11n also supported a version which did require that, despite a slightly > lower overhead. There are other overheads that are saved with this, since the TCP packet is encapsulated in the wireless transmission, things like link-layer encryption and other encasulation overhead benefit from this aggregation. But with the n style 'all or nothing' mode, the fact that the transmission takes longer, and is therefor more likely to get clobbered is a much more significant problem. This needs to be tweakable. In low-congestion, high throughput situations, you want to do a lot of aggregation, in high-congestion situations, you want to limit this. note, "low-contstion, high throughput" doesn't have to mean a small number of stations. It could be a significant number of mobile devices that are all watching streaming video from the AP. The AP could be transmitting nearly continuously, but the mobile devices transmit only in response, so there would be very little contention) > Implicit in the packet-aggregation system is the problem of collecting packets > to aggregate. Each transmission is between the AP and one client, so the > packets aggregated by the AP all have to be for the same client. (The client > can assume that all packets go to the AP.) A fair-queueing algorithm could > have the effect of forming per-client queues, so several suitable packets > could easily be located in such a queue. In a straight FIFO queue, however, > packets for the same client are likely to be separated in the queue and thus > difficult to find. It is therefore *obviously* in the AP's interest to > implement a fair-queueing algorithm based on client MAC address, even if it > does nothing else to manage congestion. > > NB: if a single aggregate could be intended to be heard by more than one > client, then the complexity of multi-target beamforming MIMO would not be > necessary. This is how I infer the strict one-to-one nature of data > transmissions, as distinct from management broadcasts. yes, multicast has a lot of potential benefits, but it's never lived up to it's promises in the real world. In effect, everything is unicast, even if you have a lot of people watching the same video, they are all at slightly different points, needing slightly different packets retransmitted, etc. In a radio environment this is even more so. one station may be hearing something perfectly while another is unable to hear the same packet due to a hidden node transmission. > On 23 Aug, 2014, at 10:26 pm, Michael Welzl wrote: > >>>> because of the "function" i wrote above: the more you retry, the more you >>>> need to buffer when traffic continuously arrives because you're stuck >>>> trying to send a frame again. >>> >>> huh, I'm missing something here, retrying sends would require you to buffer >>> more when sending. >> >> aren't you the saying the same thing as I ? Sorry else, I might have >> expressed it confusingly somehow > > There should be enough buffering to allow effective aggregation, but as little > as possible on top of that. I don't know how much aggregation can be done, > but I assume that there is a limit, and that it's not especially high in terms > of full-length packets. After all, tying up the channel for long periods of > time is unfair to other clients - a typical latency/throughput tradeoff. Aggregation is not necessarily worth pursuing. > Equally clearly, in a heavily congested scenario the AP benefits from having a > lot of buffer divided among a large number of clients, but each client should > have only a small buffer. the key thing is how long the data sits in the buffer. If it sits too long, it doesn't matter that it's the only packet for this client, it still is too much buffering. >>> If people are retrying when they really don't need to, that cuts down on the avialable airtime. >> >> Yes > > Given that TCP retries on loss, and UDP protocols are generally loss-tolerant > to a degree, there should therefore be a limit on how hard the link-layer > stuff tries to get each individual packet through. Minstrel appears to be > designed around a time limit for that sort of thing, which seems sane - and > they explicitly talk about TCP retransmit timers in that context. > > With that said, link-layer retries are a valid mechanism to minimise > unnecessarily lost packets. It's also not new - bus/hub Ethernet does this on > collision detection. What Ethernet doesn't have is the link-layer ack, so > there's an additional set of reasons why a backoff-and-retry might happen in > wifi. > > Modern wifi variants use packet aggregation to improve efficiency. This only > works when there are multiple packets to send at a time from one place to a > specific other place - which is more likely when the link is congested. In > the event of a retry, it makes sense to aggregate newly buffered packets with > the original ones, to reduce the number of negotiation and retry cycles. up to a point. It could easily be that the right thing to do is NOT to aggregate the new packets because it will make it far more likely that they will all fail (ac mitigates this in theory, but until there is really driver support, the practice is questionable) >>> But if you have continual transmissions taking place, so you have a hard >>> time getting a chance to send your traffic, then you really do have >>> congestion and should be dropping packets to let the sender know that it >>> shouldn't try to generate as much. >> >> Yes; but the complexity that I was pointing at (but maybe it's a simple >> parameter, more like a 0 or 1 situation in practice?) lies in the word >> "continual". How long do you try before you decide that the sending TCP >> should really think it *is* congestion? To really optimize the behavior, >> that would have to depend on the RTT, which you can't easily know. > > There are TCP congestion algorithms which explicitly address this (eg. > Westwood+), by reacting only a little to individual drops, but reacting more > rapidly if drops occur frequently. In principle they should also react > quickly to ECN, because that is never triggered by random noise loss alone. correct. David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 1:33 ` David Lang @ 2014-08-24 2:29 ` Jonathan Morton 2014-08-24 5:12 ` David Lang 0 siblings, 1 reply; 56+ messages in thread From: Jonathan Morton @ 2014-08-24 2:29 UTC (permalink / raw) To: David Lang; +Cc: bloat Mainlinglist On 24 Aug, 2014, at 4:33 am, David Lang wrote: > On Sun, 24 Aug 2014, Jonathan Morton wrote: > >> My general impression is that 802.11ac makes a serious effort to improve matters in heavily-congested, many-clients scenarios, which was where earlier variants had the most trouble. If you're planning to set up or go to a major conference, the best easy thing you can do is get 'ac' equipment all round - if nothing else, it's guaranteed to support the 5GHz band. Of course, we're not just considering the easy solutions. > > If ac had reasonable drivers available I would agree, but when you are limited to factory firmware, it's not good. Hmm. What are the current limitations, compared to 'n' equipment? >> Wider bandwidth channels can be used to shorten the time taken for each transmission. However, this effect is not linear, because the RTS/CTS handshake and preamble are fixed overheads (since they must be transmitted at a low speed to ensure that all clients can hear them), taking the same length of time regardless of any other enhancements. This implies that in seriously geographically-congested scenarios, 20MHz channels (and lots of APs to use them all) are still the most efficient. MIMO can still be used to beneficial effect in these situations. > > Another good reason for sticking to 20MHz channels is that it gives you more channels available, so you can deploy more APs without them interfering with each other's footprints. This can significantly reduce the distance between the mobile user and the closest AP. No, that is *the* good reason. If you don't have a lot of APs, you might as well use 40MHz or 80MHz channels to increase the throughput per AP. >> Multi-target MIMO allows an AP to transmit to several clients simultaneously, without requiring the client to support MIMO themselves. This requires the AP's antennas and radios to be dynamically reconfigured for beamforming - giving each client a clear version of its own signal and a null for the other signals - which is a tricky procedure. APs that do implement this well are highly valuable in congested situations. > > how many different targets can such APs handle? if it's only a small number, I'm not sure it helps much. The diagram I saw on Cisco's website demonstrated the process for three clients, so I assume that's their present target. I think four targets is plausible at the high end, once implementations mature, though the standard permits 8-way in theory. The RF hardware requirements are similar to 'n'-style single-target MIMO. > Also, is this a transmit-only feature? or can it help decipher multiple mobile devices transmitting at the same time? I think it *could* be used for receive as well. The AP could hear several RTS packets, configure itself for multiple receive, then send CTS in multiple, in order to signal that. The trick is with hearing the multiple RTSes, I think. >> Single-target MIMO allows higher bandwidth between one client at a time and the AP. Both the AP and the client must support MIMO for this to work. There are physical constraints which limit the ability for handheld devices to support MIMO. In general, this form of MIMO improves throughput in the home, but is not very useful in congested situations. High individual throughput is not what's needed in a crowded arena; rather, reliable if slow individual throughput, reasonable latency, and high aggregate throughput. > > well, if the higher bandwidth to an individual user ended up reducing the airtime that user takes up, it could help. but I suspect that the devices that do this couldn't keep track of a few dozen endpoints. I think multi-target MIMO is more useful than single-target MIMO for the congested case. It certainly helps that the client doesn't need to explicitly support MIMO for it to work. > This needs to be tweakable. In low-congestion, high throughput situations, you want to do a lot of aggregation, in high-congestion situations, you want to limit this. Yes, that makes sense. The higher the latency, beyond some threshold, the more likely that spurious retransmits (TCP or UDP) will occur, making the congestion worse and crippling goodput. So latency trumps throughput in the congested case. However, I think there is a sliding scale on this. With the modern modulation schemes (and especially with wide channels), the handshake and preamble really are a lot of overhead. If you have the chance to triple your throughput for a 20% increase in channel occupation, you need a *really* good reason not to take it. >> There should be enough buffering to allow effective aggregation, but as little as possible on top of that. I don't know how much aggregation can be done, but I assume that there is a limit, and that it's not especially high in terms of full-length packets. After all, tying up the channel for long periods of time is unfair to other clients - a typical latency/throughput tradeoff. > > Aggregation is not necessarily worth pursuing. > >> Equally clearly, in a heavily congested scenario the AP benefits from having a lot of buffer divided among a large number of clients, but each client should have only a small buffer. > > the key thing is how long the data sits in the buffer. If it sits too long, it doesn't matter that it's the only packet for this client, it still is too much buffering. Even if it's a Fair Queue, so *every* client has only a single packet waiting? >> Modern wifi variants use packet aggregation to improve efficiency. This only works when there are multiple packets to send at a time from one place to a specific other place - which is more likely when the link is congested. In the event of a retry, it makes sense to aggregate newly buffered packets with the original ones, to reduce the number of negotiation and retry cycles. > > up to a point. It could easily be that the right thing to do is NOT to aggregate the new packets because it will make it far more likely that they will all fail (ac mitigates this in theory, but until there is really driver support, the practice is questionable) From what I read, I got the impression that 'ac' *forbids* the use of the fragile aggregation schemes. Are the drivers really so awful that they are noncompliant? and from Steinar... >> Yep... I remember a neat paper from colleagues at Trento University that piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions between TCP's ACKs and other data packets - really nice. Not sure if it wasn't just simulations, though. > > that's a neat hack, but I don't see it working, except when one end of the wireless link is also the endpoint of the TCP connection (and then only for acks from that device) > > so in a typical wifi environment, it would be one less transmission from the laptop, no change to the AP. > > But even with that, doesn't TCP try to piggyback the ack on the next packet of data anyway? so unless it's a purely one-way dataflow, this still wouldn't help. Once established, a HTTP session looks exactly like that. I also see no reason in theory why a TCP ack couldn't be piggybacked on the *next* available link ack, which would relax the latency requirements considerably. If that were implemented and deployed successfully, it would mean that the majority of RTS/CTS handshakes initiated by clients would be to send DNS queries, TCP handshakes and HTTP request headers, all of which are actually important. It would, I think, typically reduce contention by a large margin. - Jonathan Morton ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 2:29 ` Jonathan Morton @ 2014-08-24 5:12 ` David Lang 2014-08-24 6:26 ` Jonathan Morton 0 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-24 5:12 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat Mainlinglist On Sun, 24 Aug 2014, Jonathan Morton wrote: > On 24 Aug, 2014, at 4:33 am, David Lang wrote: > >> On Sun, 24 Aug 2014, Jonathan Morton wrote: >> >>> My general impression is that 802.11ac makes a serious effort to improve matters in heavily-congested, many-clients scenarios, which was where earlier variants had the most trouble. If you're planning to set up or go to a major conference, the best easy thing you can do is get 'ac' equipment all round - if nothing else, it's guaranteed to support the 5GHz band. Of course, we're not just considering the easy solutions. >> >> If ac had reasonable drivers available I would agree, but when you are >> limited to factory firmware, it's not good. > > Hmm. What are the current limitations, compared to 'n' equipment? simply that when you ask the OpenWRT developers about ac equipment, they respond that they're working on it, but right now the best you have is binary drivers that only work with specific kernel versions provided by the manufacturers. Any source level drivers they have are "junk" and should be avoided. The inability to create a custom system build is crippling for doing things like high-density networking >>> Single-target MIMO allows higher bandwidth between one client at a time and >>> the AP. Both the AP and the client must support MIMO for this to work. >>> There are physical constraints which limit the ability for handheld devices >>> to support MIMO. In general, this form of MIMO improves throughput in the >>> home, but is not very useful in congested situations. High individual >>> throughput is not what's needed in a crowded arena; rather, reliable if slow >>> individual throughput, reasonable latency, and high aggregate throughput. >> >> well, if the higher bandwidth to an individual user ended up reducing the >> airtime that user takes up, it could help. but I suspect that the devices >> that do this couldn't keep track of a few dozen endpoints. > > I think multi-target MIMO is more useful than single-target MIMO for the > congested case. It certainly helps that the client doesn't need to explicitly > support MIMO for it to work. better yes, but at what price difference? :-) If the APs cost $1000 each instead of $100 each, you are better off with more of the cheaper APs. If they are $200 instead of $100, it may help more, but it's all a matter of how much traffic is going in each direction. >> This needs to be tweakable. In low-congestion, high throughput situations, >> you want to do a lot of aggregation, in high-congestion situations, you want >> to limit this. > > Yes, that makes sense. The higher the latency, beyond some threshold, the > more likely that spurious retransmits (TCP or UDP) will occur, making the > congestion worse and crippling goodput. So latency trumps throughput in the > congested case. it's not latency, it's how long the radio transmission takes. sending all pending packets for that destination in one long transmission will minimize the latency for that destination, and provide the best overall throughput for the system, in a quiet RF environment. but if there is a n% chance every ms of someone else transmitting, then you really want to keep your transmissions as small as possible > However, I think there is a sliding scale on this. With the modern modulation > schemes (and especially with wide channels), the handshake and preamble really > are a lot of overhead. If you have the chance to triple your throughput for a > 20% increase in channel occupation, you need a *really* good reason not to > take it. if you can send 300% as much data in 120% the time, then the overhead of sending a single packet is huge (you _far_ spend more airtime on the overhead than the packet itself) now, this may be true for small packets, which is why this should be configurable, and configurable in terms of data size, not packet count. by the way, the same effect happens on wired ethernet networks, see Jumbo Frames and the advantages of using them. the advantages are probably not 300% data in 120% time, but more like 300% data in 270% time, and at that point, the fact that you are 2.7x as likely to loose the packet to another transmission very quickly make it the wrong thing to do. >> >>> Equally clearly, in a heavily congested scenario the AP benefits from having >>> a lot of buffer divided among a large number of clients, but each client >>> should have only a small buffer. >> >> the key thing is how long the data sits in the buffer. If it sits too long, >> it doesn't matter that it's the only packet for this client, it still is too >> much buffering. > > Even if it's a Fair Queue, so *every* client has only a single packet waiting? yep, if there are too many clients, even one packet per client can end up with the overall delay being excessive. you won't hit this with 20 clients, but you sure would with 1000 clients. I don't know where the crossover point would be, but I know that you do want a limit on the total buffer size. >>> Modern wifi variants use packet aggregation to improve efficiency. This >>> only works when there are multiple packets to send at a time from one place >>> to a specific other place - which is more likely when the link is congested. >>> In the event of a retry, it makes sense to aggregate newly buffered packets >>> with the original ones, to reduce the number of negotiation and retry >>> cycles. >> >> up to a point. It could easily be that the right thing to do is NOT to >> aggregate the new packets because it will make it far more likely that they >> will all fail (ac mitigates this in theory, but until there is really driver >> support, the practice is questionable) > > From what I read, I got the impression that 'ac' *forbids* the use of the > fragile aggregation schemes. Are the drivers really so awful that they are > noncompliant? without having looked at any driver, I can tell you the answer is a strong YES :-) besides, the other end of the connection may not be ac, it may only be n, and it would do what it wants. >>> Yep... I remember a neat paper from colleagues at Trento University that >>> piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions >>> between TCP's ACKs and other data packets - really nice. Not sure if it >>> wasn't just simulations, though. >> >> that's a neat hack, but I don't see it working, except when one end of the >> wireless link is also the endpoint of the TCP connection (and then only for >> acks from that device) >> >> so in a typical wifi environment, it would be one less transmission from the >> laptop, no change to the AP. >> >> But even with that, doesn't TCP try to piggyback the ack on the next packet >> of data anyway? so unless it's a purely one-way dataflow, this still wouldn't >> help. > > Once established, a HTTP session looks exactly like that. I also see no > reason in theory why a TCP ack couldn't be piggybacked on the *next* available > link ack, which would relax the latency requirements considerably. I don't understand (or we are talking past each other again) laptop -- ap -- 50 hops -- server packets from the server to the laptop could have an ack piggybacked by the driver on the wifi link ack, but for packets the other direction, the ap can't possibly know that the server will ever respond, so it can't reply with a TCP level ack when it does the link level ack. If the ack packets are already combined by the laptop and server with the next data packet, stand-alone ack packets should be pretty rare. to be clear, what I'm thinking of is TCP offload type of operation on the laptop, something along the lines that when the driver receives a TCP packet destined for the laptop, the driver will consider the ack sent (and update the OS state accordingly), meanwhile at the AP, if the next hop is the final destination, then when it gets the link-level ack back from the wifi hop it could then generate a ack back to the server. without the need for the TCP ack packet to ever go over the air. > If that were implemented and deployed successfully, it would mean that the > majority of RTS/CTS handshakes initiated by clients would be to send DNS > queries, TCP handshakes and HTTP request headers, all of which are actually > important. It would, I think, typically reduce contention by a large margin. only if stand-alone ack packets are really a significant portion of the network traffic. David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 5:12 ` David Lang @ 2014-08-24 6:26 ` Jonathan Morton 2014-08-24 8:24 ` David Lang 0 siblings, 1 reply; 56+ messages in thread From: Jonathan Morton @ 2014-08-24 6:26 UTC (permalink / raw) To: David Lang; +Cc: bloat Mainlinglist On 24 Aug, 2014, at 8:12 am, David Lang wrote: > On Sun, 24 Aug 2014, Jonathan Morton wrote: >> I think multi-target MIMO is more useful than single-target MIMO for the congested case. It certainly helps that the client doesn't need to explicitly support MIMO for it to work. > > better yes, but at what price difference? :-) > > If the APs cost $1000 each instead of $100 each, you are better off with more of the cheaper APs. ...until you run out of channels to run them on. Then, if you still need more capacity, multi-target MIMO is probably still worth it. Hopefully, it won't be as much as a tenfold price difference. >> However, I think there is a sliding scale on this. With the modern modulation schemes (and especially with wide channels), the handshake and preamble really are a lot of overhead. If you have the chance to triple your throughput for a 20% increase in channel occupation, you need a *really* good reason not to take it. > > if you can send 300% as much data in 120% the time, then the overhead of sending a single packet is huge (you _far_ spend more airtime on the overhead than the packet itself) > > now, this may be true for small packets, which is why this should be configurable, and configurable in terms of data size, not packet count. > > by the way, the same effect happens on wired ethernet networks, see Jumbo Frames and the advantages of using them. > > the advantages are probably not 300% data in 120% time, but more like 300% data in 270% time, and at that point, the fact that you are 2.7x as likely to loose the packet to another transmission very quickly make it the wrong thing to do. The conditions are probably different in each direction. The AP is more likely to be sending large packets (DNS response, HTTP payload) while the client is more likely to send small packets (DNS request, TCP SYN, HTTP GET). The AP is also likely to want to aggregate a TCP SYN/ACK with another packet. So yes, intelligence of some sort is needed. And I should probably look up just how big the handshake and preamble are in relative terms - but I do already know that under ideal conditions, recent wifi variants still get a remarkably small percentage of their theoretical data rate as actual throughput - and that's with big packets and aggregation. >>> But even with that, doesn't TCP try to piggyback the ack on the next packet of data anyway? so unless it's a purely one-way dataflow, this still wouldn't help. >> >> Once established, a HTTP session looks exactly like that. I also see no reason in theory why a TCP ack couldn't be piggybacked on the *next* available link ack, which would relax the latency requirements considerably. > > I don't understand (or we are talking past each other again) > > laptop -- ap -- 50 hops -- server > > packets from the server to the laptop could have an ack piggybacked by the driver on the wifi link ack, but for packets the other direction, the ap can't possibly know that the server will ever respond, so it can't reply with a TCP level ack when it does the link level ack. Which is fine, because the bulk of the traffic will be from the AP to the client. Unless you're running servers wirelessly, which seems dumb, or you've got a bunch of journalists uploading copy and photos, which seems like a more reasonable use case. But what I meant is that the TCP ack doesn't need to be piggybacked on the link-level ack for the same packet - it can go on a later one. Think VJ compression in PPP - there's a small lookup table which can be used to fill in some of the data. >> If that were implemented and deployed successfully, it would mean that the majority of RTS/CTS handshakes initiated by clients would be to send DNS queries, TCP handshakes and HTTP request headers, all of which are actually important. It would, I think, typically reduce contention by a large margin. > > only if stand-alone ack packets are really a significant portion of the network traffic. I think they are significant, in terms of the number of uncoordinated contentions for the channel. Remember, the AP occupies a privileged position in the network. It transmits the bulk of the data, and the bulk of the number of individual packets. It knows when it's already busy itself, so the backoff algorithm never kicks in for the noise it makes itself. It can be a model citizen of the wireless spectrum. By contrast, clients send much less on an individual basis, but they have to negotiate with the AP *and* every other client for airtime to do so. Every TCP ack disrupts the AP's flow of traffic. If the AP aggregates three HTTP payload packets into a single transmission, then it must expect to receive a TCP ack coming the other way - in other words, to be interrupted - for every such aggregate packet it has sent. The less often clients have to contend for the channel, the more time the AP can spend distributing its self-coordinated, useful traffic. Let's suppose a typical HTTP payload is 45kB (including TCP/IP wrapping). That can be transmitted in 10 triples of 1500B packets. There would also be a DNS request and response, a TCP handshake (SYN, SYN/ACK), a HTTP request (ACK/GET), and a TCP close (FIN/ACK, ACK), which I'll assume can't be aggregated with other traffic, associated with the transaction. So the AP must transmit 13 times to complete this small request. As things currently stand, the client must *also* transmit - 14 times. The wireless channel is therefore contended for 27 times, of which 10 (37%) are pure TCP acks that could piggyback on a subsequent link-layer ack. I'd say 37% is significant, wouldn't you? - Jonathan Morton ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 6:26 ` Jonathan Morton @ 2014-08-24 8:24 ` David Lang 2014-08-24 9:20 ` Jonathan Morton ` (2 more replies) 0 siblings, 3 replies; 56+ messages in thread From: David Lang @ 2014-08-24 8:24 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat Mainlinglist On Sun, 24 Aug 2014, Jonathan Morton wrote: > On 24 Aug, 2014, at 8:12 am, David Lang wrote: > >> On Sun, 24 Aug 2014, Jonathan Morton wrote: > >>> I think multi-target MIMO is more useful than single-target MIMO for the >>> congested case. It certainly helps that the client doesn't need to >>> explicitly support MIMO for it to work. >> >> better yes, but at what price difference? :-) >> >> If the APs cost $1000 each instead of $100 each, you are better off with more >> of the cheaper APs. > > ...until you run out of channels to run them on. Then, if you still need more > capacity, multi-target MIMO is probably still worth it. keep in mind that MIMO only increases your capacity in one direction. also with 5GHz, you have quite a few channels available. By turning the power down (especially if you can tell the clients to do so as well), you can pack a LOT of APs into an area. Yes, you will evenually run out of channels, but now you are talking extreme density, not just high density > Hopefully, it won't be as much as a tenfold price difference. Until we have open drivers for the commodity APs to let them be fully controlled, you are comparing sub $100 home APs to the high end centralized AP systems from Cisco and similar, a tenfold price difference is actually pretty close to accurate. ($700+ per AP, plus the central system to run them, then software licenses, maintinance contracts, etc...) It's only a couple years ago that a building I would have setup for a couple thousand first had a $50K proprietary system purchased for it, which was then replaced by an even more expensive system >>> However, I think there is a sliding scale on this. With the modern >>> modulation schemes (and especially with wide channels), the handshake and >>> preamble really are a lot of overhead. If you have the chance to triple >>> your throughput for a 20% increase in channel occupation, you need a >>> *really* good reason not to take it. >> >> if you can send 300% as much data in 120% the time, then the overhead of >> sending a single packet is huge (you _far_ spend more airtime on the overhead >> than the packet itself) >> >> now, this may be true for small packets, which is why this should be >> configurable, and configurable in terms of data size, not packet count. >> >> by the way, the same effect happens on wired ethernet networks, see Jumbo >> Frames and the advantages of using them. >> >> the advantages are probably not 300% data in 120% time, but more like 300% >> data in 270% time, and at that point, the fact that you are 2.7x as likely to >> loose the packet to another transmission very quickly make it the wrong thing >> to do. > > The conditions are probably different in each direction. The AP is more > likely to be sending large packets (DNS response, HTTP payload) while the > client is more likely to send small packets (DNS request, TCP SYN, HTTP GET). > The AP is also likely to want to aggregate a TCP SYN/ACK with another packet. If your use case is web browsing or streaming video yes. If it's gaming or other interactive use, much less so. > So yes, intelligence of some sort is needed. And I should probably look up > just how big the handshake and preamble are in relative terms - but I do > already know that under ideal conditions, recent wifi variants still get a > remarkably small percentage of their theoretical data rate as actual > throughput - and that's with big packets and aggregation. That is very true and not something I'm disagreeing with ac type aggregation where individual packets are acked so that only the ones that get clobbered need to be re-sent makes this a lot less painful. but even there, if the second and tenth packets get clobbered, is it smart enough to only resend those two? or will it resend 2-10? >>>> But even with that, doesn't TCP try to piggyback the ack on the next packet >>>> of data anyway? so unless it's a purely one-way dataflow, this still >>>> wouldn't help. >>> >>> Once established, a HTTP session looks exactly like that. I also see no >>> reason in theory why a TCP ack couldn't be piggybacked on the *next* >>> available link ack, which would relax the latency requirements considerably. >> >> I don't understand (or we are talking past each other again) >> >> laptop -- ap -- 50 hops -- server >> >> packets from the server to the laptop could have an ack piggybacked by the >> driver on the wifi link ack, but for packets the other direction, the ap >> can't possibly know that the server will ever respond, so it can't reply with >> a TCP level ack when it does the link level ack. > > Which is fine, because the bulk of the traffic will be from the AP to the > client. Unless you're running servers wirelessly, which seems dumb, or you've > got a bunch of journalists uploading copy and photos, which seems like a more > reasonable use case. > > But what I meant is that the TCP ack doesn't need to be piggybacked on the > link-level ack for the same packet - it can go on a later one. Think VJ > compression in PPP - there's a small lookup table which can be used to fill in > some of the data. > >>> If that were implemented and deployed successfully, it would mean that the >>> majority of RTS/CTS handshakes initiated by clients would be to send DNS >>> queries, TCP handshakes and HTTP request headers, all of which are actually >>> important. It would, I think, typically reduce contention by a large >>> margin. >> >> only if stand-alone ack packets are really a significant portion of the >> network traffic. > > I think they are significant, in terms of the number of uncoordinated > contentions for the channel. > > Remember, the AP occupies a privileged position in the network. It transmits > the bulk of the data, and the bulk of the number of individual packets. It > knows when it's already busy itself, so the backoff algorithm never kicks in > for the noise it makes itself. not for the noise it makes itself, but the noise from all the other APs and the large mass of clients talking to those other APs is a problem > It can be a model citizen of the wireless spectrum. agreed, it's also the one part that we (as network admins) have a hope of controlling > By contrast, clients send much less on an individual basis, but they have to > negotiate with the AP *and* every other client for airtime to do so. Every > TCP ack disrupts the AP's flow of traffic. If the AP aggregates three HTTP > payload packets into a single transmission, then it must expect to receive a > TCP ack coming the other way - in other words, to be interrupted - for every > such aggregate packet it has sent. The less often clients have to contend for > the channel, the more time the AP can spend distributing its self-coordinated, > useful traffic. > > Let's suppose a typical HTTP payload is 45kB (including TCP/IP wrapping). > That can be transmitted in 10 triples of 1500B packets. There would also be a > DNS request and response, a TCP handshake (SYN, SYN/ACK), a HTTP request > (ACK/GET), and a TCP close (FIN/ACK, ACK), which I'll assume can't be > aggregated with other traffic, associated with the transaction. > > So the AP must transmit 13 times to complete this small request. As things > currently stand, the client must *also* transmit - 14 times. The wireless > channel is therefore contended for 27 times, of which 10 (37%) are pure TCP > acks that could piggyback on a subsequent link-layer ack. not all transmissions are equal. a transmission of 3x1500 byte packets takes a LOT longer than of a single 64 byte ack packet (and remember that ack packets can be aggregated as well, so it's not one tcp ack per aggregate sent, but it is one RF ack per aggregate sent) so the overhead is a lot less than 37% but it is a lot larger than simple packet size would indicate, because the encapsulation per transmission is a fixed size (and no, I don't know how large it is) David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 8:24 ` David Lang @ 2014-08-24 9:20 ` Jonathan Morton 2014-08-25 7:25 ` Michael Welzl 2014-08-30 7:20 ` Jonathan Morton 2 siblings, 0 replies; 56+ messages in thread From: Jonathan Morton @ 2014-08-24 9:20 UTC (permalink / raw) To: David Lang; +Cc: bloat Mainlinglist On 24 Aug, 2014, at 11:24 am, David Lang wrote: > but it is a lot larger than simple packet size would indicate, because the encapsulation per transmission is a fixed size (and no, I don't know how large it is) I found a reference to the preamble/header format of 802.11a/g. Under ideal conditions with default settings, the preamble+header of the PPDU is transmitted at 6Mbps and takes about 50µs; the PSDU adds a further 18µs overhead; a 4500-byte payload would then take 667µs to transmit at 54Mbps. (Except that 'g' doesn't support a payload that big.) So even with a fairly large aggregated payload, and ignoring RTS/CTS, the overhead is several percent. A 64-byte packet takes only 10µs to send at that speed, so that 68µs overhead looks really nasty even at lower speeds. And even that assumes that there aren't any 'b' devices associated, which would cause the whole network to fall back to 96µs 'b' preambles. That document doesn't even mention RTS/CTS/ACK. I found another diagram which points out that each of these takes 30µs to send, including 10µs periods of silence. So that's 90+68+10 = 168µs to send a 64-byte packet, and 90+68+222 = 380µs to send a 1500-byte packet - only twice as long. http://ict.siit.tu.ac.th/~sgordon/its413y12s2/unprotected/ITS413Y12S2H22-DCF-RTS-Three-Stations-Hidden.png The practical upshot - for 802.11a/g, a TCP ack costs roughly half as much channel time as a full data packet. An ack every three packets, with no aggregation on the data packets, would consume one-seventh (14%) of the channel time. It's not as high as my previous estimate, but it's still significant - and it backs up my hunch that simply negotiating for the channel is a big enough overhead to talk about separately. - Jonathan Morton ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 8:24 ` David Lang 2014-08-24 9:20 ` Jonathan Morton @ 2014-08-25 7:25 ` Michael Welzl 2014-08-30 7:20 ` Jonathan Morton 2 siblings, 0 replies; 56+ messages in thread From: Michael Welzl @ 2014-08-25 7:25 UTC (permalink / raw) To: David Lang; +Cc: bloat Mainlinglist >> The conditions are probably different in each direction. The AP is more likely to be sending large packets (DNS response, HTTP payload) while the client is more likely to send small packets (DNS request, TCP SYN, HTTP GET). The AP is also likely to want to aggregate a TCP SYN/ACK with another packet. > > If your use case is web browsing or streaming video yes. If it's gaming or other interactive use, much less so. There's worse. Every time I'm on a public wifi network and send an email with an attached file (say, a paper I forward to someone), I wonder what that does to all the others on the network... these things happen. Dropbox sync's in the background. Maybe some people try to use skype with video even when the link conditions are bad. Do such behaviors have a DoS'ish effect? They really shouldn't... Cheers, Michael ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 8:24 ` David Lang 2014-08-24 9:20 ` Jonathan Morton 2014-08-25 7:25 ` Michael Welzl @ 2014-08-30 7:20 ` Jonathan Morton 2014-08-31 20:46 ` Simon Barber 2 siblings, 1 reply; 56+ messages in thread From: Jonathan Morton @ 2014-08-30 7:20 UTC (permalink / raw) To: David Lang; +Cc: bloat Mainlinglist On 24 Aug, 2014, at 11:24 am, David Lang wrote: >> The conditions are probably different in each direction. The AP is more likely to be sending large packets (DNS response, HTTP payload) while the client is more likely to send small packets (DNS request, TCP SYN, HTTP GET). The AP is also likely to want to aggregate a TCP SYN/ACK with another packet. > > If your use case is web browsing or streaming video yes. If it's gaming or other interactive use, much less so. That's fair enough. But the conditions in both directions are *still* different, to the point where I am wary of attempting to simulate multiple wireless clients using a single piece of hardware. The big problem is that clients have the sheer weight of numbers behind them when negotiating for the channel, and are therefore quite capable of starving the AP if there are enough of them. This results in congestion collapse, as the clients aggressively demand updates on where the responses to their requests have got to, while the poor AP can't get a packet in edgewise to answer them. It doesn't matter, for that purpose, whether the packets are bigger in one direction than the other - the per-transmission overhead in modern wifi is big enough to swamp that effect. For the sake of amusement, I'm going to call this the "airport problem". Imagine a harassed airline desk clerk, besieged by hundreds of irate passengers who have just been sat on the tarmac for three hours. I don't think this is a new problem with wireless networks, either - it should happen on bus Ethernet, too. That's probably a large factor behind the comprehensive shift away from bus and hub Ethernet to switched Ethernet on most corporate LANs, which have a habit of acquiring large numbers of clients. Fortunately, modern wifi also comes with a mechanism that could, theoretically, be used to combat this problem. An AP with a lot to send could ignore clients' RTS, and respond with an RTS of its own instead of a CTS. This would allow it to get its greater volume of packets, data and/or TCP ACKs through, satisfying the requests and hopefully pacifying the crowd. But I have no idea at present whether that technique is actually in use. - Jonathan Morton ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-30 7:20 ` Jonathan Morton @ 2014-08-31 20:46 ` Simon Barber 0 siblings, 0 replies; 56+ messages in thread From: Simon Barber @ 2014-08-31 20:46 UTC (permalink / raw) To: Jonathan Morton, David Lang; +Cc: bloat Mainlinglist [-- Attachment #1: Type: text/plain, Size: 2947 bytes --] Modern APs use more agressive channel access parameters than clients. They can also control the parameters the clients use. One major issue is that to remove bloat in a wireless environment and keep access fair and delays low you really want to integrate the AQM and the packet scheduling, while tracking airtime usage. I very much doubt any equipment is doing this. Simon On August 30, 2014 12:20:48 AM PDT, Jonathan Morton <chromatix99@gmail.com> wrote: > >On 24 Aug, 2014, at 11:24 am, David Lang wrote: > >>> The conditions are probably different in each direction. The AP is >more likely to be sending large packets (DNS response, HTTP payload) >while the client is more likely to send small packets (DNS request, TCP >SYN, HTTP GET). The AP is also likely to want to aggregate a TCP >SYN/ACK with another packet. >> >> If your use case is web browsing or streaming video yes. If it's >gaming or other interactive use, much less so. > >That's fair enough. But the conditions in both directions are *still* >different, to the point where I am wary of attempting to simulate >multiple wireless clients using a single piece of hardware. > >The big problem is that clients have the sheer weight of numbers behind >them when negotiating for the channel, and are therefore quite capable >of starving the AP if there are enough of them. This results in >congestion collapse, as the clients aggressively demand updates on >where the responses to their requests have got to, while the poor AP >can't get a packet in edgewise to answer them. It doesn't matter, for >that purpose, whether the packets are bigger in one direction than the >other - the per-transmission overhead in modern wifi is big enough to >swamp that effect. > >For the sake of amusement, I'm going to call this the "airport >problem". Imagine a harassed airline desk clerk, besieged by hundreds >of irate passengers who have just been sat on the tarmac for three >hours. > >I don't think this is a new problem with wireless networks, either - it >should happen on bus Ethernet, too. That's probably a large factor >behind the comprehensive shift away from bus and hub Ethernet to >switched Ethernet on most corporate LANs, which have a habit of >acquiring large numbers of clients. > >Fortunately, modern wifi also comes with a mechanism that could, >theoretically, be used to combat this problem. An AP with a lot to >send could ignore clients' RTS, and respond with an RTS of its own >instead of a CTS. This would allow it to get its greater volume of >packets, data and/or TCP ACKs through, satisfying the requests and >hopefully pacifying the crowd. But I have no idea at present whether >that technique is actually in use. > > - Jonathan Morton > >_______________________________________________ >Bloat mailing list >Bloat@lists.bufferbloat.net >https://lists.bufferbloat.net/listinfo/bloat -- Sent from my Android device with K-9 Mail. Please excuse my brevity. [-- Attachment #2: Type: text/html, Size: 3561 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-23 23:29 ` Jonathan Morton 2014-08-23 23:40 ` Steinar H. Gunderson 2014-08-24 1:33 ` David Lang @ 2014-08-25 7:35 ` Michael Welzl 2 siblings, 0 replies; 56+ messages in thread From: Michael Welzl @ 2014-08-25 7:35 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat Mainlinglist Hi, Thank you *very* much for this email - I found it very educational! As you may guess, I knew about many but definitely not all things here - in particular, while I knew about aggregation, I never really thought of its buffering requirements and the obvious benefit of doing FQ or something similar in its favor (provided that this queue is exposed to the aggregation algorithm!). BTW on your side point on fairness of aggregation, I also haven't looked it up but might be willing to bet money on the fact that it's not just unfair but includes some mechanisms to back off. Let me try to get back to the matter at hand, though: you state that, except for aggregation, there really shouldn't be much more buffering on devices, and especially clients. I don't believe this. I see your argument about Westwood+ and the like, but in reality, you don't always know which TCP algorithm is in use. Being able to buffer *for the sake of link layer retransmissions* has at least two potential benefits: 1) retransmits occur on timescales that can be MUCH shorter than TCP's RTT - though, how much of a benefit this really is really depends on the RTT 2) successful retransmits can hide losses from TCP and if these were just occasional collisions but not really a form of overload, then this is doing the right thing - how much of a benefit this really is depends on how overloaded the wireless network truly is, and on the TCP algorithm in use WLAN equipment comes with standardized mechanisms embedded, which must be safe to work everywhere; cross-layering style optimizations (e.g. depending on the TCP variant) really don't have a place in that standard. However, in a more research-oriented playground-style world, we could try to dynamically optimize this buffer, because some of the factors can be known: - a flow's RTT can be measured (this requires keeping track of flows, which probably is doable in access equipment) - the TCP congestion control algorithm can be known in the host when we're talking about the buffer associated with uploads, in clients All in all, this sounds like interesting and useful research to me :-) *takes a mental note* Cheers, Michael On 24. aug. 2014, at 01:29, Jonathan Morton wrote: > I've done some reading on how wifi actually works, and what mechanisms the latest variants use to improve performance. It might be helpful to summarise my understanding here - biased towards the newer variants, since they are by now widely deployed. > > First a note on the variants themselves: > > 802.11 without suffix is obsolete and no longer in use. > 802.11a was the original 5GHz band version, giving 54Mbps in 20MHz channels. > 802.11b was the first "affordable" version, using 2.4GHz and giving 11Mbps in 20MHz channels. > 802.11g brought the 802.11a modulation schemes and (theoretical) performance to the 2.4GHz band. > 802.11n is dual-band, but optionally. Aggregation, 40MHz channels, single-target MIMO. > 802.11ac is 5GHz only. More aggregation, 80 & 160MHz channels, multi-target MIMO. Rationalised options, dropping many 'n' features that are more trouble than they're worth. Coexists nicely with older 20MHz-channel equipment, and nearby APs with overlapping spectrum. > > My general impression is that 802.11ac makes a serious effort to improve matters in heavily-congested, many-clients scenarios, which was where earlier variants had the most trouble. If you're planning to set up or go to a major conference, the best easy thing you can do is get 'ac' equipment all round - if nothing else, it's guaranteed to support the 5GHz band. Of course, we're not just considering the easy solutions. > > Now for some technical details: > > The wireless spectrum is fundamentally a shared-access medium. It also has the complication of being noisy and having various path-loss mechanisms, and of the "hidden node" problem where one client might not be able to hear another client's transmission, even though both are in range of the AP. > > Thus wifi uses a CSMA/CA algorithm as follows: > > 1) Listen for competing carrier. If heard, backoff and retry later. (Listening is continuous, and detected preambles are used to infer the time-length of packets when the data modulation is unreadable.) > 2) Perform an RTS/CTS handshake. If CTS doesn't arrive, backoff and retry later. > 3) Transmit, and await acknowledgement. If no ack, backoff and retry later, possibly using different modulation. > > This can be compared to Ethernet's CSMA/CD algorithm: > > 1) Listen for competing carrier. If heard, backoff and retry later. > 2) Transmit, listening for collision with a competing transmission. If collision, backoff and retry later. > > In both cases, the backoff is random and exponentially increasing, to reduce the chance of repeated collisions. > > The 2.4GHz band is chock-full of noise sources, from legacy 802.11b/g equipment to cordless phones, Bluetooth, and even microwave ovens - which generate the best part of a kilowatt of RF energy, but somehow manage to contain the vast majority of it within the cavity. It's also a relatively narrow band, with only three completely separate 20MHz channels available in most of the world (four in Japan). > > This isn't a massive concern for home use, but consumers still notice the effects surprisingly often. Perhaps they live in an apartment block with lots of devices and APs crowded together in an unmanaged mess. Perhaps they have a large home to themselves, but a bunch of noisy equipment reduces the effective range and reliability of their network. It's not uncommon to hear about networks that drop out whenever the phone rings, thanks to an old cordless phone. > > The 5GHz band is much less crowded. There are several channels which are shared with weather radar, so wifi equipment can't use those unless they are capable of detecting the radar transmissions, but even without those there are far more 20MHz channels available. There's also much less legacy equipment using it - even 802.11a is relatively uncommon (and is fairly benign in behaviour). The downside is that 5GHz doesn't propagate as far, or as easily through walls. > > Wider bandwidth channels can be used to shorten the time taken for each transmission. However, this effect is not linear, because the RTS/CTS handshake and preamble are fixed overheads (since they must be transmitted at a low speed to ensure that all clients can hear them), taking the same length of time regardless of any other enhancements. This implies that in seriously geographically-congested scenarios, 20MHz channels (and lots of APs to use them all) are still the most efficient. MIMO can still be used to beneficial effect in these situations. > > Multi-target MIMO allows an AP to transmit to several clients simultaneously, without requiring the client to support MIMO themselves. This requires the AP's antennas and radios to be dynamically reconfigured for beamforming - giving each client a clear version of its own signal and a null for the other signals - which is a tricky procedure. APs that do implement this well are highly valuable in congested situations. > > Single-target MIMO allows higher bandwidth between one client at a time and the AP. Both the AP and the client must support MIMO for this to work. There are physical constraints which limit the ability for handheld devices to support MIMO. In general, this form of MIMO improves throughput in the home, but is not very useful in congested situations. High individual throughput is not what's needed in a crowded arena; rather, reliable if slow individual throughput, reasonable latency, and high aggregate throughput. > > Choosing the most effective radio bandwidth and modulation is a difficult problem. The Minstrel algorithm seems to be an effective solution for general traffic. Some manual constraints may be appropriate in some circumstances, such as reducing the maximum radio bandwidth (trading throughput of one AP against coexistence with other APs) and increasing the modulation rate of management broadcasts (reducing per-packet overhead). > > Packet aggregation allow several IP packets to be combined into a single wireless transmission. This avoids performing the CSMA/CA steps repeatedly, which is a considerable overhead. There are several types of packet aggregation - the type adopted by 802.11ac allows individual IP packets within a transmission to be link-layer acknowledged separately, so that a minor corruption doesn't require transmission of the entire aggregate. By contrast, 802.11n also supported a version which did require that, despite a slightly lower overhead. > > Implicit in the packet-aggregation system is the problem of collecting packets to aggregate. Each transmission is between the AP and one client, so the packets aggregated by the AP all have to be for the same client. (The client can assume that all packets go to the AP.) A fair-queueing algorithm could have the effect of forming per-client queues, so several suitable packets could easily be located in such a queue. In a straight FIFO queue, however, packets for the same client are likely to be separated in the queue and thus difficult to find. It is therefore *obviously* in the AP's interest to implement a fair-queueing algorithm based on client MAC address, even if it does nothing else to manage congestion. > > NB: if a single aggregate could be intended to be heard by more than one client, then the complexity of multi-target beamforming MIMO would not be necessary. This is how I infer the strict one-to-one nature of data transmissions, as distinct from management broadcasts. > > On 23 Aug, 2014, at 10:26 pm, Michael Welzl wrote: > >>>> because of the "function" i wrote above: the more you retry, the more you need to buffer when traffic continuously arrives because you're stuck trying to send a frame again. >>> >>> huh, I'm missing something here, retrying sends would require you to buffer more when sending. >> >> aren't you the saying the same thing as I ? Sorry else, I might have expressed it confusingly somehow > > There should be enough buffering to allow effective aggregation, but as little as possible on top of that. I don't know how much aggregation can be done, but I assume that there is a limit, and that it's not especially high in terms of full-length packets. After all, tying up the channel for long periods of time is unfair to other clients - a typical latency/throughput tradeoff. > > Equally clearly, in a heavily congested scenario the AP benefits from having a lot of buffer divided among a large number of clients, but each client should have only a small buffer. > >>> If people are retrying when they really don't need to, that cuts down on the avialable airtime. >> >> Yes > > Given that TCP retries on loss, and UDP protocols are generally loss-tolerant to a degree, there should therefore be a limit on how hard the link-layer stuff tries to get each individual packet through. Minstrel appears to be designed around a time limit for that sort of thing, which seems sane - and they explicitly talk about TCP retransmit timers in that context. > > With that said, link-layer retries are a valid mechanism to minimise unnecessarily lost packets. It's also not new - bus/hub Ethernet does this on collision detection. What Ethernet doesn't have is the link-layer ack, so there's an additional set of reasons why a backoff-and-retry might happen in wifi. > > Modern wifi variants use packet aggregation to improve efficiency. This only works when there are multiple packets to send at a time from one place to a specific other place - which is more likely when the link is congested. In the event of a retry, it makes sense to aggregate newly buffered packets with the original ones, to reduce the number of negotiation and retry cycles. > >>> But if you have continual transmissions taking place, so you have a hard time getting a chance to send your traffic, then you really do have congestion and should be dropping packets to let the sender know that it shouldn't try to generate as much. >> >> Yes; but the complexity that I was pointing at (but maybe it's a simple parameter, more like a 0 or 1 situation in practice?) lies in the word "continual". How long do you try before you decide that the sending TCP should really think it *is* congestion? To really optimize the behavior, that would have to depend on the RTT, which you can't easily know. > > There are TCP congestion algorithms which explicitly address this (eg. Westwood+), by reacting only a little to individual drops, but reacting more rapidly if drops occur frequently. In principle they should also react quickly to ECN, because that is never triggered by random noise loss alone. > > - Jonathan Morton > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-23 19:26 ` Michael Welzl 2014-08-23 23:29 ` Jonathan Morton @ 2014-08-24 1:09 ` David Lang 2014-08-25 8:01 ` Michael Welzl 1 sibling, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-24 1:09 UTC (permalink / raw) To: Michael Welzl; +Cc: bloat [-- Attachment #1: Type: TEXT/PLAIN, Size: 11660 bytes --] On Sat, 23 Aug 2014, Michael Welzl wrote: > [removing Lars and Jim from direct cc, don't want to spam them - I don't know if they're sooo interested in this thread?] > > > On 23. aug. 2014, at 01:50, David Lang <david@lang.hm> wrote: > >> On Sat, 23 Aug 2014, Michael Welzl wrote: >> >>> On 21. aug. 2014, at 10:30, David Lang <david@lang.hm> wrote: >>> >>>> On Thu, 21 Aug 2014, Michael Welzl wrote: >>>> >>>>> On 21. aug. 2014, at 08:52, Eggert, Lars wrote: >>>>> >>>>>> On 2014-8-21, at 0:05, Jim Gettys <jg@freedesktop.org> wrote: >>>>>>> And what kinds of AP's? All the 1G guarantees you is that your bottleneck is in the wifi hop, and they can suffer as badly as anything else (particularly consumer home routers). >>>>>>> The reason why 802.11 works ok at IETF and NANOG is that: >>>>>>> o) they use Cisco enterprise AP's, which are not badly over buffered. >>>>> >>>>> I'd like to better understand this particular bloat problem: >>>>> >>>>> 100s of senders try to send at the same time. They can't all do that, so their cards retry a fixed number of times (10 or something, I don't remember, probably configurable), for which they need to have a buffer. >>>>> >>>>> Say, the buffer is too big. Say, we make it smaller. Then an 802.11 sender trying to get its time slot in a crowded network will have to drop a packet, requiring the TCP sender to retransmit the packet instead. The TCP sender will think it's congestion (not entirely wrong) and reduce its window (not entirely wrong either). How appropriate TCP's cwnd reduction is probably depends on how "true" the notion of congestion is ... i.e. if I can buffer only one packet and just don't get to send it, or it gets a CRC error ("collides" in the air), then that can be seen as a pure matter of luck. Then I provoke a sender reaction that's like the old story of TCP mis-interpreting random losses as a sign of congestion. I think in most practical systems this old story is now a myth because wireless equipment will try to buffer data for a relatively long time instead of exhibiting sporadic random drops to upper layers. That is, in principle, a good thing - but buffering too much has of! c! >> ourse all the problems that we know.. Not an easy trade-off at all I think. >>>> >>>> in this case the loss is a direct sign of congestion. >>> >>> "this case" - I talk about different buffer lengths. E.g., take the minimal buffer that would just function, and set retransmissions to 0. Then, a packet loss is a pretty random matter - just because you and I contended, doesn't mean that the net is truly "overloaded" ? So my point is that the buffer creates a continuum from "random loss" to "actual congestion" - we want loss to mean "actual congestion", but how large should it be to meaningfully convey that? >>> >>> >>>> remember that TCP was developed back in the days of 10base2 networks where everyone on the network was sharing a wire and it was very possible for multiple senders to start transmitting on the wire at the same time, just like with radio. >>> >>> cable or wireless: is one such occurrence "congestion"? >>> i.e. is halving the cwnd really the right response to that sort of "congestion"? (contention, really) >> >> possibly not, but in practice it may be 'good enough' >> >> but to make it work well, you probably want to play games with how much you back off, and how quickly you retry if you don't get a response. >> >> The fact that the radio link can have it's own ack for the packet can actually be an improvement over doing it at the TCP level as it only need to ack/retry for that hop, and if that hop was good, there's far less of a need to retry if the server is just slow. > > Yep... I remember a neat paper from colleagues at Trento University that piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions between TCP's ACKs and other data packets - really nice. Not sure if it wasn't just simulations, though. that's a neat hack, but I don't see it working, except when one end of the wireless link is also the endpoint of the TCP connection (and then only for acks from that device) so in a typical wifi environment, it would be one less transmission from the laptop, no change to the AP. But even with that, doesn't TCP try to piggyback the ack on the next packet of data anyway? so unless it's a purely one-way dataflow, this still wouldn't help. >> so if we try and do the retries in the OS stack, it will need to know the difference between "failed to get out the first hop due to collision" and "got out the first hop, waiting for the server across the globe to respond" with different timeouts/retries for them. >> >>>> A large part of the problem with high-density wifi is that it just wasn't designed for that sort of environment, and there are a lot of things that it does that work great for low-density, weak signal environments, but just make the problem worse for high-density environements >>>> >>>> batching packets together >>>> slowing down the transmit speed if you aren't getting through >>> >>> well... this *should* only happen when there's an actual physical signal quality degradation, not just collisions. at least minstrel does quite a good job at ensuring that, most of the time. >> >> "should" :-) >> >> but can the firmware really tell the difference between quality degredation due to interference and collisions with other transmitters? > > Well, with heuristics it can, sort of. As a simple example from one older mechanism, consider: multiple consecutive losses are *less* likely from random collisions than from link noise. That sort of thing. Minstrel worked best our tests, using tables of rates that worked well / didn't work well in the past: > http://heim.ifi.uio.no/michawe/research/publications/wowmom2012.pdf the question is if this is deployed in any comoddity OS stacks. If not, it could only help on the AP, and we are better off just locking the speeds there. > >>>> retries of packets that the OS has given up on (including the user has closed the app that sent them) >>>> >>>> Ideally we want the wifi layer to be just like the wired layer, buffer only what's needed to get it on the air without 'dead air' (where the driver is waiting for the OS to give it more data), at that point, we can do the retries from the OS as appropriate. >>>> >>>>> I have two questions: 1) is my characterization roughly correct? 2) have people investigated the downsides (negative effect on TCP) of buffering *too little* in wireless equipment? (I suspect so?) Finding where "too little" begins could give us a better idea of what the ideal buffer length should really be. >>>> >>>> too little buffering will reduce the throughput as a result of unused airtime. >>> >>> so that's a function of, at least: 1) incoming traffic rate; 2) no. retries * ( f(MAC behavior; number of other senders trying) ). >> >> incoming to the AP you mean? > > incoming to whoever is sending and would be retrying - mostly the AP, yes. terminology issue here a receiver is never going to be retrying, it has nothing to retry. It's the sender that keeps track of what it's sent and retries if it doesn't get an ack. > >> It also matters if you are worrying about aggregate throughput of a lot of users, or per-connection throughput for a single user. >> >> From a sender's point of view, if it takes 100 time units to send a packet, and 1-5 time units to queue the next packet for transmission, you loose a few percentage of your possible airtime and there's very little concern. >> >> but if it takes 10 time units to send the packet and 1-5 time units to queue the next packet, you have just lost a lot of potential bandwidth. >> >> But from the point of view of the aggregate, these gaps just give someone else a chance to transmit and have very little effect on the amount of traffic arriving at the AP. >> >> I was viewing things from the point of view of the app on the laptop. > > Yes... I agree, and that's the more common + more reasonable way to think > about it. I tend to think upstream, which of course is far less common, but > maybe even more problematic. Actually I suspect the following: things get > seriously bad when a lot of senders are sending upstream together; this isn't > really happening much in practice - BUT when we have a very very large number > of hosts connected in a conference style situation, all the HTTP GETs and SMTP > messages and whatnot *do* create lots of collisions, a situation that isn't > really too common (and maybe not envisioned / parametrized for), and that's > why things often get so bad. (At least one of the reasons.) the thing is that in the high-density environment, there's not that much the AP can do, most of the problem is related to the mobile endpoints and what they decide to do. >>>> But at the low data rates involved, the system would have to be extremely busy to be a significant amount of time if even one packet at a time is buffered. >>> >>> >>> >>>> You are also conflating the effect of the driver/hardware buffering with it doing retries. >>> >>> because of the "function" i wrote above: the more you retry, the more you need to buffer when traffic continuously arrives because you're stuck trying to send a frame again. >> >> huh, I'm missing something here, retrying sends would require you to buffer more when sending. > > aren't you the saying the same thing as I ? Sorry else, I might have expressed it confusingly somehow as I said above, the machine receiving packets doesn't need to buffer them, because it has no need to re-send them. It's the machine sending packets that needs to keep track of what's been sent in case it needs to re-send it. But this cache of recently sent packets is separate from a queue of packets waiting to be sent. the size of the buffer used to track what's been sent isn't a problem. the bufferbloat problem is aroudn the size of the queue for packet waiting to be sent. >> If people are retrying when they really don't need to, that cuts down on the avialable airtime. > > Yes > > >> But if you have continual transmissions taking place, so you have a hard time getting a chance to send your traffic, then you really do have congestion and should be dropping packets to let the sender know that it shouldn't try to generate as much. > > Yes; but the complexity that I was pointing at (but maybe it's a simple > parameter, more like a 0 or 1 situation in practice?) lies in the word > "continual". How long do you try before you decide that the sending TCP should > really think it *is* congestion? To really optimize the behavior, that would > have to depend on the RTT, which you can't easily know. Again, I think you are mixing two different issues here. 1. waiting for a pause in everyone else's transmissions so that you can transmit wihtout _knowing_ that you are going to clobber someone Even this can get tricky, is that station you are hearing faintly trying to transmit to a AP near you so you should be quiet? or is it transmitting to a station enough further away from you so you can go ahead and transmit your packet to your AP without interfering with it? 2. your transmission gettng clobbered so the packet doesn't get through, where you need to wait 'long enough' to decide that it's not going to be acknowledged and try again. This is a case where a local proxy server can actually make a big difference to you. The connections between your mobile devices and the local proxy server have a short RTT and so all timeouts can be nice and short, and then the proxy deals with the long RTT connections out to the Internet. David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 1:09 ` David Lang @ 2014-08-25 8:01 ` Michael Welzl 2014-08-25 8:19 ` Sebastian Moeller 2014-08-31 22:35 ` David Lang 0 siblings, 2 replies; 56+ messages in thread From: Michael Welzl @ 2014-08-25 8:01 UTC (permalink / raw) To: David Lang; +Cc: bloat >> Yep... I remember a neat paper from colleagues at Trento University that piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions between TCP's ACKs and other data packets - really nice. Not sure if it wasn't just simulations, though. > > that's a neat hack, but I don't see it working, except when one end of the wireless link is also the endpoint of the TCP connection (and then only for acks from that device) > > so in a typical wifi environment, it would be one less transmission from the laptop, no change to the AP. > > But even with that, doesn't TCP try to piggyback the ack on the next packet of data anyway? so unless it's a purely one-way dataflow, this still wouldn't help. Yes but of course many dataflows are indeed one-way - HTTP typically sends a get/put and not much else. >>> but can the firmware really tell the difference between quality degredation due to interference and collisions with other transmitters? >> >> Well, with heuristics it can, sort of. As a simple example from one older mechanism, consider: multiple consecutive losses are *less* likely from random collisions than from link noise. That sort of thing. Minstrel worked best our tests, using tables of rates that worked well / didn't work well in the past: >> http://heim.ifi.uio.no/michawe/research/publications/wowmom2012.pdf > > the question is if this is deployed in any comoddity OS stacks. If not, it could only help on the AP, and we are better off just locking the speeds there. I thought it's widely deployed but I really don't know. I'm sure others on this list do? What I do know is that it was (is) a part of madwifi. >>>>> retries of packets that the OS has given up on (including the user has closed the app that sent them) >>>>> >>>>> Ideally we want the wifi layer to be just like the wired layer, buffer only what's needed to get it on the air without 'dead air' (where the driver is waiting for the OS to give it more data), at that point, we can do the retries from the OS as appropriate. >>>>> >>>>>> I have two questions: 1) is my characterization roughly correct? 2) have people investigated the downsides (negative effect on TCP) of buffering *too little* in wireless equipment? (I suspect so?) Finding where "too little" begins could give us a better idea of what the ideal buffer length should really be. >>>>> >>>>> too little buffering will reduce the throughput as a result of unused airtime. >>>> >>>> so that's a function of, at least: 1) incoming traffic rate; 2) no. retries * ( f(MAC behavior; number of other senders trying) ). >>> >>> incoming to the AP you mean? >> >> incoming to whoever is sending and would be retrying - mostly the AP, yes. > > terminology issue here > > a receiver is never going to be retrying, it has nothing to retry. It's the sender that keeps track of what it's sent and retries if it doesn't get an ack. Sorry, I must have expressed myself very unclearly. I am of course talking about a sender. With "incoming" I meant incoming to the buffer of the device that's sending, from the upstream (or up-stack) sender. >>> It also matters if you are worrying about aggregate throughput of a lot of users, or per-connection throughput for a single user. >>> >>> From a sender's point of view, if it takes 100 time units to send a packet, and 1-5 time units to queue the next packet for transmission, you loose a few percentage of your possible airtime and there's very little concern. >>> >>> but if it takes 10 time units to send the packet and 1-5 time units to queue the next packet, you have just lost a lot of potential bandwidth. >>> >>> But from the point of view of the aggregate, these gaps just give someone else a chance to transmit and have very little effect on the amount of traffic arriving at the AP. >>> >>> I was viewing things from the point of view of the app on the laptop. >> >> Yes... I agree, and that's the more common + more reasonable way to think about it. I tend to think upstream, which of course is far less common, but maybe even more problematic. Actually I suspect the following: things get seriously bad when a lot of senders are sending upstream together; this isn't really happening much in practice - BUT when we have a very very large number of hosts connected in a conference style situation, all the HTTP GETs and SMTP messages and whatnot *do* create lots of collisions, a situation that isn't really too common (and maybe not envisioned / parametrized for), and that's why things often get so bad. (At least one of the reasons.) > > the thing is that in the high-density environment, there's not that much the AP can do, most of the problem is related to the mobile endpoints and what they decide to do. True! (though, as you say, limiting allowed physical rates on the AP probably helps) >>>>> But at the low data rates involved, the system would have to be extremely busy to be a significant amount of time if even one packet at a time is buffered. >>>> >>>> >>>> >>>>> You are also conflating the effect of the driver/hardware buffering with it doing retries. >>>> >>>> because of the "function" i wrote above: the more you retry, the more you need to buffer when traffic continuously arrives because you're stuck trying to send a frame again. >>> >>> huh, I'm missing something here, retrying sends would require you to buffer more when sending. >> >> aren't you the saying the same thing as I ? Sorry else, I might have expressed it confusingly somehow > > as I said above, the machine receiving packets doesn't need to buffer them, because it has no need to re-send them. It's the machine sending packets that needs to keep track of what's been sent in case it needs to re-send it. Sure, that was a plain misunderstanding. > But this cache of recently sent packets is separate from a queue of packets waiting to be sent. > > the size of the buffer used to track what's been sent isn't a problem. the bufferbloat problem is aroudn the size of the queue for packet waiting to be sent. This confuses me. Why do you even need a cache of recently sent packets? Anyway, what I am talking about *is* the size of the queue for packets waiting to be sent - and not only due to aggregation but also link layer retransmits. Per device, at the link layer, packets (frames, really) are sent in sequence AFAIK, and so any frame that has been sent but not yet acknowledged and then has to be resent if it isn't acknowledged holds up all other frames to that same destination. >>> If people are retrying when they really don't need to, that cuts down on the avialable airtime. >> >> Yes >> >> >>> But if you have continual transmissions taking place, so you have a hard time getting a chance to send your traffic, then you really do have congestion and should be dropping packets to let the sender know that it shouldn't try to generate as much. >> >> Yes; but the complexity that I was pointing at (but maybe it's a simple parameter, more like a 0 or 1 situation in practice?) lies in the word "continual". How long do you try before you decide that the sending TCP should really think it *is* congestion? To really optimize the behavior, that would have to depend on the RTT, which you can't easily know. > > Again, I think you are mixing two different issues here. No, I think you misunderstand me - > 1. waiting for a pause in everyone else's transmissions so that you can transmit wihtout _knowing_ that you are going to clobber someone > > Even this can get tricky, is that station you are hearing faintly trying to transmit to a AP near you so you should be quiet? or is it transmitting to a station enough further away from you so you can go ahead and transmit your packet to your AP without interfering with it? You mean the normal CSMA/CA procedure ( + RTS/CTS)? Sure that's tricky in itself but I wasn't talking about that. > 2. your transmission gettng clobbered so the packet doesn't get through, where you need to wait 'long enough' to decide that it's not going to be acknowledged and try again. I was always only talking about that second bit. I'm sure I wasn't clear enough in writing and I'm sorry for that. > This is a case where a local proxy server can actually make a big difference to you. The connections between your mobile devices and the local proxy server have a short RTT and so all timeouts can be nice and short, and then the proxy deals with the long RTT connections out to the Internet. Adding a proxy to these considerations only complicates them: it's a hard enough trade-off when we just ask ourselves: how large should a buffer for the sake of link layer retransmissions be? (which is closely related to the question: how often should a link layer try to retransmit before giving up?) That's what my emails were about. I suspect that we don't have a good answer to even these questions, and I suspect that we'd better off having something dynamic than fixed default values. Cheers, Michael ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-25 8:01 ` Michael Welzl @ 2014-08-25 8:19 ` Sebastian Moeller 2014-08-25 8:33 ` Michael Welzl 2014-08-31 22:37 ` David Lang 2014-08-31 22:35 ` David Lang 1 sibling, 2 replies; 56+ messages in thread From: Sebastian Moeller @ 2014-08-25 8:19 UTC (permalink / raw) To: Michael Welzl; +Cc: bloat Hi Michael, On Aug 25, 2014, at 10:01 , Michael Welzl <michawe@ifi.uio.no> wrote: [...] > >> This is a case where a local proxy server can actually make a big difference to you. The connections between your mobile devices and the local proxy server have a short RTT and so all timeouts can be nice and short, and then the proxy deals with the long RTT connections out to the Internet. > > Adding a proxy to these considerations only complicates them: it's a hard enough trade-off when we just ask ourselves: how large should a buffer for the sake of link layer retransmissions be? (which is closely related to the question: how often should a link layer try to retransmit before giving up?) That's what my emails were about. I suspect that we don't have a good answer to even these questions, and I suspect that we'd better off having something dynamic than fixed default values. What about framing the retransmissions not in number but rather in time? For example the maximum of either time to transmit a few (say 3?) packet at the current data rate (or maybe one rate lower than current to allow setoriating signal quality) or 20ms (pulled out of thin air, would need some research). The first should make sure we actually retransmit to overcome glitches, and the second should make sure that RTT does not increase to dramatically. This basically assumes that for reasonable interactive traffic we only have a given RTT budget and should make sure not to overspend ;) Best Reards Sebastian > > Cheers, > Michael > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-25 8:19 ` Sebastian Moeller @ 2014-08-25 8:33 ` Michael Welzl 2014-08-25 9:18 ` Alex Burr 2014-08-31 22:37 ` David Lang 1 sibling, 1 reply; 56+ messages in thread From: Michael Welzl @ 2014-08-25 8:33 UTC (permalink / raw) To: Sebastian Moeller; +Cc: bloat On 25. aug. 2014, at 10:19, Sebastian Moeller wrote: > Hi Michael, > > On Aug 25, 2014, at 10:01 , Michael Welzl <michawe@ifi.uio.no> wrote: > [...] >> >>> This is a case where a local proxy server can actually make a big difference to you. The connections between your mobile devices and the local proxy server have a short RTT and so all timeouts can be nice and short, and then the proxy deals with the long RTT connections out to the Internet. >> >> Adding a proxy to these considerations only complicates them: it's a hard enough trade-off when we just ask ourselves: how large should a buffer for the sake of link layer retransmissions be? (which is closely related to the question: how often should a link layer try to retransmit before giving up?) That's what my emails were about. I suspect that we don't have a good answer to even these questions, and I suspect that we'd better off having something dynamic than fixed default values. > > What about framing the retransmissions not in number but rather in time? For example the maximum of either time to transmit a few (say 3?) packet at the current data rate (or maybe one rate lower than current to allow setoriating signal quality) or 20ms (pulled out of thin air, would need some research). The first should make sure we actually retransmit to overcome glitches, and the second should make sure that RTT does not increase to dramatically. This basically assumes that for reasonable interactive traffic we only have a given RTT budget and should make sure not to overspend ;) That would be VERY good I think!!!! As for the actual recommendations, I think we'll also need a lower minimum to avoid that a single random collision that has nothing to do with any real overload causes a TCP reaction. THAT number could be 3, for example. Then, we typically have a default value of retransmissions in today's equipment - I think that number is usually something in the order of 10. So yes, that limit could be replaced with something like "min(10 retransmissions, 20ms)". I actually wonder what magnitude we can really reach in a practical system - e.g., is 20ms way beyond realistic in a super crowded network? Worth analyzing I guess. Cheers, Michael ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-25 8:33 ` Michael Welzl @ 2014-08-25 9:18 ` Alex Burr 0 siblings, 0 replies; 56+ messages in thread From: Alex Burr @ 2014-08-25 9:18 UTC (permalink / raw) To: Michael Welzl, Sebastian Moeller; +Cc: bloat > On Monday, August 25, 2014 9:34 AM, Michael Welzl <michawe@ifi.uio.no> wrote: > > > On 25. aug. 2014, at 10:19, Sebastian Moeller wrote: > >> What about framing the retransmissions not in number but rather in > time? For example the maximum of either time to transmit a few (say 3?) packet > at the current data rate (or maybe one rate lower than current to allow > setoriating signal quality) or 20ms (pulled out of thin air, would need some > research). The first should make sure we actually retransmit to overcome > glitches, and the second should make sure that RTT does not increase to > dramatically. This basically assumes that for reasonable interactive traffic we > only have a given RTT budget and should make sure not to overspend ;) > > That would be VERY good I think!!!! I agree. A third reason for link-layer retransmission is to prevent glitches in video/audio streams which aren't run over TCP and won't retransmit themselves. (This was basically the reason link-layer retransmission got added to DSL, because telco's were keen that their bundled TV didn't have glitches - in this case, due to noise, rather than collisions). This is also very consistent with the idea of only retransmitting within a short period. Alex ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-25 8:19 ` Sebastian Moeller 2014-08-25 8:33 ` Michael Welzl @ 2014-08-31 22:37 ` David Lang 2014-08-31 23:09 ` Simon Barber 1 sibling, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-31 22:37 UTC (permalink / raw) To: Sebastian Moeller; +Cc: bloat On Mon, 25 Aug 2014, Sebastian Moeller wrote: > Hi Michael, > > On Aug 25, 2014, at 10:01 , Michael Welzl <michawe@ifi.uio.no> wrote: > [...] >> >>> This is a case where a local proxy server can actually make a big difference >>> to you. The connections between your mobile devices and the local proxy >>> server have a short RTT and so all timeouts can be nice and short, and then >>> the proxy deals with the long RTT connections out to the Internet. >> >> Adding a proxy to these considerations only complicates them: it's a hard >> enough trade-off when we just ask ourselves: how large should a buffer for >> the sake of link layer retransmissions be? (which is closely related to the >> question: how often should a link layer try to retransmit before giving up?) >> That's what my emails were about. I suspect that we don't have a good answer >> to even these questions, and I suspect that we'd better off having something >> dynamic than fixed default values. > > What about framing the retransmissions not in number but rather in time? > For example the maximum of either time to transmit a few (say 3?) packet at > the current data rate (or maybe one rate lower than current to allow > setoriating signal quality) or 20ms (pulled out of thin air, would need some > research). The first should make sure we actually retransmit to overcome > glitches, and the second should make sure that RTT does not increase to > dramatically. This basically assumes that for reasonable interactive traffic > we only have a given RTT budget and should make sure not to overspend ;) Yep, just like BQL helped a lot on the wired side, because it's a good stand-in for the time involved, we need to get the same concept through the wifi stack and drivers. David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-31 22:37 ` David Lang @ 2014-08-31 23:09 ` Simon Barber 2014-09-01 0:25 ` David Lang 0 siblings, 1 reply; 56+ messages in thread From: Simon Barber @ 2014-08-31 23:09 UTC (permalink / raw) To: David Lang, Sebastian Moeller; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 2829 bytes --] The right concept for WiFi would be TQL, time queue limit. This is needed because in wifi there can be several orders of magnitude difference in the speed (transmission rate) that different packets are sent at. The purpose of the hardware queue is to cover for the interrupt latency, ie the time delay required to keep the queue filled. Thus accounting for the time it will take to transmit the packets is important. For wired interfaces with a fixed speed byte counting works fine. Byte counting does not work in a wireless environment where the same number of bytes can take 2 or 3 different orders of magnitude of time to transmit. Accounting for the expected minimum transmission time is critical. Simon On August 31, 2014 3:37:07 PM PDT, David Lang <david@lang.hm> wrote: >On Mon, 25 Aug 2014, Sebastian Moeller wrote: > >> Hi Michael, >> >> On Aug 25, 2014, at 10:01 , Michael Welzl <michawe@ifi.uio.no> wrote: >> [...] >>> >>>> This is a case where a local proxy server can actually make a big >difference >>>> to you. The connections between your mobile devices and the local >proxy >>>> server have a short RTT and so all timeouts can be nice and short, >and then >>>> the proxy deals with the long RTT connections out to the Internet. >>> >>> Adding a proxy to these considerations only complicates them: it's a >hard >>> enough trade-off when we just ask ourselves: how large should a >buffer for >>> the sake of link layer retransmissions be? (which is closely >related to the >>> question: how often should a link layer try to retransmit before >giving up?) >>> That's what my emails were about. I suspect that we don't have a >good answer >>> to even these questions, and I suspect that we'd better off having >something >>> dynamic than fixed default values. >> >> What about framing the retransmissions not in number but rather in >time? >> For example the maximum of either time to transmit a few (say 3?) >packet at >> the current data rate (or maybe one rate lower than current to allow >> setoriating signal quality) or 20ms (pulled out of thin air, would >need some >> research). The first should make sure we actually retransmit to >overcome >> glitches, and the second should make sure that RTT does not increase >to >> dramatically. This basically assumes that for reasonable interactive >traffic >> we only have a given RTT budget and should make sure not to overspend >;) > >Yep, just like BQL helped a lot on the wired side, because it's a good >stand-in >for the time involved, we need to get the same concept through the wifi >stack >and drivers. > >David Lang >_______________________________________________ >Bloat mailing list >Bloat@lists.bufferbloat.net >https://lists.bufferbloat.net/listinfo/bloat -- Sent from my Android device with K-9 Mail. Please excuse my brevity. [-- Attachment #2: Type: text/html, Size: 3603 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-31 23:09 ` Simon Barber @ 2014-09-01 0:25 ` David Lang 2014-09-01 2:14 ` Simon Barber 0 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-09-01 0:25 UTC (permalink / raw) To: Simon Barber; +Cc: bloat On Sun, 31 Aug 2014, Simon Barber wrote: > The right concept for WiFi would be TQL, time queue limit. This is needed > because in wifi there can be several orders of magnitude difference in the > speed (transmission rate) that different packets are sent at. The purpose of > the hardware queue is to cover for the interrupt latency, ie the time delay > required to keep the queue filled. Thus accounting for the time it will take > to transmit the packets is important. For wired interfaces with a fixed speed > byte counting works fine. Byte counting does not work in a wireless > environment where the same number of bytes can take 2 or 3 different orders of > magnitude of time to transmit. Accounting for the expected minimum > transmission time is critical. given that conditions can change while data is in the queue, I think the right answer is to size the queue based on the fastest possible transmission time (otherwise you will be running the risk of emptying the queue). Yes, it will lead to over buffering when the speed drops, but that would still be an improvement over the current situation. David Lang > Simon > > On August 31, 2014 3:37:07 PM PDT, David Lang <david@lang.hm> wrote: >> On Mon, 25 Aug 2014, Sebastian Moeller wrote: >> >>> Hi Michael, >>> >>> On Aug 25, 2014, at 10:01 , Michael Welzl <michawe@ifi.uio.no> wrote: >>> [...] >>>> >>>>> This is a case where a local proxy server can actually make a big >> difference >>>>> to you. The connections between your mobile devices and the local >> proxy >>>>> server have a short RTT and so all timeouts can be nice and short, >> and then >>>>> the proxy deals with the long RTT connections out to the Internet. >>>> >>>> Adding a proxy to these considerations only complicates them: it's a >> hard >>>> enough trade-off when we just ask ourselves: how large should a >> buffer for >>>> the sake of link layer retransmissions be? (which is closely >> related to the >>>> question: how often should a link layer try to retransmit before >> giving up?) >>>> That's what my emails were about. I suspect that we don't have a >> good answer >>>> to even these questions, and I suspect that we'd better off having >> something >>>> dynamic than fixed default values. >>> >>> What about framing the retransmissions not in number but rather in >> time? >>> For example the maximum of either time to transmit a few (say 3?) >> packet at >>> the current data rate (or maybe one rate lower than current to allow >>> setoriating signal quality) or 20ms (pulled out of thin air, would >> need some >>> research). The first should make sure we actually retransmit to >> overcome >>> glitches, and the second should make sure that RTT does not increase >> to >>> dramatically. This basically assumes that for reasonable interactive >> traffic >>> we only have a given RTT budget and should make sure not to overspend >> ;) >> >> Yep, just like BQL helped a lot on the wired side, because it's a good >> stand-in >> for the time involved, we need to get the same concept through the wifi >> stack >> and drivers. >> >> David Lang >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-09-01 0:25 ` David Lang @ 2014-09-01 2:14 ` Simon Barber 0 siblings, 0 replies; 56+ messages in thread From: Simon Barber @ 2014-09-01 2:14 UTC (permalink / raw) To: David Lang; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 4005 bytes --] Indeed - minimum transmission time is the right metric. For wireless cards where transmission speed is determined by the host this should work very well, queue will be tightly controlled. Wireless cards that put a lot of intelligence in the card, ie packet transmission speed selection, are harder to handle well. In these cards to get good results you really need to move all the queueing code into the card, which is not going to happen. You're basically screwed and can't do a good job. Simon On August 31, 2014 5:25:05 PM PDT, David Lang <david@lang.hm> wrote: >On Sun, 31 Aug 2014, Simon Barber wrote: > >> The right concept for WiFi would be TQL, time queue limit. This is >needed >> because in wifi there can be several orders of magnitude difference >in the >> speed (transmission rate) that different packets are sent at. The >purpose of >> the hardware queue is to cover for the interrupt latency, ie the time >delay >> required to keep the queue filled. Thus accounting for the time it >will take >> to transmit the packets is important. For wired interfaces with a >fixed speed >> byte counting works fine. Byte counting does not work in a wireless >> environment where the same number of bytes can take 2 or 3 different >orders of >> magnitude of time to transmit. Accounting for the expected minimum >> transmission time is critical. > >given that conditions can change while data is in the queue, I think >the right >answer is to size the queue based on the fastest possible transmission >time >(otherwise you will be running the risk of emptying the queue). Yes, it >will >lead to over buffering when the speed drops, but that would still be an > >improvement over the current situation. > >David Lang > >> Simon >> >> On August 31, 2014 3:37:07 PM PDT, David Lang <david@lang.hm> wrote: >>> On Mon, 25 Aug 2014, Sebastian Moeller wrote: >>> >>>> Hi Michael, >>>> >>>> On Aug 25, 2014, at 10:01 , Michael Welzl <michawe@ifi.uio.no> >wrote: >>>> [...] >>>>> >>>>>> This is a case where a local proxy server can actually make a big >>> difference >>>>>> to you. The connections between your mobile devices and the local >>> proxy >>>>>> server have a short RTT and so all timeouts can be nice and >short, >>> and then >>>>>> the proxy deals with the long RTT connections out to the >Internet. >>>>> >>>>> Adding a proxy to these considerations only complicates them: it's >a >>> hard >>>>> enough trade-off when we just ask ourselves: how large should a >>> buffer for >>>>> the sake of link layer retransmissions be? (which is closely >>> related to the >>>>> question: how often should a link layer try to retransmit before >>> giving up?) >>>>> That's what my emails were about. I suspect that we don't have a >>> good answer >>>>> to even these questions, and I suspect that we'd better off having >>> something >>>>> dynamic than fixed default values. >>>> >>>> What about framing the retransmissions not in number but rather in >>> time? >>>> For example the maximum of either time to transmit a few (say 3?) >>> packet at >>>> the current data rate (or maybe one rate lower than current to >allow >>>> setoriating signal quality) or 20ms (pulled out of thin air, would >>> need some >>>> research). The first should make sure we actually retransmit to >>> overcome >>>> glitches, and the second should make sure that RTT does not >increase >>> to >>>> dramatically. This basically assumes that for reasonable >interactive >>> traffic >>>> we only have a given RTT budget and should make sure not to >overspend >>> ;) >>> >>> Yep, just like BQL helped a lot on the wired side, because it's a >good >>> stand-in >>> for the time involved, we need to get the same concept through the >wifi >>> stack >>> and drivers. >>> >>> David Lang >>> _______________________________________________ >>> Bloat mailing list >>> Bloat@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/bloat >> >> -- Sent from my Android device with K-9 Mail. Please excuse my brevity. [-- Attachment #2: Type: text/html, Size: 8762 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-25 8:01 ` Michael Welzl 2014-08-25 8:19 ` Sebastian Moeller @ 2014-08-31 22:35 ` David Lang 1 sibling, 0 replies; 56+ messages in thread From: David Lang @ 2014-08-31 22:35 UTC (permalink / raw) To: Michael Welzl; +Cc: bloat On Mon, 25 Aug 2014, Michael Welzl wrote: >> But this cache of recently sent packets is separate from a queue of packets waiting to be sent. >> >> the size of the buffer used to track what's been sent isn't a problem. the >> bufferbloat problem is aroudn the size of the queue for packet waiting to be >> sent. > > This confuses me. Why do you even need a cache of recently sent packets? so that you can re-send them. if you are the originator of the packet, you need to have this. If your are relaying packets over some media that has link-level acks you need this for all packets you relay as well > Anyway, what I am talking about *is* the size of the queue for packets waiting > to be sent - and not only due to aggregation but also link layer retransmits. > Per device, at the link layer, packets (frames, really) are sent in sequence > AFAIK, and so any frame that has been sent but not yet acknowledged and then > has to be resent if it isn't acknowledged holds up all other frames to that > same destination. > > > >>>> If people are retrying when they really don't need to, that cuts down on the avialable airtime. >>> >>> Yes >>> >>> >>>> But if you have continual transmissions taking place, so you have a hard time getting a chance to send your traffic, then you really do have congestion and should be dropping packets to let the sender know that it shouldn't try to generate as much. >>> >>> Yes; but the complexity that I was pointing at (but maybe it's a simple parameter, more like a 0 or 1 situation in practice?) lies in the word "continual". How long do you try before you decide that the sending TCP should really think it *is* congestion? To really optimize the behavior, that would have to depend on the RTT, which you can't easily know. >> >> Again, I think you are mixing two different issues here. > > No, I think you misunderstand me - > > >> 1. waiting for a pause in everyone else's transmissions so that you can >> transmit wihtout _knowing_ that you are going to clobber someone >> >> Even this can get tricky, is that station you are hearing faintly trying to >> transmit to a AP near you so you should be quiet? or is it transmitting to a >> station enough further away from you so you can go ahead and transmit your >> packet to your AP without interfering with it? > > You mean the normal CSMA/CA procedure ( + RTS/CTS)? Sure that's tricky in > itself but I wasn't talking about that. > > >> 2. your transmission gettng clobbered so the packet doesn't get through, >> where you need to wait 'long enough' to decide that it's not going to be >> acknowledged and try again. > > I was always only talking about that second bit. I'm sure I wasn't clear > enough in writing and I'm sorry for that. > > >> This is a case where a local proxy server can actually make a big difference >> to you. The connections between your mobile devices and the local proxy >> server have a short RTT and so all timeouts can be nice and short, and then >> the proxy deals with the long RTT connections out to the Internet. > > Adding a proxy to these considerations only complicates them: it's a hard > enough trade-off when we just ask ourselves: how large should a buffer for the > sake of link layer retransmissions be? (which is closely related to the > question: how often should a link layer try to retransmit before giving up?) > That's what my emails were about. I suspect that we don't have a good answer > to even these questions, and I suspect that we'd better off having something > dynamic than fixed default values. A proxy should simplify this, because the total connection RTT now is client <-> AP <-> proxy, not client <-> AP <-> Internet <-> server. if you can _know_ that the endpoint of the connection is a local proxy that's only 10ms away, you don't need to have timeouts long enough to handle a multi-second connection to a server on the other side of the world with you both using satellite links David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-20 22:05 ` Jim Gettys 2014-08-21 6:52 ` Eggert, Lars @ 2014-08-21 6:56 ` David Lang 2014-08-21 7:04 ` David Lang ` (2 subsequent siblings) 4 siblings, 0 replies; 56+ messages in thread From: David Lang @ 2014-08-21 6:56 UTC (permalink / raw) To: Jim Gettys; +Cc: bloat [-- Attachment #1: Type: TEXT/Plain, Size: 1864 bytes --] On Wed, 20 Aug 2014, Jim Gettys wrote: > On Wed, Aug 20, 2014 at 3:12 AM, Eggert, Lars <lars@netapp.com> wrote: > >> On 2014-8-19, at 18:45, Dave Taht <dave.taht@gmail.com> wrote: >>> I figured y'all would be bemused by the wifi performance in the sigcomm >>> main conference room this morning... >>> >>> http://snapon.lab.bufferbloat.net/~d/sigcomm_tuesday.png >> >> There is a reason we budgeted a 1G uplink for SIGCOMM Helsinki and made >> sure we had sufficient AP coverage... >> > > And what kinds of AP's? All the 1G guarantees you is that your bottleneck > is in the wifi hop, and they can suffer as badly as anything else > (particularly consumer home routers). > > The reason why 802.11 works ok at IETF and NANOG is that: > o) they use Cisco enterprise AP's, which are not badly over buffered. I > don't have data on which enterprise AP's are overbuffered. > o) they do a good job of placing the AP's, given a lot of experience > o) they turn on RED in the router, which, since there is a lot of > aggregated traffic, can actually help rather than hurt, and keep TCP > decently policed. > o) they play some interesting diffserv marking tricks to prioritize some > traffic, getting part of the effect the fq_codel gives you in its "new > flow" behavior by manual configuration. Fq_codel does much better without > having to mess around like this. > > Would be nice if they (the folks who run the IETF network) wrote a BCP on > the topic; I urged them some IETF's ago, but if others asked, it would help. > > If you try to use consumer home routers running factory firmware and hack > it yourself, you will likely lose no matter what you backhaul is (though > you might do ok using current CeroWrt/OpenWrt if you know what you are > doing. Yep, bad AP setup, coverage and configuration can cripple you. how many people in the room? David Lang [-- Attachment #2: Type: TEXT/PLAIN, Size: 140 bytes --] _______________________________________________ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-20 22:05 ` Jim Gettys 2014-08-21 6:52 ` Eggert, Lars 2014-08-21 6:56 ` David Lang @ 2014-08-21 7:04 ` David Lang 2014-08-21 9:46 ` Jesper Dangaard Brouer 2014-08-21 8:58 ` Steinar H. Gunderson 2014-08-21 9:23 ` Mikael Abrahamsson 4 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-21 7:04 UTC (permalink / raw) To: Jim Gettys; +Cc: bloat [-- Attachment #1: Type: TEXT/Plain, Size: 2192 bytes --] On Wed, 20 Aug 2014, Jim Gettys wrote: > On Wed, Aug 20, 2014 at 3:12 AM, Eggert, Lars <lars@netapp.com> wrote: > >> On 2014-8-19, at 18:45, Dave Taht <dave.taht@gmail.com> wrote: >>> I figured y'all would be bemused by the wifi performance in the sigcomm >>> main conference room this morning... >>> >>> http://snapon.lab.bufferbloat.net/~d/sigcomm_tuesday.png >> >> There is a reason we budgeted a 1G uplink for SIGCOMM Helsinki and made >> sure we had sufficient AP coverage... >> > > And what kinds of AP's? All the 1G guarantees you is that your bottleneck > is in the wifi hop, and they can suffer as badly as anything else > (particularly consumer home routers). > > The reason why 802.11 works ok at IETF and NANOG is that: > o) they use Cisco enterprise AP's, which are not badly over buffered. I > don't have data on which enterprise AP's are overbuffered. > o) they do a good job of placing the AP's, given a lot of experience > o) they turn on RED in the router, which, since there is a lot of > aggregated traffic, can actually help rather than hurt, and keep TCP > decently policed. > o) they play some interesting diffserv marking tricks to prioritize some > traffic, getting part of the effect the fq_codel gives you in its "new > flow" behavior by manual configuration. Fq_codel does much better without > having to mess around like this. > > Would be nice if they (the folks who run the IETF network) wrote a BCP on > the topic; I urged them some IETF's ago, but if others asked, it would help. > > If you try to use consumer home routers running factory firmware and hack > it yourself, you will likely lose no matter what you backhaul is (though > you might do ok using current CeroWrt/OpenWrt if you know what you are > doing. here's a paper I did a couple years ago on the network we build for Scale '11 https://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david_wireless this year we had pretty much the same network layout with 2500 people (our most crowded room holds ~450, but there are many rooms next to each other down the hall) we did do some DNS blacklisting to cut down a bit on the bandwidth requirements. David Lang [-- Attachment #2: Type: TEXT/PLAIN, Size: 140 bytes --] _______________________________________________ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 7:04 ` David Lang @ 2014-08-21 9:46 ` Jesper Dangaard Brouer 2014-08-21 19:49 ` David Lang 0 siblings, 1 reply; 56+ messages in thread From: Jesper Dangaard Brouer @ 2014-08-21 9:46 UTC (permalink / raw) To: David Lang; +Cc: bloat On Thu, 21 Aug 2014 00:04:54 -0700 (PDT) David Lang <david@lang.hm> wrote: > here's a paper I did a couple years ago on the network we build for Scale '11 > > https://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david_wireless Thank you, David! Just finished watching this, very excellent presentation. Good with some information sharing based on practical wifi deployments. I also find this/your presentation interesting: http://talks.lang.hm/talks/topics/Wireless/Cascadia_2012/Wireless.pdf Where you go more into the channels. Thanks for all the good work :-) -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 9:46 ` Jesper Dangaard Brouer @ 2014-08-21 19:49 ` David Lang 2014-08-21 19:57 ` Steinar H. Gunderson 0 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-21 19:49 UTC (permalink / raw) To: Jesper Dangaard Brouer; +Cc: bloat On Thu, 21 Aug 2014, Jesper Dangaard Brouer wrote: > On Thu, 21 Aug 2014 00:04:54 -0700 (PDT) David Lang <david@lang.hm> wrote: > >> here's a paper I did a couple years ago on the network we build for Scale '11 >> >> https://www.usenix.org/conference/lisa12/technical-sessions/presentation/lang_david_wireless > > Thank you, David! Just finished watching this, very excellent > presentation. Good with some information sharing based on practical > wifi deployments. > > I also find this/your presentation interesting: > http://talks.lang.hm/talks/topics/Wireless/Cascadia_2012/Wireless.pdf > Where you go more into the channels. > > Thanks for all the good work :-) the difference between 20 min of time and an hour of time (there should also be a video on my lang.hm site, but it's a slow uplink) One trick we pulled this year at Scale that I think made a big difference is that we labled the 2.4 GHz network "Scale-slow" and the 5GHz network "Scale", this seems to have pushed a LOT more people to using the less congested 5GHz band. David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 19:49 ` David Lang @ 2014-08-21 19:57 ` Steinar H. Gunderson 2014-08-22 17:07 ` Jan Ceuleers 0 siblings, 1 reply; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-21 19:57 UTC (permalink / raw) To: bloat On Thu, Aug 21, 2014 at 12:49:02PM -0700, David Lang wrote: > One trick we pulled this year at Scale that I think made a big > difference is that we labled the 2.4 GHz network "Scale-slow" and > the 5GHz network "Scale", this seems to have pushed a LOT more > people to using the less congested 5GHz band. I'll stop preaching the Cisco gospel soon (well, e.g. Aruba would probably also do just the same thing :-) ), but in a WLC-based solution, there's a setting to just refuse the first few associations on 2.4 GHz if it detects the client is 5 GHz capable. It's a hack, but it pushes people over to 5 GHz quite effectively. /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 19:57 ` Steinar H. Gunderson @ 2014-08-22 17:07 ` Jan Ceuleers 2014-08-22 18:27 ` Steinar H. Gunderson 0 siblings, 1 reply; 56+ messages in thread From: Jan Ceuleers @ 2014-08-22 17:07 UTC (permalink / raw) To: bloat On 08/21/2014 09:57 PM, Steinar H. Gunderson wrote: > I'll stop preaching the Cisco gospel soon (well, e.g. Aruba would probably > also do just the same thing :-) ), but in a WLC-based solution, there's a > setting to just refuse the first few associations on 2.4 GHz if it detects > the client is 5 GHz capable. It's a hack, but it pushes people over to 5 GHz > quite effectively. How does that work in the light of STA MAC address randomisation during the scan stage? As I understand it, such STAs only use their "real" MAC address for actually connecting to the AP, but hide their identity in probe requests. So a dual-band AP can't rely on a STA's MAC address being the same in both bands. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-22 17:07 ` Jan Ceuleers @ 2014-08-22 18:27 ` Steinar H. Gunderson 0 siblings, 0 replies; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-22 18:27 UTC (permalink / raw) To: bloat On Fri, Aug 22, 2014 at 07:07:00PM +0200, Jan Ceuleers wrote: > How does that work in the light of STA MAC address randomisation during > the scan stage? As I understand it, such STAs only use their "real" MAC > address for actually connecting to the AP, but hide their identity in > probe requests. So a dual-band AP can't rely on a STA's MAC address > being the same in both bands. No idea. On a guess, maybe they only randomize ever so often? Or maybe they don't get band-selected. /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-20 22:05 ` Jim Gettys ` (2 preceding siblings ...) 2014-08-21 7:04 ` David Lang @ 2014-08-21 8:58 ` Steinar H. Gunderson 2014-08-22 23:34 ` David Lang 2014-08-21 9:23 ` Mikael Abrahamsson 4 siblings, 1 reply; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-21 8:58 UTC (permalink / raw) To: bloat On Wed, Aug 20, 2014 at 06:05:57PM -0400, Jim Gettys wrote: > The reason why 802.11 works ok at IETF and NANOG is that: > o) they use Cisco enterprise AP's, which are not badly over buffered. I > don't have data on which enterprise AP's are overbuffered. Note that there's a lot more to this kind of solution than “not badly overbuffered”. In particular, you have automated systems for channel assignment, for biasing people onto 5 GHz (which has 10x the number of nonoverlapping channels) and for forcing people to be load-balanced between the different APs. All of this helps in high-density. A lot of what's problematic in crowded areas is actually control traffic, not data traffic, especially since it is sent on the lowest basic rate. (So, well, one thing you do is to set 11Mbit or whatever as the lowest basic rate instead of 1Mbit...) /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 8:58 ` Steinar H. Gunderson @ 2014-08-22 23:34 ` David Lang 2014-08-22 23:41 ` Steinar H. Gunderson 0 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-22 23:34 UTC (permalink / raw) To: Steinar H. Gunderson; +Cc: bloat [-- Attachment #1: Type: TEXT/PLAIN, Size: 1650 bytes --] On Thu, 21 Aug 2014, Steinar H. Gunderson wrote: > On Wed, Aug 20, 2014 at 06:05:57PM -0400, Jim Gettys wrote: >> The reason why 802.11 works ok at IETF and NANOG is that: >> o) they use Cisco enterprise AP's, which are not badly over buffered. I >> don't have data on which enterprise AP's are overbuffered. > > Note that there's a lot more to this kind of solution than “not badly > overbuffered”. In particular, you have automated systems for channel > assignment, for biasing people onto 5 GHz (which has 10x the number of > nonoverlapping channels) and for forcing people to be load-balanced between > the different APs. All of this helps in high-density. is there actually anything for this in the 802.11 protocol? or is this just the controller noticing that this MAC address has shown up on 5GHz in the past so it opts to not respond to requests to associate with a 2.4GHz AP? Social Enginnering works well for this without a need to technical tricks (Scale-slow on 2.4, Scale on 5) > A lot of what's problematic in crowded areas is actually control traffic, > not data traffic, especially since it is sent on the lowest basic rate. > (So, well, one thing you do is to set 11Mbit or whatever as the lowest basic > rate instead of 1Mbit...) Yep, if the rate of control traffic could be set to be faster it would help a lot. When you transmit slower, it greatly magnifies the chances of something else coming up on the air and clobbering you so your lengthy broadcast is worthless. But I thought that Dave Taht had been experimenting with this in cerowrt and found it didn't actually work well in the real world. David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-22 23:34 ` David Lang @ 2014-08-22 23:41 ` Steinar H. Gunderson 2014-08-22 23:52 ` David Lang 0 siblings, 1 reply; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-22 23:41 UTC (permalink / raw) To: David Lang; +Cc: bloat On Fri, Aug 22, 2014 at 04:34:09PM -0700, David Lang wrote: >> Note that there's a lot more to this kind of solution than “not badly >> overbuffered”. In particular, you have automated systems for channel >> assignment, for biasing people onto 5 GHz (which has 10x the number of >> nonoverlapping channels) and for forcing people to be load-balanced between >> the different APs. All of this helps in high-density. > is there actually anything for this in the 802.11 protocol? or is > this just the controller noticing that this MAC address has shown up > on 5GHz in the past so it opts to not respond to requests to > associate with a 2.4GHz AP? The latter. > Social Enginnering works well for this without a need to technical > tricks (Scale-slow on 2.4, Scale on 5) Having two separate ESSIDs is a pain, though; it means that when you go into the outskirts of your coverage zones, you need to manually switch to 2.4. > Yep, if the rate of control traffic could be set to be faster it > would help a lot. When you transmit slower, it greatly magnifies the > chances of something else coming up on the air and clobbering you so > your lengthy broadcast is worthless. > > But I thought that Dave Taht had been experimenting with this in > cerowrt and found it didn't actually work well in the real world. I've tested this in the real world. It works really well. (It's also pretty much standard practice in high-density Wi-Fi, from what I've been told.) /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-22 23:41 ` Steinar H. Gunderson @ 2014-08-22 23:52 ` David Lang 2014-08-22 23:56 ` Steinar H. Gunderson 0 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-22 23:52 UTC (permalink / raw) To: Steinar H. Gunderson; +Cc: bloat [-- Attachment #1: Type: TEXT/PLAIN, Size: 1853 bytes --] On Sat, 23 Aug 2014, Steinar H. Gunderson wrote: > On Fri, Aug 22, 2014 at 04:34:09PM -0700, David Lang wrote: >>> Note that there's a lot more to this kind of solution than “not badly >>> overbuffered”. In particular, you have automated systems for channel >>> assignment, for biasing people onto 5 GHz (which has 10x the number of >>> nonoverlapping channels) and for forcing people to be load-balanced between >>> the different APs. All of this helps in high-density. >> is there actually anything for this in the 802.11 protocol? or is >> this just the controller noticing that this MAC address has shown up >> on 5GHz in the past so it opts to not respond to requests to >> associate with a 2.4GHz AP? > > The latter. > >> Social Enginnering works well for this without a need to technical >> tricks (Scale-slow on 2.4, Scale on 5) > > Having two separate ESSIDs is a pain, though; it means that when you go into > the outskirts of your coverage zones, you need to manually switch to 2.4. that depends on how many APs you put in place. you really should be trying to cover the area with 5GHz, and since you have so many more channels available, you can afford to use more power on 5GHs and give each AP a bigger footprint. >> Yep, if the rate of control traffic could be set to be faster it >> would help a lot. When you transmit slower, it greatly magnifies the >> chances of something else coming up on the air and clobbering you so >> your lengthy broadcast is worthless. >> >> But I thought that Dave Taht had been experimenting with this in >> cerowrt and found it didn't actually work well in the real world. > > I've tested this in the real world. It works really well. (It's also pretty > much standard practice in high-density Wi-Fi, from what I've been told.) Ok, then I'm back to "how can I do this on OpenWRT"? :-) David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-22 23:52 ` David Lang @ 2014-08-22 23:56 ` Steinar H. Gunderson 2014-08-23 0:03 ` Steinar H. Gunderson 0 siblings, 1 reply; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-22 23:56 UTC (permalink / raw) To: David Lang; +Cc: bloat On Fri, Aug 22, 2014 at 04:52:55PM -0700, David Lang wrote: > that depends on how many APs you put in place. you really should be > trying to cover the area with 5GHz, and since you have so many more > channels available, you can afford to use more power on 5GHs and > give each AP a bigger footprint. Sure. But there are situations where this isn't possible for a variety of reasons, in particular when your area starts getting fuzzy (ie., for a university campus, how far outdoor do you want to support). >> I've tested this in the real world. It works really well. (It's also pretty >> much standard practice in high-density Wi-Fi, from what I've been told.) > Ok, then I'm back to "how can I do this on OpenWRT"? :-) No idea there, sorry. /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-22 23:56 ` Steinar H. Gunderson @ 2014-08-23 0:03 ` Steinar H. Gunderson 0 siblings, 0 replies; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-23 0:03 UTC (permalink / raw) To: David Lang; +Cc: bloat On Sat, Aug 23, 2014 at 01:56:30AM +0200, Steinar H. Gunderson wrote: >>> I've tested this in the real world. It works really well. (It's also pretty >>> much standard practice in high-density Wi-Fi, from what I've been told.) >> Ok, then I'm back to "how can I do this on OpenWRT"? :-) > No idea there, sorry. Oh, if it helps you anything, it seems to be based on 802.11h, and http://wiki.openwrt.org/doc/uci/wireless seems to indicate it's supported... for some drivers only. Maybe. The information element number seems to be 0x20 (WLAN_EID_PWR_CONSTRAINT), and I can see it (undecoded) in iwlist: IE: Unknown: 200103 Which means length=1, turn down 3 dB from regulatory limits (which is inferred from country, set in some other IE). /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-20 22:05 ` Jim Gettys ` (3 preceding siblings ...) 2014-08-21 8:58 ` Steinar H. Gunderson @ 2014-08-21 9:23 ` Mikael Abrahamsson 2014-08-21 9:30 ` Steinar H. Gunderson 4 siblings, 1 reply; 56+ messages in thread From: Mikael Abrahamsson @ 2014-08-21 9:23 UTC (permalink / raw) To: Jim Gettys; +Cc: bloat [-- Attachment #1: Type: TEXT/PLAIN, Size: 1707 bytes --] On Wed, 20 Aug 2014, Jim Gettys wrote: > On Wed, Aug 20, 2014 at 3:12 AM, Eggert, Lars <lars@netapp.com> wrote: > >> On 2014-8-19, at 18:45, Dave Taht <dave.taht@gmail.com> wrote: >>> I figured y'all would be bemused by the wifi performance in the sigcomm >>> main conference room this morning... >>> >>> http://snapon.lab.bufferbloat.net/~d/sigcomm_tuesday.png >> >> There is a reason we budgeted a 1G uplink for SIGCOMM Helsinki and made >> sure we had sufficient AP coverage... >> > > And what kinds of AP's? All the 1G guarantees you is that your bottleneck > is in the wifi hop, and they can suffer as badly as anything else > (particularly consumer home routers). > > The reason why 802.11 works ok at IETF and NANOG is that: > o) they use Cisco enterprise AP's, which are not badly over buffered. I > don't have data on which enterprise AP's are overbuffered. > o) they do a good job of placing the AP's, given a lot of experience > o) they turn on RED in the router, which, since there is a lot of > aggregated traffic, can actually help rather than hurt, and keep TCP > decently policed. > o) they play some interesting diffserv marking tricks to prioritize some > traffic, getting part of the effect the fq_codel gives you in its "new > flow" behavior by manual configuration. Fq_codel does much better without > having to mess around like this. I also remember a problem that was solved by turning down the transmit power of the APs, as they were causing problems due to too much interference between them. Sometimes the solutions aren't all intuitive, and +1 on the experience of running these kinds of networks being important. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 9:23 ` Mikael Abrahamsson @ 2014-08-21 9:30 ` Steinar H. Gunderson 2014-08-22 23:30 ` David Lang 0 siblings, 1 reply; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-21 9:30 UTC (permalink / raw) To: bloat On Thu, Aug 21, 2014 at 11:23:01AM +0200, Mikael Abrahamsson wrote: > I also remember a problem that was solved by turning down the > transmit power of the APs, as they were causing problems due to too > much interference between them. Sometimes the solutions aren't all > intuitive, and +1 on the experience of running these kinds of > networks being important. This is very common on 2.4 GHz (and will happen by itself in a controller-based setup). Anything that reduces the cell size is good (assuming you have enough APs for all the resulting cells); if you go to big stadiums etc., you will also see lots of directional antennas to reduce the cell size further. Note that the AP also can ask _clients_ to reduce their power level accordingly. /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-21 9:30 ` Steinar H. Gunderson @ 2014-08-22 23:30 ` David Lang 2014-08-22 23:40 ` Steinar H. Gunderson 0 siblings, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-22 23:30 UTC (permalink / raw) To: Steinar H. Gunderson; +Cc: bloat On Thu, 21 Aug 2014, Steinar H. Gunderson wrote: > On Thu, Aug 21, 2014 at 11:23:01AM +0200, Mikael Abrahamsson wrote: >> I also remember a problem that was solved by turning down the >> transmit power of the APs, as they were causing problems due to too >> much interference between them. Sometimes the solutions aren't all >> intuitive, and +1 on the experience of running these kinds of >> networks being important. > > This is very common on 2.4 GHz (and will happen by itself in a > controller-based setup). Anything that reduces the cell size is good > (assuming you have enough APs for all the resulting cells); if you go to big > stadiums etc., you will also see lots of directional antennas to reduce the > cell size further. > > Note that the AP also can ask _clients_ to reduce their power level > accordingly. can you give a pointer to this? I'd like to check if openwrt does this and how to configure it (and if not, what would it take to get it added) David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-22 23:30 ` David Lang @ 2014-08-22 23:40 ` Steinar H. Gunderson 0 siblings, 0 replies; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-22 23:40 UTC (permalink / raw) To: David Lang; +Cc: bloat On Fri, Aug 22, 2014 at 04:30:29PM -0700, David Lang wrote: > can you give a pointer to this? I'd like to check if openwrt does > this and how to configure it (and if not, what would it take to get > it added) Define “pointer”. I don't have access to the 802.11 standards, but you can clearly see e.g. Linux doing this when the AP asks: [70982.370680] wlan0: Limiting TX power to 21 (24 - 3) dBm as advertised by 00:23:eb:c1:0f:60 /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-19 16:45 [Bloat] sigcomm wifi Dave Taht 2014-08-20 7:12 ` Eggert, Lars @ 2014-08-20 8:30 ` Steinar H. Gunderson 2014-08-21 6:58 ` David Lang 1 sibling, 1 reply; 56+ messages in thread From: Steinar H. Gunderson @ 2014-08-20 8:30 UTC (permalink / raw) To: bloat On Tue, Aug 19, 2014 at 11:45:21AM -0500, Dave Taht wrote: > I figured y'all would be bemused by the wifi performance in the sigcomm > main conference room this morning... > > http://snapon.lab.bufferbloat.net/~d/sigcomm_tuesday.png High-density 802.11 is hard. Doesn't have to be bloat, though. =) /* Steinar */ -- Homepage: http://www.sesse.net/ ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-20 8:30 ` Steinar H. Gunderson @ 2014-08-21 6:58 ` David Lang 0 siblings, 0 replies; 56+ messages in thread From: David Lang @ 2014-08-21 6:58 UTC (permalink / raw) To: Steinar H. Gunderson; +Cc: bloat On Wed, 20 Aug 2014, Steinar H. Gunderson wrote: > On Tue, Aug 19, 2014 at 11:45:21AM -0500, Dave Taht wrote: >> I figured y'all would be bemused by the wifi performance in the sigcomm >> main conference room this morning... >> >> http://snapon.lab.bufferbloat.net/~d/sigcomm_tuesday.png > > High-density 802.11 is hard. Doesn't have to be bloat, though. =) yep, hard but not impossible (even with WNDR3800 routers) you have to work hard to get the RF side as clean as it can get. David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi @ 2014-08-24 3:49 Hal Murray 2014-08-24 3:52 ` Jonathan Morton 2014-08-24 5:14 ` David Lang 0 siblings, 2 replies; 56+ messages in thread From: Hal Murray @ 2014-08-24 3:49 UTC (permalink / raw) To: bloat; +Cc: Hal Murray >> Yep... I remember a neat paper from colleagues at Trento University that >> piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions >> between TCP's ACKs and other data packets - really nice. Not sure if it >> wasn't just simulations, though. > that's a neat hack, but I don't see it working, except when one end of the > wireless link is also the endpoint of the TCP connection (and then only for > acks from that device) That could be generalized to piggybacking any handy small packet onto the link layer ACK. Of course, then you have to send back a link layer ACK for the extra info. Does that converge? -- These are my opinions. I hate spam. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 3:49 Hal Murray @ 2014-08-24 3:52 ` Jonathan Morton 2014-08-24 5:14 ` David Lang 1 sibling, 0 replies; 56+ messages in thread From: Jonathan Morton @ 2014-08-24 3:52 UTC (permalink / raw) To: Hal Murray; +Cc: bloat On 24 Aug, 2014, at 6:49 am, Hal Murray wrote: >>> Yep... I remember a neat paper from colleagues at Trento University that >>> piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions >>> between TCP's ACKs and other data packets - really nice. Not sure if it >>> wasn't just simulations, though. > >> that's a neat hack, but I don't see it working, except when one end of the >> wireless link is also the endpoint of the TCP connection (and then only for >> acks from that device) > > That could be generalized to piggybacking any handy small packet onto the > link layer ACK. > > Of course, then you have to send back a link layer ACK for the extra info. > Does that converge? No, you don't. If the link-layer ack (plus payload) didn't get through, the other end will (usually) retransmit the frame anyway. So you don't get recursive acks. :-) - Jonathan Morton ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 3:49 Hal Murray 2014-08-24 3:52 ` Jonathan Morton @ 2014-08-24 5:14 ` David Lang 2014-08-25 7:43 ` Michael Welzl 1 sibling, 1 reply; 56+ messages in thread From: David Lang @ 2014-08-24 5:14 UTC (permalink / raw) To: Hal Murray; +Cc: bloat On Sat, 23 Aug 2014, Hal Murray wrote: >>> Yep... I remember a neat paper from colleagues at Trento University that >>> piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions >>> between TCP's ACKs and other data packets - really nice. Not sure if it >>> wasn't just simulations, though. > >> that's a neat hack, but I don't see it working, except when one end of the >> wireless link is also the endpoint of the TCP connection (and then only for >> acks from that device) > > That could be generalized to piggybacking any handy small packet onto the > link layer ACK. > > Of course, then you have to send back a link layer ACK for the extra info. > Does that converge? if you aren't talking between the two endpoints of the wireless connection, probably :-) but fairness would be an issue for something like this. what constitues a 'small amount of data' to try and piggyback, and what happens if you are talking between endpoints, are the two allowed to monopolize the airtime? but backing up a step, finding airtime for the ack is just as hard as finding airtime for the next transmission. David Lang ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [Bloat] sigcomm wifi 2014-08-24 5:14 ` David Lang @ 2014-08-25 7:43 ` Michael Welzl 0 siblings, 0 replies; 56+ messages in thread From: Michael Welzl @ 2014-08-25 7:43 UTC (permalink / raw) To: David Lang; +Cc: Hal Murray, bloat On 24. aug. 2014, at 07:14, David Lang wrote: > On Sat, 23 Aug 2014, Hal Murray wrote: > >>>> Yep... I remember a neat paper from colleagues at Trento University that >>>> piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions >>>> between TCP's ACKs and other data packets - really nice. Not sure if it >>>> wasn't just simulations, though. >> >>> that's a neat hack, but I don't see it working, except when one end of the >>> wireless link is also the endpoint of the TCP connection (and then only for >>> acks from that device) >> >> That could be generalized to piggybacking any handy small packet onto the >> link layer ACK. >> >> Of course, then you have to send back a link layer ACK for the extra info. >> Does that converge? > > if you aren't talking between the two endpoints of the wireless connection, probably :-) > > but fairness would be an issue for something like this. what constitues a 'small amount of data' to try and piggyback, and what happens if you are talking between endpoints, are the two allowed to monopolize the airtime? I agree - there'd have to be a size limit placed on what you really do piggyback on link layer ACKs. TCP ACK size can vary, depending on SACK... > but backing up a step, finding airtime for the ack is just as hard as finding airtime for the next transmission. I think not, don't link layer ACKs get to use a smaller CW? Or is this just me remembering it wrongly? Cheers, Michael ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2014-09-01 2:14 UTC | newest] Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-19 16:45 [Bloat] sigcomm wifi Dave Taht 2014-08-20 7:12 ` Eggert, Lars 2014-08-20 14:01 ` Dave Taht 2014-08-20 22:05 ` Jim Gettys 2014-08-21 6:52 ` Eggert, Lars 2014-08-21 7:11 ` Michael Welzl 2014-08-21 8:30 ` David Lang 2014-08-22 23:07 ` Michael Welzl 2014-08-22 23:50 ` David Lang 2014-08-23 19:26 ` Michael Welzl 2014-08-23 23:29 ` Jonathan Morton 2014-08-23 23:40 ` Steinar H. Gunderson 2014-08-23 23:49 ` Jonathan Morton 2014-08-24 1:33 ` David Lang 2014-08-24 2:29 ` Jonathan Morton 2014-08-24 5:12 ` David Lang 2014-08-24 6:26 ` Jonathan Morton 2014-08-24 8:24 ` David Lang 2014-08-24 9:20 ` Jonathan Morton 2014-08-25 7:25 ` Michael Welzl 2014-08-30 7:20 ` Jonathan Morton 2014-08-31 20:46 ` Simon Barber 2014-08-25 7:35 ` Michael Welzl 2014-08-24 1:09 ` David Lang 2014-08-25 8:01 ` Michael Welzl 2014-08-25 8:19 ` Sebastian Moeller 2014-08-25 8:33 ` Michael Welzl 2014-08-25 9:18 ` Alex Burr 2014-08-31 22:37 ` David Lang 2014-08-31 23:09 ` Simon Barber 2014-09-01 0:25 ` David Lang 2014-09-01 2:14 ` Simon Barber 2014-08-31 22:35 ` David Lang 2014-08-21 6:56 ` David Lang 2014-08-21 7:04 ` David Lang 2014-08-21 9:46 ` Jesper Dangaard Brouer 2014-08-21 19:49 ` David Lang 2014-08-21 19:57 ` Steinar H. Gunderson 2014-08-22 17:07 ` Jan Ceuleers 2014-08-22 18:27 ` Steinar H. Gunderson 2014-08-21 8:58 ` Steinar H. Gunderson 2014-08-22 23:34 ` David Lang 2014-08-22 23:41 ` Steinar H. Gunderson 2014-08-22 23:52 ` David Lang 2014-08-22 23:56 ` Steinar H. Gunderson 2014-08-23 0:03 ` Steinar H. Gunderson 2014-08-21 9:23 ` Mikael Abrahamsson 2014-08-21 9:30 ` Steinar H. Gunderson 2014-08-22 23:30 ` David Lang 2014-08-22 23:40 ` Steinar H. Gunderson 2014-08-20 8:30 ` Steinar H. Gunderson 2014-08-21 6:58 ` David Lang 2014-08-24 3:49 Hal Murray 2014-08-24 3:52 ` Jonathan Morton 2014-08-24 5:14 ` David Lang 2014-08-25 7:43 ` Michael Welzl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox