[Bloat] sigcomm wifi

Sat Aug 23 21:33:13 EDT 2014

On Sun, 24 Aug 2014, Jonathan Morton wrote:

> I've done some reading on how wifi actually works, and what mechanisms the latest variants use to improve performance.  It might be helpful to summarise my understanding here - biased towards the newer variants, since they are by now widely deployed.
>
> First a note on the variants themselves:
>
> 802.11 without suffix is obsolete and no longer in use.
> 802.11a was the original 5GHz band version, giving 54Mbps in 20MHz channels.
> 802.11b was the first "affordable" version, using 2.4GHz and giving 11Mbps in 20MHz channels.
> 802.11g brought the 802.11a modulation schemes and (theoretical) performance to the 2.4GHz band.
> 802.11n is dual-band, but optionally.  Aggregation, 40MHz channels, single-target MIMO.
> 802.11ac is 5GHz only.  More aggregation, 80 & 160MHz channels, multi-target MIMO.  Rationalised options, dropping many 'n' features that are more trouble than they're worth.  Coexists nicely with older 20MHz-channel equipment, and nearby APs with overlapping spectrum.

> My general impression is that 802.11ac makes a serious effort to improve 
> matters in heavily-congested, many-clients scenarios, which was where earlier 
> variants had the most trouble.  If you're planning to set up or go to a major 
> conference, the best easy thing you can do is get 'ac' equipment all round - 
> if nothing else, it's guaranteed to support the 5GHz band.  Of course, we're 
> not just considering the easy solutions.

If ac had reasonable drivers available I would agree, but when you are limited 
to factory firmware, it's not good.

> Now for some technical details:
>
> The wireless spectrum is fundamentally a shared-access medium.  It also has 
> the complication of being noisy and having various path-loss mechanisms, and 
> of the "hidden node" problem where one client might not be able to hear 
> another client's transmission, even though both are in range of the AP.
>
> Thus wifi uses a CSMA/CA algorithm as follows:
>
> 1) Listen for competing carrier.  If heard, backoff and retry later. 
> (Listening is continuous, and detected preambles are used to infer the 
> time-length of packets when the data modulation is unreadable.)
> 2) Perform an RTS/CTS handshake.  If CTS doesn't arrive, backoff and retry later.
> 3) Transmit, and await acknowledgement.  If no ack, backoff and retry later, 
> possibly using different modulation.
>
> This can be compared to Ethernet's CSMA/CD algorithm:
>
> 1) Listen for competing carrier.  If heard, backoff and retry later.
> 2) Transmit, listening for collision with a competing transmission.  If 
> collision, backoff and retry later.
>
> In both cases, the backoff is random and exponentially increasing, to reduce 
> the chance of repeated collisions.
>
> The 2.4GHz band is chock-full of noise sources, from legacy 802.11b/g 
> equipment to cordless phones, Bluetooth, and even microwave ovens - which 
> generate the best part of a kilowatt of RF energy, but somehow manage to 
> contain the vast majority of it within the cavity.  It's also a relatively 
> narrow band, with only three completely separate 20MHz channels available in 
> most of the world (four in Japan).
>
> This isn't a massive concern for home use, but consumers still notice the 
> effects surprisingly often.  Perhaps they live in an apartment block with lots 
> of devices and APs crowded together in an unmanaged mess.  Perhaps they have a 
> large home to themselves, but a bunch of noisy equipment reduces the effective 
> range and reliability of their network.  It's not uncommon to hear about 
> networks that drop out whenever the phone rings, thanks to an old cordless 
> phone.
>
> The 5GHz band is much less crowded.  There are several channels which are 
> shared with weather radar, so wifi equipment can't use those unless they are 
> capable of detecting the radar transmissions, but even without those there are 
> far more 20MHz channels available.  There's also much less legacy equipment 
> using it - even 802.11a is relatively uncommon (and is fairly benign in 
> behaviour).  The downside is that 5GHz doesn't propagate as far, or as easily 
> through walls.
>
> Wider bandwidth channels can be used to shorten the time taken for each 
> transmission.  However, this effect is not linear, because the RTS/CTS 
> handshake and preamble are fixed overheads (since they must be transmitted at 
> a low speed to ensure that all clients can hear them), taking the same length 
> of time regardless of any other enhancements.  This implies that in seriously 
> geographically-congested scenarios, 20MHz channels (and lots of APs to use 
> them all) are still the most efficient.  MIMO can still be used to beneficial 
> effect in these situations.

Another good reason for sticking to 20MHz channels is that it gives you more 
channels available, so you can deploy more APs without them interfering with 
each other's footprints. This can significantly reduce the distance between the 
mobile user and the closest AP.

> Multi-target MIMO allows an AP to transmit to several clients simultaneously, 
> without requiring the client to support MIMO themselves.  This requires the 
> AP's antennas and radios to be dynamically reconfigured for beamforming - 
> giving each client a clear version of its own signal and a null for the other 
> signals - which is a tricky procedure.  APs that do implement this well are 
> highly valuable in congested situations.

how many different targets can such APs handle? if it's only a small number, I'm 
not sure it helps much.

Also, is this a transmit-only feature? or can it help decipher multiple mobile 
devices transmitting at the same time?

> Single-target MIMO allows higher bandwidth between one client at a time and 
> the AP.  Both the AP and the client must support MIMO for this to work. 
> There are physical constraints which limit the ability for handheld devices to 
> support MIMO.  In general, this form of MIMO improves throughput in the home, 
> but is not very useful in congested situations.  High individual throughput is 
> not what's needed in a crowded arena; rather, reliable if slow individual 
> throughput, reasonable latency, and high aggregate throughput.

well, if the higher bandwidth to an individual user ended up reducing the 
airtime that user takes up, it could help. but I suspect that the devices that 
do this couldn't keep track of a few dozen endpoints.

> Choosing the most effective radio bandwidth and modulation is a difficult 
> problem.  The Minstrel algorithm seems to be an effective solution for general 
> traffic.  Some manual constraints may be appropriate in some circumstances, 
> such as reducing the maximum radio bandwidth (trading throughput of one AP 
> against coexistence with other APs) and increasing the modulation rate of 
> management broadcasts (reducing per-packet overhead).

agreed.

> Packet aggregation allow several IP packets to be combined into a single 
> wireless transmission.  This avoids performing the CSMA/CA steps repeatedly, 
> which is a considerable overhead.  There are several types of packet 
> aggregation - the type adopted by 802.11ac allows individual IP packets within 
> a transmission to be link-layer acknowledged separately, so that a minor 
> corruption doesn't require transmission of the entire aggregate.  By contrast, 
> 802.11n also supported a version which did require that, despite a slightly 
> lower overhead.

There are other overheads that are saved with this, since the TCP packet is 
encapsulated in the wireless transmission, things like link-layer encryption and 
other encasulation overhead benefit from this aggregation.

But with the n style 'all or nothing' mode, the fact that the transmission takes 
longer, and is therefor more likely to get clobbered is a much more significant 
problem.

This needs to be tweakable. In low-congestion, high throughput situations, you 
want to do a lot of aggregation, in high-congestion situations, you want to 
limit this.

   note, "low-contstion, high throughput" doesn't have to mean a small number of 
stations. It could be a significant number of mobile devices that are all 
watching streaming video from the AP. The AP could be transmitting nearly 
continuously, but the mobile devices transmit only in response, so there would 
be very little contention)

> Implicit in the packet-aggregation system is the problem of collecting packets 
> to aggregate.  Each transmission is between the AP and one client, so the 
> packets aggregated by the AP all have to be for the same client.  (The client 
> can assume that all packets go to the AP.)  A fair-queueing algorithm could 
> have the effect of forming per-client queues, so several suitable packets 
> could easily be located in such a queue.  In a straight FIFO queue, however, 
> packets for the same client are likely to be separated in the queue and thus 
> difficult to find.  It is therefore *obviously* in the AP's interest to 
> implement a fair-queueing algorithm based on client MAC address, even if it 
> does nothing else to manage congestion.
>
> NB: if a single aggregate could be intended to be heard by more than one 
> client, then the complexity of multi-target beamforming MIMO would not be 
> necessary.  This is how I infer the strict one-to-one nature of data 
> transmissions, as distinct from management broadcasts.

yes, multicast has a lot of potential benefits, but it's never lived up to it's 
promises in the real world. In effect, everything is unicast, even if you have a 
lot of people watching the same video, they are all at slightly different 
points, needing slightly different packets retransmitted, etc.

In a radio environment this is even more so. one station may be hearing 
something perfectly while another is unable to hear the same packet due to a 
hidden node transmission.

> On 23 Aug, 2014, at 10:26 pm, Michael Welzl wrote:
>
>>>> because of the "function" i wrote above: the more you retry, the more you 
>>>> need to buffer when traffic continuously arrives because you're stuck 
>>>> trying to send a frame again.
>>>
>>> huh, I'm missing something here, retrying sends would require you to buffer 
>>> more when sending.
>>
>> aren't you the saying the same thing as I ?  Sorry else, I might have 
>> expressed it confusingly somehow
>
> There should be enough buffering to allow effective aggregation, but as little 
> as possible on top of that.  I don't know how much aggregation can be done, 
> but I assume that there is a limit, and that it's not especially high in terms 
> of full-length packets.  After all, tying up the channel for long periods of 
> time is unfair to other clients - a typical latency/throughput tradeoff.

Aggregation is not necessarily worth pursuing.

> Equally clearly, in a heavily congested scenario the AP benefits from having a 
> lot of buffer divided among a large number of clients, but each client should 
> have only a small buffer.

the key thing is how long the data sits in the buffer. If it sits too long, it 
doesn't matter that it's the only packet for this client, it still is too much 
buffering.

>>> If people are retrying when they really don't need to, that cuts down on the avialable airtime.
>>
>> Yes
>
> Given that TCP retries on loss, and UDP protocols are generally loss-tolerant 
> to a degree, there should therefore be a limit on how hard the link-layer 
> stuff tries to get each individual packet through.  Minstrel appears to be 
> designed around a time limit for that sort of thing, which seems sane - and 
> they explicitly talk about TCP retransmit timers in that context.
>
> With that said, link-layer retries are a valid mechanism to minimise 
> unnecessarily lost packets.  It's also not new - bus/hub Ethernet does this on 
> collision detection.  What Ethernet doesn't have is the link-layer ack, so 
> there's an additional set of reasons why a backoff-and-retry might happen in 
> wifi.
>
> Modern wifi variants use packet aggregation to improve efficiency.  This only 
> works when there are multiple packets to send at a time from one place to a 
> specific other place - which is more likely when the link is congested.  In 
> the event of a retry, it makes sense to aggregate newly buffered packets with 
> the original ones, to reduce the number of negotiation and retry cycles.

up to a point. It could easily be that the right thing to do is NOT to aggregate 
the new packets because it will make it far more likely that they will all fail 
(ac mitigates this in theory, but until there is really driver support, the 
practice is questionable)

>>> But if you have continual transmissions taking place, so you have a hard 
>>> time getting a chance to send your traffic, then you really do have 
>>> congestion and should be dropping packets to let the sender know that it 
>>> shouldn't try to generate as much.
>>
>> Yes; but the complexity that I was pointing at (but maybe it's a simple 
>> parameter, more like a 0 or 1 situation in practice?) lies in the word 
>> "continual". How long do you try before you decide that the sending TCP 
>> should really think it *is* congestion?  To really optimize the behavior, 
>> that would have to depend on the RTT, which you can't easily know.
>
> There are TCP congestion algorithms which explicitly address this (eg. 
> Westwood+), by reacting only a little to individual drops, but reacting more 
> rapidly if drops occur frequently.  In principle they should also react 
> quickly to ECN, because that is never triggered by random noise loss alone.

correct.

David Lang