[Bloat] sigcomm wifi

Sat Aug 23 19:29:50 EDT 2014

I've done some reading on how wifi actually works, and what mechanisms the latest variants use to improve performance.  It might be helpful to summarise my understanding here - biased towards the newer variants, since they are by now widely deployed.

First a note on the variants themselves:

802.11 without suffix is obsolete and no longer in use.
802.11a was the original 5GHz band version, giving 54Mbps in 20MHz channels.
802.11b was the first "affordable" version, using 2.4GHz and giving 11Mbps in 20MHz channels.
802.11g brought the 802.11a modulation schemes and (theoretical) performance to the 2.4GHz band.
802.11n is dual-band, but optionally.  Aggregation, 40MHz channels, single-target MIMO.
802.11ac is 5GHz only.  More aggregation, 80 & 160MHz channels, multi-target MIMO.  Rationalised options, dropping many 'n' features that are more trouble than they're worth.  Coexists nicely with older 20MHz-channel equipment, and nearby APs with overlapping spectrum.

My general impression is that 802.11ac makes a serious effort to improve matters in heavily-congested, many-clients scenarios, which was where earlier variants had the most trouble.  If you're planning to set up or go to a major conference, the best easy thing you can do is get 'ac' equipment all round - if nothing else, it's guaranteed to support the 5GHz band.  Of course, we're not just considering the easy solutions.

Now for some technical details:

The wireless spectrum is fundamentally a shared-access medium.  It also has the complication of being noisy and having various path-loss mechanisms, and of the "hidden node" problem where one client might not be able to hear another client's transmission, even though both are in range of the AP.

Thus wifi uses a CSMA/CA algorithm as follows:

1) Listen for competing carrier.  If heard, backoff and retry later.  (Listening is continuous, and detected preambles are used to infer the time-length of packets when the data modulation is unreadable.)
2) Perform an RTS/CTS handshake.  If CTS doesn't arrive, backoff and retry later.
3) Transmit, and await acknowledgement.  If no ack, backoff and retry later, possibly using different modulation.

This can be compared to Ethernet's CSMA/CD algorithm:

1) Listen for competing carrier.  If heard, backoff and retry later.
2) Transmit, listening for collision with a competing transmission.  If collision, backoff and retry later.

In both cases, the backoff is random and exponentially increasing, to reduce the chance of repeated collisions.

The 2.4GHz band is chock-full of noise sources, from legacy 802.11b/g equipment to cordless phones, Bluetooth, and even microwave ovens - which generate the best part of a kilowatt of RF energy, but somehow manage to contain the vast majority of it within the cavity.  It's also a relatively narrow band, with only three completely separate 20MHz channels available in most of the world (four in Japan).

This isn't a massive concern for home use, but consumers still notice the effects surprisingly often.  Perhaps they live in an apartment block with lots of devices and APs crowded together in an unmanaged mess.  Perhaps they have a large home to themselves, but a bunch of noisy equipment reduces the effective range and reliability of their network.  It's not uncommon to hear about networks that drop out whenever the phone rings, thanks to an old cordless phone.

The 5GHz band is much less crowded.  There are several channels which are shared with weather radar, so wifi equipment can't use those unless they are capable of detecting the radar transmissions, but even without those there are far more 20MHz channels available.  There's also much less legacy equipment using it - even 802.11a is relatively uncommon (and is fairly benign in behaviour).  The downside is that 5GHz doesn't propagate as far, or as easily through walls.

Wider bandwidth channels can be used to shorten the time taken for each transmission.  However, this effect is not linear, because the RTS/CTS handshake and preamble are fixed overheads (since they must be transmitted at a low speed to ensure that all clients can hear them), taking the same length of time regardless of any other enhancements.  This implies that in seriously geographically-congested scenarios, 20MHz channels (and lots of APs to use them all) are still the most efficient.  MIMO can still be used to beneficial effect in these situations.

Multi-target MIMO allows an AP to transmit to several clients simultaneously, without requiring the client to support MIMO themselves.  This requires the AP's antennas and radios to be dynamically reconfigured for beamforming - giving each client a clear version of its own signal and a null for the other signals - which is a tricky procedure.  APs that do implement this well are highly valuable in congested situations.

Single-target MIMO allows higher bandwidth between one client at a time and the AP.  Both the AP and the client must support MIMO for this to work.  There are physical constraints which limit the ability for handheld devices to support MIMO.  In general, this form of MIMO improves throughput in the home, but is not very useful in congested situations.  High individual throughput is not what's needed in a crowded arena; rather, reliable if slow individual throughput, reasonable latency, and high aggregate throughput.

Choosing the most effective radio bandwidth and modulation is a difficult problem.  The Minstrel algorithm seems to be an effective solution for general traffic.  Some manual constraints may be appropriate in some circumstances, such as reducing the maximum radio bandwidth (trading throughput of one AP against coexistence with other APs) and increasing the modulation rate of management broadcasts (reducing per-packet overhead).

Packet aggregation allow several IP packets to be combined into a single wireless transmission.  This avoids performing the CSMA/CA steps repeatedly, which is a considerable overhead.  There are several types of packet aggregation - the type adopted by 802.11ac allows individual IP packets within a transmission to be link-layer acknowledged separately, so that a minor corruption doesn't require transmission of the entire aggregate.  By contrast, 802.11n also supported a version which did require that, despite a slightly lower overhead.

Implicit in the packet-aggregation system is the problem of collecting packets to aggregate.  Each transmission is between the AP and one client, so the packets aggregated by the AP all have to be for the same client.  (The client can assume that all packets go to the AP.)  A fair-queueing algorithm could have the effect of forming per-client queues, so several suitable packets could easily be located in such a queue.  In a straight FIFO queue, however, packets for the same client are likely to be separated in the queue and thus difficult to find.  It is therefore *obviously* in the AP's interest to implement a fair-queueing algorithm based on client MAC address, even if it does nothing else to manage congestion.

NB: if a single aggregate could be intended to be heard by more than one client, then the complexity of multi-target beamforming MIMO would not be necessary.  This is how I infer the strict one-to-one nature of data transmissions, as distinct from management broadcasts.

On 23 Aug, 2014, at 10:26 pm, Michael Welzl wrote:

>>> because of the "function" i wrote above: the more you retry, the more you need to buffer when traffic continuously arrives because you're stuck trying to send a frame again.
>> 
>> huh, I'm missing something here, retrying sends would require you to buffer more when sending.
> 
> aren't you the saying the same thing as I ?  Sorry else, I might have expressed it confusingly somehow

There should be enough buffering to allow effective aggregation, but as little as possible on top of that.  I don't know how much aggregation can be done, but I assume that there is a limit, and that it's not especially high in terms of full-length packets.  After all, tying up the channel for long periods of time is unfair to other clients - a typical latency/throughput tradeoff.

Equally clearly, in a heavily congested scenario the AP benefits from having a lot of buffer divided among a large number of clients, but each client should have only a small buffer.

>> If people are retrying when they really don't need to, that cuts down on the avialable airtime.
> 
> Yes

Given that TCP retries on loss, and UDP protocols are generally loss-tolerant to a degree, there should therefore be a limit on how hard the link-layer stuff tries to get each individual packet through.  Minstrel appears to be designed around a time limit for that sort of thing, which seems sane - and they explicitly talk about TCP retransmit timers in that context.

With that said, link-layer retries are a valid mechanism to minimise unnecessarily lost packets.  It's also not new - bus/hub Ethernet does this on collision detection.  What Ethernet doesn't have is the link-layer ack, so there's an additional set of reasons why a backoff-and-retry might happen in wifi.

Modern wifi variants use packet aggregation to improve efficiency.  This only works when there are multiple packets to send at a time from one place to a specific other place - which is more likely when the link is congested.  In the event of a retry, it makes sense to aggregate newly buffered packets with the original ones, to reduce the number of negotiation and retry cycles.

>> But if you have continual transmissions taking place, so you have a hard time getting a chance to send your traffic, then you really do have congestion and should be dropping packets to let the sender know that it shouldn't try to generate as much.
> 
> Yes; but the complexity that I was pointing at (but maybe it's a simple parameter, more like a 0 or 1 situation in practice?) lies in the word "continual". How long do you try before you decide that the sending TCP should really think it *is* congestion?  To really optimize the behavior, that would have to depend on the RTT, which you can't easily know.

There are TCP congestion algorithms which explicitly address this (eg. Westwood+), by reacting only a little to individual drops, but reacting more rapidly if drops occur frequently.  In principle they should also react quickly to ECN, because that is never triggered by random noise loss alone.

 - Jonathan Morton