[Bloat] A hard ceiling for the size of a drop-tail queue

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* [Bloat] A hard ceiling for the size of a drop-tail queue
@ 2011-03-11  1:15 Jonathan Morton
  2011-03-11 22:21 ` Jonathan Morton
  0 siblings, 1 reply; 2+ messages in thread
From: Jonathan Morton @ 2011-03-11  1:15 UTC (permalink / raw)
  To: bloat

There are two mechanisms, in TCP and IP respectively, which suggest theoretical limits to the acceptable size of FIFO queues (in routers, modems and other places):

1) The "Initial RTO" value in TCP is 3 seconds, or exceptionally 2.5 seconds.  This suggests a hard maximum of 2 seconds' buffering in order to avoid needless retransmits during TCP setup.

2) The initial TTL of IP packets is typically 64 hops, and an assumption in TCP is that a packet will not live in the Internet for longer than 120 seconds.  This also strongly suggests a buffering limit of about 2 seconds per hop.  Admittedly, such buffering is only likely to occur in the bottleneck link, but 30 seconds RTT has been observed in the wild and this is already uncomfortably close to 120.

The next question is, of course, how many bytes is 2 seconds of buffering?  This depends on the connection type.  For a full-duplex Ethernet or optical link, 2 seconds is a lot of data and so such a limit is almost meaningless.  For an analogue modem or similar "thin" and stable link, the limit is usually easy to calculate, yet many professional-grade devices have buffers considerably larger than this.

For a shared-media connection, such as bus-type Ethernet or 802.11, such a naive calculation is very dangerous, because the throughput of the connection cannot be reliably predicted, especially under congested conditions.  In fact, wireless protocols have several positive-feedback mechanisms which virtually guarantee that once severe congestion occurs (as in a typical large conference room), the network will collapse (ie. goodput will approach zero) and will remain in the collapsed state until the load is removed completely.  Most of these mechanisms are explicitly not present in classic Ethernet, which will continue to function usefully (if minimally) in a congested state, and recovers as soon as the load is reduced.

NB: I am writing this from a relatively high-level understanding of how 802.11 works.  I may therefore miss some details.

1) A radio transceiver cannot detect a collision once it has begun to transmit - it goes deaf.  So if two transmissions overlap, it is likely that both colliding packets will be lost.  Furthermore, a transceiver cannot reliably detect another transmission even before starting to transmit, because it may be audible to the base station but not itself.  This latter limitation is not present in Ethernet.

2) There is a distinctly finite number of channels available to transmit on (especially in the usual 2.4 GHz band), and in 802.11 (especially newer versions) these channels overlap to some degree.  The more radios are transmitting simultaneously on one band, the greater the noise level seen by other radios sharing the same band, even if they are operating outside each others' normal ranges.  Obviously, overlapping transmissions due to collisions will increase the background noise level, which will in turn lead to a greater incidence of collisions.

3) Increased noise also causes transceivers to step down in speed.  This reduces the capacity of the link, increases the time needed to transmit a packet, and increases the noise energy generated per collision.  These also increase the probability of a given packet colliding.  This in turn causes further step-down in speed.

4) Once the goodput has deteriorated to the extent that some RTTs are over 3 seconds - at which point the experience is already qualitatively slow - TCP will retransmit the setup packets, further increasing network load.  DNS has a similar retry mechanism, so applications which have not yet reached the TCP setup phase are similarly affected.  At this point it is very likely that every device in the airspace is constantly attempting to transmit something, yet is deafened to the correct opportunities to transmit without colliding because of the noise from everyone else also getting it wrong.

In the early stages of the above sequence, advanced MIMO base stations can mitigate the problem slightly by being able to receive packets from more than one transmitter at once, provided they are in substantially different directions (which is the most likely deafening scenario).  In the later stages this does not help because most collisions will involve multiple colliders, some of which will be in similar directions and thus indistinguishable to the MIMO transceiver.

So how many bytes is a 2-second queue on a wireless network?  It can be no more than the *minimum* link speed (in bytes/sec) divided by the *maximum* number of nodes.  This allows a 100% overhead for framing and congestion avoidance.  For 802.11 the minimum link speed is 1Mbps = 100KB/s, so with 100 nodes (not a very large conference) the buffer should be no more than 1KB - which is less than one full packet.  And since there are only about 3 non-overlapping channels in the 2.4GHz band, then even if you spread those 100 nodes over 3 base stations, you only get 3KB = 2 packets of buffer space.  (And this includes the base station!)

So I'm surprised that these conference networks ever function at all.

 - Jonathan

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Bloat] A hard ceiling for the size of a drop-tail queue
  2011-03-11  1:15 [Bloat] A hard ceiling for the size of a drop-tail queue Jonathan Morton
@ 2011-03-11 22:21 ` Jonathan Morton
  0 siblings, 0 replies; 2+ messages in thread
From: Jonathan Morton @ 2011-03-11 22:21 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

On 11 Mar, 2011, at 3:15 am, Jonathan Morton wrote:

> So how many bytes is a 2-second queue on a wireless network?  It can be no more than the *minimum* link speed (in bytes/sec) divided by the *maximum* number of nodes.  This allows a 100% overhead for framing and congestion avoidance.  For 802.11 the minimum link speed is 1Mbps = 100KB/s, so with 100 nodes (not a very large conference) the buffer should be no more than 1KB - which is less than one full packet.  And since there are only about 3 non-overlapping channels in the 2.4GHz band, then even if you spread those 100 nodes over 3 base stations, you only get 3KB = 2 packets of buffer space.  (And this includes the base station!)
> 
> So I'm surprised that these conference networks ever function at all.

Coming back to this topic for a moment, I have a radical solution to the "wifi congestion collapse" problem:

If a packet has been in the send queue for more than 1 second, drop it - regardless of whether it has been sent or not.

Why?  Because if the application and the user are still interested in getting the traffic through, they will retry.  If not, dropping stale packets will stop thee retries (which will happen anyway) from clogging up the network.  The first TCP-SYN retry will be after 3 seconds, so dropping the first try after 1 second gives back 2/3rds of the airtime and prevents the queue growing.

Dave Täht gave me something to listen to, and one of the things that was mentioned was that the aggregation aims for 4ms of airtime at once.  This means that there is time for about 250 aggregate-frames per second.  This, to me, puts the sustainable number of nodes at about 100 as I suspected.  But remember that at the minimum rate of 1Mbps, 4ms is only 500 bytes - enough for TCP SYN or DNS to work, but only one-third of a full Ethernet frame.

 - Jonathan

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-03-11 22:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-11  1:15 [Bloat] A hard ceiling for the size of a drop-tail queue Jonathan Morton
2011-03-11 22:21 ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox