Re: [Bloat] What is a good burst? -- AQM evaluation guidelines

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

From: Bob Briscoe <bob.briscoe@bt.com>
To: Naeem Khademi <naeem.khademi@gmail.com>
Cc: bloat Mainlinglist <bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] What is a good burst? -- AQM evaluation guidelines
Date: Sun, 15 Dec 2013 20:56:21 +0000	[thread overview]
Message-ID: <201312152056.rBFKuLFq021776@bagheera.jungle.bt.co.uk> (raw)
In-Reply-To: <655C07320163294895BBADA28372AF5D14C5DF@FR712WXCHMBA15.zeu. alcatel-lucent.com>

Naeem,

You don't need to go through a BDP calculation if calibrating queue 
length in time units - you can take the burst size directly as the 
RTT (otherwise you multiply RTT by link bandwidth to get BDP, then 
just divide again by link bandwidth to get burst size in time).

It's important to use an RTT at the high end of the expected range, 
otherwise TCP flows with significantly higher RTT can get v poor performance.

Having to pick a compromise RTT value is not ideal, because for 
public Internet you will typically need to assume a worst-case RTT of 
transcontinental proportions (c.200ms), which is why it is 
recommended to set 'interval' to 100ms in 'codel' and 'max_burst' to 
100ms in PIE.

However, most flows nowadays terminate at a CDN with RTT ~20ms. So, 
having configured the AQM to allow for 100ms RTT, every time the 
queue fills from idle, it will be delaying any loss signals for about 
5 CDN-RTTs.

It's an even tougher compromise if your AQM is within your host, 
which has to support the full range of RTTs from 200ms 
transcontinental to <2ms across your LAN (whether campus, enterprise 
or home - e.g. with your media server). Then, you have to configure 
the AQM to absorb 100ms bursts, so it will delay signals to LAN flows 
for ~50 of their RTTs. In this time, a multi-round-trip LAN TCP will 
have pushed the queue into tail-drop, long before the AQM has responded.

The notion of a 'good burst' is only necessary if using drop as the 
signal though. For ECN-capable packets, we have been experimenting 
with shifting absorbtion of bursts from the network to L4 in the host 
(which knows its own RTT). We don't wait at all for a burst to 
persist before sending ECN signals, and then smooth out RTT-length 
bursts of ECN signals in the congestion avoidance algorithm in the 
transport. Also, during slow-start the transport doesn't need to 
smooth out the bursts at all, so it gets the signal within 1RTT and 
it can respond immediately.

See
<http://www.ietf.org/proceedings/88/slides/slides-88-tsvwg-20.pdf>
and the thread that has just been discussing this:
"[aqm] Text for aqm-recommendation on independent ECN config"

Bob

At 15:16 15/12/2013, Scharf, Michael (Michael) wrote:
>There are ongoing discussions in the IETF TCPM working group on 
>sender-side pacing:
>
>http://www.ietf.org/mail-archive/web/tcpm/current/msg08167.html
>
>http://www.ietf.org/proceedings/88/slides/slides-88-tcpm-9.pdf
>
>Insight or contributions from further implementers would be highly welcome.
>
>Michael
>
>
>________________________________________
>Von: bloat-bounces@lists.bufferbloat.net 
>[bloat-bounces@lists.bufferbloat.net]&quot; im Auftrag von 
>&quot;Jonathan Morton [chromatix99@gmail.com]
>Gesendet: Sonntag, 15. Dezember 2013 13:26
>An: Naeem Khademi
>Cc: bloat Mainlinglist
>Betreff: Re: [Bloat] What is a good burst? -- AQM evaluation guidelines
>
>On 15 Dec, 2013, at 7:35 am, Naeem Khademi wrote:
>
> > the question remains: "what is a good burst (size) that AQMs should allow?"
>
>The ideal size of a TCP congestion window - which limits the size of 
>a burst on a TCP flow - is equal to the natural bandwidth-delay 
>product for the flow.  That involves the available bandwidth and the 
>natural RTT delay - ie. without added queueing delay.
>
>Codel operates on this basis, making an assumption about typical RTT 
>delays, and permitting queue residency to temporarily rise to that 
>value without initiating marking operations.  A larger burst would 
>be evidence of a congestion window that is too large, or an overall 
>sending rate that exceeds the bandwidth at the link the codel queue 
>controls.  A persistent queue is always taken as evidence of the latter.
>
>In a datacentre or on a LAN, natural RTT delays are much shorter 
>(microseconds) than on the general Internet (milliseconds) - 
>conversely, available bandwidth is typically much higher (Gbps vs. 
>Mbps).  The two factors approximately cancel out, so the 
>bandwidth-delay product remains roughly the same in typical cases - 
>although, of course, atypical cases such as satellite links (seconds 
>of latency) and major backbones (extreme aggregate bandwidth and 
>Internet-scale delays) also exist.  However, RTT is more consistent 
>between installations than bandwidth is (factor of ten difference in 
>typical range of ADSL link speeds, factor of a hundred in WiFi), so 
>Codel uses a time basis rather than a byte-count basis for 
>regulation, and is by default tuned for typical overland Internet latencies.
>
>Fq_codel, as with other FQ-type qdiscs, tends to improve pacing when 
>multiple flows are present, by interleaving packets from different 
>queued bursts.  Pacing is the general absence of bursts, and can be 
>implemented at source by a TCP sender that spreads packets within a 
>congestion window through an interval of time corresponding to the 
>measured RTT.  AFAIK, very few TCP implementations actually do this, 
>probably due to a desire to avoid interrupt overheads (the CPU would 
>have to be woken up by a timer for each packet).  It strikes me as 
>feasible for NIC hardware to take on some of this burden.
>
>  - Jonathan Morton
>
>_______________________________________________
>Bloat mailing list
>Bloat@lists.bufferbloat.net
>https://lists.bufferbloat.net/listinfo/bloat
>_______________________________________________
>Bloat mailing list
>Bloat@lists.bufferbloat.net
>https://lists.bufferbloat.net/listinfo/bloat

________________________________________________________________
Bob Briscoe,                                                  BT

next prev parent reply	other threads:[~2013-12-15 20:56 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-15  5:35 Naeem Khademi
2013-12-15 12:26 ` Jonathan Morton
2013-12-15 15:16   ` Scharf, Michael (Michael)
     [not found]     ` <655C07320163294895BBADA28372AF5D14C5DF@FR712WXCHMBA15.zeu. alcatel-lucent.com>
2013-12-15 20:56       ` Bob Briscoe [this message]
2013-12-15 18:56 ` [Bloat] [aqm] " Curtis Villamizar
2014-01-02  6:31   ` Fred Baker (fred)
2014-01-03 18:17     ` [Bloat] [e2e] " dpreed
2014-01-30 19:27       ` [Bloat] [aqm] [e2e] " Dave Taht
2013-12-15 21:42 ` [Bloat] [aqm] " Fred Baker (fred)
2013-12-15 22:57   ` [Bloat] [e2e] " Bob Briscoe
2013-12-16  7:34     ` Fred Baker (fred)
2013-12-16 13:47       ` Naeem Khademi
2013-12-16 14:05         ` Naeem Khademi
2013-12-16 17:30           ` Fred Baker (fred)
2013-12-16 14:28         ` Jonathan Morton
2013-12-16 14:50           ` Steinar H. Gunderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201312152056.rBFKuLFq021776@bagheera.jungle.bt.co.uk \
    --to=bob.briscoe@bt.com \
    --cc=bloat@lists.bufferbloat.net \
    --cc=naeem.khademi@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox