From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lb0-x235.google.com (mail-lb0-x235.google.com [IPv6:2a00:1450:4010:c04::235]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 276E921FBBA for ; Mon, 20 Jul 2015 12:35:08 -0700 (PDT) Received: by lblf12 with SMTP id f12so101554356lbl.2 for ; Mon, 20 Jul 2015 12:35:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=pyrBPJdSO+VGLN1l+OHx4NnpLYaCye91Tb0d8MCBBYY=; b=ZiYzRO9pCJLgHdrGaZIP4oCsfnF/FWxhT4URJCUJZB/+KE3iuOTkyvUid7qYi7tbFT 1HJos2SmQ3L9U3sNPmswTbmXTarPwwYAigTtIAGen/mfWP483Rw5E0R6nclxtdxR9VA8 yDPsCn1sPmkezVYj1ny5/ybMBNTrRL5s1zRgGkoKKwjt+enNq5tpuTTalK9PyV5cUL0M 3+8XRgcDHGUoSkkFSqIrVARp5bm+fnEraLdjLPgWnrZyz8B47DLlQiyQatxDdW+py44e fDkGHRYWDhXCcamFhgSSDKB5lTYuWuliAWTOwyueutsVwchgH0nHGkEPEUa8MaOFiHPq RcIg== X-Received: by 10.152.178.229 with SMTP id db5mr15830450lac.55.1437420906321; Mon, 20 Jul 2015 12:35:06 -0700 (PDT) Received: from bass.home.chromatix.fi (188-67-145-250.bb.dnainternet.fi. [188.67.145.250]) by smtp.gmail.com with ESMTPSA id pl1sm3062999lbc.30.2015.07.20.12.35.03 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 20 Jul 2015 12:35:05 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) From: Jonathan Morton In-Reply-To: <55AD2695.8050605@kit.edu> Date: Mon, 20 Jul 2015 22:34:59 +0300 Content-Transfer-Encoding: quoted-printable Message-Id: <07D540E8-1184-4DBD-B372-40C55A485C40@gmail.com> References: <55AD2695.8050605@kit.edu> To: Roland Bless , Anil.Agarwal@viasat.com X-Mailer: Apple Mail (2.2102) Cc: codel@lists.bufferbloat.net, aqm@ietf.org Subject: Re: [Codel] [aqm] Codel's count variable and re-entering dropping state at small time intervals X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 20 Jul 2015 19:35:37 -0000 I can=E2=80=99t speak for Kathy, but as Cake=E2=80=99s author I have = some insight into the changes I=E2=80=99ve made to the associated = version of Codel. > Ronald Bless: > 1. after exiting dropping state, count is usually reset, unless "the > next drop state is entered too close to the previous one=E2=80=9D. In fact, in most cases (ie. where the true RTT is near to or less than = the one assumed by the interval parameter, and the flow is persistent) = the drop state will be re-entered sufficiently soon to prevent the = reset. The backoff behaviour for that case is thus more significant in = practice. The standard backoff behaviour is to reduce count by just 2. This is = obviously wrong in the case where count has already reached a = significant value; with a high drop frequency, count will climb a lot = during a single RTT. But if the elevated drop frequency is necessary = and sufficient to control the flow, we don=E2=80=99t want it to grow = indefinitely. Cake=E2=80=99s version halves the count, reducing the drop frequency by = a factor of sqrt(2) after a pause. This multiplicative backoff prevents = count from growing indefinitely, and allows it to reduce gradually to a = lower value if that becomes appropriate due to reduced network load. Anil's suggestion of continuously reducing count during the non-dropping = state is an interesting alternative which might also make sense. It has = the pleasing property of relating the backoff rate to the length of time = spent outside the dropping state. > 2. is 'count' supposed to be reset or saturated on overflow and what > should be its maximum value (it makes a difference whether you are = using > 16-, 32-, or 64-bit counter)? It=E2=80=99s clear to me that a saturating increment is necessary to = deal with unresponsive flows adequately. Otherwise, the drop frequency = will wrap around to a low value periodically when it needs to be = consistently high. My test case for this involves 50 upstream TCP flows on a narrow link = (say 10Mbit) and a single downstream TCP flow on a wide link (say = 100Mbit). In this configuration, the acks for the downstream flow = constitute an unresponsive (in fact, anti-responsive) upstream flow of = greater throughput demand than the individual upstream TCP flows. = Anti-responsive means that dropping packets actually increases the = network load, because reducing the ack density permits higher throughput = on the downlink, which in turn generates additional acks on the uplink. Since Cake uses very robust flow isolation, simply delivering all the = acks in order, using a fair-share of the bandwidth, would severely limit = the throughput of the downstream flow. However, the standard Codel = behaviour resulted in wildly oscillating throughput on the downstream = flow. Straightforward calculations suggested that the count variable = was probably wrapping. At first, I only put in the modified backoff behaviour mentioned above. = Since the downstream flow was often reaching full throughput, I thought = that occasionally the drop rate was managing to empty the ack queue, = making the backoff behaviour significant. This assumption turned out to = be correct, but the backoff modification was insufficient to completely = correct the behaviour. Only when I changed count to use a saturating = increment instead of a wrapping one did the downstream flow achieve full = throughput consistently. It=E2=80=99s possible that if count was a wider variable (32 instead of = 16 bits) that the backoff modification would have been sufficient to = prevent wrapping in this case. However, using saturating arithmetic is = inexpensive on modern CPUs (many have dedicated instructions which could = be used by highly optimised implementations) and is the most robust = solution. Note that in the absence of robust flow isolation, eg. with plain Codel = or PIE, the upstream TCP flows will back off to make room for all the = acks. This does not require a particularly high drop rate, but it = reduces the upstream goodput significantly. Cake instead achieves high = goodput in both directions in this especially challenging case. > Anil Agarwal: > I have attached the updated document. There are indeed a lot of suggestions here, which I haven=E2=80=99t yet = fully assimilated. I=E2=80=99ll cherry-pick a few that stood out: > When RTT is small compared to the default Interval, CoDel is slow in = reacting and controlling queue size/delay during TCP slow start. > For example, if Interval =3D 100 ms and RTT =3D 20 ms, CoDel will not = drop or mark a packet for the first 100 ms of a new TCP connection; if = link rate =3D 10 Mbps, initial window size =3D 14600 bytes, then during = this period, the TCP cwnd will grow to 104 kbytes; queuing delay will be = 63 ms. Cake=E2=80=99s modified Codel does vary its sensitivity according to how = quickly the queue is filling. With the default parameters, it may = trigger as quickly as 35ms after a large burst arrives, or as slowly as = 200ms if the queue is growing very slowly. It could be improved further in theory, by giving Codel visibility of = the length and drain rate of the queue (and thus the ability to predict = future delay) rather than only observing the exit delay. However, in = the general case the drain rate is variable, so this may be difficult to = make robust. Ultimately, triggering instantly when the queue begins to fill is = optimal to deal with slow-start, which will double its cwnd further = while the congestion signal is in transit, before responding with a = halving. However, an instant trigger is suboptimal for the = congestion-avoidance phase of TCP, which is why Codel always delays its = response. Flow isolation is a good (if imperfect) countermeasure for = the suboptimal slow-start behaviour this causes. My =E2=80=9CExplicit Load Regulation=E2=80=9D proposal, which hasn=E2=80=99= t yet been widely circulated, aims to get closer to optimal behaviour in = both situations. It is an extension to ECN behaviour, giving the option = of more moderate signals than ECN permits, while being easier to deploy = incrementally than DCTCP is. Hopefully I=E2=80=99ll be able to write it = up properly later. > The draft rfc states - =E2=80=9CFor example, there is about an 86% = probability that no more than two of the 100 VoIP sessions will be = involved in any given collision=E2=80=9D > Based on analytical equations for hash collision probabilities (I = derived them independently some time ago), the probability of no = collision =3D 90.78%=E2=80=A6 Cake=E2=80=99s set-associative hash function reduces the probability of = flow collisions further, to negligible values for the above number of = flows. > Alternatively, codel_dequeue() can be re-structured as two routines=E2=80= =A6 Certainly a refactoring would be beneficial for readability. I = haven=E2=80=99t got around to doing it yet. - Jonathan Morton