From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 4A7473B29E for ; Sat, 15 Jun 2019 12:57:46 -0400 (EDT) Received: by mail-io1-xd35.google.com with SMTP id d12so4615174iod.5 for ; Sat, 15 Jun 2019 09:57:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=3m5D4eGqsCu/XU1teDbmloY6mJ7co8FvM59wW1pf6AA=; b=KLA79ZCK+HOrr0uvkfHH0aY0Z9yWHan/gxpAiEiBfaaS2CexqNUeXIY0DYC6z8DTIu Meo4YKcpqD5uTsTB23JdHIPClVWHJMJ57Uw8Spqud2XOmNg4VBXRTqtBYff4Y9oC84rS BFIQMB/VvzKw5NrNkXf7DhrmLxU8WD4bi/UJ7+8a1n7X/Vm+DK5Lu9DU4AvvYKfVcbgw rNf3HuOlLY6JmMENU634eWzvygl/U/cOy5uE7lyYoUXK5/CMrF6Ns2EW4S7XISpSZf03 lDlDfVuI66PasQIXQ2OZ+1Op9NPsM8qCNrMFRHxVPSN4S/SEFZnrUn8LyvrAy3cnrX9i NGDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=3m5D4eGqsCu/XU1teDbmloY6mJ7co8FvM59wW1pf6AA=; b=Rs4G2nC7f+E6Rx2CLPy6lHXtu1vuNxupzIDw2KeG/lzQWjheZ42RoKr7m2ui5k1Wpe mvQyLiHh7k70CzV/FG4HWh7ZidjuwmfF3Ya1NdCpqgXMrUIvTVgO0WTH69woPYfiW5kC hqBbAdqXllBMmgqFph4Y00mHZMv+BvGtU74+lfWzOZDjZVHFzLfB6OZXWltH02gn0waC bKwHzZNmlEJjtx4+wUTDn3cWRFQCNqVy4Tbckj2cLsuErfLBHtbV5r9muSW38iIxQNZQ Lhsjw8BYCBb3QxZ4p4PzpttfzZoa57CUtRbdW7Yt/HF6oC+C9cYkhf627VBrjdwlbDTI psMQ== X-Gm-Message-State: APjAAAUlIN7gd16gY4OIVNnVRVtHBFYARrf4QN5UupX4L88OPWAWpOB8 aPwjtTSz7H3OJ7455ZIa7Qerz2+yg7eNOsQCrPplHA== X-Google-Smtp-Source: APXvYqx7zzkckr8kxHU6NRp8+vDFtMRu/VdDV/jrSlJUIJnJE+zQtrvf8n4a6B+1Lcx2AAf51npQQ8tM4CbtQLiYvnI= X-Received: by 2002:a02:b798:: with SMTP id f24mr72192553jam.97.1560617865156; Sat, 15 Jun 2019 09:57:45 -0700 (PDT) MIME-Version: 1.0 From: Dave Taht Date: Sat, 15 Jun 2019 09:57:33 -0700 Message-ID: To: ECN-Sane Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [Ecn-sane] I think a defense of fq_x and co-design of new transports might be good X-BeenThere: ecn-sane@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Discussion of explicit congestion notification's impact on the Internet List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 15 Jun 2019 16:57:46 -0000 it would be a good paper to write. This is a draft of points I'd like to cover, not an attempt at a more formal email, I just needed to get this much out of my system, on the ecn-sane list. # about fq_x fq_x (presently fq_codel, fq_pie, sch_cake) have pretty much the same fq algorithm. It has one new characteristic compared to all the prior FQ ones - truly sparse flows see no queue at all, otherwise the observed queue size is f, where f =3D the number of queue building flows. If you have 3 full size packets queued, you have 3f. No transport currently takes advantage of this fairly tiny difference between "no queue" and "f queue". We use bytes, rather than packets, also, in our calculations as that translates to time. I'm perpetually throwing around a statistic like "95% of all flows never get out of slow start", that most are sender limited, and so on, and thus (especially if paced) get 0 delay all the time in FQ_x, or "0 first packet + pf" for the burst of packets. this is an essential, fine difference in measurement that can be tracked receiver side unique to fq_x. ... where all it takes with a single queue, with AQM on, is one greedy flow, to induce L latency on all flows, which in the case of pie/codel is > 16/5ms - with plenty of jitter until things settle down. ( I wish there was a way to express in a variable that it has a bounded range of some sort, a ~16ms isn't good, >16ms or 16+ms neither ) dualpi retains that >16ms characteristic for normal flows, and a claimed 1ms for dualpi, which is... IMHO simply impossible in a wide range of circumstances, but I'd just as soon try to focus on improving FQ_x and co-designed transports in a more ideal world for a while, on this thread. For purposes of exposition, let's assume that fq_x is the dominant AQM algorithm in the world, the only one with a proven and oft enabled, and *deterministic*, RFC3168 CE response on overload, where a loss is assumed equivalent to a mark. In terms of co-designing a transport for it, a transport can then assume that a CE mark is coming from FQ_x. Knowing that, there are new curves that can be followed in various phases of the evolution of a flow. Abstractly: 0 delay - we have capacity to spare, grow the window "some delay" - we have a queue of "f", and thus a thinner setpoint observab= le. mild jitter between a recent arrival and the rest of the burst (the sparse flow optimization) # Benefits of FQ_x FQ_x is robust against abuse. A single flow cannot overwhelm it. Some level of service is guaranteed for the vast majority of flows (excepting collisions) in the number of flows configured. FQ_x is also robust against different treatments of drop (bbr without ecn) and CE (l4s) FQ_x allows for delay based and hybrid delay based (like BBR) to "just work", without any ecn support at all. The additional support in "x" pushes queue lengths for drop based algorithms back to where the most common TCPs can shift back into classic slow start and congestion avoidance modes, instead of being bound (as they are often today) in rwind, etc. FQ_x is (add more) # Some observations regarding a CE mark Packet loss is a weak signal of a variety of events. A CE mark is a currently a strong signal you are in FQ_x - the odds are good, this will be the event that kicks the transport out of slow start. Now knowing you got a CE mark, gives you a chance to optimize, knowing that your queue length is not a fifo, but relative to "f". In BBR's case in particular, resetting the bandwidth and pacing rate to the lowest recently observed (in the last 100 ms) "RTT - a little" is better than the classic RFC3168 response of halving. One thing that bugs me about RTT based measurements is when the return path is inflated - in FQ_x it's a decent assumption that both sides of the path have FQ, so the ack return path is far less inflated, but in pie/dualpi/codel it certainly can be for a variety of reasons. This is why the rrul test exists. ack thinning does help also. the amount of potential jitter in the return path is enormous, and one benchmark I've not yet seen from anyone on that side. moving sideways: I happen to like (in terms of determinism) an even stronger signal than RFC3168, "loss and mark", where a combination of loss and marks is even more meaningful than either, and thus the sender should back off even harder (or, the receiver pretend it got CE in two different RTTs). when we have queue sizes elsewhere measured in seconds, and a colossal bufferbloat mess in general, anything that moves a link below capacity would be great. The deterministic "loss and mark" feature was in cake until a year or two back but I never got around much to mucking with a transport's interpretation of it. # The SCE concept in addition to that With or without SCE, just that much, just that normal CE signal, is enough to evolve a transport towards more sensitive delay based signaling. It could be added to cubic, for example... Anyway... We have two public implementations of SCE under test - the cake one uses a ramp, the fq_codel_fast one just uses a setpoint where we have a consistently measurable queue (1ms), and that setpoint is different for wifi (1-2 TXOPs) SCE (presently) kicks in almost immediately upon building a queue. Often, immediately! with IW10 at low bandwidths, (without initial spreading, pacing or chirping). There is also the bulkyness of draining the oft-large rx ring and the effects of NAPI interrupt mitigation to deal with - which is usually around 1ms. Thus it is an extremely strong signal both that there is a queue, and that fq_x is present. SCE requires support at the receiver - not the sender - in order to work at all. The receiver can decide what to do with it. My own first experimental preference was to kick tcp out of slow start on receipt of any SCE mark, but afterwards in congestion avoidance as a much more gradual signal, or even ignore it entirely. I'm grumpy enough about IW10 to still consider that, but as the current sch_fq code does indeed pace the next burst, perhaps ignoring SCE on the first few packets of a connection is useful to consider, also. There is plenty of work on all the congestion avoidance mode stuff (reusing nonce sum, accecn, etc), but the key point (for me) was signalling and thinking hard about the fact that fq_x was present and that f governed the behavior of the queues. Knowing this, growth and signalling patterns such as ELR, dctcp etc, can change. # Benefits of SCE * Plenty of stuff to write here that has been written elsewhere * Backward compatible * gradual upgrade * easy change to fq_x * SCE re-enables the possibility of low priority congestion control for background tcp flows --=20 Dave T=C3=A4ht CTO, TekLibre, LLC http://www.teklibre.com Tel: 1-831-205-9740