From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd31.google.com (mail-io1-xd31.google.com [IPv6:2607:f8b0:4864:20::d31]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 3B67B3B29D for ; Sat, 23 Jan 2021 18:19:39 -0500 (EST) Received: by mail-io1-xd31.google.com with SMTP id y19so19102054iov.2 for ; Sat, 23 Jan 2021 15:19:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=h5xqb4MG3Sz5W5E3bUt6IyFvKahHLU3VLlZth8l32Vw=; b=HUalfiiu0XdatQRt5pQEVqkgdiSbQU46Qo0MRbBJLS5leIDypd5dA3IQ4zkn8cjniW evEXIjOjzN4ucqTb0cZsP4WSk+HZhq9oRQjsVDQSPkZ8Uz9NBxxth9vMDHb5KQ5GgriN TojvLGPxy5hjvdliB9UBRdu8Iu7i7i0Rc4Nc/KsG4RV6W8MOJJP/yPl1MH3lSvxp9/op zsoc2fmRqkJgFRtEZrCpBiRwbDpS4pnbciwLxDU1OeThBrCT9785/JkM6AoYJVIR6kQf B8KvpuvX1mwH6Z2Kyqg8qzmyVvb5EPUuY4fHmXS0bkhlM328R73fwlhh06/bUPHokUau XeHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=h5xqb4MG3Sz5W5E3bUt6IyFvKahHLU3VLlZth8l32Vw=; b=D76FrWBisTNWhw7Bh9SH/D2bf/m2BhOGUfBx9jRqfsVdT9wlHWYD9l3WnnkqKr1psI CnfZX3oCdR53ik8rluqliqTvyX2Qr1wrSq0JmtIp1qwJj6q4tZKHQUPADSO4ydjq+RSe e0tEvZsglblBvNK8ctMWgo7RmZaSa6PsgrieX5sUeQlTnjEdSeSe9LjHMN2o42bqEaSc cFEQi/eLBOKdZ7hozX8ebzqbw7LWdTS9RZoz5dGsqXdzMW16jd7wqsm5IdV1mGHDl9aT VrqpJMobMmgyGpfhV+SXQpTjxBh+LHbpspBwMjZuzOddWBBrfI1GnMvhkwNw7BDijggj 7RWw== X-Gm-Message-State: AOAM533YbC5z3MyjaSTDIIybUEUiPnmZ9CAkgxzAneCHT57NetLYNxM1 1czDHfaxz2QX0In10rZma1zTD2IAftIvhWUyw8w= X-Google-Smtp-Source: ABdhPJzcFmNYZDxQa+eEGL2dFDL8OkBdgNG9nC/+aaCfLJv8nKNhLaLsy8HbQy1s2TNylUvO9M/Yw8LqNn2+WHFcVrM= X-Received: by 2002:a05:6e02:cd2:: with SMTP id c18mr47014ilj.249.1611443978482; Sat, 23 Jan 2021 15:19:38 -0800 (PST) MIME-Version: 1.0 References: <932357EB-614C-4F74-925C-A1D6FB5F3AD2@apple.com> In-Reply-To: <932357EB-614C-4F74-925C-A1D6FB5F3AD2@apple.com> From: Dave Taht Date: Sat, 23 Jan 2021 15:19:27 -0800 Message-ID: To: Stuart Cheshire Cc: bloat Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Bloat] UniFi Dream Machine Pro X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Jan 2021 23:19:39 -0000 On Fri, Jan 22, 2021 at 11:43 AM Stuart Cheshire wrote= : > > On 20 Jan 2021, at 07:55, Dave Taht wrote: > > > This review, highly recommending this router on the high end > > > > https://www.increasebroadbandspeed.co.uk/best-router-2020 > > > > also states that the sqm implementation has been dumbed down significan= tly and can only shape 800Mbit inbound. Long ago we did a backport of cake = to the other ubnt routers mentioned in the review, has anyone tackled this = one? It's nice to see the "godfadder" of our effort back again here. I do re-read periodically http://www.stuartcheshire.org/rants/latency.html At the price of perhaps over-lecturing for a wider audience. > According to the UniFi Dream Machine Pro data sheet, it has a 1.7 GHz qua= d-core ARM Cortex-A57 processor and achieves the following throughput numbe= rs (downlink direction): > > 8.0 Gb/s with Deep Packet Inspection I'm always very dubious of these kind of numbers against anything but single large, bulk flows. Also if the fast path is not entirely offloaded, performance goes to hell. > 3.5 Gb/s with DPI + Intrusion Detection > 0.8 Gb/s with IPsec VPN Especially here, also. I should also note that the rapidly deploying wireguard vpn outperforms ipsec in just about every way... in software. > > > > Is implementing CoDel queueing really 10x more burden than running =E2=80= =9CUbiquiti=E2=80=99s proprietary Deep Packet Inspection (DPI) engine=E2=80= =9D? Is CoDel 4x more burden than Ubiquiti=E2=80=99s IDS (Intrusion Detecti= on System) and IPS (Intrusion Prevention System)? These questions, given that the actual fq-codel overhead is nearly immeasurable, and the code complexity much less than these, are the makings of a very good rant targetted at a hw offload maker. :) hashing is generally "free" and in hw, selecting a different queue can be done with single indirect Cake has a lot of ideas that would benefit from actual hw offloads. a 4 or 8 way associative cache is a common IP hw block.... > Is CoDel really the same per-packet cost as doing full IPsec VPN decrypti= on on every packet? I realize the IPsec VPN decryption probably has No. >some assist from crypto-specific ARM instructions or hardware, but even so= , crypto operations are generally considered relatively expensive. If this = device can do 800 Mb/s throughput doing IPsec VPN decryption for every pack= et, it feels like it ought to be able to do a lot better than that just doi= ng CoDel queueing calculations for every packet. yep. the only even semi-costly codel function is an invsqrt which can be implemented in 3k gates or so in hw. In software the newton approximation is nearly immeasurable, and accurate enough. (we went to great lengths to make it accurate in cake to no observable effect) codel is not O(1) A nice thing about fq is that you can be codeling in parallel, or if you are acting on a single queue at a time, short circuit the overload section of codel to give up and deliver a packet if you cannot meet the deadline... or... using a very small fifo queue (say 3k bytes at a gbit), the odds are extremely good (millions? ... A lot. I worked it out once with various assumptions...) that no matter how many packets you need to drop at once, you can still run at line rate at a reasonable clock. bql manages this short fifo in linux, but it tends to be much larger and inflated by tso offloads. You really don't need to drop or mark a lot of packets to achieve good congestion control at high rates. But you know that. :) Most "hw" offloads are actually offloads to a specialized cpu and thus O(1) or not isn't much of a problem there. > Is this just a software polish issue, that could be remedied by doing som= e performance optimization on the CoDel code? Don't know how to make it faster. The linux version is about as optimized as we know how. A p4 implementation exists. As everyone points out later on this thread, it's the software *shaper* (on inbound especially) that is the real burden. TB has been offloaded to hw. The QCA offloaded version has both the tb and fq_codel in there. also hw shaping outbound is vastly cheaper with a programmable completion interrrupt. tell 1Gbit hardware to interrupt at half the rate, bang, it's 500Mbit. (this is implemented in several intel ethernet cards) inbound shaping in sw is another one of the it's the latency stupid things. It's not so much the clock rate, but how fast the cpu can reschedule the thread, a number that doesn't scale much with clock, but with cache and pipeline depth. One reason why I adore the mill cpu design is that it can context switch in 5 clocks, where x86 takes 1000.... > It=E2=80=99s also possible that the information in the review might simpl= y be wrong -- it=E2=80=99s hard to measure throughput numbers in excess of = 1 Gb/s unless you have both a client and a server connected faster than tha= t in order to run the test. In other words, gigabit Ethernet is out, so bot= h client and server would have to be connected via the 10 Gb/s SFP+ ports (= of which the UDM-PRO has just two -- one in the upstream direction, and one= in the downstream direction). Speaking for myself personally, I don=E2=80= =99t have any devices with 10 Gb/s capability, and my Internet connection i= sn=E2=80=99t above 1 Gb/s either, so as long as it can get reasonably close= to 1 Gb/s that=E2=80=99s more than I need (or could use) right now. As most 1Gbit ISP links are still quite overbuffered (over 120ms was what I'd measured with comcast, 60ms on sonic fiber, both a few years back), vs a total induced latency of *0-5ms* with sqm at 800mbit, it generally seems to me that inbound shaping to something close to a gbit is a win for videoconferencing, gaming, vr, jacktrip and other latency sensitive traffic. On a 35Mbit upload, fq_codel or cake are *loafing*. If we were to get around to doing a backport of cake to this device, I'd probably go with htb+fq_codel on the download and cake on the upload, where the ack-filtering and per host/per flow fq of cake would be ideal. (this, btw, is what I do presently) ack-filtering at these asymmetries is a pretty big win for retaining a high download speed with competing upload traffic. https://blog.cerowrt.org/post/ack_filtering/ you cannot do anything even close to a steady gbit down with competing uplink traffic on the cable modems I've tested to date. > Stuart Cheshire > --=20 "For a successful technology, reality must take precedence over public relations, for Mother Nature cannot be fooled" - Richard Feynman dave@taht.net CTO, TekLibre, LLC Tel: 1-831-435-0729