From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.toke.dk (mail.toke.dk [45.145.95.4]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id EE1713B2A4 for ; Fri, 15 Jan 2021 07:30:32 -0500 (EST) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=toke.dk; s=20161023; t=1610713829; bh=sy9MV3vPGCgQhL/cv4phcjn43M4VmqgxCZAj/Uemf94=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=jciXPwJ7t9dZbkvjES/6tvkDwCNDaVmJDYoBYbxxSeEu8fGmVRzT3BGCdrpW/kjOC 9noqYCuZW60nL2JAi+eBXfAwY2JMeN576tDtcM/1ZycqFD5oMimzItQ+DEOlxM8ke4 qeaYRMpJASUrDHYsS4o5xxBNDOWlzkOvyvOad13/a3aepjHszDDZlHe6Gp9z9BdVZN oxBTlmEvjEQlX6eJU0cnWyRxQoc8/JkJO4RBfVqq4YA325pb9pcokFzoyl2viw9MvD pC+c7sWcTu0xmlpUPnaizfO/g7DR/41Xf9w9l1dXIOYC80UGRe7JlM0PaBXhS5HTgM A19AVy2a1fb+w== To: Robert Chacon Cc: bloat@lists.bufferbloat.net In-Reply-To: References: <87y2gvqp2i.fsf@toke.dk> Date: Fri, 15 Jan 2021 13:30:28 +0100 X-Clacks-Overhead: GNU Terry Pratchett Message-ID: <87o8hqs7q3.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Bloat] Thanks to developers / htb+fq_codel ISP shaper X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Jan 2021 12:30:33 -0000 Robert Chacon writes: >> Cool! What kind of performance are you seeing? The README mentions being >> limited by the BPF hash table size, but can you actually shape 2000 >> customers on one machine? On what kind of hardware and at what rate(s)? > > On our production network our peak throughput is 1.5Gbps from 200 clients, > and it works very well. > We use a simple consumer-class AMD 2700X CPU in production because > utilization of the shaper VM is ~15% at 1.5Gbps load. > Customers get reliably capped within =C2=B12Mbps of their allocated htb/f= q_codel > bandwidth, which is very helpful to control network congestion. > > Here are some graphs from RRUL performed on our test bench hypervisor: > https://raw.githubusercontent.com/rchac/LibreQoS/main/docs/fq_codel_1000_= subs_4G.png > In that example, bandwidth for the "subscriber" client VM was set to 4Gbp= s. > 1000 IPv4 IPs and 1000 IPv6 IPs were in the filter hash table of LibreQoS. > The test bench server has an AMD 3900X running Ubuntu in Proxmox. 4Gbps > utilizes 10% of the VM's 12 cores. Paravirtualized VirtIO network drivers > are used and most offloading types are enabled. > In our setup, VM networking multiqueue isn't enabled (it kept disrupting > traffic flow), so 6Gbps is probably the most it can achieve like this. Our > qdiscs in this VM may be limited to one core because of that. I suspect the issue you had with multiqueue is that it requires per-CPU partitioning on a per-customer base to work well. This is possible to do with XDP, as Jesper demonstrates here: https://github.com/netoptimizer/xdp-cpumap-tc With this it should be possible to scale the hardware queues across multiple CPUs properly, and you should be able to go to much higher rates by just throwing more CPU cores at it. At least on bare metal; not sure if the VM virt-drivers have the needed support yet... -Toke