From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 1D9163B2A4; Mon, 4 Dec 2017 05:57:03 -0500 (EST) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4953D7EA9A; Mon, 4 Dec 2017 10:57:02 +0000 (UTC) Received: from localhost (ovpn-200-36.brq.redhat.com [10.40.200.36]) by smtp.corp.redhat.com (Postfix) with ESMTP id 92EB362500; Mon, 4 Dec 2017 10:56:53 +0000 (UTC) Date: Mon, 4 Dec 2017 11:56:51 +0100 From: Jesper Dangaard Brouer To: Dave Taht Cc: Joel =?UTF-8?B?V2lyxIFtdQ==?= Pauling , bloat@lists.bufferbloat.net, "cerowrt-devel@lists.bufferbloat.net" , brouer@redhat.com, Tariq Toukan , David Ahern , Christina Jacob , "netdev@vger.kernel.org" Message-ID: <20171204110923.3a213986@redhat.com> In-Reply-To: <87bmjff7l6.fsf_-_@nemesis.taht.net> References: <92906bd8-7bad-945d-83c8-a2f9598aac2c@lackof.org> <87bmjff7l6.fsf_-_@nemesis.taht.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Mon, 04 Dec 2017 10:57:02 +0000 (UTC) Subject: [Cerowrt-devel] Linux network is damn fast, need more use XDP (Was: [Bloat] DC behaviors today) X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Dec 2017 10:57:03 -0000 On Sun, 03 Dec 2017 20:19:33 -0800 Dave Taht wrote: > Changing the topic, adding bloat. Adding netdev, and also adjust the topic to be a rant on that the Linux kernel network stack is actually damn fast, and if you need something faster then XDP can solved your needs... > Joel Wir=C4=81mu Pauling writes: >=20 > > Just from a Telco/Industry perspective slant. > > > > Everything in DC has moved to SFP28 interfaces at 25Gbit as the server > > port of interconnect. Everything TOR wise is now QSFP28 - 100Gbit. > > Mellanox X5 cards are the current hotness, and their offload > > enhancements (ASAP2 - which is sorta like DPDK on steroids) allows for > > OVS flow rules programming into the card. We have a lot of customers > > chomping at the bit for that feature (disclaimer I work for Nuage > > Networks, and we are working on enhanced OVS to do just that) for NFV > > workloads. =20 >=20 > What Jesper's been working on for ages has been to try and get linux's > PPS up for small packets, which last I heard was hovering at about > 4Gbits. I hope you made a typo here Dave, the normal Linux kernel is definitely way beyond 4Gbit/s, you must have misunderstood something, maybe you meant 40Gbit/s? (which is also too low) Scaling up to more CPUs and TCP-stream, Tariq[1] and I have showed the Linux kernel network stack scales to 94Gbit/s (linerate minus overhead). But when the drivers page-recycler fails, we hit bottlenecks in the page-allocator, that cause negative scaling to around 43Gbit/s. [1] http://lkml.kernel.org/r/cef85936-10b2-5d76-9f97-cb03b418fd94@mellanox.= com Linux have for a _long_ time been doing 10Gbit/s TCP-stream easily, on a SINGLE CPU. This is mostly thanks to TSO/GRO aggregating packets, but last couple of years the network stack have been optimized (with UDP workloads), and as a result we can do 10G without TSO/GRO on a single-CPU. This is "only" 812Kpps with MTU size frames. It is important to NOTICE that I'm mostly talking about SINGLE-CPU performance. But the Linux kernel scales very well to more CPUs, and you can scale this up, although we are starting to hit scalability issues in MM-land[1]. I've also demonstrated that netdev-community have optimized the kernels per-CPU processing power to around 2Mpps. What does this really mean... well with MTU size packets 812Kpps was 10Gbit/s, thus 25Gbit/s should be around 2Mpps.... That implies Linux can do 25Gbit/s on a single CPU without GRO (MTU size frames). Do you need more I ask? =20 > The route table lookup also really expensive on the main cpu. Well, it used-to-be very expensive. Vincent Bernat wrote some excellent blogposts[2][3] on the recent improvements over kernel versions, and gave due credit to people involved. [2] https://vincent.bernat.im/en/blog/2017-performance-progression-ipv4-rou= te-lookup-linux [3] https://vincent.bernat.im/en/blog/2017-performance-progression-ipv6-rou= te-lookup-linux He measured around 25 to 35 nanosec cost of route lookups. My own recent measurements were 36.9 ns cost of fib_table_lookup. > Does this stuff offload the route table lookup also? If you have not heard, the netdev-community have worked on something called XDP (eXpress Data Path). This is a new layer in the network stack, that basically operates a the same "layer"/level as DPDK. Thus, surprise we get the same performance numbers as DPDK. E.g. I can do 13.4 Mpps forwarding with ixgbe on a single CPU (more CPUs=3D14.6Mps) We can actually use XDP for (software) offloading the Linux routing table. There are two methods we are experimenting with: (1) externally monitor route changes from userspace and update BPF-maps to reflect this. That approach is already accepted upstream[4][5]. I'm measuring 9,513,746 pps per CPU with that approach. (2) add a bpf helper to simply call fib_table_lookup() from the XDP hook. This is still experimental patches (credit to David Ahern), and I've measured 9,350,160 pps with this approach in a single CPU. Using more CPUs we hit 14.6Mpps (only used 3 CPUs in that test) [4] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ip= v4_user.c [5] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ip= v4_kern.c --=20 Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer