Development issues regarding the cerowrt test router project
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Dave Taht <dave@taht.net>
Cc: "Joel Wirāmu Pauling" <joel@aenertia.net>,
	bloat@lists.bufferbloat.net,
	"cerowrt-devel@lists.bufferbloat.net"
	<cerowrt-devel@lists.bufferbloat.net>,
	brouer@redhat.com, "Tariq Toukan" <tariqt@mellanox.com>,
	"David Ahern" <dsa@cumulusnetworks.com>,
	"Christina Jacob" <christina.jacob.koikara@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: [Cerowrt-devel] Linux network is damn fast, need more use XDP (Was: [Bloat] DC behaviors today)
Date: Mon, 4 Dec 2017 11:56:51 +0100	[thread overview]
Message-ID: <20171204110923.3a213986@redhat.com> (raw)
In-Reply-To: <87bmjff7l6.fsf_-_@nemesis.taht.net>


On Sun, 03 Dec 2017 20:19:33 -0800 Dave Taht <dave@taht.net> wrote:

> Changing the topic, adding bloat.

Adding netdev, and also adjust the topic to be a rant on that the Linux
kernel network stack is actually damn fast, and if you need something
faster then XDP can solved your needs...

> Joel Wirāmu Pauling <joel@aenertia.net> writes:
> 
> > Just from a Telco/Industry perspective slant.
> >
> > Everything in DC has moved to SFP28 interfaces at 25Gbit as the server
> > port of interconnect. Everything TOR wise is now QSFP28 - 100Gbit.
> > Mellanox X5 cards are the current hotness, and their offload
> > enhancements (ASAP2 - which is sorta like DPDK on steroids) allows for
> > OVS flow rules programming into the card. We have a lot of customers
> > chomping at the bit for that feature (disclaimer I work for Nuage
> > Networks, and we are working on enhanced OVS to do just that) for NFV
> > workloads.  
> 
> What Jesper's been working on for ages has been to try and get linux's
> PPS up for small packets, which last I heard was hovering at about
> 4Gbits.

I hope you made a typo here Dave, the normal Linux kernel is definitely
way beyond 4Gbit/s, you must have misunderstood something, maybe you
meant 40Gbit/s? (which is also too low)

Scaling up to more CPUs and TCP-stream, Tariq[1] and I have showed the
Linux kernel network stack scales to 94Gbit/s (linerate minus overhead).
But when the drivers page-recycler fails, we hit bottlenecks in the
page-allocator, that cause negative scaling to around 43Gbit/s.

[1] http://lkml.kernel.org/r/cef85936-10b2-5d76-9f97-cb03b418fd94@mellanox.com

Linux have for a _long_ time been doing 10Gbit/s TCP-stream easily, on
a SINGLE CPU.  This is mostly thanks to TSO/GRO aggregating packets,
but last couple of years the network stack have been optimized (with
UDP workloads), and as a result we can do 10G without TSO/GRO on a
single-CPU.  This is "only" 812Kpps with MTU size frames.

It is important to NOTICE that I'm mostly talking about SINGLE-CPU
performance.  But the Linux kernel scales very well to more CPUs, and
you can scale this up, although we are starting to hit scalability
issues in MM-land[1].

I've also demonstrated that netdev-community have optimized the kernels
per-CPU processing power to around 2Mpps.  What does this really
mean... well with MTU size packets 812Kpps was 10Gbit/s, thus 25Gbit/s
should be around 2Mpps.... That implies Linux can do 25Gbit/s on a
single CPU without GRO (MTU size frames).  Do you need more I ask?

 
> The route table lookup also really expensive on the main cpu.

Well, it used-to-be very expensive. Vincent Bernat wrote some excellent
blogposts[2][3] on the recent improvements over kernel versions, and
gave due credit to people involved.

[2] https://vincent.bernat.im/en/blog/2017-performance-progression-ipv4-route-lookup-linux
[3] https://vincent.bernat.im/en/blog/2017-performance-progression-ipv6-route-lookup-linux

He measured around 25 to 35 nanosec cost of route lookups.  My own
recent measurements were 36.9 ns cost of fib_table_lookup.

> Does this stuff offload the route table lookup also?

If you have not heard, the netdev-community have worked on something
called XDP (eXpress Data Path).  This is a new layer in the network
stack, that basically operates a the same "layer"/level as DPDK.
Thus, surprise we get the same performance numbers as DPDK. E.g. I can
do 13.4 Mpps forwarding with ixgbe on a single CPU (more CPUs=14.6Mps)

We can actually use XDP for (software) offloading the Linux routing
table.  There are two methods we are experimenting with:

(1) externally monitor route changes from userspace and update BPF-maps
to reflect this. That approach is already accepted upstream[4][5].  I'm
measuring 9,513,746 pps per CPU with that approach.

(2) add a bpf helper to simply call fib_table_lookup() from the XDP hook.
This is still experimental patches (credit to David Ahern), and I've
measured 9,350,160 pps with this approach in a single CPU.  Using more
CPUs we hit 14.6Mpps (only used 3 CPUs in that test)


[4] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ipv4_user.c
[5] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ipv4_kern.c

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

  parent reply	other threads:[~2017-12-04 10:57 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-03 17:44 [Cerowrt-devel] quad core arm Dave Taht
2017-12-03 18:18 ` Joel Wirāmu Pauling
2017-12-03 19:18 ` Matt Taggart
2017-12-03 19:55   ` Dave Taht
2017-12-03 20:47     ` Joel Wirāmu Pauling
2017-12-04  4:19       ` [Cerowrt-devel] DC behaviors today Dave Taht
2017-12-04  9:13         ` Mikael Abrahamsson
2017-12-04  9:31           ` Joel Wirāmu Pauling
2017-12-04 10:18             ` Mikael Abrahamsson
2017-12-04 10:27               ` Joel Wirāmu Pauling
2017-12-04 10:43                 ` [Cerowrt-devel] [Bloat] " Pedro Tumusok
2017-12-04 10:47                   ` Joel Wirāmu Pauling
2017-12-04 10:57                     ` Pedro Tumusok
2017-12-04 10:59                       ` Joel Wirāmu Pauling
2017-12-04 12:44                       ` Mikael Abrahamsson
2017-12-04 19:59                         ` dpreed
2017-12-08  7:05                           ` Mikael Abrahamsson
2017-12-12 15:09                             ` Luca Muscariello
2017-12-12 18:36                               ` Dave Taht
2017-12-12 22:53                                 ` dpreed
2017-12-12 23:20                                   ` Jonathan Morton
2017-12-13 10:20                                     ` Mikael Abrahamsson
2017-12-13 10:45                                   ` Luca Muscariello
2017-12-13 15:26                                   ` Neil Davies
2017-12-13 16:41                                     ` Jonathan Morton
2017-12-13 18:08                                       ` dpreed
2017-12-13 19:55                                         ` Neil Davies
2017-12-13 21:06                                           ` Jonathan Morton
2017-12-14  8:22                                       ` Mikael Abrahamsson
2017-12-17 21:37                                         ` Benjamin Cronce
2017-12-18  8:11                                           ` Mikael Abrahamsson
2017-12-17 11:52                                 ` Matthias Tafelmeier
2017-12-18  7:50                                   ` Mikael Abrahamsson
2017-12-19 17:55                                     ` Matthias Tafelmeier
2017-12-27 15:15                                       ` Matthias Tafelmeier
2018-01-20 11:55                                 ` Joel Wirāmu Pauling
2017-12-04 12:41                 ` [Cerowrt-devel] " Mikael Abrahamsson
2017-12-04 10:56         ` Jesper Dangaard Brouer [this message]
2017-12-04 17:00           ` [Cerowrt-devel] [Bloat] Linux network is damn fast, need more use XDP (Was: DC behaviors today) Dave Taht
2017-12-04 20:49             ` Joel Wirāmu Pauling
2017-12-07  8:43             ` Jesper Dangaard Brouer
2017-12-07  8:49             ` Jesper Dangaard Brouer
2017-12-04 17:19           ` Matthias Tafelmeier
2017-12-03 19:49 ` [Cerowrt-devel] quad core arm Dave Taht
2017-12-04  0:19   ` Michael Richardson
2017-12-04 23:18   ` Matt Taggart
2017-12-04  0:11 ` Michael Richardson
2017-12-04  0:34   ` Joel Wirāmu Pauling
2017-12-04 23:40 ` Matt Taggart
2017-12-04 23:58   ` Joel Wirāmu Pauling
2017-12-05 10:13   ` Toke Høiland-Jørgensen
2017-12-06  8:59     ` Dave Taht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/cerowrt-devel.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171204110923.3a213986@redhat.com \
    --to=brouer@redhat.com \
    --cc=bloat@lists.bufferbloat.net \
    --cc=cerowrt-devel@lists.bufferbloat.net \
    --cc=christina.jacob.koikara@gmail.com \
    --cc=dave@taht.net \
    --cc=dsa@cumulusnetworks.com \
    --cc=joel@aenertia.net \
    --cc=netdev@vger.kernel.org \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox