[Codel] Re: [Bloat] Re: [Cake] Re: XDP2 is here - from one and only Tom Herbert (almost to the date, 10 years after XDP was released)

CoDel AQM discussions
 help / color / mirror / Atom feed

From: Frantisek Borsik <frantisek.borsik@gmail.com>
To: stephen@networkplumber.org
Cc: BeckW--- via Bloat <bloat@lists.bufferbloat.net>,
	BeckW@telekom.de, dpreed@deepplum.com, tom@herbertland.com,
	cake@lists.bufferbloat.net, codel@lists.bufferbloat.net,
	rpm@lists.bufferbloat.net
Subject: [Codel] Re: [Bloat] Re: [Cake] Re: XDP2 is here - from one and only Tom Herbert (almost to the date, 10 years after XDP was released)
Date: Mon, 15 Sep 2025 20:07:54 +0200	[thread overview]
Message-ID: <CAJUtOOjbzFEgouvbzNX3owzT3gp_PFonXWQvTvgAAFC1V09P7A@mail.gmail.com> (raw)
In-Reply-To: <20250915081637.2cd0d07c@hermes.local>

"There were a few NIC's that offloaded eBPF but they never really went
mainstream."

And even then, they were doing only 40 Gbps, like https://netronome.com and
didn't even supported full eBPF...

They only support a pretty small subset of eBPF (in particular they don't
support the LPM map type, which was our biggest performance pain point),
and have a pretty cool user replaceable firmware system. They also don't
have the higher speeds - above 40 Gbps - where the offloading would be most
useful."

Btw, Tom will be at FLOSS Weekly tomorrow (Tuesday), 12:20 EDT / 11:20 CDT
/ 10:20 MDT / 9:20 PDT

https://www.youtube.com/live/OBW5twvmHOI


All the best,

Frank

Frantisek (Frank) Borsik


*In loving memory of Dave Täht: *1965-2025

https://libreqos.io/2025/04/01/in-loving-memory-of-dave/


https://www.linkedin.com/in/frantisekborsik

Signal, Telegram, WhatsApp: +421919416714

iMessage, mobile: +420775230885

Skype: casioa5302ca

frantisek.borsik@gmail.com


On Mon, Sep 15, 2025 at 5:16 PM Stephen Hemminger <
stephen@networkplumber.org> wrote:

> On Mon, 15 Sep 2025 08:39:48 +0000
> BeckW--- via Bloat <bloat@lists.bufferbloat.net> wrote:
>
> > Programming networking hardware is a bit like programming 8 bit
> computers int the 1980s, the hardware is often too limited and varied to
> support useful abstractions. This is also true for CPU-based networking
> once you get into the >10 Gbps realm, when caching and pipelining
> architectures become relevant. Writing a network protocol compiler that
> produces efficient code for different NICs and different CPUs is a daunting
> task. And unlike with 8 bit computers, there are no simple metrics ('you
> need at least 32kb RAM to run this code' vs 'this NIC supports 4k queues
> with PIE, Codel', 'this CPU has 20 Mbyte of Intel SmartCache').
>
> Linux kernel still lacks an easy way to setup many features in Smart
> NIC's. DPDK has rte_flow which allows direct
> access to hardware flow processing. But DPDK lacks any reasonable form of
> shaper control.
>
> > Ebpf is very close to what was described in this 1995 exokernel paper(
> https://pdos.csail.mit.edu/6.828/2008/readings/engler95exokernel.pdf).
> The idea of the exokernel was to have easily loadable, verified code in the
> kernel -- eg the security-critical task of assigning a packet to a session
> of a user -- and leave the rest of the protocol -- eg tcp retransmissions
> -- to the user space. AFAIK few people use ebpf like this, but it should be
> possible.
> >
> > Ebpf manages the abstraction part well, but sacrifices a lot of
> performance -- eg lack of aggressive batching like vpp / fd.io does. With
> DPDK,  you often find out that your nic's hardware or driver doesn't
> support the function that you hoped to use and end up optimizing for a
> particular hardware. Even if driver and hardware support a functionality,
> it may very well be that hardware resources are too limited for your
> particular use case. The abstraction is there, but your code is still
> hardware specific.
>
> There were a few NIC's that offloaded eBPF but they never really went
> mainstream.
>



> > -----Ursprüngliche Nachricht-----
> > Von: David P. Reed <dpreed@deepplum.com>
> > Gesendet: Samstag, 13. September 2025 22:33
> > An: Tom Herbert <tom@herbertland.com>
> > Cc: Frantisek Borsik <frantisek.borsik@gmail.com>; Cake List <
> cake@lists.bufferbloat.net>; codel@lists.bufferbloat.net; bloat <
> bloat@lists.bufferbloat.net>; Jeremy Austin via Rpm <
> rpm@lists.bufferbloat.net>
> > Betreff: [Bloat] Re: [Cake] Re: XDP2 is here - from one and only Tom
> Herbert (almost to the date, 10 years after XDP was released)
> >
> >
> > Tom -
> >
> > An architecture-independent network framework independent of the OS
> kernel's peculiarities seems within reach (though a fair bit of work), and
> I think it would be a GOOD THING indeed. IMHO the Linux networking stack in
> the kernel is a horrific mess, and it doesn't have to be.
> >
> > The reason it doesn't have to be is that there should be no reason it
> cannot run in ring3/userland, just like DPDK. And it should be built using
> "real-time" userland programming techniques. (avoiding the generic linux
> scheduler). The ONLY reason for involving the scheduler would be because
> there aren't enough cores. Linux was designed to be a uniprocessor Unix,
> and that just is no longer true at all. With hyperthreading, too, one need
> never abandon a processor's context in userspace to run some "userland"
> application.
> >
> > This would rip a huge amount of kernel code out of the kernel. (at least
> 50%, and probably more). THe security issues of all those 3rd party network
> drivers would go away.
> >
> > And the performance would be much higher for networking.  (running in
> ring 3, especially if you don't do system calls, is no performance penalty,
> and interprocessor communications using shared memory is much lower latency
> than Linux IPC or mutexes).
> >
> > I like the idea of a compilation based network stack, at a slightly
> higher level than C. eBPF is NOT what I have in mind - it's an interpreter
> with high overhead. The language should support high-performance
> co-routining - shared memory, ideally. I don't thing GC is a good thing.
> Rust might be a good starting point because its memory management is safe.
> > To me, some of what the base of DPDK is like is good stuff. However, it
> isn't architecturally neutral.
> >
> > To me, the network stack should not be entangled with interrupt handling
> at all. "polling" is far more performant under load. The only use for
> interrupts is when the network stack is completely idle. That would be, in
> userland, a "wait for interrupt" call (not a poll). Ideally, on recent
> Intel machines, a userspace version of MONITOR/MWAIT).
> >
> > Now I know that Linus and his crew are really NOT gonna like this. Linus
> is still thinking like MINIX, a uniprocessor time-sharing system with rich
> OS functions in the kernel and doing "file" reads and writes to communicate
> with the kernel state. But it is a much more modern way to think of
> real-time IO in a modern operating system. (Windows and macOS are also
> Unix-like, uniprocessor monolithic kernel designs).
> >
> > So, if XDP2 got away from the Linux kernel, it could be great.
> > BTW, io_uring, etc. are half-measures. They address getting away from
> interrupts toward polling, but they still make the mistake of keeping huge
> drivers in the kernel.
>
> DPDK already supports use of XDP as a way to do userspace networking.
> It is good generic way to get packets in/out but the dedicated userspace
> drivers allow
> for more access to hardware. The XDP abstraction gets in the way of little
> things like programming
> VLAN's, etc.
>
> The tradeoff is userspace networking works great for infrastructure,
> routers, switches, firewalls etc;
> but userspace networking for network stacks to applications is hard to do,
> and loses the isolation
> that the kernel provides.
>
> >  > I think it is interesting as a concept. A project I am advising has
> been  > using DPDK very effectively to get rid of the huge path and locking
> delays  > in the current Linux network stack. XDP2 could be supported in a
> ring3  > (user) address space, achieving a similar result.
> > HI David,
> > The idea is you could write the code in XDP2 and it would be compiled to
> DPDK or eBPF and the compiler would handle the optimizations.
> >  >
> >  >
> >  >
> >  > But I don't think XDP2 is going that direction - so it may be
> stuckinto  > the mess of kernel space networking. Adding eBPF only has made
> this more of  > a mess, by the way (and adding a new "compiler" that needs
> to be veriried  > as safe for the kernel).
> > Think of XDP2 as the generalization of XDP to go beyond just the kernel.
> The idea is that the user writes their datapath code once and they compile
> it to run in whatever targets they have-- DPDK, P4, other programmable
> hardware, and yes XDP/eBPF. It's really not limited to kernel networking.
> > As for the name XDP2, when we created XDP, eXpress DataPath, my vision
> was that it would be implementation agnostic. eBPF was the first
> instantiation for practicality, but now ten years later I think we can
> realize the initial vision.
> > Tom
>
>
> At this point, different network architectures get focused at different
> use cases.
> The days of the one-size-fits-all networking of BSD Unix is dead.
>

next prev parent reply	other threads:[~2025-09-15 18:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-09 10:32 [Codel] " Frantisek Borsik
2025-09-09 20:25 ` [Codel] Re: [Cake] " David P. Reed
2025-09-09 21:02   ` Frantisek Borsik
2025-09-09 21:36     ` [Codel] Re: [Cake] " Tom Herbert
2025-09-10  8:54       ` [Codel] Re: [Bloat] " BeckW
2025-09-10 13:59         ` Tom Herbert
2025-09-10 14:06           ` Tom Herbert
2025-09-13 18:33       ` [Codel] " David P. Reed
2025-09-13 20:58         ` Tom Herbert
2025-09-14 18:00           ` David P. Reed
2025-09-14 18:38             ` Tom Herbert
2025-09-15  8:39         ` [Codel] Re: [Bloat] " BeckW
2025-09-15 15:16           ` Stephen Hemminger
2025-09-15 18:07             ` Frantisek Borsik [this message]
     [not found]       ` <FR2PPFEFD18174CA00474D0DC8DBDA3EE00DC0EA@FR2PPFEFD18174C.DEUP281.PROD.OUT LOOK.COM>
     [not found]         ` <FR2PPFEFD18174CA00474D0DC8DBDA3EE00DC0EA@FR2PPFEFD18174C.DEUP281.PROD.OUTLOO K.COM>
2025-09-13 18:35           ` David P. Reed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/codel.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJUtOOjbzFEgouvbzNX3owzT3gp_PFonXWQvTvgAAFC1V09P7A@mail.gmail.com \
    --to=frantisek.borsik@gmail.com \
    --cc=BeckW@telekom.de \
    --cc=bloat@lists.bufferbloat.net \
    --cc=cake@lists.bufferbloat.net \
    --cc=codel@lists.bufferbloat.net \
    --cc=dpreed@deepplum.com \
    --cc=rpm@lists.bufferbloat.net \
    --cc=stephen@networkplumber.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox