[Bloat] Re: [Cake] Re: XDP2 is here - from one and only Tom Herbert (almost to the date, 10 years after XDP was released)

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

From: Frantisek Borsik <frantisek.borsik@gmail.com>
To: Tom Herbert <tom@herbertland.com>
Cc: stephen@networkplumber.org,
	BeckW--- via Bloat <bloat@lists.bufferbloat.net>,
	BeckW@telekom.de, dpreed@deepplum.com,
	cake@lists.bufferbloat.net, codel@lists.bufferbloat.net,
	rpm@lists.bufferbloat.net
Subject: [Bloat] Re: [Cake] Re: XDP2 is here - from one and only Tom Herbert (almost to the date, 10 years after XDP was released)
Date: Tue, 16 Sep 2025 00:26:19 +0200	[thread overview]
Message-ID: <CAJUtOOh81gJBcrA8C=C9AHQhuRRDy5KaoLptKZ9_y4+SxW2T_w@mail.gmail.com> (raw)
In-Reply-To: <CALx6S35SqVqnPf+AgvkSW4f+3Kv4kh7HqzATMJRBP7rgLUZOXw@mail.gmail.com>

Fresh from Tom's oven:
https://medium.com/@tom_84912/programming-a-parser-in-xdp2-is-as-easy-as-pie-8f26c8b3e704


All the best,

Frank

Frantisek (Frank) Borsik


*In loving memory of Dave Täht: *1965-2025

https://libreqos.io/2025/04/01/in-loving-memory-of-dave/


https://www.linkedin.com/in/frantisekborsik

Signal, Telegram, WhatsApp: +421919416714

iMessage, mobile: +420775230885

Skype: casioa5302ca

frantisek.borsik@gmail.com


On Mon, Sep 15, 2025 at 8:35 PM Tom Herbert <tom@herbertland.com> wrote:

> On Mon, Sep 15, 2025 at 11:07 AM Frantisek Borsik
> <frantisek.borsik@gmail.com> wrote:
> >
> >
> > "There were a few NIC's that offloaded eBPF but they never really went
> mainstream."
> >
> > And even then, they were doing only 40 Gbps, like https://netronome.com
> and didn't even supported full eBPF...
> >
> > They only support a pretty small subset of eBPF (in particular they
> don't support the LPM map type, which was our biggest performance pain
> point), and have a pretty cool user replaceable firmware system. They also
> don't have the higher speeds - above 40 Gbps - where the offloading would
> be most useful."
>
> Yeah, the attempts at offloading eBPF were doomed to fail. It's a
> restricted model, lacks parallelism, doesn't support inline
> accelerators, and requires the eBPF VM to make it no-staters. DPDK
> would fail as well. The kernel/host environment and hardware
> environments are quite different. If we try to force the hardware to
> look like the host to make eBPF or DPDK portable then we'll lose the
> performance advantages of running in the hardware. We need a model
> that allows the software to adapt to HW, not the other way around (of
> course, in a perfect world we'd do software/hardware codesign from the
> get-go).
>
> >
> > Btw, Tom will be at FLOSS Weekly tomorrow (Tuesday), 12:20 EDT / 11:20
> CDT / 10:20 MDT / 9:20 PDT
>
> Can't wait!
>
> >
> > https://www.youtube.com/live/OBW5twvmHOI
> >
> >
> > All the best,
> >
> > Frank
> >
> > Frantisek (Frank) Borsik
> >
> >
> > In loving memory of Dave Täht: 1965-2025
> >
> > https://libreqos.io/2025/04/01/in-loving-memory-of-dave/
> >
> >
> > https://www.linkedin.com/in/frantisekborsik
> >
> > Signal, Telegram, WhatsApp: +421919416714
> >
> > iMessage, mobile: +420775230885
> >
> > Skype: casioa5302ca
> >
> > frantisek.borsik@gmail.com
> >
> >
> >
> > On Mon, Sep 15, 2025 at 5:16 PM Stephen Hemminger <
> stephen@networkplumber.org> wrote:
> >>
> >> On Mon, 15 Sep 2025 08:39:48 +0000
> >> BeckW--- via Bloat <bloat@lists.bufferbloat.net> wrote:
> >>
> >> > Programming networking hardware is a bit like programming 8 bit
> computers int the 1980s, the hardware is often too limited and varied to
> support useful abstractions. This is also true for CPU-based networking
> once you get into the >10 Gbps realm, when caching and pipelining
> architectures become relevant. Writing a network protocol compiler that
> produces efficient code for different NICs and different CPUs is a daunting
> task. And unlike with 8 bit computers, there are no simple metrics ('you
> need at least 32kb RAM to run this code' vs 'this NIC supports 4k queues
> with PIE, Codel', 'this CPU has 20 Mbyte of Intel SmartCache').
> >>
> >> Linux kernel still lacks an easy way to setup many features in Smart
> NIC's. DPDK has rte_flow which allows direct
> >> access to hardware flow processing. But DPDK lacks any reasonable form
> of shaper control.
> >>
> >> > Ebpf is very close to what was described in this 1995 exokernel
> paper(
> https://pdos.csail.mit.edu/6.828/2008/readings/engler95exokernel.pdf).
> The idea of the exokernel was to have easily loadable, verified code in the
> kernel -- eg the security-critical task of assigning a packet to a session
> of a user -- and leave the rest of the protocol -- eg tcp retransmissions
> -- to the user space. AFAIK few people use ebpf like this, but it should be
> possible.
> >> >
> >> > Ebpf manages the abstraction part well, but sacrifices a lot of
> performance -- eg lack of aggressive batching like vpp / fd.io does. With
> DPDK,  you often find out that your nic's hardware or driver doesn't
> support the function that you hoped to use and end up optimizing for a
> particular hardware. Even if driver and hardware support a functionality,
> it may very well be that hardware resources are too limited for your
> particular use case. The abstraction is there, but your code is still
> hardware specific.
> >>
> >> There were a few NIC's that offloaded eBPF but they never really went
> mainstream.
> >>
> >
> >
> >>
> >> > -----Ursprüngliche Nachricht-----
> >> > Von: David P. Reed <dpreed@deepplum.com>
> >> > Gesendet: Samstag, 13. September 2025 22:33
> >> > An: Tom Herbert <tom@herbertland.com>
> >> > Cc: Frantisek Borsik <frantisek.borsik@gmail.com>; Cake List <
> cake@lists.bufferbloat.net>; codel@lists.bufferbloat.net; bloat <
> bloat@lists.bufferbloat.net>; Jeremy Austin via Rpm <
> rpm@lists.bufferbloat.net>
> >> > Betreff: [Bloat] Re: [Cake] Re: XDP2 is here - from one and only Tom
> Herbert (almost to the date, 10 years after XDP was released)
> >> >
> >> >
> >> > Tom -
> >> >
> >> > An architecture-independent network framework independent of the OS
> kernel's peculiarities seems within reach (though a fair bit of work), and
> I think it would be a GOOD THING indeed. IMHO the Linux networking stack in
> the kernel is a horrific mess, and it doesn't have to be.
> >> >
> >> > The reason it doesn't have to be is that there should be no reason it
> cannot run in ring3/userland, just like DPDK. And it should be built using
> "real-time" userland programming techniques. (avoiding the generic linux
> scheduler). The ONLY reason for involving the scheduler would be because
> there aren't enough cores. Linux was designed to be a uniprocessor Unix,
> and that just is no longer true at all. With hyperthreading, too, one need
> never abandon a processor's context in userspace to run some "userland"
> application.
> >> >
> >> > This would rip a huge amount of kernel code out of the kernel. (at
> least 50%, and probably more). THe security issues of all those 3rd party
> network drivers would go away.
> >> >
> >> > And the performance would be much higher for networking.  (running in
> ring 3, especially if you don't do system calls, is no performance penalty,
> and interprocessor communications using shared memory is much lower latency
> than Linux IPC or mutexes).
> >> >
> >> > I like the idea of a compilation based network stack, at a slightly
> higher level than C. eBPF is NOT what I have in mind - it's an interpreter
> with high overhead. The language should support high-performance
> co-routining - shared memory, ideally. I don't thing GC is a good thing.
> Rust might be a good starting point because its memory management is safe.
> >> > To me, some of what the base of DPDK is like is good stuff. However,
> it isn't architecturally neutral.
> >> >
> >> > To me, the network stack should not be entangled with interrupt
> handling at all. "polling" is far more performant under load. The only use
> for interrupts is when the network stack is completely idle. That would be,
> in userland, a "wait for interrupt" call (not a poll). Ideally, on recent
> Intel machines, a userspace version of MONITOR/MWAIT).
> >> >
> >> > Now I know that Linus and his crew are really NOT gonna like this.
> Linus is still thinking like MINIX, a uniprocessor time-sharing system with
> rich OS functions in the kernel and doing "file" reads and writes to
> communicate with the kernel state. But it is a much more modern way to
> think of real-time IO in a modern operating system. (Windows and macOS are
> also Unix-like, uniprocessor monolithic kernel designs).
> >> >
> >> > So, if XDP2 got away from the Linux kernel, it could be great.
> >> > BTW, io_uring, etc. are half-measures. They address getting away from
> interrupts toward polling, but they still make the mistake of keeping huge
> drivers in the kernel.
> >>
> >> DPDK already supports use of XDP as a way to do userspace networking.
> >> It is good generic way to get packets in/out but the dedicated
> userspace drivers allow
> >> for more access to hardware. The XDP abstraction gets in the way of
> little things like programming
> >> VLAN's, etc.
> >>
> >> The tradeoff is userspace networking works great for infrastructure,
> routers, switches, firewalls etc;
> >> but userspace networking for network stacks to applications is hard to
> do, and loses the isolation
> >> that the kernel provides.
> >>
> >> >  > I think it is interesting as a concept. A project I am advising
> has been  > using DPDK very effectively to get rid of the huge path and
> locking delays  > in the current Linux network stack. XDP2 could be
> supported in a ring3  > (user) address space, achieving a similar result.
> >> > HI David,
> >> > The idea is you could write the code in XDP2 and it would be compiled
> to DPDK or eBPF and the compiler would handle the optimizations.
> >> >  >
> >> >  >
> >> >  >
> >> >  > But I don't think XDP2 is going that direction - so it may be
> stuckinto  > the mess of kernel space networking. Adding eBPF only has made
> this more of  > a mess, by the way (and adding a new "compiler" that needs
> to be veriried  > as safe for the kernel).
> >> > Think of XDP2 as the generalization of XDP to go beyond just the
> kernel. The idea is that the user writes their datapath code once and they
> compile it to run in whatever targets they have-- DPDK, P4, other
> programmable hardware, and yes XDP/eBPF. It's really not limited to kernel
> networking.
> >> > As for the name XDP2, when we created XDP, eXpress DataPath, my
> vision was that it would be implementation agnostic. eBPF was the first
> instantiation for practicality, but now ten years later I think we can
> realize the initial vision.
> >> > Tom
> >>
> >>
> >> At this point, different network architectures get focused at different
> use cases.
> >> The days of the one-size-fits-all networking of BSD Unix is dead.
>

next prev parent reply	other threads:[~2025-09-15 22:27 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-09 10:32 [Bloat] " Frantisek Borsik
2025-09-09 20:25 ` [Bloat] Re: [Cake] " David P. Reed
2025-09-09 21:02   ` Frantisek Borsik
2025-09-09 21:36     ` [Bloat] Re: [Cake] " Tom Herbert
2025-09-10  8:54       ` BeckW
2025-09-10 13:59         ` Tom Herbert
2025-09-10 14:06           ` Tom Herbert
2025-09-13 20:33       ` David P. Reed
2025-09-13 20:58         ` Tom Herbert
2025-09-14 18:00           ` David P. Reed
2025-09-14 18:18             ` David Collier-Brown
2025-09-14 18:38             ` Tom Herbert
2025-09-15  8:39         ` BeckW
2025-09-15 15:16           ` Stephen Hemminger
2025-09-15 18:07             ` Frantisek Borsik
2025-09-15 18:35               ` Tom Herbert
2025-09-15 22:26                 ` Frantisek Borsik [this message]
2025-09-15 23:16                   ` David P. Reed
2025-09-16  0:05                     ` Tom Herbert
     [not found]       ` <FR2PPFEFD18174CA00474D0DC8DBDA3EE00DC0EA@FR2PPFEFD18174C.DEUP281.PROD.OUT LOOK.COM>
     [not found]         ` <FR2PPFEFD18174CA00474D0DC8DBDA3EE00DC0EA@FR2PPFEFD18174C.DEUP281.PROD.OUTLOO K.COM>
2025-09-13 20:35           ` David P. Reed
2025-09-15 23:16   ` [Bloat] Re: [Rpm] Re: [Cake] " Robert McMahon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJUtOOh81gJBcrA8C=C9AHQhuRRDy5KaoLptKZ9_y4+SxW2T_w@mail.gmail.com' \
    --to=frantisek.borsik@gmail.com \
    --cc=BeckW@telekom.de \
    --cc=bloat@lists.bufferbloat.net \
    --cc=cake@lists.bufferbloat.net \
    --cc=codel@lists.bufferbloat.net \
    --cc=dpreed@deepplum.com \
    --cc=rpm@lists.bufferbloat.net \
    --cc=stephen@networkplumber.org \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox