From: "David P. Reed" <dpreed@deepplum.com>
To: "Tom Herbert" <tom@herbertland.com>
Cc: "Frantisek Borsik" <frantisek.borsik@gmail.com>,
"Cake List" <cake@lists.bufferbloat.net>,
codel@lists.bufferbloat.net,
"bloat" <bloat@lists.bufferbloat.net>,
"Jeremy Austin via Rpm" <rpm@lists.bufferbloat.net>
Subject: [Bloat] Re: [Cake] Re: XDP2 is here - from one and only Tom Herbert (almost to the date, 10 years after XDP was released)
Date: Sat, 13 Sep 2025 16:33:11 -0400 (EDT) [thread overview]
Message-ID: <1757795591.523513612@apps.rackspace.com> (raw)
In-Reply-To: <CALx6S37wDSc=d27twPBm_NHFSWYQUL74oMSyBVwY=z9uNEvJ_g@mail.gmail.com>
Tom -
An architecture-independent network framework independent of the OS kernel's peculiarities seems within reach (though a fair bit of work), and I think it would be a GOOD THING indeed. IMHO the Linux networking stack in the kernel is a horrific mess, and it doesn't have to be.
The reason it doesn't have to be is that there should be no reason it cannot run in ring3/userland, just like DPDK. And it should be built using "real-time" userland programming techniques. (avoiding the generic linux scheduler). The ONLY reason for involving the scheduler would be because there aren't enough cores. Linux was designed to be a uniprocessor Unix, and that just is no longer true at all. With hyperthreading, too, one need never abandon a processor's context in userspace to run some "userland" application.
This would rip a huge amount of kernel code out of the kernel. (at least 50%, and probably more). THe security issues of all those 3rd party network drivers would go away.
And the performance would be much higher for networking. (running in ring 3, especially if you don't do system calls, is no performance penalty, and interprocessor communications using shared memory is much lower latency than Linux IPC or mutexes).
I like the idea of a compilation based network stack, at a slightly higher level than C. eBPF is NOT what I have in mind - it's an interpreter with high overhead. The language should support high-performance co-routining - shared memory, ideally. I don't thing GC is a good thing. Rust might be a good starting point because its memory management is safe.
To me, some of what the base of DPDK is like is good stuff. However, it isn't architecturally neutral.
To me, the network stack should not be entangled with interrupt handling at all. "polling" is far more performant under load. The only use for interrupts is when the network stack is completely idle. That would be, in userland, a "wait for interrupt" call (not a poll). Ideally, on recent Intel machines, a userspace version of MONITOR/MWAIT).
Now I know that Linus and his crew are really NOT gonna like this. Linus is still thinking like MINIX, a uniprocessor time-sharing system with rich OS functions in the kernel and doing "file" reads and writes to communicate with the kernel state. But it is a much more modern way to think of real-time IO in a modern operating system. (Windows and macOS are also Unix-like, uniprocessor monolithic kernel designs).
So, if XDP2 got away from the Linux kernel, it could be great.
BTW, io_uring, etc. are half-measures. They address getting away from interrupts toward polling, but they still make the mistake of keeping huge drivers in the kernel.
On Tuesday, September 9, 2025 17:36, "Tom Herbert" <tom@herbertland.com> said:
On Tue, Sep 9, 2025, 5:03 PM Frantisek Borsik <[ frantisek.borsik@gmail.com ]( mailto:frantisek.borsik@gmail.com )> wrote:Thanks a lot, David.
I have asked Tom if he wants to join us and he should be here to chat with
us now.
All the best,
Frank
Frantisek (Frank) Borsik
*In loving memory of Dave Täht: *1965-2025
[ https://libreqos.io/2025/04/01/in-loving-memory-of-dave/ ]( https://libreqos.io/2025/04/01/in-loving-memory-of-dave/ )
[ https://www.linkedin.com/in/frantisekborsik ]( https://www.linkedin.com/in/frantisekborsik )
Signal, Telegram, WhatsApp: +421919416714
iMessage, mobile: +420775230885
Skype: casioa5302ca
[ frantisek.borsik@gmail.com ]( mailto:frantisek.borsik@gmail.com )
On Tue, Sep 9, 2025 at 10:25 PM David P. Reed <[ dpreed@deepplum.com ]( mailto:dpreed@deepplum.com )> wrote:
> Hi Frank -
>
>
>
> I think it is interesting as a concept. A project I am advising has been
> using DPDK very effectively to get rid of the huge path and locking delays
> in the current Linux network stack. XDP2 could be supported in a ring3
> (user) address space, achieving a similar result.
HI David,
The idea is you could write the code in XDP2 and it would be compiled to DPDK or eBPF and the compiler would handle the optimizations.
>
>
>
> But I don't think XDP2 is going that direction - so it may be stuckinto
> the mess of kernel space networking. Adding eBPF only has made this more of
> a mess, by the way (and adding a new "compiler" that needs to be veriried
> as safe for the kernel).
Think of XDP2 as the generalization of XDP to go beyond just the kernel. The idea is that the user writes their datapath code once and they compile it to run in whatever targets they have-- DPDK, P4, other programmable hardware, and yes XDP/eBPF. It's really not limited to kernel networking.
As for the name XDP2, when we created XDP, eXpress DataPath, my vision was that it would be implementation agnostic. eBPF was the first instantiation for practicality, but now ten years later I think we can realize the initial vision.
Tom
>
> I will be watching how this evolves.
>
>
>
> David
>
>
>
> On Tuesday, September 9, 2025 06:32, "Frantisek Borsik" <
> [ frantisek.borsik@gmail.com ]( mailto:frantisek.borsik@gmail.com )> said:
>
> > Hello to all,
> >
> > Looks interesting:
> >
> [ https://medium.com/@tom_84912/xdp2-this-changes-everything-at-least-for-ai-ml-infrastructure-850c1ba82771 ]( https://medium.com/@tom_84912/xdp2-this-changes-everything-at-least-for-ai-ml-infrastructure-850c1ba82771 )
> >
> >
> > All the best,
> >
> > Frank
> >
> > Frantisek (Frank) Borsik
> >
> >
> > *In loving memory of Dave Täht: *1965-2025
> >
> > [ https://libreqos.io/2025/04/01/in-loving-memory-of-dave/ ]( https://libreqos.io/2025/04/01/in-loving-memory-of-dave/ )
> >
> >
> > [ https://www.linkedin.com/in/frantisekborsik ]( https://www.linkedin.com/in/frantisekborsik )
> >
> > Signal, Telegram, WhatsApp: +421919416714
> >
> > iMessage, mobile: +420775230885
> >
> > Skype: casioa5302ca
> >
> > [ frantisek.borsik@gmail.com ]( mailto:frantisek.borsik@gmail.com )
> > _______________________________________________
> > Cake mailing list -- [ cake@lists.bufferbloat.net ]( mailto:cake@lists.bufferbloat.net )
> > To unsubscribe send an email to [ cake-leave@lists.bufferbloat.net ]( mailto:cake-leave@lists.bufferbloat.net )
> >
>
_______________________________________________
Cake mailing list -- [ cake@lists.bufferbloat.net ]( mailto:cake@lists.bufferbloat.net )
To unsubscribe send an email to [ cake-leave@lists.bufferbloat.net ]( mailto:cake-leave@lists.bufferbloat.net )
next prev parent reply other threads:[~2025-09-13 20:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-09 10:32 [Bloat] " Frantisek Borsik
2025-09-09 20:25 ` [Bloat] Re: [Cake] " David P. Reed
2025-09-09 21:02 ` Frantisek Borsik
2025-09-09 21:36 ` [Bloat] Re: [Cake] " Tom Herbert
2025-09-10 8:54 ` BeckW
2025-09-10 13:59 ` Tom Herbert
2025-09-10 14:06 ` Tom Herbert
2025-09-13 20:33 ` David P. Reed [this message]
[not found] ` <CALx6S34SYbYhNVHgGJP6+aGegiABy3KM4Ugx3yTLiye3hbAtrQ@mail.gmail.com>
2025-09-14 18:00 ` David P. Reed
2025-09-14 18:18 ` David Collier-Brown
[not found] ` <FR2PPFEFD18174CA00474D0DC8DBDA3EE00DC0EA@FR2PPFEFD18174C.DEUP281.PROD.OUT LOOK.COM>
[not found] ` <FR2PPFEFD18174CA00474D0DC8DBDA3EE00DC0EA@FR2PPFEFD18174C.DEUP281.PROD.OUTLOO K.COM>
2025-09-13 20:35 ` David P. Reed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1757795591.523513612@apps.rackspace.com \
--to=dpreed@deepplum.com \
--cc=bloat@lists.bufferbloat.net \
--cc=cake@lists.bufferbloat.net \
--cc=codel@lists.bufferbloat.net \
--cc=frantisek.borsik@gmail.com \
--cc=rpm@lists.bufferbloat.net \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox