From: Frantisek Borsik <frantisek.borsik@gmail.com>
To: Tom Herbert <tom@herbertland.com>
Cc: stephen@networkplumber.org,
BeckW--- via Bloat <bloat@lists.bufferbloat.net>,
BeckW@telekom.de, dpreed@deepplum.com,
cake@lists.bufferbloat.net, codel@lists.bufferbloat.net,
rpm@lists.bufferbloat.net
Subject: [Bloat] Re: [Cake] Re: XDP2 is here - from one and only Tom Herbert (almost to the date, 10 years after XDP was released)
Date: Tue, 16 Sep 2025 00:26:19 +0200 [thread overview]
Message-ID: <CAJUtOOh81gJBcrA8C=C9AHQhuRRDy5KaoLptKZ9_y4+SxW2T_w@mail.gmail.com> (raw)
In-Reply-To: <CALx6S35SqVqnPf+AgvkSW4f+3Kv4kh7HqzATMJRBP7rgLUZOXw@mail.gmail.com>
Fresh from Tom's oven:
https://medium.com/@tom_84912/programming-a-parser-in-xdp2-is-as-easy-as-pie-8f26c8b3e704
All the best,
Frank
Frantisek (Frank) Borsik
*In loving memory of Dave Täht: *1965-2025
https://libreqos.io/2025/04/01/in-loving-memory-of-dave/
https://www.linkedin.com/in/frantisekborsik
Signal, Telegram, WhatsApp: +421919416714
iMessage, mobile: +420775230885
Skype: casioa5302ca
frantisek.borsik@gmail.com
On Mon, Sep 15, 2025 at 8:35 PM Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Sep 15, 2025 at 11:07 AM Frantisek Borsik
> <frantisek.borsik@gmail.com> wrote:
> >
> >
> > "There were a few NIC's that offloaded eBPF but they never really went
> mainstream."
> >
> > And even then, they were doing only 40 Gbps, like https://netronome.com
> and didn't even supported full eBPF...
> >
> > They only support a pretty small subset of eBPF (in particular they
> don't support the LPM map type, which was our biggest performance pain
> point), and have a pretty cool user replaceable firmware system. They also
> don't have the higher speeds - above 40 Gbps - where the offloading would
> be most useful."
>
> Yeah, the attempts at offloading eBPF were doomed to fail. It's a
> restricted model, lacks parallelism, doesn't support inline
> accelerators, and requires the eBPF VM to make it no-staters. DPDK
> would fail as well. The kernel/host environment and hardware
> environments are quite different. If we try to force the hardware to
> look like the host to make eBPF or DPDK portable then we'll lose the
> performance advantages of running in the hardware. We need a model
> that allows the software to adapt to HW, not the other way around (of
> course, in a perfect world we'd do software/hardware codesign from the
> get-go).
>
> >
> > Btw, Tom will be at FLOSS Weekly tomorrow (Tuesday), 12:20 EDT / 11:20
> CDT / 10:20 MDT / 9:20 PDT
>
> Can't wait!
>
> >
> > https://www.youtube.com/live/OBW5twvmHOI
> >
> >
> > All the best,
> >
> > Frank
> >
> > Frantisek (Frank) Borsik
> >
> >
> > In loving memory of Dave Täht: 1965-2025
> >
> > https://libreqos.io/2025/04/01/in-loving-memory-of-dave/
> >
> >
> > https://www.linkedin.com/in/frantisekborsik
> >
> > Signal, Telegram, WhatsApp: +421919416714
> >
> > iMessage, mobile: +420775230885
> >
> > Skype: casioa5302ca
> >
> > frantisek.borsik@gmail.com
> >
> >
> >
> > On Mon, Sep 15, 2025 at 5:16 PM Stephen Hemminger <
> stephen@networkplumber.org> wrote:
> >>
> >> On Mon, 15 Sep 2025 08:39:48 +0000
> >> BeckW--- via Bloat <bloat@lists.bufferbloat.net> wrote:
> >>
> >> > Programming networking hardware is a bit like programming 8 bit
> computers int the 1980s, the hardware is often too limited and varied to
> support useful abstractions. This is also true for CPU-based networking
> once you get into the >10 Gbps realm, when caching and pipelining
> architectures become relevant. Writing a network protocol compiler that
> produces efficient code for different NICs and different CPUs is a daunting
> task. And unlike with 8 bit computers, there are no simple metrics ('you
> need at least 32kb RAM to run this code' vs 'this NIC supports 4k queues
> with PIE, Codel', 'this CPU has 20 Mbyte of Intel SmartCache').
> >>
> >> Linux kernel still lacks an easy way to setup many features in Smart
> NIC's. DPDK has rte_flow which allows direct
> >> access to hardware flow processing. But DPDK lacks any reasonable form
> of shaper control.
> >>
> >> > Ebpf is very close to what was described in this 1995 exokernel
> paper(
> https://pdos.csail.mit.edu/6.828/2008/readings/engler95exokernel.pdf).
> The idea of the exokernel was to have easily loadable, verified code in the
> kernel -- eg the security-critical task of assigning a packet to a session
> of a user -- and leave the rest of the protocol -- eg tcp retransmissions
> -- to the user space. AFAIK few people use ebpf like this, but it should be
> possible.
> >> >
> >> > Ebpf manages the abstraction part well, but sacrifices a lot of
> performance -- eg lack of aggressive batching like vpp / fd.io does. With
> DPDK, you often find out that your nic's hardware or driver doesn't
> support the function that you hoped to use and end up optimizing for a
> particular hardware. Even if driver and hardware support a functionality,
> it may very well be that hardware resources are too limited for your
> particular use case. The abstraction is there, but your code is still
> hardware specific.
> >>
> >> There were a few NIC's that offloaded eBPF but they never really went
> mainstream.
> >>
> >
> >
> >>
> >> > -----Ursprüngliche Nachricht-----
> >> > Von: David P. Reed <dpreed@deepplum.com>
> >> > Gesendet: Samstag, 13. September 2025 22:33
> >> > An: Tom Herbert <tom@herbertland.com>
> >> > Cc: Frantisek Borsik <frantisek.borsik@gmail.com>; Cake List <
> cake@lists.bufferbloat.net>; codel@lists.bufferbloat.net; bloat <
> bloat@lists.bufferbloat.net>; Jeremy Austin via Rpm <
> rpm@lists.bufferbloat.net>
> >> > Betreff: [Bloat] Re: [Cake] Re: XDP2 is here - from one and only Tom
> Herbert (almost to the date, 10 years after XDP was released)
> >> >
> >> >
> >> > Tom -
> >> >
> >> > An architecture-independent network framework independent of the OS
> kernel's peculiarities seems within reach (though a fair bit of work), and
> I think it would be a GOOD THING indeed. IMHO the Linux networking stack in
> the kernel is a horrific mess, and it doesn't have to be.
> >> >
> >> > The reason it doesn't have to be is that there should be no reason it
> cannot run in ring3/userland, just like DPDK. And it should be built using
> "real-time" userland programming techniques. (avoiding the generic linux
> scheduler). The ONLY reason for involving the scheduler would be because
> there aren't enough cores. Linux was designed to be a uniprocessor Unix,
> and that just is no longer true at all. With hyperthreading, too, one need
> never abandon a processor's context in userspace to run some "userland"
> application.
> >> >
> >> > This would rip a huge amount of kernel code out of the kernel. (at
> least 50%, and probably more). THe security issues of all those 3rd party
> network drivers would go away.
> >> >
> >> > And the performance would be much higher for networking. (running in
> ring 3, especially if you don't do system calls, is no performance penalty,
> and interprocessor communications using shared memory is much lower latency
> than Linux IPC or mutexes).
> >> >
> >> > I like the idea of a compilation based network stack, at a slightly
> higher level than C. eBPF is NOT what I have in mind - it's an interpreter
> with high overhead. The language should support high-performance
> co-routining - shared memory, ideally. I don't thing GC is a good thing.
> Rust might be a good starting point because its memory management is safe.
> >> > To me, some of what the base of DPDK is like is good stuff. However,
> it isn't architecturally neutral.
> >> >
> >> > To me, the network stack should not be entangled with interrupt
> handling at all. "polling" is far more performant under load. The only use
> for interrupts is when the network stack is completely idle. That would be,
> in userland, a "wait for interrupt" call (not a poll). Ideally, on recent
> Intel machines, a userspace version of MONITOR/MWAIT).
> >> >
> >> > Now I know that Linus and his crew are really NOT gonna like this.
> Linus is still thinking like MINIX, a uniprocessor time-sharing system with
> rich OS functions in the kernel and doing "file" reads and writes to
> communicate with the kernel state. But it is a much more modern way to
> think of real-time IO in a modern operating system. (Windows and macOS are
> also Unix-like, uniprocessor monolithic kernel designs).
> >> >
> >> > So, if XDP2 got away from the Linux kernel, it could be great.
> >> > BTW, io_uring, etc. are half-measures. They address getting away from
> interrupts toward polling, but they still make the mistake of keeping huge
> drivers in the kernel.
> >>
> >> DPDK already supports use of XDP as a way to do userspace networking.
> >> It is good generic way to get packets in/out but the dedicated
> userspace drivers allow
> >> for more access to hardware. The XDP abstraction gets in the way of
> little things like programming
> >> VLAN's, etc.
> >>
> >> The tradeoff is userspace networking works great for infrastructure,
> routers, switches, firewalls etc;
> >> but userspace networking for network stacks to applications is hard to
> do, and loses the isolation
> >> that the kernel provides.
> >>
> >> > > I think it is interesting as a concept. A project I am advising
> has been > using DPDK very effectively to get rid of the huge path and
> locking delays > in the current Linux network stack. XDP2 could be
> supported in a ring3 > (user) address space, achieving a similar result.
> >> > HI David,
> >> > The idea is you could write the code in XDP2 and it would be compiled
> to DPDK or eBPF and the compiler would handle the optimizations.
> >> > >
> >> > >
> >> > >
> >> > > But I don't think XDP2 is going that direction - so it may be
> stuckinto > the mess of kernel space networking. Adding eBPF only has made
> this more of > a mess, by the way (and adding a new "compiler" that needs
> to be veriried > as safe for the kernel).
> >> > Think of XDP2 as the generalization of XDP to go beyond just the
> kernel. The idea is that the user writes their datapath code once and they
> compile it to run in whatever targets they have-- DPDK, P4, other
> programmable hardware, and yes XDP/eBPF. It's really not limited to kernel
> networking.
> >> > As for the name XDP2, when we created XDP, eXpress DataPath, my
> vision was that it would be implementation agnostic. eBPF was the first
> instantiation for practicality, but now ten years later I think we can
> realize the initial vision.
> >> > Tom
> >>
> >>
> >> At this point, different network architectures get focused at different
> use cases.
> >> The days of the one-size-fits-all networking of BSD Unix is dead.
>
next prev parent reply other threads:[~2025-09-15 22:27 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-09 10:32 [Bloat] " Frantisek Borsik
2025-09-09 20:25 ` [Bloat] Re: [Cake] " David P. Reed
2025-09-09 21:02 ` Frantisek Borsik
2025-09-09 21:36 ` [Bloat] Re: [Cake] " Tom Herbert
2025-09-10 8:54 ` BeckW
2025-09-10 13:59 ` Tom Herbert
2025-09-10 14:06 ` Tom Herbert
2025-09-13 20:33 ` David P. Reed
2025-09-13 20:58 ` Tom Herbert
2025-09-14 18:00 ` David P. Reed
2025-09-14 18:18 ` David Collier-Brown
2025-09-14 18:38 ` Tom Herbert
2025-09-15 8:39 ` BeckW
2025-09-15 15:16 ` Stephen Hemminger
2025-09-15 18:07 ` Frantisek Borsik
2025-09-15 18:35 ` Tom Herbert
2025-09-15 22:26 ` Frantisek Borsik [this message]
2025-09-15 23:16 ` David P. Reed
2025-09-16 0:05 ` Tom Herbert
[not found] ` <FR2PPFEFD18174CA00474D0DC8DBDA3EE00DC0EA@FR2PPFEFD18174C.DEUP281.PROD.OUT LOOK.COM>
[not found] ` <FR2PPFEFD18174CA00474D0DC8DBDA3EE00DC0EA@FR2PPFEFD18174C.DEUP281.PROD.OUTLOO K.COM>
2025-09-13 20:35 ` David P. Reed
2025-09-15 23:16 ` [Bloat] Re: [Rpm] Re: [Cake] " Robert McMahon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJUtOOh81gJBcrA8C=C9AHQhuRRDy5KaoLptKZ9_y4+SxW2T_w@mail.gmail.com' \
--to=frantisek.borsik@gmail.com \
--cc=BeckW@telekom.de \
--cc=bloat@lists.bufferbloat.net \
--cc=cake@lists.bufferbloat.net \
--cc=codel@lists.bufferbloat.net \
--cc=dpreed@deepplum.com \
--cc=rpm@lists.bufferbloat.net \
--cc=stephen@networkplumber.org \
--cc=tom@herbertland.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox