[Cake] ebpf policing?

Dave Taht dave.taht at gmail.com
Wed Aug 8 11:13:07 EDT 2018

On Wed, Aug 8, 2018 at 6:04 AM Jonathan Morton <chromatix99 at gmail.com> wrote:
> > On 7 Aug, 2018, at 3:12 am, Dave Taht <dave.taht at gmail.com> wrote:
> >
> >> Writing a modern policer in ebpf is feasible. it's got nsec
> >> timestamping, counters, threads, lots of potential parallelism. Loops
> >> that have to be carefully bound. The worst possible config api. A nice
> >> statistics export system.
> >>
> >> Classic policers use token buckets, but anyone up for codel,
> >> time_per_byte, framing awareness and deficits, and a little aqm?
> >
> > :crickets: :)
> >
> > I started taking a stab at writing a straight tc filter for this,
> > trying to learn enough about how the tc filter subsystem works
> > today - like, can you scribble on a packet? Can you keep local state?
> I actually thought about it myself, but while it's undoubtedly *possible* to write a policer in eBPF, I don't see an overwhelmingly good reason to actually do so.  It still makes the most sense to do it in C, for maximum performance, until the semantics are proved useful enough to bake in hardware.

for relative ease of coding, too. straight c is way easier than ebpf
c. I merely proved to myself that
you could translate it to ebpf if needed, and it is the rx side of
linux that struggles to reach even 1/10th
the pps of the tx side, so pushing stuff like this into an offload
engine might be long term worthwhile - ISPs also have to deal with
packet floods and ddos attacks....

> There does still seem to be an awful lot of boilerplate in a netfilter action, though.  Makes it harder to tease out what is actually going on.

Lord gawd this got complicated in the last decade.

> > What's this rcu thing do?
> Read-Copy-Update is a scheme for efficient, lock-free concurrent access, where most of the accesses are reads while changes are comparatively rare.  Basically there's a pointer to the real data, and the pointer can be switched atomically after a complete new set of data is constructed, then the old data can be deleted once all the read-only references to it are released.  It's a natural fit for configuration data, but would be a bad choice for realtime statistics - better to use atomic_add et al for that.

well, it Is used by the distributed bstats code to collect
incrementing counters and later merge them after the rcu
period. So for stats ok.

for state, not ok.

> It strikes me that the filter environment may differ from the qdisc environment in one crucial matter: concurrency.  A qdisc is always called with a lock held, so concurrency with respect to itself is not a factor, but maximum throughput is limited.  If that is *not* true of a filter action, then greater throughput should be feasible but the programming techniques required will be more subtle.  Does anyone know for certain which it is?

many older filters do take a lock per packet, notably act_police. The
modernized act_skbedit doesn't.

>  - Jonathan Morton


Dave Täht
CEO, TekLibre, LLC
Tel: 1-669-226-2619

More information about the Cake mailing list