* [Cake] passing args to bpf programs
@ 2018-08-01 16:22 Dave Taht
2018-08-01 16:35 ` Stephen Hemminger
2018-08-01 16:36 ` Dave Taht
0 siblings, 2 replies; 10+ messages in thread
From: Dave Taht @ 2018-08-01 16:22 UTC (permalink / raw)
To: Cake List
this really isn't the right list for this... but I wanted to build on
the ack_filter bpf code I had, to create impairments, like dropping
acks every X packets, or randomly, or when a specific pattern is seen
(like timestamps or sack). This was sort of the reverse complement to
getting the cake ack-filter right, now that I know everything that can
go wrong...
I see I can return ACT_SHOT, so I can drop packets.
But what I can't quite figure out is how to pass args to an tc ebpf
program. Do I have to pass those via a file descriptor? A map
generated elsewhere? what? Sure as heck don't want to compile one
program per opt....
Simplest args would be:
max 16 - drop every 16th ack packet
random 24 - drop randomly between 0 24
match only certain flags
followed by more gnarly ones like:
miscalculate if I have a payload or not
drop sack
mangle timestamps
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] passing args to bpf programs
2018-08-01 16:22 [Cake] passing args to bpf programs Dave Taht
@ 2018-08-01 16:35 ` Stephen Hemminger
2018-08-01 16:36 ` Dave Taht
1 sibling, 0 replies; 10+ messages in thread
From: Stephen Hemminger @ 2018-08-01 16:35 UTC (permalink / raw)
To: Dave Taht; +Cc: Cake List
On Wed, 1 Aug 2018 09:22:41 -0700
Dave Taht <dave.taht@gmail.com> wrote:
> this really isn't the right list for this... but I wanted to build on
> the ack_filter bpf code I had, to create impairments, like dropping
> acks every X packets, or randomly, or when a specific pattern is seen
> (like timestamps or sack). This was sort of the reverse complement to
> getting the cake ack-filter right, now that I know everything that can
> go wrong...
>
> I see I can return ACT_SHOT, so I can drop packets.
>
> But what I can't quite figure out is how to pass args to an tc ebpf
> program. Do I have to pass those via a file descriptor? A map
> generated elsewhere? what? Sure as heck don't want to compile one
> program per opt....
>
> Simplest args would be:
>
> max 16 - drop every 16th ack packet
> random 24 - drop randomly between 0 24
> match only certain flags
>
> followed by more gnarly ones like:
>
> miscalculate if I have a payload or not
> drop sack
> mangle timestamps
>
With Xnetem, I ended up creating a map of config options.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] passing args to bpf programs
2018-08-01 16:22 [Cake] passing args to bpf programs Dave Taht
2018-08-01 16:35 ` Stephen Hemminger
@ 2018-08-01 16:36 ` Dave Taht
2018-08-01 16:42 ` Jonathan Morton
1 sibling, 1 reply; 10+ messages in thread
From: Dave Taht @ 2018-08-01 16:36 UTC (permalink / raw)
To: Cake List
A somewhat related goal would be to apply the codel algorithm via bpf.
We'd take advantage of hardware
multiqueue for the fq part, ensure a good timestamp always existed on
all ingress ports, check it on egress.
The one major loop in codel we could unroll to be a fixed unroll (and
just give up), and we're done there.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] passing args to bpf programs
2018-08-01 16:36 ` Dave Taht
@ 2018-08-01 16:42 ` Jonathan Morton
2018-08-01 16:54 ` Dave Taht
0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Morton @ 2018-08-01 16:42 UTC (permalink / raw)
To: Dave Taht; +Cc: Cake List
> On 1 Aug, 2018, at 7:36 pm, Dave Taht <dave.taht@gmail.com> wrote:
>
> The one major loop in codel we could unroll to be a fixed unroll (and
> just give up), and we're done there.
The COBALT version only has a loop in the recovery phase, and that mainly to handle long pauses immediately following heavy congestion. The idle and marking phases do not loop.
- Jonathan Morton
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] passing args to bpf programs
2018-08-01 16:42 ` Jonathan Morton
@ 2018-08-01 16:54 ` Dave Taht
2018-08-01 17:25 ` Dave Taht
0 siblings, 1 reply; 10+ messages in thread
From: Dave Taht @ 2018-08-01 16:54 UTC (permalink / raw)
To: Jonathan Morton; +Cc: Cake List
the other thing I noticed while fiddling with bql and cake unshaped is
that bql, too, had gained the ability to limit rates at mbit
granularity, when I wasn't looking. I am not sure if additional
hardware support is required, but:
https://patchwork.ozlabs.org/patch/449002/
On Wed, Aug 1, 2018 at 9:42 AM Jonathan Morton <chromatix99@gmail.com> wrote:
>
> > On 1 Aug, 2018, at 7:36 pm, Dave Taht <dave.taht@gmail.com> wrote:
> >
> > The one major loop in codel we could unroll to be a fixed unroll (and
> > just give up), and we're done there.
>
> The COBALT version only has a loop in the recovery phase, and that mainly to handle long pauses immediately following heavy congestion. The idle and marking phases do not loop.
>
> - Jonathan Morton
>
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] passing args to bpf programs
2018-08-01 16:54 ` Dave Taht
@ 2018-08-01 17:25 ` Dave Taht
2018-08-01 19:20 ` [Cake] codel in ebpf? Dave Taht
0 siblings, 1 reply; 10+ messages in thread
From: Dave Taht @ 2018-08-01 17:25 UTC (permalink / raw)
To: Jonathan Morton; +Cc: Cake List
I wonder if ebpf has opcode space for an invsqrt?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Cake] codel in ebpf?
2018-08-01 17:25 ` Dave Taht
@ 2018-08-01 19:20 ` Dave Taht
2018-08-02 20:04 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 10+ messages in thread
From: Dave Taht @ 2018-08-01 19:20 UTC (permalink / raw)
To: Jonathan Morton; +Cc: Cake List
On Wed, Aug 1, 2018 at 10:25 AM Dave Taht <dave.taht@gmail.com> wrote:
>
> I wonder if ebpf has opcode space for an invsqrt?
bpf_ktime_get_ns() exists...
one thing that I don't know if bpf can do is read/write the
skb->tstamp field. The plan would be to rigorously write it (if not
supplied by hw) on all ingress ports and check it on all egress ports.
That said, every time I've tried to do something in ebpf I hit a
limitation I'd not thunk of yet. For example, where can you attach the
egress filter?
My thought would be to use a bfifo > bpf -> bql, but from what little I
understand, it's bpf -> bfifo -> bql
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] codel in ebpf?
2018-08-01 19:20 ` [Cake] codel in ebpf? Dave Taht
@ 2018-08-02 20:04 ` Toke Høiland-Jørgensen
2018-08-03 0:22 ` Dave Taht
0 siblings, 1 reply; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-02 20:04 UTC (permalink / raw)
To: Dave Taht, Jonathan Morton; +Cc: Cake List
Dave Taht <dave.taht@gmail.com> writes:
> On Wed, Aug 1, 2018 at 10:25 AM Dave Taht <dave.taht@gmail.com> wrote:
>>
>> I wonder if ebpf has opcode space for an invsqrt?
>
> bpf_ktime_get_ns() exists...
>
> one thing that I don't know if bpf can do is read/write the
> skb->tstamp field. The plan would be to rigorously write it (if not
> supplied by hw) on all ingress ports and check it on all egress ports.
An XDP eBPF program (run at earliest possible ingress) has access to a
buffer of arbitrary data that is attached to the skb and that can be
read from later eBPF programs. So it doesn't need to muck with
skb->tstamp for this.
> That said, every time I've tried to do something in ebpf I hit a
> limitation I'd not thunk of yet.
Yeah, the whole XDP/eBPF system is somewhat of a work in progress ;)
> For example, where can you attach the egress filter?
>
> My thought would be to use a bfifo > bpf -> bql, but from what little
> I understand, it's bpf -> bfifo -> bql
Yeah, it is. Don't think there's a way to run an eBPF program after the
qdisc...
-Toke
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] codel in ebpf?
2018-08-02 20:04 ` Toke Høiland-Jørgensen
@ 2018-08-03 0:22 ` Dave Taht
2018-08-03 10:19 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 10+ messages in thread
From: Dave Taht @ 2018-08-03 0:22 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Jonathan Morton, Cake List
On Thu, Aug 2, 2018 at 1:04 PM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
> Dave Taht <dave.taht@gmail.com> writes:
>
> > On Wed, Aug 1, 2018 at 10:25 AM Dave Taht <dave.taht@gmail.com> wrote:
> >>
> >> I wonder if ebpf has opcode space for an invsqrt?
> >
> > bpf_ktime_get_ns() exists...
> >
> > one thing that I don't know if bpf can do is read/write the
> > skb->tstamp field. The plan would be to rigorously write it (if not
> > supplied by hw) on all ingress ports and check it on all egress ports.
>
> An XDP eBPF program (run at earliest possible ingress) has access to a
> buffer of arbitrary data that is attached to the skb and that can be
> read from later eBPF programs. So it doesn't need to muck with
> skb->tstamp for this.
It does? All I see is maps. If you clone or otherwise split the
packet, what happens?
> > That said, every time I've tried to do something in ebpf I hit a
> > limitation I'd not thunk of yet.
>
> Yeah, the whole XDP/eBPF system is somewhat of a work in progress ;)
>
> > For example, where can you attach the egress filter?
> >
> > My thought would be to use a bfifo > bpf -> bql, but from what little
> > I understand, it's bpf -> bfifo -> bql
>
> Yeah, it is. Don't think there's a way to run an eBPF program after the
> qdisc...
Well, it seems possible to move this typical part of an ebpf specific qdisc
from ingress to egress. Gawd knows what that would break, but, essentially....
filter = rcu_dereference_bh(q->filter_list);
retry: skb = dequeue();
...
if(filter) {
*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
result = tcf_classify(skb, filter, &res, false);
switch (result) {
case TC_ACT_SHOT: clean up stuff; goto retry;
}
...
> -Toke
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Cake] codel in ebpf?
2018-08-03 0:22 ` Dave Taht
@ 2018-08-03 10:19 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 10+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-03 10:19 UTC (permalink / raw)
To: Dave Taht; +Cc: Jonathan Morton, Cake List
Dave Taht <dave.taht@gmail.com> writes:
> On Thu, Aug 2, 2018 at 1:04 PM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>
>> Dave Taht <dave.taht@gmail.com> writes:
>>
>> > On Wed, Aug 1, 2018 at 10:25 AM Dave Taht <dave.taht@gmail.com> wrote:
>> >>
>> >> I wonder if ebpf has opcode space for an invsqrt?
>> >
>> > bpf_ktime_get_ns() exists...
>> >
>> > one thing that I don't know if bpf can do is read/write the
>> > skb->tstamp field. The plan would be to rigorously write it (if not
>> > supplied by hw) on all ingress ports and check it on all egress ports.
>>
>> An XDP eBPF program (run at earliest possible ingress) has access to a
>> buffer of arbitrary data that is attached to the skb and that can be
>> read from later eBPF programs. So it doesn't need to muck with
>> skb->tstamp for this.
>
> It does? All I see is maps.
See bpf_xdp_adjust_meta() - some description here:
http://cilium.readthedocs.io/en/latest/bpf/#program-types
> If you clone or otherwise split the packet, what happens?
Hmm, good question; I *think* the metadata is tied to the skb, so I
guess it should also be copied?
>> Yeah, it is. Don't think there's a way to run an eBPF program after the
>> qdisc...
>
> Well, it seems possible to move this typical part of an ebpf specific qdisc
> from ingress to egress. Gawd knows what that would break, but,
> essentially....
Well for one thing that would break the TC classification. And you'd
have to implement a new qdisc to do it; at which point, you might as
well implement the queueing there? But maybe a new generic eBPF hook
could be added to the qdisc dequeue code? If there's a compelling use
case, I've heard that the kernel people can be quite responsive for new
features :)
-Toke
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-08-03 10:19 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-01 16:22 [Cake] passing args to bpf programs Dave Taht
2018-08-01 16:35 ` Stephen Hemminger
2018-08-01 16:36 ` Dave Taht
2018-08-01 16:42 ` Jonathan Morton
2018-08-01 16:54 ` Dave Taht
2018-08-01 17:25 ` Dave Taht
2018-08-01 19:20 ` [Cake] codel in ebpf? Dave Taht
2018-08-02 20:04 ` Toke Høiland-Jørgensen
2018-08-03 0:22 ` Dave Taht
2018-08-03 10:19 ` Toke Høiland-Jørgensen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox