* [Cake] issue with Cake and bpf filter
@ 2018-08-12 8:17 Pete Heist
2018-08-12 10:23 ` Pete Heist
0 siblings, 1 reply; 21+ messages in thread
From: Pete Heist @ 2018-08-12 8:17 UTC (permalink / raw)
To: Cake List
[-- Attachment #1: Type: text/plain, Size: 678 bytes --]
I’m seeing a panic (or infinite loop, not sure) when using Cake (not fq_codel or sfq) with a simple bpf filter that just sets skb->tc_classid to a fixed value. I haven’t investigated further yet, but since I’m away from the computer more these days I thought it better to post what I have so far in case someone wants to look at it.
The attached contains the code and setup script I’m using and a README.txt with instructions on how to reproduce it and some more notes. Be prepared that it’s something I’m doing wrong. :) However, I also probably shouldn’t be able to cause this to happen this easily either, regardless of where the actual issue is...
Pete
[-- Attachment #2: simple-bpf.tgz --]
[-- Type: application/octet-stream, Size: 1797 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-12 8:17 [Cake] issue with Cake and bpf filter Pete Heist
@ 2018-08-12 10:23 ` Pete Heist
2018-08-12 13:41 ` Jonathan Morton
0 siblings, 1 reply; 21+ messages in thread
From: Pete Heist @ 2018-08-12 10:23 UTC (permalink / raw)
To: Cake List
[-- Attachment #1: Type: text/plain, Size: 329 bytes --]
One more thing to add to this, when working with another bpf filter which is relatively similar to this simple one (although has some more innocuous looking code like map lookups and read-only operations on the skb) sometimes the attached is suddenly and repeatedly sent to both syslog and kern.log until the disk fills up...
[-- Attachment #2: cake_loop.txt --]
[-- Type: text/plain, Size: 5775 bytes --]
Aug 12 09:57:25 ubuntu kernel: [ 2408.152975] WARNING: CPU: 3 PID: 2304 at /home/a/src/sch_cake/sch_cake.c:2094 cake_dequeue+0x791/0xc70 [sch_cake]
Aug 12 09:57:25 ubuntu kernel: [ 2408.152975] Modules linked in: sch_cake(OE) nf_conntrack cls_bpf algif_hash af_alg binfmt_misc vmw_balloon btusb btrtl btbcm btintel bluetooth uvcvideo joydev input_leds videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media ecdh_generic intel_rapl_perf snd_ens1371 serio_raw snd_ac97_codec gameport snd_rawmidi snd_seq_device ac97_bus snd_pcm snd_timer snd soundcore shpchp mac_hid sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vmw_vsock_vmci_transport vsock vmw_vmci ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
Aug 12 09:57:25 ubuntu kernel: [ 2408.152997] aes_x86_64 crypto_simd glue_helper cryptd vmwgfx ttm psmouse drm_kms_helper syscopyarea sysfillrect sysimgblt e1000 fb_sys_fops ahci mptspi drm mptscsih libahci mptbase scsi_transport_spi i2c_piix4 pata_acpi
Aug 12 09:57:25 ubuntu kernel: [ 2408.153004] CPU: 3 PID: 2304 Comm: sshd Tainted: G W OE 4.15.0-30-generic #32-Ubuntu
Aug 12 09:57:25 ubuntu kernel: [ 2408.153004] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
Aug 12 09:57:25 ubuntu kernel: [ 2408.153006] RIP: 0010:cake_dequeue+0x791/0xc70 [sch_cake]
Aug 12 09:57:25 ubuntu kernel: [ 2408.153006] RSP: 0018:ffffadf9c11df8a0 EFLAGS: 00010282
Aug 12 09:57:25 ubuntu kernel: [ 2408.153007] RAX: 000000000000fffd RBX: ffff94c0747331b0 RCX: 000000000000fffd
Aug 12 09:57:25 ubuntu kernel: [ 2408.153008] RDX: 0000000000000000 RSI: 000000000000fffd RDI: ffff94c1796df9b0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153008] RBP: ffffadf9c11df8f8 R08: 0000000054bc4e6d R09: 0000000000000000
Aug 12 09:57:25 ubuntu kernel: [ 2408.153009] R10: 0000000000000000 R11: 0000000000000001 R12: 00000227839cc5b1
Aug 12 09:57:25 ubuntu kernel: [ 2408.153009] R13: ffff94c0747331c0 R14: ffff94c1718b0000 R15: 0000000000000000
Aug 12 09:57:25 ubuntu kernel: [ 2408.153010] FS: 00007faa37446900(0000) GS:ffff94c1796c0000(0000) knlGS:0000000000000000
Aug 12 09:57:25 ubuntu kernel: [ 2408.153011] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 12 09:57:25 ubuntu kernel: [ 2408.153012] CR2: 00007fff85e71058 CR3: 00000000b1436002 CR4: 00000000000606e0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153015] Call Trace:
Aug 12 09:57:25 ubuntu kernel: [ 2408.153016] __qdisc_run+0x61/0x320
Aug 12 09:57:25 ubuntu kernel: [ 2408.153018] __dev_queue_xmit+0x2ab/0x7d0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153019] dev_queue_xmit+0x10/0x20
Aug 12 09:57:25 ubuntu kernel: [ 2408.153021] ? dev_queue_xmit+0x10/0x20
Aug 12 09:57:25 ubuntu kernel: [ 2408.153022] ip_finish_output2+0x263/0x3c0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153024] ip_finish_output+0x198/0x260
Aug 12 09:57:25 ubuntu kernel: [ 2408.153025] ? ip_finish_output+0x198/0x260
Aug 12 09:57:25 ubuntu kernel: [ 2408.153027] ip_output+0x70/0xe0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153028] ip_local_out+0x3b/0x50
Aug 12 09:57:25 ubuntu kernel: [ 2408.153030] ip_queue_xmit+0x160/0x3e0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153031] ? memcg_kmem_charge+0x7e/0xe0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153033] tcp_transmit_skb+0x56a/0xa70
Aug 12 09:57:25 ubuntu kernel: [ 2408.153034] tcp_write_xmit+0x1c7/0xf40
Aug 12 09:57:25 ubuntu kernel: [ 2408.153036] ? __alloc_skb+0x9b/0x1e0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153037] __tcp_push_pending_frames+0x35/0xd0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153038] tcp_push+0xe4/0x110
Aug 12 09:57:25 ubuntu kernel: [ 2408.153039] tcp_sendmsg_locked+0xb6e/0xe70
Aug 12 09:57:25 ubuntu kernel: [ 2408.153041] tcp_sendmsg+0x2c/0x50
Aug 12 09:57:25 ubuntu kernel: [ 2408.153042] inet_sendmsg+0x2e/0xb0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153043] sock_sendmsg+0x3e/0x50
Aug 12 09:57:25 ubuntu kernel: [ 2408.153045] sock_write_iter+0x8c/0xf0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153046] new_sync_write+0xe7/0x140
Aug 12 09:57:25 ubuntu kernel: [ 2408.153048] __vfs_write+0x29/0x40
Aug 12 09:57:25 ubuntu kernel: [ 2408.153050] vfs_write+0xb1/0x1a0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153051] SyS_write+0x55/0xc0
Aug 12 09:57:25 ubuntu kernel: [ 2408.153052] do_syscall_64+0x73/0x130
Aug 12 09:57:25 ubuntu kernel: [ 2408.153054] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Aug 12 09:57:25 ubuntu kernel: [ 2408.153054] RIP: 0033:0x7faa352a9154
Aug 12 09:57:25 ubuntu kernel: [ 2408.153055] RSP: 002b:00007fff85e75188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Aug 12 09:57:25 ubuntu kernel: [ 2408.153056] RAX: ffffffffffffffda RBX: 000000000000006c RCX: 00007faa352a9154
Aug 12 09:57:25 ubuntu kernel: [ 2408.153056] RDX: 000000000000006c RSI: 000055a21c87a290 RDI: 0000000000000003
Aug 12 09:57:25 ubuntu kernel: [ 2408.153057] RBP: 000055a21c87d560 R08: 0000000000000000 R09: 0000000000000100
Aug 12 09:57:25 ubuntu kernel: [ 2408.153057] R10: 00007fff85e75100 R11: 0000000000000246 R12: 0000000000000000
Aug 12 09:57:25 ubuntu kernel: [ 2408.153058] R13: 000055a21c866ad0 R14: 0000000000000003 R15: 00007fff85e7520f
Aug 12 09:57:25 ubuntu kernel: [ 2408.153059] Code: ff 80 fa 06 75 1b 48 63 55 a8 0f b7 c8 48 8d 14 52 0f b7 b4 93 0a 68 01 00 66 39 c6 0f 43 ce 89 c8 66 3d 00 04 0f 86 70 fb ff ff <0f> 0b 41 8b 7d 10 85 ff 0f 8f 6e fb ff ff 0f b7 c0 44 0f b7 bb
Aug 12 09:57:25 ubuntu kernel: [ 2408.153079] ---[ end trace 9274605666353433 ]---
[-- Attachment #3: Type: text/plain, Size: 787 bytes --]
> On Aug 12, 2018, at 10:17 AM, Pete Heist <pete@heistp.net> wrote:
>
> I’m seeing a panic (or infinite loop, not sure) when using Cake (not fq_codel or sfq) with a simple bpf filter that just sets skb->tc_classid to a fixed value. I haven’t investigated further yet, but since I’m away from the computer more these days I thought it better to post what I have so far in case someone wants to look at it.
>
> The attached contains the code and setup script I’m using and a README.txt with instructions on how to reproduce it and some more notes. Be prepared that it’s something I’m doing wrong. :) However, I also probably shouldn’t be able to cause this to happen this easily either, regardless of where the actual issue is...
>
> Pete
> <simple-bpf.tgz>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-12 10:23 ` Pete Heist
@ 2018-08-12 13:41 ` Jonathan Morton
2018-08-12 19:42 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Morton @ 2018-08-12 13:41 UTC (permalink / raw)
To: Pete Heist; +Cc: Cake List
> On 12 Aug, 2018, at 1:23 pm, Pete Heist <pete@heistp.net> wrote:
>
> One more thing to add to this, when working with another bpf filter which is relatively similar to this simple one (although has some more innocuous looking code like map lookups and read-only operations on the skb) sometimes the attached is suddenly and repeatedly sent to both syslog and kern.log until the disk fills up...
> Aug 12 09:57:25 ubuntu kernel: [ 2408.152975] WARNING: CPU: 3 PID: 2304 at /home/a/src/sch_cake/sch_cake.c:2094 cake_dequeue+0x791/0xc70 [sch_cake]
> WARN_ON(host_load > CAKE_QUEUES);
Yeah, something is going seriously wrong here. It shouldn't be possible for that warning to trigger; if it is, then Cake's internal data structures are being corrupted somehow.
But Cake doesn't directly read tc_classid from an skb. So what could it be influencing?
- Jonathan Morton
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-12 13:41 ` Jonathan Morton
@ 2018-08-12 19:42 ` Toke Høiland-Jørgensen
2018-08-12 21:34 ` Jonathan Morton
0 siblings, 1 reply; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-12 19:42 UTC (permalink / raw)
To: Jonathan Morton, Pete Heist; +Cc: Cake List
Jonathan Morton <chromatix99@gmail.com> writes:
>> On 12 Aug, 2018, at 1:23 pm, Pete Heist <pete@heistp.net> wrote:
>>
>> One more thing to add to this, when working with another bpf filter which is relatively similar to this simple one (although has some more innocuous looking code like map lookups and read-only operations on the skb) sometimes the attached is suddenly and repeatedly sent to both syslog and kern.log until the disk fills up...
>
>> Aug 12 09:57:25 ubuntu kernel: [ 2408.152975] WARNING: CPU: 3 PID: 2304 at /home/a/src/sch_cake/sch_cake.c:2094 cake_dequeue+0x791/0xc70 [sch_cake]
>
>> WARN_ON(host_load > CAKE_QUEUES);
>
> Yeah, something is going seriously wrong here. It shouldn't be
> possible for that warning to trigger; if it is, then Cake's internal
> data structures are being corrupted somehow.
>
> But Cake doesn't directly read tc_classid from an skb. So what could
> it be influencing?
Yes it does; setting tc_classid is one way to set the flow class that is
read by the call to tcf_classify(). I guess the problem is that
cake_hash() is bypassed, so the *host_refcnt variables are not
incremented.
There's been another report of the same issue on github; haven't had
time to look into it yet, but I guess this is the reason...
I guess there are two options here; either always run the hashing
algorithm to select the host hashes, or make sure the srchost/dsthost
fields are set to 0 when tc classification is used. I think the latter
is the better approach; but I'm not sure if it's enough to just set the
fields to 0? There seem to be unconditional decrements of the refcnts in
multiple places?
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-12 19:42 ` Toke Høiland-Jørgensen
@ 2018-08-12 21:34 ` Jonathan Morton
2018-08-13 0:21 ` Jonathan Morton
0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Morton @ 2018-08-12 21:34 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Pete Heist, Cake List
> On 12 Aug, 2018, at 10:42 pm, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
> Yes it does; setting tc_classid is one way to set the flow class that is
> read by the call to tcf_classify(). I guess the problem is that
> cake_hash() is bypassed, so the *host_refcnt variables are not
> incremented.
>
> There's been another report of the same issue on github; haven't had
> time to look into it yet, but I guess this is the reason...
I'll see what I can do tonight, now that I understand the problem.
- Jonathan Morton
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-12 21:34 ` Jonathan Morton
@ 2018-08-13 0:21 ` Jonathan Morton
2018-08-13 5:44 ` Jonathan Morton
2018-08-13 11:01 ` Toke Høiland-Jørgensen
0 siblings, 2 replies; 21+ messages in thread
From: Jonathan Morton @ 2018-08-13 0:21 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Pete Heist, Cake List
> On 13 Aug, 2018, at 12:34 am, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 12 Aug, 2018, at 10:42 pm, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>
>> Yes it does; setting tc_classid is one way to set the flow class that is
>> read by the call to tcf_classify(). I guess the problem is that
>> cake_hash() is bypassed, so the *host_refcnt variables are not
>> incremented.
>>
>> There's been another report of the same issue on github; haven't had
>> time to look into it yet, but I guess this is the reason...
>
> I'll see what I can do tonight, now that I understand the problem.
Oh, this is a mess. Ultimately the problem stems from having previously factored out choosing tin and flow to two distinct functions, but now there's an external mechanism for overriding both at once, which can only hook into one or the other.
The easiest fix is to just remove the broken support for setting the flow ID via a filter, but leave in the support for setting the tin. I think that's the most useful "simple" fix. It's perhaps worth noting that this was the first thing I *removed* when reworking fq_codel into the first version of Cake, because I couldn't see a valid use for it.
The next simplest fix is to ignore the flow ID override unless we're in "flows" mode. We can then make valid assumptions about what should go into the host tables.
The *right* fix, if we want to maximise functionality, would be to pass the result struct by reference into cake_hash(), where it can override the *host* IDs (not the flow ID). Users can then choose between using the override as a flow ID (by setting "hosts" mode instead of "flows"), or retaining the default host-isolation semantics with a revised definition of "host".
I feel compelled to point out that baggage like this is probably at least some of why Cake is no longer faster than HTB+fq_codel, as it used to be.
- Jonathan Morton
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-13 0:21 ` Jonathan Morton
@ 2018-08-13 5:44 ` Jonathan Morton
2018-08-13 11:01 ` Toke Høiland-Jørgensen
1 sibling, 0 replies; 21+ messages in thread
From: Jonathan Morton @ 2018-08-13 5:44 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Pete Heist, Cake List
> The easiest fix is to just remove the broken support for setting the flow ID via a filter, but leave in the support for setting the tin. I think that's the most useful "simple" fix.
I've pushed a temporary fix which does that.
- Jonathan Morton
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-13 0:21 ` Jonathan Morton
2018-08-13 5:44 ` Jonathan Morton
@ 2018-08-13 11:01 ` Toke Høiland-Jørgensen
2018-08-21 11:25 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-13 11:01 UTC (permalink / raw)
To: Jonathan Morton; +Cc: Pete Heist, Cake List
Jonathan Morton <chromatix99@gmail.com> writes:
>> On 13 Aug, 2018, at 12:34 am, Jonathan Morton <chromatix99@gmail.com> wrote:
>>
>>> On 12 Aug, 2018, at 10:42 pm, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>>
>>> Yes it does; setting tc_classid is one way to set the flow class that is
>>> read by the call to tcf_classify(). I guess the problem is that
>>> cake_hash() is bypassed, so the *host_refcnt variables are not
>>> incremented.
>>>
>>> There's been another report of the same issue on github; haven't had
>>> time to look into it yet, but I guess this is the reason...
>>
>> I'll see what I can do tonight, now that I understand the problem.
>
> Oh, this is a mess. Ultimately the problem stems from having
> previously factored out choosing tin and flow to two distinct
> functions, but now there's an external mechanism for overriding both
> at once, which can only hook into one or the other.
>
> The easiest fix is to just remove the broken support for setting the
> flow ID via a filter, but leave in the support for setting the tin. I
> think that's the most useful "simple" fix. It's perhaps worth noting
> that this was the first thing I *removed* when reworking fq_codel into
> the first version of Cake, because I couldn't see a valid use for it.
Well, seeing as lots of people have expressed interest in the feature
since we put it back in it seems that there *is* a use for it :)
> The next simplest fix is to ignore the flow ID override unless we're
> in "flows" mode. We can then make valid assumptions about what should
> go into the host tables.
>
> The *right* fix, if we want to maximise functionality, would be to
> pass the result struct by reference into cake_hash(), where it can
> override the *host* IDs (not the flow ID). Users can then choose
> between using the override as a flow ID (by setting "hosts" mode
> instead of "flows"), or retaining the default host-isolation semantics
> with a revised definition of "host".
Ah, making it possible to override both host and flow mode is a great
idea! I guess we could use the major/minor distinction in the class to
steer this. I'll see if I can't integrate this.
> I feel compelled to point out that baggage like this is probably at
> least some of why Cake is no longer faster than HTB+fq_codel, as it
> used to be.
The benchmarks showing HTB to be faster were performed before the
classification went in, so I very much doubt that :)
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-13 11:01 ` Toke Høiland-Jørgensen
@ 2018-08-21 11:25 ` Toke Høiland-Jørgensen
2018-08-21 21:06 ` Pete Heist
0 siblings, 1 reply; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-21 11:25 UTC (permalink / raw)
To: Jonathan Morton; +Cc: Pete Heist, Cake List
>> The next simplest fix is to ignore the flow ID override unless we're
>> in "flows" mode. We can then make valid assumptions about what should
>> go into the host tables.
>>
>> The *right* fix, if we want to maximise functionality, would be to
>> pass the result struct by reference into cake_hash(), where it can
>> override the *host* IDs (not the flow ID). Users can then choose
>> between using the override as a flow ID (by setting "hosts" mode
>> instead of "flows"), or retaining the default host-isolation semantics
>> with a revised definition of "host".
>
> Ah, making it possible to override both host and flow mode is a great
> idea! I guess we could use the major/minor distinction in the class to
> steer this. I'll see if I can't integrate this.
So, I implemented this; in the latest commit on github it is again
possible to override the flow hashing by setting the class ID with a TC
filter; and the host hash can be overridden by setting the major number
of the class ID. In my testing the hangs from before are gone, but if
anyone else wants to test, please do!
I'll write up a description of the filter overrides in the man page, and
submit the change upstream as well...
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-21 11:25 ` Toke Høiland-Jørgensen
@ 2018-08-21 21:06 ` Pete Heist
2018-08-21 21:17 ` Toke Høiland-Jørgensen
2018-08-22 6:17 ` Jonathan Morton
0 siblings, 2 replies; 21+ messages in thread
From: Pete Heist @ 2018-08-21 21:06 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Jonathan Morton, Cake List
[-- Attachment #1: Type: text/plain, Size: 2498 bytes --]
> On Aug 21, 2018, at 1:25 PM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
>>> The next simplest fix is to ignore the flow ID override unless we're
>>> in "flows" mode. We can then make valid assumptions about what should
>>> go into the host tables.
>>>
>>> The *right* fix, if we want to maximise functionality, would be to
>>> pass the result struct by reference into cake_hash(), where it can
>>> override the *host* IDs (not the flow ID). Users can then choose
>>> between using the override as a flow ID (by setting "hosts" mode
>>> instead of "flows"), or retaining the default host-isolation semantics
>>> with a revised definition of "host".
>>
>> Ah, making it possible to override both host and flow mode is a great
>> idea! I guess we could use the major/minor distinction in the class to
>> steer this. I'll see if I can't integrate this.
>
> So, I implemented this; in the latest commit on github it is again
> possible to override the flow hashing by setting the class ID with a TC
> filter; and the host hash can be overridden by setting the major number
> of the class ID. In my testing the hangs from before are gone, but if
> anyone else wants to test, please do!
>
> I'll write up a description of the filter overrides in the man page, and
> submit the change upstream as well...
Well that’s good timing for me as I’m wrapping up a small utility/eBPF to classify an arbitrary username to either MAC or IP. Here’s the work in progress, which is not done yet as flow fairness is still under construction, and I haven’t gotten my IPv6 support to pass the rather stubborn eBPF verifier: https://github.com/heistp/tc-users <https://github.com/heistp/tc-users>
With your new code Toke:
- I so far haven’t seen my VM either crash or suddenly fill its disk with logs, which is a bonus. :)
- With the new major/minor ID distinction, I’d probably use major for the user and minor for the flow hash?
Another thing I haven’t looked into yet is that when fq_codel is the qdisc, the eBPF action is only called "once in a while” (start of a new flow?) With cake it’s called for every single packet, which is what I expected to happen, but very different behavior.
Lastly, if anyone has time to review even just a little code for what is or is not good or idiomatic C, post an issue and I’d appreciate it. Yes, I yield to the ‘goto’ proponents when it comes to error handling and resource de-allocation. :)
Pete
[-- Attachment #2: Type: text/html, Size: 3539 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-21 21:06 ` Pete Heist
@ 2018-08-21 21:17 ` Toke Høiland-Jørgensen
2018-08-21 21:46 ` Pete Heist
2018-08-22 9:41 ` Toke Høiland-Jørgensen
2018-08-22 6:17 ` Jonathan Morton
1 sibling, 2 replies; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-21 21:17 UTC (permalink / raw)
To: Pete Heist; +Cc: Jonathan Morton, Cake List
On 21 August 2018 23:06:11 CEST, Pete Heist <pete@heistp.net> wrote:
>
>> On Aug 21, 2018, at 1:25 PM, Toke Høiland-Jørgensen <toke@toke.dk>
>wrote:
>>
>>>> The next simplest fix is to ignore the flow ID override unless
>we're
>>>> in "flows" mode. We can then make valid assumptions about what
>should
>>>> go into the host tables.
>>>>
>>>> The *right* fix, if we want to maximise functionality, would be to
>>>> pass the result struct by reference into cake_hash(), where it can
>>>> override the *host* IDs (not the flow ID). Users can then choose
>>>> between using the override as a flow ID (by setting "hosts" mode
>>>> instead of "flows"), or retaining the default host-isolation
>semantics
>>>> with a revised definition of "host".
>>>
>>> Ah, making it possible to override both host and flow mode is a
>great
>>> idea! I guess we could use the major/minor distinction in the class
>to
>>> steer this. I'll see if I can't integrate this.
>>
>> So, I implemented this; in the latest commit on github it is again
>> possible to override the flow hashing by setting the class ID with a
>TC
>> filter; and the host hash can be overridden by setting the major
>number
>> of the class ID. In my testing the hangs from before are gone, but if
>> anyone else wants to test, please do!
>>
>> I'll write up a description of the filter overrides in the man page,
>and
>> submit the change upstream as well...
>
>
>Well that’s good timing for me as I’m wrapping up a small utility/eBPF
>to classify an arbitrary username to either MAC or IP. Here’s the work
>in progress, which is not done yet as flow fairness is still under
>construction, and I haven’t gotten my IPv6 support to pass the rather
>stubborn eBPF verifier: https://github.com/heistp/tc-users
><https://github.com/heistp/tc-users>
Did you see my classifier? Does subnet-to-flow mapping. https://github.com/tohojo/tc-classifier
Feel free to reuse it in whole or in part...
>With your new code Toke:
>- I so far haven’t seen my VM either crash or suddenly fill its disk
>with logs, which is a bonus. :)
Awesome!
>- With the new major/minor ID distinction, I’d probably use major for
>the user and minor for the flow hash?
Yes. See the latest commit in the tc-adv repo for a man page update explaining it. You can also just set the major ID and let cake do the flow hashing...
>Another thing I haven’t looked into yet is that when fq_codel is the
>qdisc, the eBPF action is only called "once in a while” (start of a new
>flow?) With cake it’s called for every single packet, which is what I
>expected to happen, but very different behavior.
Maybe because fq_codel is not splitting gso packets?
>Lastly, if anyone has time to review even just a little code for what
>is or is not good or idiomatic C, post an issue and I’d appreciate it.
>Yes, I yield to the ‘goto’ proponents when it comes to error handling
>and resource de-allocation. :)
I'll take a look tomorrow :)
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-21 21:17 ` Toke Høiland-Jørgensen
@ 2018-08-21 21:46 ` Pete Heist
2018-08-22 9:49 ` Toke Høiland-Jørgensen
2018-08-22 9:41 ` Toke Høiland-Jørgensen
1 sibling, 1 reply; 21+ messages in thread
From: Pete Heist @ 2018-08-21 21:46 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Jonathan Morton, Cake List
[-- Attachment #1: Type: text/plain, Size: 2181 bytes --]
> On Aug 21, 2018, at 11:17 PM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>
>> Well that’s good timing for me as I’m wrapping up a small utility/eBPF
>> to classify an arbitrary username to either MAC or IP. Here’s the work
>> in progress, which is not done yet as flow fairness is still under
>> construction, and I haven’t gotten my IPv6 support to pass the rather
>> stubborn eBPF verifier: https://github.com/heistp/tc-users
>> <https://github.com/heistp/tc-users <https://github.com/heistp/tc-users>>
>
> Did you see my classifier? Does subnet-to-flow mapping. https://github.com/tohojo/tc-classifier <https://github.com/tohojo/tc-classifier>
Yes I did, that helped a lot with the eBPF code! I’ll consult it for LPM trie usage, which will have to be ifdef’d out though for pre-4.11 kernels.
tc-users is similar really but I desired a few things (for FreeNet):
- MAC and IPv6 support
- to map arbitrary usernames to the least used class id
- to minimize the number of map changes when there are a lot of users to sync (not done yet)
- flow fairness (though given your new info below, I think that just got way easier)
- to write the userspace utility in C, for practice
>> - With the new major/minor ID distinction, I’d probably use major for
>> the user and minor for the flow hash?
>
> Yes. See the latest commit in the tc-adv repo for a man page update explaining it. You can also just set the major ID and let cake do the flow hashing…
Aha, that’s terribly convenient but also means I don’t really need to solve the hashing problem (rats), and will be ripping out some of what I started. :)
>> Another thing I haven’t looked into yet is that when fq_codel is the
>> qdisc, the eBPF action is only called "once in a while” (start of a new
>> flow?) With cake it’s called for every single packet, which is what I
>> expected to happen, but very different behavior.
>
> Maybe because fq_codel is not splitting gso packets?
Good one, I wonder, because I see it’s not just “new flow”, I seem to see it called again on the same flow if there’s a pause in packets on it for “some time”.
[-- Attachment #2: Type: text/html, Size: 7152 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-21 21:06 ` Pete Heist
2018-08-21 21:17 ` Toke Høiland-Jørgensen
@ 2018-08-22 6:17 ` Jonathan Morton
2018-08-22 9:30 ` Pete Heist
1 sibling, 1 reply; 21+ messages in thread
From: Jonathan Morton @ 2018-08-22 6:17 UTC (permalink / raw)
To: Pete Heist; +Cc: Toke Høiland-Jørgensen, Cake List
> On 22 Aug, 2018, at 12:06 am, Pete Heist <pete@heistp.net> wrote:
>
> when fq_codel is the qdisc, the eBPF action is only called "once in a while”
One difference between fq_codel and Cake is that the former - which has no shaper - will "bypass" packets when it's empty and there's no back-pressure filling it. In that case no packet classification occurs and filters will not be called. Or at least, that's how it used to be set up; I haven't looked at it recently. Cake does not rely on the same set of assumptions, so will always call the filter.
- Jonathan Morton
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-22 6:17 ` Jonathan Morton
@ 2018-08-22 9:30 ` Pete Heist
2018-08-22 9:37 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 21+ messages in thread
From: Pete Heist @ 2018-08-22 9:30 UTC (permalink / raw)
To: Jonathan Morton; +Cc: Toke Høiland-Jørgensen, Cake List
> On Aug 22, 2018, at 8:17 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 22 Aug, 2018, at 12:06 am, Pete Heist <pete@heistp.net> wrote:
>>
>> when fq_codel is the qdisc, the eBPF action is only called "once in a while”
>
> One difference between fq_codel and Cake is that the former - which has no shaper - will "bypass" packets when it's empty and there's no back-pressure filling it. In that case no packet classification occurs and filters will not be called. Or at least, that's how it used to be set up; I haven't looked at it recently. Cake does not rely on the same set of assumptions, so will always call the filter.
Aha, that sounds likely, I’ll try with htb and a rate limit. Testing with fq_codel was challenging as I had to “do stuff” until my printk’s were eventually called, but it’s easier now that I can use cake. I suppose in my case fq_codel’s behavior would be ok in production, because if there’s no queue then there’s no need to classify. Maybe in some other cases (like gathering stats), it could be problematic.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-22 9:30 ` Pete Heist
@ 2018-08-22 9:37 ` Toke Høiland-Jørgensen
2018-08-22 9:51 ` Pete Heist
0 siblings, 1 reply; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-22 9:37 UTC (permalink / raw)
To: Pete Heist, Jonathan Morton; +Cc: Cake List
Pete Heist <pete@heistp.net> writes:
>> On Aug 22, 2018, at 8:17 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>
>>> On 22 Aug, 2018, at 12:06 am, Pete Heist <pete@heistp.net> wrote:
>>>
>>> when fq_codel is the qdisc, the eBPF action is only called "once in a while”
>>
>> One difference between fq_codel and Cake is that the former - which
>> has no shaper - will "bypass" packets when it's empty and there's no
>> back-pressure filling it. In that case no packet classification
>> occurs and filters will not be called. Or at least, that's how it
>> used to be set up; I haven't looked at it recently. Cake does not
>> rely on the same set of assumptions, so will always call the filter.
>
> Aha, that sounds likely, I’ll try with htb and a rate limit. Testing
> with fq_codel was challenging as I had to “do stuff” until my printk’s
> were eventually called, but it’s easier now that I can use cake. I
> suppose in my case fq_codel’s behavior would be ok in production,
> because if there’s no queue then there’s no need to classify. Maybe in
> some other cases (like gathering stats), it could be problematic.
fq_codel turns off the bypass capability if you attach a tc filter to
it, though, so if the issue you're seeing is that you filter function is
not being called, that sounds... strange...
How do you check if the function is being called?
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-21 21:17 ` Toke Høiland-Jørgensen
2018-08-21 21:46 ` Pete Heist
@ 2018-08-22 9:41 ` Toke Høiland-Jørgensen
1 sibling, 0 replies; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-22 9:41 UTC (permalink / raw)
To: Pete Heist; +Cc: Jonathan Morton, Cake List
Toke Høiland-Jørgensen <toke@toke.dk> writes:
>>Lastly, if anyone has time to review even just a little code for what
>>is or is not good or idiomatic C, post an issue and I’d appreciate it.
>>Yes, I yield to the ‘goto’ proponents when it comes to error handling
>>and resource de-allocation. :)
>
> I'll take a look tomorrow :)
I can tell from the code that you're coming from Go. But the only thing
I found strange on a cursory glance was your error handling method of
returning a pointer that is NULL on success. I know this corresponds to
"return 0 on success", but, well I was thrown off by the "return NULL"
statements in functions indicating success. The usual pattern is to
return the error code, and pass in a pointer to the error struct that is
only filled if the return code is != 0.
This is very much a matter of taste, though, and if you prefer your
current style just keep it :)
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-21 21:46 ` Pete Heist
@ 2018-08-22 9:49 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-22 9:49 UTC (permalink / raw)
To: Pete Heist; +Cc: Jonathan Morton, Cake List
Pete Heist <pete@heistp.net> writes:
>> On Aug 21, 2018, at 11:17 PM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>>>
>>> Well that’s good timing for me as I’m wrapping up a small utility/eBPF
>>> to classify an arbitrary username to either MAC or IP. Here’s the work
>>> in progress, which is not done yet as flow fairness is still under
>>> construction, and I haven’t gotten my IPv6 support to pass the rather
>>> stubborn eBPF verifier: https://github.com/heistp/tc-users
>>> <https://github.com/heistp/tc-users <https://github.com/heistp/tc-users>>
>>
>> Did you see my classifier? Does subnet-to-flow mapping.
>> https://github.com/tohojo/tc-classifier
>> <https://github.com/tohojo/tc-classifier>
>
> Yes I did, that helped a lot with the eBPF code! I’ll consult it for
> LPM trie usage, which will have to be ifdef’d out though for pre-4.11
> kernels.
Cool! Just wanted to make sure you were reinventing wheels on purpose,
not from lack of a prior stack of wheels to pick from ;)
(I very much consider my classifier a quick solution for a specific use
case; can totally understand you want more features :))
> tc-users is similar really but I desired a few things (for FreeNet):
> - MAC and IPv6 support
Might be useful to add rudimentary encapsulation support as well? I.e.,
at least ip{4,6}-ip{4,6} and vlan tags? Only a few lines of BPF code and
makes it useful in a few more scenarios...
>>> - With the new major/minor ID distinction, I’d probably use major for
>>> the user and minor for the flow hash?
>>
>> Yes. See the latest commit in the tc-adv repo for a man page update
>> explaining it. You can also just set the major ID and let cake do the
>> flow hashing…
>
> Aha, that’s terribly convenient but also means I don’t really need to
> solve the hashing problem (rats), and will be ripping out some of what
> I started. :)
Yeah, I think keeping the flow hashing but being able to specify the
host mapping could turn out to be quite the killer feature, actually.
Unfortunately there are not enough bits in the class ID to specify both
source and destination host IDs, so triple-isolate is not going to work
too well with this. But then I figure that anyone who is deploying this
with their own filtering probably has enough control over the network to
deploy separate CAKE instances in each direction...
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-22 9:37 ` Toke Høiland-Jørgensen
@ 2018-08-22 9:51 ` Pete Heist
2018-08-22 10:00 ` Jonathan Morton
0 siblings, 1 reply; 21+ messages in thread
From: Pete Heist @ 2018-08-22 9:51 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Jonathan Morton, Cake List
[-- Attachment #1: Type: text/plain, Size: 2137 bytes --]
> On Aug 22, 2018, at 11:37 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
> Pete Heist <pete@heistp.net> writes:
>
>>> On Aug 22, 2018, at 8:17 AM, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>
>>> One difference between fq_codel and Cake is that the former - which
>>> has no shaper - will "bypass" packets when it's empty and there's no
>>> back-pressure filling it. In that case no packet classification
>>> occurs and filters will not be called. Or at least, that's how it
>>> used to be set up; I haven't looked at it recently. Cake does not
>>> rely on the same set of assumptions, so will always call the filter.
>>
>> Aha, that sounds likely, I’ll try with htb and a rate limit. Testing
>> with fq_codel was challenging as I had to “do stuff” until my printk’s
>> were eventually called, but it’s easier now that I can use cake. I
>> suppose in my case fq_codel’s behavior would be ok in production,
>> because if there’s no queue then there’s no need to classify. Maybe in
>> some other cases (like gathering stats), it could be problematic.
>
> fq_codel turns off the bypass capability if you attach a tc filter to
> it, though, so if the issue you're seeing is that you filter function is
> not being called, that sounds... strange...
>
> How do you check if the function is being called?
With printk and "sudo tc exec bpf dbg” to tail the output. For example, in act_main of this file I just added one more line to print “act_main” if TCU_DEBUG is defined:
https://github.com/heistp/tc-users/blob/master/tc-users-bpf.c <https://github.com/heistp/tc-users/blob/master/tc-users-bpf.c>
And here’s an example of verifier weirdness (not that you need to look into this)…
If IPV6_SUPPORT_V2 is defined but TCU_DEBUG is not, the verifier error is:
"math between pkt pointer and 4294901760 is not allowed"
If IPV6_SUPPORT_V2 is defined but TCU_DEBUG *is* (which just defines the prink func and adds two printk lines, and otherwise works without IPV6_SUPPORT_V2 defined), the verifier error is:
"R3 bitwise operator &= on pointer prohibited"
[-- Attachment #2: Type: text/html, Size: 3299 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-22 9:51 ` Pete Heist
@ 2018-08-22 10:00 ` Jonathan Morton
2018-08-22 10:13 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Morton @ 2018-08-22 10:00 UTC (permalink / raw)
To: Pete Heist; +Cc: Toke Høiland-Jørgensen, Cake List
> On 22 Aug, 2018, at 12:51 pm, Pete Heist <pete@heistp.net> wrote:
>
> "math between pkt pointer and 4294901760 is not allowed"
As a possible clue here, 4294901760 == (2^32) - (2^16).
I suspect both errors are being caused by the call to memcpy(). This potentially inlines a substantial amount of code which may not be eBPF-clean.
- Jonathan Morton
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-22 10:00 ` Jonathan Morton
@ 2018-08-22 10:13 ` Toke Høiland-Jørgensen
2018-08-22 10:32 ` Pete Heist
0 siblings, 1 reply; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-08-22 10:13 UTC (permalink / raw)
To: Jonathan Morton, Pete Heist; +Cc: Cake List
Jonathan Morton <chromatix99@gmail.com> writes:
>> On 22 Aug, 2018, at 12:51 pm, Pete Heist <pete@heistp.net> wrote:
>>
>> "math between pkt pointer and 4294901760 is not allowed"
>
> As a possible clue here, 4294901760 == (2^32) - (2^16).
>
> I suspect both errors are being caused by the call to memcpy(). This
> potentially inlines a substantial amount of code which may not be
> eBPF-clean.
Yeah, I think this is a good guess. You could try using
__builtin_memcpy() instead; that may do fewer pointer shenanigans than
the function-header-defined one...
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [Cake] issue with Cake and bpf filter
2018-08-22 10:13 ` Toke Høiland-Jørgensen
@ 2018-08-22 10:32 ` Pete Heist
0 siblings, 0 replies; 21+ messages in thread
From: Pete Heist @ 2018-08-22 10:32 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Jonathan Morton, Cake List
> On Aug 22, 2018, at 12:13 PM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
> Jonathan Morton <chromatix99@gmail.com> writes:
>
>>> On 22 Aug, 2018, at 12:51 pm, Pete Heist <pete@heistp.net> wrote:
>>>
>>> "math between pkt pointer and 4294901760 is not allowed"
>>
>> As a possible clue here, 4294901760 == (2^32) - (2^16).
>>
>> I suspect both errors are being caused by the call to memcpy(). This
>> potentially inlines a substantial amount of code which may not be
>> eBPF-clean.
>
> Yeah, I think this is a good guess. You could try using
> __builtin_memcpy() instead; that may do fewer pointer shenanigans than
> the function-header-defined one...
Thanks for those ideas. It sounded good to me too, but __builtin_memcpy does the same, as does replacing memcpy with a manually unrolled loop (committed as IPV6_SUPPORT_V3).
Backing up, I’m not even sure why I have to put the IPv6 address on the stack. For IPv4, I just have a pointer to a “struct iphdr” set at an offset from head, not on the stack at all, and it works fine to reference &ip4->daddr directly. For IPv6 (IPV6_SUPPORT_V1), it doesn’t. I have to look at that (2^32) - (2^16) value, seems key, and I might find some clues in the example Dave sent. Or understand the bytecode… :)
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2018-08-22 10:32 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-12 8:17 [Cake] issue with Cake and bpf filter Pete Heist
2018-08-12 10:23 ` Pete Heist
2018-08-12 13:41 ` Jonathan Morton
2018-08-12 19:42 ` Toke Høiland-Jørgensen
2018-08-12 21:34 ` Jonathan Morton
2018-08-13 0:21 ` Jonathan Morton
2018-08-13 5:44 ` Jonathan Morton
2018-08-13 11:01 ` Toke Høiland-Jørgensen
2018-08-21 11:25 ` Toke Høiland-Jørgensen
2018-08-21 21:06 ` Pete Heist
2018-08-21 21:17 ` Toke Høiland-Jørgensen
2018-08-21 21:46 ` Pete Heist
2018-08-22 9:49 ` Toke Høiland-Jørgensen
2018-08-22 9:41 ` Toke Høiland-Jørgensen
2018-08-22 6:17 ` Jonathan Morton
2018-08-22 9:30 ` Pete Heist
2018-08-22 9:37 ` Toke Høiland-Jørgensen
2018-08-22 9:51 ` Pete Heist
2018-08-22 10:00 ` Jonathan Morton
2018-08-22 10:13 ` Toke Høiland-Jørgensen
2018-08-22 10:32 ` Pete Heist
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox