[Make-wifi-fast] emulating wifi better - coupling qdiscs in netem?
Jesper Dangaard Brouer
brouer at redhat.com
Sun Jun 17 07:19:21 EDT 2018
Hi Pete,
Happened to be at the Netfilter Workshop, and discussed nfqueue with
Florian and Marek, and I saw this attempt to use nfqueue, and Florian
points out that you are not using the GRO facility of nfqueue.
I'll quote what Florian said below:
On Sun, 17 Jun 2018 12:45:52 +0200 Florian Westphal <fw at strlen.de> wrote:
> The linked example code is old and does not set
> mnl_attr_put_u32(nlh, NFQA_CFG_FLAGS, htonl(NFQA_CFG_F_GSO));
>
> When requesting the queue.
>
> This means kernel has to do software segmentation of GSO skbs.
>
> Consider using
> https://git.netfilter.org/libnetfilter_queue/tree/examples/nf-queue.c
>
> instead if you need a template, it does this correctly.
--Jesper
On Sun, 17 Jun 2018 00:53:03 +0200 Pete Heist <pete at heistp.net> wrote:
> > On Jun 16, 2018, at 12:30 AM, Dave Taht <dave.taht at gmail.com> wrote:
> >
> > Eric just suggested using the iptables NFQUEUE ability to toss
> > packets to userspace.
> >
> > https://home.regit.org/netfilter-en/using-nfqueue-and-libnetfilter_queue/ <https://home.regit.org/netfilter-en/using-nfqueue-and-libnetfilter_queue/>
> > For wifi, at least, timings are not hugely critical, a few hundred
> > usec is something userspace can handle reasonably accurately. I like
> > very much being able to separate out mcast and treat that correctly in
> > userspace, also. I did want to be below 10usec (wifi "bus"
> > arbitration), which I am dubious about....
> >
> > Now as for an implementation language? C++ C? Go? Python? The
> > condition of the wrapper library for go leaves a bit to be desired
> > ( https://github.com/chifflier/nfqueue-go <https://github.com/chifflier/nfqueue-go> ) and given a choice I'd
> > MUCH rather use a go than a C.
>
> This sounds cool... So for fun, I compared ping and iperf3 with no-op nfqueue callbacks in both C and Go. As for the hardware setup, I used two lxc containers (effectively just veth) on an APU2.
>
> For the Go program, I used test_nfqueue from the wrapper above (which yes, does need some work) and removed debugging / logging.
>
> For the C program I used this:
> https://github.com/irontec/netfilter-nfqueue-samples/blob/master/sample-helloworld.c
> I removed any per-packet printf calls and compiled with "gcc sample-helloworld.c -o nfq -lnfnetlink -lnetfilter_queue”.
>
> Ping results:
>
> ping without nfqueue:
> root at lsrv:~# iptables -F OUTPUT
> root at lsrv:~# ping -c 500 -i 0.01 -q 10.182.122.11
> 500 packets transmitted, 500 received, 0% packet loss, time 7985ms
> rtt min/avg/max/mdev = 0.056/0.058/0.185/0.011 ms
>
> ping with no-op nfqueue callback in C:
> root at lsrv:~# iptables -A OUTPUT -d 10.182.122.11/32 -j NFQUEUE --queue-num 0
> root at lsrv:~/nfqueue# ping -c 500 -i 0.01 -q 10.182.122.11
> 500 packets transmitted, 500 received, 0% packet loss, time 7981ms
> rtt min/avg/max/mdev = 0.117/0.123/0.384/0.020 ms
>
> ping with no-op nfqueue callback in Go:
> root at lsrv:~# iptables -A OUTPUT -d 10.182.122.11/32 -j NFQUEUE --queue-num 0
> root at lsrv:~# ping -c 500 -i 0.01 -q 10.182.122.11
> 500 packets transmitted, 500 received, 0% packet loss, time 7982ms
> rtt min/avg/max/mdev = 0.095/0.172/0.532/0.042 ms
>
> The mean induced latency of 65us for C or 114us for Go might be within your parameters, except you mentioned 10us for WiFi bus arbitration, which does indeed look impossible with this setup, even in C.
>
> Iperf3 results:
>
> iperf3 without nfqueue:
> root at lsrv:~# iptables -F OUTPUT
> root at lsrv:~# iperf3 -t 5 -c 10.182.122.11
> Connecting to host 10.182.122.11, port 5201
> [ 4] local 10.182.122.1 port 55810 connected to 10.182.122.11 port 5201
> [ ID] Interval Transfer Bandwidth Retr Cwnd
> [ 4] 0.00-1.00 sec 452 MBytes 3.79 Gbits/sec 0 178 KBytes
> [ 4] 1.00-2.00 sec 454 MBytes 3.82 Gbits/sec 0 320 KBytes
> [ 4] 2.00-3.00 sec 450 MBytes 3.77 Gbits/sec 0 320 KBytes
> [ 4] 3.00-4.00 sec 451 MBytes 3.79 Gbits/sec 0 352 KBytes
> [ 4] 4.00-5.00 sec 451 MBytes 3.79 Gbits/sec 0 352 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-5.00 sec 2.21 GBytes 3.79 Gbits/sec 0 sender
> [ 4] 0.00-5.00 sec 2.21 GBytes 3.79 Gbits/sec receiver
> iperf Done.
>
> iperf3 with no-op nfqueue callback in C:
> root at lsrv:~# iptables -A OUTPUT -d 10.182.122.11/32 -j NFQUEUE --queue-num 0
> root at lsrv:~/nfqueue# iperf3 -t 5 -c 10.182.122.11
> Connecting to host 10.182.122.11, port 5201
> [ 4] local 10.182.122.1 port 55868 connected to 10.182.122.11 port 5201
> [ ID] Interval Transfer Bandwidth Retr Cwnd
> [ 4] 0.00-1.00 sec 17.4 MBytes 146 Mbits/sec 0 107 KBytes
> [ 4] 1.00-2.00 sec 16.9 MBytes 142 Mbits/sec 0 107 KBytes
> [ 4] 2.00-3.00 sec 17.0 MBytes 142 Mbits/sec 0 107 KBytes
> [ 4] 3.00-4.00 sec 17.0 MBytes 142 Mbits/sec 0 107 KBytes
> [ 4] 4.00-5.00 sec 17.0 MBytes 143 Mbits/sec 0 115 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-5.00 sec 85.3 MBytes 143 Mbits/sec 0 sender
> [ 4] 0.00-5.00 sec 84.7 MBytes 142 Mbits/sec receiver
>
> iperf3 with no-op nfqueue callback in Go:
> root at lsrv:~# iptables -A OUTPUT -d 10.182.122.11/32 -j NFQUEUE --queue-num 0
> root at lsrv:~# iperf3 -t 5 -c 10.182.122.11
> Connecting to host 10.182.122.11, port 5201
> [ 4] local 10.182.122.1 port 55864 connected to 10.182.122.11 port 5201
> [ ID] Interval Transfer Bandwidth Retr Cwnd
> [ 4] 0.00-1.00 sec 14.6 MBytes 122 Mbits/sec 0 96.2 KBytes
> [ 4] 1.00-2.00 sec 14.1 MBytes 118 Mbits/sec 0 96.2 KBytes
> [ 4] 2.00-3.00 sec 14.0 MBytes 118 Mbits/sec 0 102 KBytes
> [ 4] 3.00-4.00 sec 14.0 MBytes 117 Mbits/sec 0 102 KBytes
> [ 4] 4.00-5.00 sec 13.7 MBytes 115 Mbits/sec 0 107 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-5.00 sec 70.5 MBytes 118 Mbits/sec 0 sender
> [ 4] 0.00-5.00 sec 69.9 MBytes 117 Mbits/sec receiver
> iperf Done.
>
> So rats, throughput gets brutalized for both C and Go. For Go, a rate of 117 Mbit with a 1500 byte MTU is 9750 packets/sec, which is 103us / packet. Mean induced latency measured by ping is 114us, which is not far off 103us, so the rate slowdown looks to be mostly caused by the per-packet nfqueue calls. The core running test_nfqueue is pinned at 100% during the test. "nice -n -20" does nothing.
>
> Presumably you’ll sometimes be releasing more than one packet at a time(?) so I guess whether or not this is workable depends on how many you release at once, what hardware you’re on and what rates you need to test at. But when you’re trying to test a qdisc, I guess you’d want to minimize the burden you add to the CPU, or else move it to a core the qdisc isn’t running on, or something, so the qdisc itself isn’t affected by the test rig.
>
> > There is of course a hideous amount of complexity moved to the daemon,
>
> I can only imagine.
>
> > as a pure fifo ap queue forms aggregregates much differently
> > than a fq_codeled one. But, yea! userspace....
>
> This would be awesome if it works out! After that iperf3 test though, I think I may have smashed my dreams of writing a libnetfilter_queue userspace qdisc in Go, or C for that matter.
>
> If this does somehow turn out to be good enough performance-wise, I think you’d have a lot more fun and spend a lot less time on it in Go than C, but that’s just an opinion... :)
>
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
More information about the Make-wifi-fast
mailing list