[Make-wifi-fast] emulating wifi better - coupling qdiscs in netem?

Pete Heist pete at heistp.net
Sat Jun 16 18:53:03 EDT 2018


> On Jun 16, 2018, at 12:30 AM, Dave Taht <dave.taht at gmail.com> wrote:
> 
> Eric just suggested using the iptables NFQUEUE ability to toss
> packets to userspace.
> 
> https://home.regit.org/netfilter-en/using-nfqueue-and-libnetfilter_queue/ <https://home.regit.org/netfilter-en/using-nfqueue-and-libnetfilter_queue/>
> For wifi, at least, timings are not hugely critical, a few hundred
> usec is something userspace can handle reasonably accurately. I like
> very much being able to separate out mcast and treat that correctly in
> userspace, also. I did want to be below 10usec (wifi "bus"
> arbitration), which I am dubious about....
> 
> Now as for an implementation language? C++ C? Go? Python? The
> condition of the wrapper library for go leaves a bit to be desired
> ( https://github.com/chifflier/nfqueue-go <https://github.com/chifflier/nfqueue-go> ) and given a choice I'd
> MUCH rather use a go than a C.

This sounds cool... So for fun, I compared ping and iperf3 with no-op nfqueue callbacks in both C and Go. As for the hardware setup, I used two lxc containers (effectively just veth) on an APU2.

For the Go program, I used test_nfqueue from the wrapper above (which yes, does need some work) and removed debugging / logging.

For the C program I used this:
https://github.com/irontec/netfilter-nfqueue-samples/blob/master/sample-helloworld.c
I removed any per-packet printf calls and compiled with "gcc sample-helloworld.c -o nfq -lnfnetlink -lnetfilter_queue”.

Ping results:

ping without nfqueue:
root at lsrv:~# iptables -F OUTPUT
root at lsrv:~# ping -c 500 -i 0.01 -q 10.182.122.11
500 packets transmitted, 500 received, 0% packet loss, time 7985ms
rtt min/avg/max/mdev = 0.056/0.058/0.185/0.011 ms

ping with no-op nfqueue callback in C:
root at lsrv:~# iptables -A OUTPUT -d 10.182.122.11/32 -j NFQUEUE --queue-num 0
root at lsrv:~/nfqueue# ping -c 500 -i 0.01 -q 10.182.122.11
500 packets transmitted, 500 received, 0% packet loss, time 7981ms
rtt min/avg/max/mdev = 0.117/0.123/0.384/0.020 ms

ping with no-op nfqueue callback in Go:
root at lsrv:~# iptables -A OUTPUT -d 10.182.122.11/32 -j NFQUEUE --queue-num 0
root at lsrv:~# ping -c 500 -i 0.01 -q 10.182.122.11
500 packets transmitted, 500 received, 0% packet loss, time 7982ms
rtt min/avg/max/mdev = 0.095/0.172/0.532/0.042 ms

The mean induced latency of 65us for C or 114us for Go might be within your parameters, except you mentioned 10us for WiFi bus arbitration, which does indeed look impossible with this setup, even in C.

Iperf3 results:

iperf3 without nfqueue:
root at lsrv:~# iptables -F OUTPUT
root at lsrv:~# iperf3 -t 5 -c 10.182.122.11
Connecting to host 10.182.122.11, port 5201
[  4] local 10.182.122.1 port 55810 connected to 10.182.122.11 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   452 MBytes  3.79 Gbits/sec    0    178 KBytes       
[  4]   1.00-2.00   sec   454 MBytes  3.82 Gbits/sec    0    320 KBytes       
[  4]   2.00-3.00   sec   450 MBytes  3.77 Gbits/sec    0    320 KBytes       
[  4]   3.00-4.00   sec   451 MBytes  3.79 Gbits/sec    0    352 KBytes       
[  4]   4.00-5.00   sec   451 MBytes  3.79 Gbits/sec    0    352 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-5.00   sec  2.21 GBytes  3.79 Gbits/sec    0             sender
[  4]   0.00-5.00   sec  2.21 GBytes  3.79 Gbits/sec                  receiver
iperf Done.

iperf3 with no-op nfqueue callback in C:
root at lsrv:~# iptables -A OUTPUT -d 10.182.122.11/32 -j NFQUEUE --queue-num 0
root at lsrv:~/nfqueue# iperf3 -t 5 -c 10.182.122.11
Connecting to host 10.182.122.11, port 5201
[  4] local 10.182.122.1 port 55868 connected to 10.182.122.11 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  17.4 MBytes   146 Mbits/sec    0    107 KBytes       
[  4]   1.00-2.00   sec  16.9 MBytes   142 Mbits/sec    0    107 KBytes       
[  4]   2.00-3.00   sec  17.0 MBytes   142 Mbits/sec    0    107 KBytes       
[  4]   3.00-4.00   sec  17.0 MBytes   142 Mbits/sec    0    107 KBytes       
[  4]   4.00-5.00   sec  17.0 MBytes   143 Mbits/sec    0    115 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-5.00   sec  85.3 MBytes   143 Mbits/sec    0             sender
[  4]   0.00-5.00   sec  84.7 MBytes   142 Mbits/sec                  receiver

iperf3 with no-op nfqueue callback in Go:
root at lsrv:~# iptables -A OUTPUT -d 10.182.122.11/32 -j NFQUEUE --queue-num 0
root at lsrv:~# iperf3 -t 5 -c 10.182.122.11
Connecting to host 10.182.122.11, port 5201
[  4] local 10.182.122.1 port 55864 connected to 10.182.122.11 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  14.6 MBytes   122 Mbits/sec    0   96.2 KBytes       
[  4]   1.00-2.00   sec  14.1 MBytes   118 Mbits/sec    0   96.2 KBytes       
[  4]   2.00-3.00   sec  14.0 MBytes   118 Mbits/sec    0    102 KBytes       
[  4]   3.00-4.00   sec  14.0 MBytes   117 Mbits/sec    0    102 KBytes       
[  4]   4.00-5.00   sec  13.7 MBytes   115 Mbits/sec    0    107 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-5.00   sec  70.5 MBytes   118 Mbits/sec    0             sender
[  4]   0.00-5.00   sec  69.9 MBytes   117 Mbits/sec                  receiver
iperf Done.

So rats, throughput gets brutalized for both C and Go. For Go, a rate of 117 Mbit with a 1500 byte MTU is 9750 packets/sec, which is 103us / packet. Mean induced latency measured by ping is 114us, which is not far off 103us, so the rate slowdown looks to be mostly caused by the per-packet nfqueue calls. The core running test_nfqueue is pinned at 100% during the test. "nice -n -20" does nothing.

Presumably you’ll sometimes be releasing more than one packet at a time(?) so I guess whether or not this is workable depends on how many you release at once, what hardware you’re on and what rates you need to test at. But when you’re trying to test a qdisc, I guess you’d want to minimize the burden you add to the CPU, or else move it to a core the qdisc isn’t running on, or something, so the qdisc itself isn’t affected by the test rig.

> There is of course a hideous amount of complexity moved to the daemon,

I can only imagine.

> as a pure fifo ap queue forms aggregregates much differently
> than a fq_codeled one. But, yea! userspace....

This would be awesome if it works out! After that iperf3 test though, I think I may have smashed my dreams of writing a libnetfilter_queue userspace qdisc in Go, or C for that matter.

If this does somehow turn out to be good enough performance-wise, I think you’d have a lot more fun and spend a lot less time on it in Go than C, but that’s just an opinion... :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/make-wifi-fast/attachments/20180617/89ebe057/attachment-0001.html>


More information about the Make-wifi-fast mailing list