> On Jun 18, 2018, at 9:44 PM, Dave Taht wrote: > > This is still without batch releases, yes? Yes, I should've tried that earlier, but I’m scratching my head now as to how it works. Perhaps it’s because the old example I’m using for the non-GSO case uses deprecated functions and I ought to just ditch it, but I thought if in my callback I just switched: return nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL); to return nfq_set_verdict_batch(qh, id + 8, NF_ACCEPT); that my callback might not be called for the subsequent 8 packets I’ve accepted, however it continues to be called for each id sequentially anyway and throughput is no better. If I change 8 to something unreasonable, like 1000000, throughput is cut in half, so it’s doing “something”. There are functions in the newer GSO example like nfq_nlmsg_verdict_put, but I don’t see a batch version of that. So, I’m likely missing something… BTW I don’t see a change setting SO_BUSY_POLL on nfq’s fd (tried 1000 - 1000000 usec). > In any case, the now achieved rates and latencies seem sufficient to > try and adapt these methods to emulating wifi/lte etc better! We only > need to get to a gbit. Indeed, it’s there. :) > Obviously doing more expensive userspace > processing is going to hurt, and, well, for the sake of argument > emulating a 32 station wifi 802.11n network would be proof of the > pudding, but I'd settle for even the simplest case of one ap and two > stations > actually rendering sane-looking behavior. > Originally, when thinking about this, I'd thought we'd use one veth > per station and toss packets to userspace based on one nfqueue per > input/output interface. I still lean that way (do we get multicast mac > addrs on packets this way?), but perhaps a single interface could be > used and we could > sort out the src/dst ips and batching in userspace, starting with > fifos to represent current behavior and gradually working our way back > up to the fq_codel on wifi emulation. Or, with one veth per station, > still use a fq_codel qdisc, but I don't see how we can create > backpressure for that actually to engage. > > Better to be reordering the verdict on packets in the batch for an > fq_codel emulation. I think. Is it worth measuring the aggregate throughput of 32 iperf3 client veth devices to one server device? Worth trying to get the newer code into Go? I may have to start over without the wrapper and just write something simpler with newer code.