From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-x230.google.com (mail-wm0-x230.google.com [IPv6:2a00:1450:400c:c09::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 1D5CA3B2A4 for ; Mon, 18 Jun 2018 17:54:39 -0400 (EDT) Received: by mail-wm0-x230.google.com with SMTP id p11-v6so18294991wmc.4 for ; Mon, 18 Jun 2018 14:54:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=heistp.net; s=google; h=mime-version:subject:from:in-reply-to:date:cc:message-id:references :to; bh=BVrGb0yuTEJQ33he5jb0JUwtDHEUwtHIZ02T6Tpw+aI=; b=Ew2rsLxF/bQ+3nGBi/tXfs/GwydjSu5Fi1SehkM/+V05+Oiv2d9j1H73+3IlAXeH6f iy/VwGftyVljWwcgOf48RUZ/XbuzuARVwKmS5P+EuJDSDsstfvTEPsmxq9tHApsussnh hThzChCaXauegV3gKyVbZr/cZtteNaINUvaG6xwtAKzJr44d855h/lx2PHbMbxvD6jT+ lmXTTeYPj/KRSR83cLBZGwrfutvXyHXqSDBnsuAdj0Op1kJnzHpCoxkwihonQb8W/9Qz 5Mn0JpPXgfi60FNKFRR6Zz6QL/igtLoOc4TtLBvUQXglGGP0MljRuIOpVaeNmgiZBeXc E1Cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :message-id:references:to; bh=BVrGb0yuTEJQ33he5jb0JUwtDHEUwtHIZ02T6Tpw+aI=; b=khch5/fICaoHMno5nfL8Xkz7ZxJ6hNGu71Dn1sb+wwExLE1S7H/ZzRyZWOebtbO3A0 pzGBm0atq7ugk1PmPS+OPaByBp5L7+KZv0LJMxROlCSNFU4m15f65A+FmKhQR7EA+2kz i+wtbZJRa4Lvh6WLvxH3CFSthAaYoadJDLXPC1+zF72Wu+O5EJWrPTHUcDeRapY7A8XG n6R/3PF6r3ikbPTQXJte0fBfR2MPBDU47HCxXD8M6um7+/ZzaHThknMlETLpyC4gLSi2 Ylx3xES4GEBYoHFGSEzUQ2o8SCJka3I3cNN0fnrplRdpDB+qLDCK9NWCPRG/sr4ac0lI CNHg== X-Gm-Message-State: APt69E3T1TkyN2vAvUM1wPM+BNudWznWyXs38K+Zg6PVv8Iehl++gl61 93XlUnhTYCRBAIG6fwbdQfIFHw== X-Google-Smtp-Source: ADUXVKIM0OI3DUVBlYiyrCuBHyDCpqkITTaLncmrmvKST4iGVbs1M9u/8sGnRx3NrVcwqaWKDy6bVw== X-Received: by 2002:a1c:100f:: with SMTP id 15-v6mr3123219wmq.150.1529358878175; Mon, 18 Jun 2018 14:54:38 -0700 (PDT) Received: from tron.luk.heistp.net (h-1169.lbcfree.net. [185.193.85.130]) by smtp.gmail.com with ESMTPSA id n12-v6sm13919623wrp.69.2018.06.18.14.54.37 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 18 Jun 2018 14:54:37 -0700 (PDT) Content-Type: multipart/alternative; boundary="Apple-Mail=_24C073C7-1484-46D5-BC2C-B1F741CB6FB9" Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) From: Pete Heist In-Reply-To: Date: Mon, 18 Jun 2018 23:54:36 +0200 Cc: Make-Wifi-fast Message-Id: <96697B23-46AB-4BA7-8B7E-2A66C1E67911@heistp.net> References: <1527721073.171416827@apps.rackspace.com> <150ABF21-FAFC-48E2-9E55-CAA609EAE449@heistp.net> <20180617131921.09bf5353@redhat.com> <5CC11C44-6C78-410D-B699-B4B1A6F5FBDD@heistp.net> <8f5915f8-7dad-f881-b0c8-f6b03165c675@gmail.com> To: Dave Taht X-Mailer: Apple Mail (2.3124) Subject: Re: [Make-wifi-fast] emulating wifi better - coupling qdiscs in netem? X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jun 2018 21:54:39 -0000 --Apple-Mail=_24C073C7-1484-46D5-BC2C-B1F741CB6FB9 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jun 18, 2018, at 9:44 PM, Dave Taht wrote: >=20 > This is still without batch releases, yes? Yes, I should've tried that earlier, but I=E2=80=99m scratching my head = now as to how it works. Perhaps it=E2=80=99s because the old example = I=E2=80=99m using for the non-GSO case uses deprecated functions and I = ought to just ditch it, but I thought if in my callback I just switched: return nfq_set_verdict(qh, id, NF_ACCEPT, 0, NULL); to return nfq_set_verdict_batch(qh, id + 8, NF_ACCEPT); that my callback might not be called for the subsequent 8 packets I=E2=80=99= ve accepted, however it continues to be called for each id sequentially = anyway and throughput is no better. If I change 8 to something = unreasonable, like 1000000, throughput is cut in half, so it=E2=80=99s = doing =E2=80=9Csomething=E2=80=9D. There are functions in the newer GSO example like nfq_nlmsg_verdict_put, = but I don=E2=80=99t see a batch version of that. So, I=E2=80=99m likely = missing something=E2=80=A6 BTW I don=E2=80=99t see a change setting SO_BUSY_POLL on nfq=E2=80=99s = fd (tried 1000 - 1000000 usec). > In any case, the now achieved rates and latencies seem sufficient to > try and adapt these methods to emulating wifi/lte etc better! We only > need to get to a gbit. Indeed, it=E2=80=99s there. :) > Obviously doing more expensive userspace > processing is going to hurt, and, well, for the sake of argument > emulating a 32 station wifi 802.11n network would be proof of the > pudding, but I'd settle for even the simplest case of one ap and two > stations > actually rendering sane-looking behavior. > Originally, when thinking about this, I'd thought we'd use one veth > per station and toss packets to userspace based on one nfqueue per > input/output interface. I still lean that way (do we get multicast mac > addrs on packets this way?), but perhaps a single interface could be > used and we could > sort out the src/dst ips and batching in userspace, starting with > fifos to represent current behavior and gradually working our way back > up to the fq_codel on wifi emulation. Or, with one veth per station, > still use a fq_codel qdisc, but I don't see how we can create > backpressure for that actually to engage. >=20 > Better to be reordering the verdict on packets in the batch for an > fq_codel emulation. I think. Is it worth measuring the aggregate throughput of 32 iperf3 client veth = devices to one server device? Worth trying to get the newer code into Go? I may have to start over = without the wrapper and just write something simpler with newer code. --Apple-Mail=_24C073C7-1484-46D5-BC2C-B1F741CB6FB9 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On Jun 18, 2018, at 9:44 PM, Dave Taht <dave.taht@gmail.com>= wrote:

This is still without batch releases, = yes?

Yes, I = should've tried that earlier, but I=E2=80=99m scratching my head now as = to how it works. Perhaps it=E2=80=99s because the old example I=E2=80=99m = using for the non-GSO case uses deprecated functions and I ought to just = ditch it, but I thought if in my callback I just switched:

return nfq_set_verdict(qh, id, NF_ACCEPT, 0, = NULL);

to

return nfq_set_verdict_batch(qh, id + 8, = NF_ACCEPT);

that my callback might = not be called for the subsequent 8 packets I=E2=80=99ve accepted, = however it continues to be called for each id sequentially anyway and = throughput is no better. If I change 8 to something unreasonable, like = 1000000, throughput is cut in half, so it=E2=80=99s doing = =E2=80=9Csomething=E2=80=9D.

There = are functions in the newer GSO example like nfq_nlmsg_verdict_put, = but I don=E2=80=99t see a batch version of that. So, I=E2=80=99m likely = missing something=E2=80=A6

BTW I = don=E2=80=99t see a change setting SO_BUSY_POLL on nfq=E2=80=99s fd = (tried 1000 - 1000000 usec).

In any case, the now = achieved rates and latencies seem sufficient to
try and adapt these = methods to emulating wifi/lte etc better! We only
need to get to a = gbit.

Indeed, = it=E2=80=99s there. :)

Obviously doing more = expensive userspace
processing is going to hurt, and, well, = for the sake of argument
emulating a 32 station wifi 802.11n = network would be proof of the
pudding, but I'd settle for even the = simplest case of one ap and two
stations
actually rendering sane-looking = behavior.
Originally, when thinking = about this, I'd thought we'd use one veth
per station and toss packets to userspace = based on one nfqueue per
input/output interface. I still lean that = way (do we get multicast mac
addrs on packets this way?), but perhaps = a single interface could be
used and we could
sort out the = src/dst ips and batching in userspace, starting with
fifos to represent = current behavior and gradually working our way back
up to the fq_codel = on wifi emulation. Or, with one veth per station,
still use a = fq_codel qdisc, but I don't see how we can create
backpressure for = that actually to engage.

Better to be reordering the verdict on packets = in the batch for an
fq_codel emulation. I think.

Is it worth measuring the aggregate = throughput of 32 iperf3 client veth devices to one server = device?

Worth trying to get the newer code = into Go? I may have to start over without the wrapper and just write = something simpler with newer code.

= --Apple-Mail=_24C073C7-1484-46D5-BC2C-B1F741CB6FB9--