From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id D4CD53B2A4 for ; Thu, 18 Nov 2021 09:35:48 -0500 (EST) Received: by mail-io1-xd36.google.com with SMTP id m9so8286234iop.0 for ; Thu, 18 Nov 2021 06:35:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=x/xCEF3zlnAe7Gwi3UPG41TO5pH+7kZsh60iXu6Hwi0=; b=FMCcCkCmsJfLLudmDNBWRbcY93Fa92KQESKfz+DDfSsJggk/LoM75seDBu8ohvfLFw ewh6qsJd4LpRiMY/lrS0LFDB8PZslCycyh9TcG+fNPitp/BGSQM3iK8GAU3SjRbqo/Sa aNDf8FNTuhxkUFuwbP/wBN84l1xzb7UeXjN65FpsIdf/E/AbnNot/tI04x3GX4TU2cLb sO2TtQ+XDra5bqplqbK1PwScSj5Kl5/n/uTTLcsR80wxx51EgqE9wsdai24YbAsp84my GlxCBd3fOx4UQeXIVGK0DAQ9e78TvqLGJXVPb/+XnkBV7tEfyELHzhnHe953OcUWKs1o TxXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=x/xCEF3zlnAe7Gwi3UPG41TO5pH+7kZsh60iXu6Hwi0=; b=ndurdQdjBClnHQ5m9we9bAn+wpzTFIX5Xo/1VxtBcVxXpL/ynu8Lk5DLVFCJTl2cJN Yj8p9Z8yAm7jPrSsAGyC4cQWiJ9HRLtZtxOInWvo20nmjeeDtOy3J9sXlGv4sAkjSbfd O/HMbKEMXYwzbE47tf2SdqdIEgCuR6bkozFG/Fq9pFeYlDbi1woHM26c96ts7qB80AtF E5SLbyO2tnvhCu01e2AOBD7PG4IQ0DOND+6vYKYD9HyJMPIxvvxwY+jPfx7HrWv7Dw+2 1FJDDN2Y2RaLq8ccvWTwL16QXHjPT2dNZqY4l8Sd+r1XsBteb0T6T5G6/+e7RRhdRuEe +ZCQ== X-Gm-Message-State: AOAM530p3W3I+uYJN+1fLd2hnFI5M3lxplEzq4Xv+BCdPDZWP45EFKNk NKwGmIS6gH1gAkqKGLknrFpLKUMZcbXZjM8BminLC4B4 X-Google-Smtp-Source: ABdhPJy3DVKqE8XodhPGaO8AHxwSSpmVC9zZhydOCOHt5QI7RGzqszLlHyWtptsJE8XCuT2WrwI5Bw5c05n03VNUU5g= X-Received: by 2002:a05:6602:2e81:: with SMTP id m1mr17225354iow.55.1637246148104; Thu, 18 Nov 2021 06:35:48 -0800 (PST) MIME-Version: 1.0 References: <20211118110139.7321-1-john.efstathiades@pebblebay.com> In-Reply-To: <20211118110139.7321-1-john.efstathiades@pebblebay.com> From: Dave Taht Date: Thu, 18 Nov 2021 06:35:35 -0800 Message-ID: To: John Efstathiades , Cake List Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] [PATCH net-next 0/6] lan78xx NAPI Performance Improvements X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Nov 2021 14:35:49 -0000 Dear John: I am quite impressed by these results. We have a standard test out of flent.org we tend to use called "rrul", that exercises both tx and rx at the same time, and also measures the latency in the stack. Could you give that a shot, also? On Thu, Nov 18, 2021 at 3:13 AM John Efstathiades wrote: > > This patch set introduces a set of changes to the lan78xx driver > that were originally developed as part of an investigation into > the performance of TCP and UDP transfers on an Android system. > The changes increase the throughput of both UDP and TCP transfers > and reduce the overall CPU load. > > These improvements are also seen on a standard Linux kernel. Typical > results are included at the end of this document. > > The changes to the driver evolved over time. The patches presented > here attempt to organise the changes in to coherent blocks that > affect logically connected parts of the driver. The patches do not > reflect the way in which the code evolved during the performance > investigation. > > Each patch produces a working driver that has an incremental > improvement but patches 2, 3 and 6 should be considered a single > update. > > The changes affect the following parts of the driver: > > 1. Deferred URB processing > > The deferred URB processing that was originally done by a tasklet > is now done by a NAPI polling routine. The NAPI cycle has a fixed > work budget that controls how many received frames are passed to > the network stack. > > Patch 6 introduces the NAPI polling but depends on preceding patches. > > The new NAPI polling routine is also responsible for submitting > Rx and Tx URBs to the USB host controller. > > Moving the URB processing to a NAPI-based system "smoothed" > incoming and outgoing data flows on the Android system under > investigation. However, taken in isolation, moving from a tasklet > approach to a NAPI approach made little or no difference to the > overall performance. > > 2. URB buffer management > > The driver creates a pool of Tx and a pool of Rx URB buffers. Each > buffer is large enough to accommodate a packet with the maximum MTU > data. URBs are allocated from these pools as required. > > Patch 2 introduces the new Tx buffer pool. > Patch 3 introduces the new Rx buffer pool. > > 3. Tx pending data > > SKBs containing data to be transmitted are added to a queue. The > driver tracks free Tx URBs and the corresponding free Tx URB space. > When new Tx URBs are submitted, pending data is copied into the > URB buffer until the URB buffer is filled or there is no more > pending data. This maximises utilisation the LAN78xx internal > USB and network frame buffers. > > New Tx URBs are submitted to the USB host controller as part of the > NAPI polling cycle. > > Patch 2 introduces these changes. > > 4. Rx URB completion > > A new URB is no longer submitted as part of the URB completion > callback. > New URBs are submitted during the NAPI polling cycle. > > Patch 3 introduces these changes. > > 5. Rx URB processing > > Completed URBs are put on to queue for processing (as is done in the > current driver). Network packets in completed URBs are copied from > the URB buffer in to dynamically allocated SKBs and passed to > the network stack. > > The emptied URBs are resubmitted to the USB host controller. > > Patch 3 introduces this change. Patch 6 updates the change to use > NAPI SKBs. > > Each packet passed to the network stack is a single NAPI work item. > If the NAPI work budget is exhausted the remaining packets in the > URB are put onto an overflow queue that is processed at the start > of the next NAPI cycle. > > Patch 6 introduces this change. > > 6. Driver-specific hard_header_len > > The driver-specific hard_header_len adjustment was removed as it > broke generic receive offload (GRO) processing. Moreover, it was no > longer required due the change in Tx pending data management (see > point 3. above). > > Patch 5 introduces this change. > > The modification has been tested on four different target machines: > > Target | CPU | ARCH | cores | kernel | RAM | > -----------------+------------+---------+-------+--------+-------| > Raspberry Pi 4B | Cortex-A72 | aarch64 | 4 | 64-bit | 2 GB | > Nitrogen8M SBC | Cortex-A53 | aarch64 | 4 | 64-bit | 2 GB | > Compaq Pressario | Pentium D | i686 | 2 | 32-bit | 4 GB | > Dell T3620 | Core i3 | x86_64 | 2+2 | 64-bit | 16 GB | > > The targets, apart from the Compaq, each have an on-chip USB3 host > controller. A PCIe-based USB3 host controller card was added to the > Compaq to provide the necessary USB3 host interface. > > The network throughput was measured using iperf3. The peer device was > a second Dell T3620 fitted with an Intel i210 network interface. The > target machine and the peer device were connected via a Netgear GS105 > gigabit switch. > > The CPU load was measured using mpstat running on the target machine. > > The tables below summarise the throughput and CPU load improvements > achieved by the updated driver. > > The bandwidth is the average bandwidth reported by iperf3 at the end > of a 60-second test. > > The percentage idle figure is the average idle reported across all > CPU cores on the target machine for the duration of the test. > > TCP Rx (target receiving, peer transmitting) > > | Standard Driver | NAPI Driver | > Target | Bandwidth | % Idle | Bandwidth | % Idle | > -----------------+-----------+--------+--------------------| > RPi4 Model B | 941 | 74.9 | 941 | 91.5 | > Nitrogen8M | 941 | 76.2 | 941 | 92.7 | > Compaq Pressario | 941 | 44.5 | 941 | 82.1 | > Dell T3620 | 941 | 88.9 | 941 | 98.3 | > > TCP Tx (target transmitting, peer receiving) > > | Standard Driver | NAPI Driver | > Target | Bandwidth | % Idle | Bandwidth | % Idle | > -----------------+-----------+--------+--------------------| > RPi4 Model B | 683 | 80.1 | 942 | 97.6 | > Nitrogen8M | 942 | 97.8 | 942 | 97.3 | > Compaq Pressario | 939 | 80.0 | 942 | 91.2 | > Dell T3620 | 942 | 95.3 | 942 | 97.6 | > > UDP Rx (target receiving, peer transmitting) > > | Standard Driver | NAPI Driver | > Target | Bandwidth | % Idle | Bandwidth | % Idle | > -----------------+-----------+--------+--------------------| > RPi4 Model B | - | - | 958 (0%) | 76.2 | > Nitrogen8M | 690 (25%) | 57.7 | 937 (0%) | 68.5 | > Compaq Pressario | 958 (0%) | 50.2 | 958 (0%) | 61.6 | > Dell T3620 | 958 (0%) | 89.6 | 958 (0%) | 85.3 | > > The figure in brackets is the percentage packet loss. > > UDP Tx (target transmitting, peer receiving) > > | Standard Driver | NAPI Driver | > Target | Bandwidth | % Idle | Bandwidth | % Idle | > -----------------+-----------+--------+--------------------| > RPi4 Model B | 370 | 75.0 | 886 | 78.9 | > Nitrogen8M | 710 | 75.0 | 958 | 85.3 | > Compaq Pressario | 958 | 65.5 | 958 | 76.6 | > Dell T3620 | 958 | 97.0 | 958 | 97.3 | > > > John Efstathiades (6): > lan78xx: Fix memory allocation bug > lan78xx: Introduce Tx URB processing improvements > lan78xx: Introduce Rx URB processing improvements > lan78xx: Re-order rx_submit() to remove forward declaration > lan78xx: Remove hardware-specific header update > lan78xx: Introduce NAPI polling support > > drivers/net/usb/lan78xx.c | 1211 +++++++++++++++++++++++-------------- > 1 file changed, 769 insertions(+), 442 deletions(-) > > -- > 2.25.1 > --=20 I tried to build a better future, a few times: https://wayforward.archive.org/?site=3Dhttps%3A%2F%2Fwww.icei.org Dave T=C3=A4ht CEO, TekLibre, LLC