From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x235.google.com (mail-oi0-x235.google.com [IPv6:2607:f8b0:4003:c06::235]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 797EC21F826 for ; Fri, 26 Jun 2015 11:12:48 -0700 (PDT) Received: by oigb199 with SMTP id b199so81136691oig.3 for ; Fri, 26 Jun 2015 11:12:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=FI3VlfxtB2fojNA4SKTwTmVJuvW14v2WHrYkch8TBNQ=; b=Pq0sCRABmQJ38O3KsAJ34/l331K9EFnpIsK14gMXEoS3hRaKKuiaus3cEB/wj9ERSi sNTd9bSYMDMBLmPx7lC9D8JzAGn0FUZIZcY3JTy9hSX+YThJv2k+tdBd81LOjCyaRt7v 01hWpERMSKnZsgdeEusz3YPWdQlQ3le6Op490X+jqyBHB5aax1wxdt9+tCgUv/kkSUou 93Qqu/i2llTd5cMt9kR99/ij8rwED7NiPcAfp3P+eS/Fl+n+6HzhBblmZrX/QpNjXzaa 7XDp0aCwZ0mVzKPi2YnAvN5o2lw2aFTLcDLUpKVpv5xT0eHbYjer4fSd2iz4xzbI1Ec1 I8Yg== MIME-Version: 1.0 X-Received: by 10.202.227.15 with SMTP id a15mr2520263oih.59.1435342368235; Fri, 26 Jun 2015 11:12:48 -0700 (PDT) Received: by 10.202.105.129 with HTTP; Fri, 26 Jun 2015 11:12:48 -0700 (PDT) Date: Fri, 26 Jun 2015 11:12:48 -0700 Message-ID: From: Dave Taht To: Jonathan Morton Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Jesper Dangaard Brouer , "cerowrt-devel@lists.bufferbloat.net" Subject: [Cerowrt-devel] lacking in BQL in the mvneta, what is the max latency? X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Jun 2015 18:13:17 -0000 Poking harder at the drivers/net/ethernet/marvel/mvneta.c: (am I looking at the right driver for the linksys ac1200? mikael? what does lspci and/or dmesg say for both this and the wifi on this platform?) 1) this thing does not actually need a tx ring buffer structure, it could fair queue all the way down to the hardware. /* Update HW with number of TX descriptors to be sent */ static void mvneta_txq_pend_desc_add(struct mvneta_port *pp, struct mvneta_tx_queue *txq, int pend_desc) { u32 val; /* Only 255 descriptors can be added at once ; Assume caller * process TX desriptors in quanta less than 256 */ val =3D pend_desc; mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val); } 2) And it doesnt look like there are ipv6 checksum offloads... 3) and, sigh, on driver depth: /* Max number of allowed TCP segments for software TSO */ #define MVNETA_MAX_TSO_SEGS 100 // 100!!!!???? #define MVNETA_MAX_SKB_DESCS (MVNETA_MAX_TSO_SEGS * 2 + MAX_SKB_FRAGS) later on we get some moderation of this using txq->tx_stop_threshold =3D txq->size - MVNETA_MAX_SKB_DESCS; // 532 - 200 + 16 which =3D 350 packets outstanding in the driver ring... times 8 rings =3D 2800 possible packets living in the tx rings with enough flows. (not clear to me if the tso/gso stuff is split into a tx op each, but it looks like it) That=C2=B4s a worst case latency in the driver of 36ms at a gigabit. (you= =C2=B4d have to have a lot of different flows to exercise all the queues, though. So, for example, rrul is not enough to stress it out. 4 rruls, maybe. Or the rrul_50up or down test, would be simpler). And of course, if you run the device at 100mbit, 360ms, 10mbit 3.6 sec... that, coupled with: txq->tx_wake_threshold =3D txq->tx_stop_threshold / 2; gets us our ~17ms observed latency under load on this hardware at these spe= eds. *Houston, we have found our tx latency!*. 4) Having gone this deep... basic BQL support looks straighforward on the xmit side, but we'd have to walk the sent descriptors to get the sum of bytes sent (not a huge problem), its not clear if all the error out conditions are clean, either. I have no way to compile, nor test on this platform at the moment. And BQL=C2=B4s behavior is additive and MIAD, which are features I am deeply uncomfortable with hardware multiqueue. Still, there is room for vast improvement here. If the wifi driver is fixible, I would vote for selecting this platform as a base for future cerowrt development.