[Cerowrt-devel] lacking in BQL in the mvneta, what is the max latency?
Dave Taht
dave.taht at gmail.com
Fri Jun 26 11:12:48 PDT 2015
Poking harder at the drivers/net/ethernet/marvel/mvneta.c:
(am I looking at the right driver for the linksys ac1200? mikael? what
does lspci and/or dmesg say for both this and the wifi on this
platform?)
1) this thing does not actually need a tx ring buffer structure, it
could fair queue all the way down to the hardware.
/* Update HW with number of TX descriptors to be sent */
static void mvneta_txq_pend_desc_add(struct mvneta_port *pp,
struct mvneta_tx_queue *txq,
int pend_desc)
{
u32 val;
/* Only 255 descriptors can be added at once ; Assume caller
* process TX desriptors in quanta less than 256
*/
val = pend_desc;
mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
}
2) And it doesnt look like there are ipv6 checksum offloads...
3) and, sigh, on driver depth:
/* Max number of allowed TCP segments for software TSO */
#define MVNETA_MAX_TSO_SEGS 100 // 100!!!!????
#define MVNETA_MAX_SKB_DESCS (MVNETA_MAX_TSO_SEGS * 2 + MAX_SKB_FRAGS)
later on we get some moderation of this using
txq->tx_stop_threshold = txq->size - MVNETA_MAX_SKB_DESCS; //
532 - 200 + 16
which = 350 packets outstanding
in the driver ring...
times 8 rings = 2800
possible packets living in the tx rings with enough flows. (not clear
to me if the tso/gso stuff is split into a tx op each, but it looks
like it)
That´s a worst case latency in the driver of 36ms at a gigabit. (you´d
have to have a lot of different flows to exercise all the queues,
though. So, for example, rrul is not enough to stress it out. 4 rruls,
maybe. Or the rrul_50up or down test, would be simpler). And of
course, if you run the device at 100mbit, 360ms, 10mbit 3.6 sec...
that, coupled with:
txq->tx_wake_threshold = txq->tx_stop_threshold / 2;
gets us our ~17ms observed latency under load on this hardware at these speeds.
*Houston, we have found our tx latency!*.
4) Having gone this deep... basic BQL support looks straighforward on
the xmit side, but we'd have to walk the sent descriptors to get the
sum of bytes sent (not a huge problem), its not clear if all the error
out conditions are clean, either.
I have no way to compile, nor test on this platform at the moment. And
BQL´s behavior is additive and MIAD, which are features I am deeply
uncomfortable with hardware multiqueue. Still, there is room for vast
improvement here.
If the wifi driver is fixible, I would vote for selecting this
platform as a base for future cerowrt development.
More information about the Cerowrt-devel
mailing list