[Cerowrt-devel] lacking in BQL in the mvneta, what is the max latency?

Dave Taht dave.taht at gmail.com
Fri Jun 26 14:12:48 EDT 2015


Poking harder at the drivers/net/ethernet/marvel/mvneta.c:

(am I looking at the right driver for the linksys ac1200? mikael? what
does lspci and/or dmesg say for both this and the wifi on this
platform?)

1) this thing does not actually need a tx ring buffer structure, it
could fair queue all the way down to the hardware.

/* Update HW with number of TX descriptors to be sent */
static void mvneta_txq_pend_desc_add(struct mvneta_port *pp,
                                     struct mvneta_tx_queue *txq,
                                     int pend_desc)
{
        u32 val;

        /* Only 255 descriptors can be added at once ; Assume caller
         * process TX desriptors in quanta less than 256
         */
        val = pend_desc;
        mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val);
}

2) And it doesnt look like there are ipv6 checksum offloads...

3) and, sigh, on driver depth:

/* Max number of allowed TCP segments for software TSO */
#define MVNETA_MAX_TSO_SEGS 100 // 100!!!!????

#define MVNETA_MAX_SKB_DESCS (MVNETA_MAX_TSO_SEGS * 2 + MAX_SKB_FRAGS)

later on we get some moderation of this using

        txq->tx_stop_threshold = txq->size - MVNETA_MAX_SKB_DESCS; //
532 - 200 + 16

which = 350 packets outstanding
in the driver ring...

times 8 rings = 2800

possible packets living in the tx rings with enough flows. (not clear
to me if the tso/gso stuff is split into a tx op each, but it looks
like it)

That´s a worst case latency in the driver of 36ms at a gigabit. (you´d
have to have a lot of different flows to exercise all the queues,
though. So, for example, rrul is not enough to stress it out. 4 rruls,
maybe. Or the rrul_50up or down test, would be simpler). And of
course, if you run the device at 100mbit, 360ms, 10mbit 3.6 sec...

that, coupled with:
        txq->tx_wake_threshold = txq->tx_stop_threshold / 2;

gets us our ~17ms observed latency under load on this hardware at these speeds.

*Houston, we have found our tx latency!*.

4) Having gone this deep... basic BQL support looks straighforward on
the xmit side, but we'd have to walk the sent descriptors to get the
sum of bytes sent (not a huge problem), its not clear if all the error
out conditions are clean, either.

I have no way to compile, nor test on this platform at the moment. And
BQL´s behavior is additive and MIAD, which are features I am deeply
uncomfortable with hardware multiqueue. Still, there is room for vast
improvement here.

If the wifi driver is fixible, I would vote for selecting this
platform as a base for future cerowrt development.



More information about the Cerowrt-devel mailing list