Bemused to

Mon Feb 14 22:28:14 PST 2011

On Mon, Feb 14, 2011 at 7:08 PM, Felix Fietkau <nbd at openwrt.org> wrote:
> On 2011-02-14 6:33 AM, Nathaniel Smith wrote:
>> This is the first place I'm confused. Why would you drop packets
>> inside the driver? Shouldn't dropping packets be the responsibility of
>> the Qdisc feeding your driver, since that's where all the smart AMQ
>> and QoS and user-specified-policy knobs live? My understanding is that
>> the driver's job is just to take the minimum number of packets at a
>> time (consistent with throughput, etc.) from the Qdisc and send them
>> on. Or are you talking about dropping in the sense of spending more
>> effort on driver-level retransmit in some cases than others?
> It's the driver's responsibility to aggregate packets. For that, I
> absolutely need queuing under the control of the driver. After a packet
> has an assigned sequence number, the driver cannot hand control over to
> an external qdisc unless it is guaranteed that it gets the packet back
> (possibly with a status info that tells it whether the packet should be
> dropped or not). If packets were dropped outside of the driver after
> they've been tracked, gaps in the aggregation reorder window of the
> receiver would bring the data transfer to an immediate halt.

Ah. I think I understand, but to make sure: the problem is that the
802.11 MAC layer guarantees in-order delivery (at least for packets
within the same QoS class). Therefore, if an A-MPDU aggregate is only
partially received, then the receiving side can't pass *any* parts up
the networking stack -- even the parts that were successfully received
-- until after *all* of the parts are successfully retransmitted (or
the transmitter says never mind, I'm not going to retransmit). Yes?

And this is to avoid TCP getting confused by out-of-order packets
(which it might think are lost packets, at least until they arrive and
it has to do the D-SACK dance)? How sad. It would obviously be so much
better if some reordering were possible -- no-one really wants the MAC
layer to be holding onto packets for tens of milliseconds. Surely VoIP
people hate this?

So an interesting question is if, in some circumstances, it would be
better to reorder lost packets in the service of better queue
handling. Because if you tell the receiving station to give up on this
MPDU, then you can throw the packet back into the Qdisc...

>> For that I have a crazy idea: what if the driver took each potentially
>> retransmittable packet and handed it *back* to the Qdisc, who then
>> could apply policy to send it to the back of the queue, jump it to the
>> front of the queue for immediate retransmission, throw it away if
>> higher priority traffic has arrived and the queue is full now, etc.
>> You'd probably need to add some API to tell the Qdisc that the packet
>> you want to enqueue has already waited once (I imagine the default
>> dumb Qdisc would want to enqueue such packets at the head of the queue
>> by default). Perhaps also some way to give up on a packet if it's
>> waited "too long" (but then again, perhaps not!). But as I think about
>> this idea it does grow on me.
> For the ath9k case that would mean having to turn the qdisc code into a
> library that can be completely controlled and that does not free packets
> by itself. I think that would require major changes to the network stack.

Right -- it might well be a good idea in the long run to reorganize
queue handling like this, if it turns out that drivers really need to
have their dirty fingers mixed into AQM and stuff; the Qdisc machinery
certainly doesn't strike me as the cleanest and most mature design in
the kernel. But it's not the easiest place to start...

>>> For aggregation I would like to allow at least the maximum number of
>>> packets that can fit into one A-MPDU, which depends on the selected
>>> rate. Since wireless driver queueing will really only have an effect
>>> when we're running short on airtime, we need to make sure that we reduce
>>> airtime waste caused by PHY headers, interframe spacing, etc.
>>> A-MPDU is a very neat way to do that...
>>
>> If sending N packets is as cheap (in latency terms) as sending 1, then
>> I don't see how queueing up N packets can hurt any!
>>
>> The iwlwifi patches I just sent do the dumbest possible fix, of making
>> the tx queue have a fixed latency instead of a fixed number of
>> packets. I found this attractive because I figured I wasn't smart
>> enough to anticipate all the different things that might affect
>> transmission rate, so better to just measure what was happening and
>> adapt. In principle, if A-MPDU is in use, and that lets us send more
>> packets for the price of one, then this approach would notice that
>> reflected in our packet throughput and the queue size should increase
>> to match.
>>
>> Obviously this could break if the queue size ever dropped too low --
>> you might lose throughput because of the smaller queue size, and then
>> that would lock in the small queue size, causing loss of throughput...
>> but I don't see any major downsides to just setting a minimum
>> allowable queue size, so long as it's accurate.
>>
>> In fact, my intuition is that the only thing way to improve on just
>> queueing up a full A-MPDU aggregated packet would be to wait until
>> *just before* your transmission time slot rolls around and *then*
>> queueing up a full A-MPDU aggregated packet. If you get to transmit
>> every K milliseconds, and you always refill your queue immediately
>> after transmitting, then in the worst case a high-priority packet
>> might have to wait 2*K ms (K ms sitting at the head of the Qdisc
>> waiting for you to open your queue, then another K ms in the driver
>> waiting to be transmitted). This worst case drops to K ms if you
>> always refill immediately before transmitting. But the possible gains
>> here are bounded by whatever uncertainty you have about the upcoming
>> transmission time, scheduling jitter, and K. I don't know what any of
>> those constants look like in practice.
> The problem with that is that aggregation uses more queues inside the
> driver than can be made visible as network stack queues.
> Queueing is done for every traffic identifier (there are 8 of them,
> which map to 4 hardware queues), for every station individually.
> Because of that, the driver cannot simply pull in more frames at
> convenient points in time, because what ends up getting in that case
> might just be the entirely wrong batch of frames, or a mix of packets
> for completely different stations, which would also completely kill
> aggregation performance.

Traffic identifier = 802.11-ese for QoS category, right?

Another thing I really don't understand is how 802.11 QoS is expected
to interact with everyone-else's version of QoS. Certainly it's useful
to have some way to pass QoS categories from one station to another,
but within a single station the actual spec'ed mechanisms for handling
the different QoS categories seem to just come down to, you can&should
reorder high priority packets in front of low priority packets? Is
there a requirement that a single A-MPDU cannot contain a mix of
different TIDs? It's not obvious to me how per-TID driver queues add
value over standard traffic shaping.

> For fixing this, I'm considering running the A* algorithm (proposed in
> the paper that jg mentioned) on each individual per-station per-tid queue.

That seems like it would work, but the cost is that IIUC A* calculates
the total appropriate queue size. So if you do this, then you should
also set the default txqueuelen to 0, which will disable the kernel's
generic QoS and AQM code entirely

Since bufferbloat means that the kernel's generic QoS and AQM code
don't work *now*, this would still be a huge improvement. But it seems
suboptimal in the long run, which is why I keep poking away at better
ways to interact with the Qdisc.

I take the point about needing separate buffers for separate stations,
though. That really does seem to require that your queue management
has intimate knowledge of the network situation. (I guess you could
somehow clone some template Qdisc for each station, with packets
implicitly routed into the correct one? What a mess.) Does traffic for
one station even contend with traffic for other stations?

-- Nathaniel