[Bloat] sigcomm wifi

Michael Welzl michawe at ifi.uio.no
Mon Aug 25 04:01:21 EDT 2014


>> Yep... I remember a neat paper from colleagues at Trento University that piggybacked TCP's ACKs on link layer ACKs, thereby avoiding the collisions between TCP's ACKs and other data packets - really nice. Not sure if it wasn't just simulations, though.
> 
> that's a neat hack, but I don't see it working, except when one end of the wireless link is also the endpoint of the TCP connection (and then only for acks from that device)
> 
> so in a typical wifi environment, it would be one less transmission from the laptop, no change to the AP.
> 
> But even with that, doesn't TCP try to piggyback the ack on the next packet of data anyway? so unless it's a purely one-way dataflow, this still wouldn't help.

Yes but of course many dataflows are indeed one-way - HTTP typically sends a get/put and not much else.


>>> but can the firmware really tell the difference between quality degredation due to interference and collisions with other transmitters?
>> 
>> Well, with heuristics it can, sort of. As a simple example from one older mechanism, consider: multiple consecutive losses are *less* likely from random collisions than from link noise. That sort of thing. Minstrel worked best our tests, using tables of rates that worked well / didn't work well in the past:
>> http://heim.ifi.uio.no/michawe/research/publications/wowmom2012.pdf
> 
> the question is if this is deployed in any comoddity OS stacks. If not, it could only help on the AP, and we are better off just locking the speeds there.

I thought it's widely deployed but I really don't know. I'm sure others on this list do? What I do know is that it was (is) a part of madwifi.


>>>>> retries of packets that the OS has given up on (including the user has closed the app that sent them)
>>>>> 
>>>>> Ideally we want the wifi layer to be just like the wired layer, buffer only what's needed to get it on the air without 'dead air' (where the driver is waiting for the OS to give it more data), at that point, we can do the retries from the OS as appropriate.
>>>>> 
>>>>>> I have two questions: 1) is my characterization roughly correct? 2) have people investigated the downsides (negative effect on TCP) of buffering *too little* in wireless equipment? (I suspect so?)  Finding where "too little" begins could give us a better idea of what the ideal buffer length should really be.
>>>>> 
>>>>> too little buffering will reduce the throughput as a result of unused airtime.
>>>> 
>>>> so that's a function of, at least: 1) incoming traffic rate; 2) no. retries * ( f(MAC behavior; number of other senders trying) ).
>>> 
>>> incoming to the AP you mean?
>> 
>> incoming to whoever is sending and would be retrying - mostly the AP, yes.
> 
> terminology issue here
> 
> a receiver is never going to be retrying, it has nothing to retry. It's the sender that keeps track of what it's sent and retries if it doesn't get an ack.

Sorry, I must have expressed myself very unclearly. I am of course talking about a sender. With "incoming" I meant incoming to the buffer of the device that's sending, from the upstream (or up-stack) sender.


>>> It also matters if you are worrying about aggregate throughput of a lot of users, or per-connection throughput for a single user.
>>> 
>>> From a sender's point of view, if it takes 100 time units to send a packet, and 1-5 time units to queue the next packet for transmission, you loose a few percentage of your possible airtime and there's very little concern.
>>> 
>>> but if it takes 10 time units to send the packet and 1-5 time units to queue the next packet, you have just lost a lot of potential bandwidth.
>>> 
>>> But from the point of view of the aggregate, these gaps just give someone else a chance to transmit and have very little effect on the amount of traffic arriving at the AP.
>>> 
>>> I was viewing things from the point of view of the app on the laptop.
>> 
>> Yes... I agree, and that's the more common + more reasonable way to think about it. I tend to think upstream, which of course is far less common, but maybe even more problematic. Actually I suspect the following: things get seriously bad when a lot of senders are sending upstream together; this isn't really happening much in practice - BUT when we have a very very large number of hosts connected in a conference style situation, all the HTTP GETs and SMTP messages and whatnot *do* create lots of collisions, a situation that isn't really too common (and maybe not envisioned / parametrized for), and that's why things often get so bad. (At least one of the reasons.)
> 
> the thing is that in the high-density environment, there's not that much the AP can do, most of the problem is related to the mobile endpoints and what they decide to do.

True!
(though, as you say, limiting allowed physical rates on the AP probably helps)


>>>>> But at the low data rates involved, the system would have to be extremely busy to be a significant amount of time if even one packet at a time is buffered.
>>>> 
>>>> 
>>>> 
>>>>> You are also conflating the effect of the driver/hardware buffering with it doing retries.
>>>> 
>>>> because of the "function" i wrote above: the more you retry, the more you need to buffer when traffic continuously arrives because you're stuck trying to send a frame again.
>>> 
>>> huh, I'm missing something here, retrying sends would require you to buffer more when sending.
>> 
>> aren't you the saying the same thing as I ?  Sorry else, I might have expressed it confusingly somehow
> 
> as I said above, the machine receiving packets doesn't need to buffer them, because it has no need to re-send them. It's the machine sending packets that needs to keep track of what's been sent in case it needs to re-send it.

Sure, that was a plain misunderstanding.


> But this cache of recently sent packets is separate from a queue of packets waiting to be sent.
> 
> the size of the buffer used to track what's been sent isn't a problem. the bufferbloat problem is aroudn the size of the queue for packet waiting to be sent.

This confuses me. Why do you even need a cache of recently sent packets?
Anyway, what I am talking about *is* the size of the queue for packets waiting to be sent - and not only due to aggregation but also link layer retransmits. Per device, at the link layer, packets (frames, really) are sent in sequence AFAIK, and so any frame that has been sent but not yet acknowledged and then has to be resent if it isn't acknowledged holds up all other frames to that same destination.



>>> If people are retrying when they really don't need to, that cuts down on the avialable airtime.
>> 
>> Yes
>> 
>> 
>>> But if you have continual transmissions taking place, so you have a hard time getting a chance to send your traffic, then you really do have congestion and should be dropping packets to let the sender know that it shouldn't try to generate as much.
>> 
>> Yes; but the complexity that I was pointing at (but maybe it's a simple parameter, more like a 0 or 1 situation in practice?) lies in the word "continual". How long do you try before you decide that the sending TCP should really think it *is* congestion?  To really optimize the behavior, that would have to depend on the RTT, which you can't easily know.
> 
> Again, I think you are mixing two different issues here.

No, I think you misunderstand me -


> 1. waiting for a pause in everyone else's transmissions so that you can transmit wihtout _knowing_ that you are going to clobber someone
> 
>  Even this can get tricky, is that station you are hearing faintly trying to transmit to a AP near you so you should be quiet? or is it transmitting to a station enough further away from you so you can go ahead and transmit your packet to your AP without interfering with it?

You mean the normal CSMA/CA procedure ( + RTS/CTS)? Sure that's tricky in itself but I wasn't talking about that.


> 2. your transmission gettng clobbered so the packet doesn't get through, where you need to wait 'long enough' to decide that it's not going to be acknowledged and try again.

I was always only talking about that second bit. I'm sure I wasn't clear enough in writing and I'm sorry for that.


>  This is a case where a local proxy server can actually make a big difference to you. The connections between your mobile devices and the local proxy server have a short RTT and so all timeouts can be nice and short, and then the proxy deals with the long RTT connections out to the Internet.

Adding a proxy to these considerations only complicates them: it's a hard enough trade-off when we just ask ourselves: how large should a buffer for the sake of link layer retransmissions be?  (which is closely related to the question: how often should a link layer try to retransmit before giving up?)  That's what my emails were about. I suspect that we don't have a good answer to even these questions, and I suspect that we'd better off having something dynamic than fixed default values.

Cheers,
Michael




More information about the Bloat mailing list