* [Bloat] lwn.net's tcp small queues vs wifi aggregation solved @ 2018-06-21 4:58 Dave Taht 2018-06-21 9:22 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 32+ messages in thread From: Dave Taht @ 2018-06-21 4:58 UTC (permalink / raw) To: Make-Wifi-fast, bloat Nice war story. I'm glad this last problem with the fq_codel wifi code is solved, and the article points to a few usb wifi dongles that work better now. https://lwn.net/SubscriberLink/757643/b25587e3593e9f9e/ -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 4:58 [Bloat] lwn.net's tcp small queues vs wifi aggregation solved Dave Taht @ 2018-06-21 9:22 ` Toke Høiland-Jørgensen 2018-06-21 12:55 ` Eric Dumazet 0 siblings, 1 reply; 32+ messages in thread From: Toke Høiland-Jørgensen @ 2018-06-21 9:22 UTC (permalink / raw) To: Dave Taht, Make-Wifi-fast, bloat Dave Taht <dave.taht@gmail.com> writes: > Nice war story. I'm glad this last problem with the fq_codel wifi code > is solved This wasn't specific to the fq_codel wifi code, but hit all WiFi devices that were running TCP on the local stack. Which would be mostly laptops, I guess... -Toke ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 9:22 ` Toke Høiland-Jørgensen @ 2018-06-21 12:55 ` Eric Dumazet 2018-06-21 15:18 ` Dave Taht 0 siblings, 1 reply; 32+ messages in thread From: Eric Dumazet @ 2018-06-21 12:55 UTC (permalink / raw) To: bloat On 06/21/2018 02:22 AM, Toke Høiland-Jørgensen wrote: > Dave Taht <dave.taht@gmail.com> writes: > >> Nice war story. I'm glad this last problem with the fq_codel wifi code >> is solved > > This wasn't specific to the fq_codel wifi code, but hit all WiFi devices > that were running TCP on the local stack. Which would be mostly laptops, > I guess... Yes. Also switching TCP stack to always GSO has been a major gain for wifi in my tests. (TSQ budget is based on sk_wmem_alloc, tracking truesize of skbs, and not having GSO is considerably inflating the truesize/payload ratio) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a tcp: switch to GSO being always on I expect SACK compression to also give a nice boost to wifi. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e tcp: add SACK compression Lastly I am working on adding ACK compression in TCP stack itself. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 12:55 ` Eric Dumazet @ 2018-06-21 15:18 ` Dave Taht 2018-06-21 15:31 ` Caleb Cushing ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Dave Taht @ 2018-06-21 15:18 UTC (permalink / raw) To: Eric Dumazet; +Cc: bloat, Make-Wifi-fast On Thu, Jun 21, 2018 at 5:55 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > On 06/21/2018 02:22 AM, Toke Høiland-Jørgensen wrote: >> Dave Taht <dave.taht@gmail.com> writes: >> >>> Nice war story. I'm glad this last problem with the fq_codel wifi code >>> is solved >> >> This wasn't specific to the fq_codel wifi code, but hit all WiFi devices >> that were running TCP on the local stack. Which would be mostly laptops, >> I guess... > > Yes. > > Also switching TCP stack to always GSO has been a major gain for wifi in my tests. > > (TSQ budget is based on sk_wmem_alloc, tracking truesize of skbs, and not having > GSO is considerably inflating the truesize/payload ratio) > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a > tcp: switch to GSO being always on > > I expect SACK compression to also give a nice boost to wifi. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e > tcp: add SACK compression > > Lastly I am working on adding ACK compression in TCP stack itself. One thing I've seen repeatedly on mac80211 aircaps is a tendency for clients to use up two TXOPs rather than one. scenario: 1) A tcp burst arrives at the client 2) A single ack migrates down the client stack into the driver, into the device, which then attempts to compete for airtime on that TXOP for that single ack, sometimes waiting 10s of msec to get that op 3) a bunch more acks arrive "slightly late"[1], and then get queued for the next TXOP, waiting, again sometimes 10s of msec (similar scenario in a client making a quick string of web related requests) This is a case where inserting a teeny bit more latency to fill up the queue (ugh!), or a driver having some way to ask the probability of seeing more data in the next 10us, or... something like that, could help. ... [1] if you need coffee through your nose this morning, regarding usage of the phrase "slightly late", read http://www.rawbw.com/~svw/superman.html -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 15:18 ` Dave Taht @ 2018-06-21 15:31 ` Caleb Cushing 2018-06-21 15:46 ` Stephen Hemminger 2018-06-21 15:50 ` Dave Taht 2018-06-21 16:29 ` David Collier-Brown 2018-06-21 16:43 ` Kathleen Nichols 2 siblings, 2 replies; 32+ messages in thread From: Caleb Cushing @ 2018-06-21 15:31 UTC (permalink / raw) To: Dave Taht; +Cc: Eric Dumazet, Make-Wifi-fast, bloat [-- Attachment #1: Type: text/plain, Size: 2785 bytes --] actually... all of my devices, including my desktop connect through wifi these days... and only one of them isn't running some variant of linux. On Thu, Jun 21, 2018 at 10:18 AM Dave Taht <dave.taht@gmail.com> wrote: > On Thu, Jun 21, 2018 at 5:55 AM, Eric Dumazet <eric.dumazet@gmail.com> > wrote: > > > > > > On 06/21/2018 02:22 AM, Toke Høiland-Jørgensen wrote: > >> Dave Taht <dave.taht@gmail.com> writes: > >> > >>> Nice war story. I'm glad this last problem with the fq_codel wifi code > >>> is solved > >> > >> This wasn't specific to the fq_codel wifi code, but hit all WiFi devices > >> that were running TCP on the local stack. Which would be mostly laptops, > >> I guess... > > > > Yes. > > > > Also switching TCP stack to always GSO has been a major gain for wifi in > my tests. > > > > (TSQ budget is based on sk_wmem_alloc, tracking truesize of skbs, and > not having > > GSO is considerably inflating the truesize/payload ratio) > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a > > tcp: switch to GSO being always on > > > > I expect SACK compression to also give a nice boost to wifi. > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e > > tcp: add SACK compression > > > > Lastly I am working on adding ACK compression in TCP stack itself. > > One thing I've seen repeatedly on mac80211 aircaps is a tendency for > clients to use up two TXOPs rather than one. > > scenario: > > 1) A tcp burst arrives at the client > 2) A single ack migrates down the client stack into the driver, into > the device, which then attempts to compete for airtime on that TXOP > for that single ack, sometimes waiting 10s of msec to get that op > 3) a bunch more acks arrive "slightly late"[1], and then get queued > for the next TXOP, waiting, again sometimes 10s of msec > > (similar scenario in a client making a quick string of web related > requests) > > This is a case where inserting a teeny bit more latency to fill up the > queue (ugh!), or a driver having some way to ask the probability of > seeing more data in the > next 10us, or... something like that, could help. > > ... > > [1] if you need coffee through your nose this morning, regarding usage > of the phrase "slightly late", read > http://www.rawbw.com/~svw/superman.html > > -- > > Dave Täht > CEO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-669-226-2619 <(669)%20226-2619> > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > -- Caleb Cushing http://xenoterracide.com [-- Attachment #2: Type: text/html, Size: 4279 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 15:31 ` Caleb Cushing @ 2018-06-21 15:46 ` Stephen Hemminger 2018-06-21 17:41 ` Caleb Cushing 2018-06-21 15:50 ` Dave Taht 1 sibling, 1 reply; 32+ messages in thread From: Stephen Hemminger @ 2018-06-21 15:46 UTC (permalink / raw) To: Caleb Cushing; +Cc: Dave Taht, Make-Wifi-fast, bloat On Thu, 21 Jun 2018 10:31:18 -0500 Caleb Cushing <xenoterracide@gmail.com> wrote: > actually... all of my devices, including my desktop connect through wifi > these days... and only one of them isn't running some variant of linux. > Sigh. My experience with wifi is that it is not stable enough for that. Both AP's I have tried Linksys ACM3200 or Netgear WNDR3800 I still see random drop outs. Not sure if these are device resets (ie workarounds) or other issues. These happen independent of firmware (vendor, OpenWRT, or LEDE). So my suspicion is the that Wifi hardware is shite and that firmware is trying to workaround and mask the problem. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 15:46 ` Stephen Hemminger @ 2018-06-21 17:41 ` Caleb Cushing 0 siblings, 0 replies; 32+ messages in thread From: Caleb Cushing @ 2018-06-21 17:41 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Dave Taht, Make-Wifi-fast, bloat [-- Attachment #1: Type: text/plain, Size: 950 bytes --] I'm not disagreeing, just saying that wifi is much more prevalent now than just laptops... literally I only have a cord for emergency use On Thu, Jun 21, 2018 at 10:46 AM Stephen Hemminger < stephen@networkplumber.org> wrote: > On Thu, 21 Jun 2018 10:31:18 -0500 > Caleb Cushing <xenoterracide@gmail.com> wrote: > > > actually... all of my devices, including my desktop connect through wifi > > these days... and only one of them isn't running some variant of linux. > > > > Sigh. My experience with wifi is that it is not stable enough for that. > Both AP's I have tried Linksys ACM3200 or Netgear WNDR3800 I still see > random drop outs. > Not sure if these are device resets (ie workarounds) or other issues. > > These happen independent of firmware (vendor, OpenWRT, or LEDE). > So my suspicion is the that Wifi hardware is shite and that firmware is > trying > to workaround and mask the problem. > -- Caleb Cushing http://xenoterracide.com [-- Attachment #2: Type: text/html, Size: 1493 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 15:31 ` Caleb Cushing 2018-06-21 15:46 ` Stephen Hemminger @ 2018-06-21 15:50 ` Dave Taht 1 sibling, 0 replies; 32+ messages in thread From: Dave Taht @ 2018-06-21 15:50 UTC (permalink / raw) To: Caleb Cushing; +Cc: Eric Dumazet, Make-Wifi-fast, bloat On Thu, Jun 21, 2018 at 8:31 AM, Caleb Cushing <xenoterracide@gmail.com> wrote: > actually... all of my devices, including my desktop connect through wifi > these days... and only one of them isn't running some variant of linux. Yes the tendency of manufacturers to hook things up to the more convenient, but overbuffered and less opaque USB bus has become an increasingly large problem (canonical example - raspberry pi). In the case of LTE, especially, everything is a USB dongle, and the CDC_ETH driver and device spec actually mandates at least 32k of on-chip buffering on the other side of the bus. We had tried at one point (5 years ago) to find ways to apply something BQL-like to this but failed. I am currently getting miserable performance out of the one LTE dongle I have (16K/sec up) but have not gone and fiddled with it with more modern kernels. I ended up just tethering via an android phone, which cracks 1mbit up. The quality of the wifi drivers for USB is almost uniformly miserable, and out of tree. > > On Thu, Jun 21, 2018 at 10:18 AM Dave Taht <dave.taht@gmail.com> wrote: >> >> On Thu, Jun 21, 2018 at 5:55 AM, Eric Dumazet <eric.dumazet@gmail.com> >> wrote: >> > >> > >> > On 06/21/2018 02:22 AM, Toke Høiland-Jørgensen wrote: >> >> Dave Taht <dave.taht@gmail.com> writes: >> >> >> >>> Nice war story. I'm glad this last problem with the fq_codel wifi code >> >>> is solved >> >> >> >> This wasn't specific to the fq_codel wifi code, but hit all WiFi >> >> devices >> >> that were running TCP on the local stack. Which would be mostly >> >> laptops, >> >> I guess... >> > >> > Yes. >> > >> > Also switching TCP stack to always GSO has been a major gain for wifi in >> > my tests. >> > >> > (TSQ budget is based on sk_wmem_alloc, tracking truesize of skbs, and >> > not having >> > GSO is considerably inflating the truesize/payload ratio) >> > >> > >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a >> > tcp: switch to GSO being always on >> > >> > I expect SACK compression to also give a nice boost to wifi. >> > >> > >> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e >> > tcp: add SACK compression >> > >> > Lastly I am working on adding ACK compression in TCP stack itself. >> >> One thing I've seen repeatedly on mac80211 aircaps is a tendency for >> clients to use up two TXOPs rather than one. >> >> scenario: >> >> 1) A tcp burst arrives at the client >> 2) A single ack migrates down the client stack into the driver, into >> the device, which then attempts to compete for airtime on that TXOP >> for that single ack, sometimes waiting 10s of msec to get that op >> 3) a bunch more acks arrive "slightly late"[1], and then get queued >> for the next TXOP, waiting, again sometimes 10s of msec >> >> (similar scenario in a client making a quick string of web related >> requests) >> >> This is a case where inserting a teeny bit more latency to fill up the >> queue (ugh!), or a driver having some way to ask the probability of >> seeing more data in the >> next 10us, or... something like that, could help. >> >> ... >> >> [1] if you need coffee through your nose this morning, regarding usage >> of the phrase "slightly late", read >> http://www.rawbw.com/~svw/superman.html >> >> -- >> >> Dave Täht >> CEO, TekLibre, LLC >> http://www.teklibre.com >> Tel: 1-669-226-2619 >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > -- > Caleb Cushing > > http://xenoterracide.com -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 15:18 ` Dave Taht 2018-06-21 15:31 ` Caleb Cushing @ 2018-06-21 16:29 ` David Collier-Brown 2018-06-21 16:54 ` Jonathan Morton 2018-06-21 16:43 ` Kathleen Nichols 2 siblings, 1 reply; 32+ messages in thread From: David Collier-Brown @ 2018-06-21 16:29 UTC (permalink / raw) To: bloat On 21/06/18 11:18 AM, Dave Taht wrote > This is a case where inserting a teeny bit more latency to fill up the > queue (ugh!), or a driver having some way to ask the probability of > seeing more data in the > next 10us, or... something like that, could help. Hmmn, that sounds like a pattern seen in physical switching systems: someone with knowledge that another car is coming (especially if it's unexpected) waves a flag at the dispatcher to warn them to leave space and avoid a nasty ka-thump and the extra strain on the couplers (;-)) --dave -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest davecb@spamcop.net | -- Mark Twain ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 16:29 ` David Collier-Brown @ 2018-06-21 16:54 ` Jonathan Morton 0 siblings, 0 replies; 32+ messages in thread From: Jonathan Morton @ 2018-06-21 16:54 UTC (permalink / raw) To: davecb; +Cc: bloat >> This is a case where inserting a teeny bit more latency to fill up the >> queue (ugh!), or a driver having some way to ask the probability of >> seeing more data in the >> next 10us, or... something like that, could help. > > Hmmn, that sounds like a pattern seen in physical switching systems: someone with knowledge that another car is coming (especially if it's unexpected) waves a flag at the dispatcher to warn them to leave space and avoid a nasty ka-thump and the extra strain on the couplers (;-)) A more relevant railway analogy would be that a passenger train keeps its doors open while waiting for the departure signal to clear, permitting more passengers to board. At large stations the crew will press a TRTS (Train Ready To Start) button on the platform about half a minute before departure time, to prompt setting of the departure route in time, but a conflicting movement may delay the signal actually clearing. - Jonathan Morton ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 15:18 ` Dave Taht 2018-06-21 15:31 ` Caleb Cushing 2018-06-21 16:29 ` David Collier-Brown @ 2018-06-21 16:43 ` Kathleen Nichols 2018-06-21 19:17 ` Dave Taht 2 siblings, 1 reply; 32+ messages in thread From: Kathleen Nichols @ 2018-06-21 16:43 UTC (permalink / raw) To: bloat On 6/21/18 8:18 AM, Dave Taht wrote: > This is a case where inserting a teeny bit more latency to fill up the > queue (ugh!), or a driver having some way to ask the probability of > seeing more data in the > next 10us, or... something like that, could help. > Well, if the driver sees the arriving packets, it could infer that an ack will be produced shortly and will need a sending opportunity. Kathie (we tried this mechanism out for cable data head ends at Com21 and it went into a patent that probably belongs to Arris now. But that was for cable. It is a fact universally acknowledged that a packet of data must be in want of an acknowledgement.) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 16:43 ` Kathleen Nichols @ 2018-06-21 19:17 ` Dave Taht 2018-06-21 19:41 ` Sebastian Moeller 2018-06-22 14:01 ` Kathleen Nichols 0 siblings, 2 replies; 32+ messages in thread From: Dave Taht @ 2018-06-21 19:17 UTC (permalink / raw) To: Kathleen Nichols; +Cc: bloat On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <nichols@pollere.com> wrote: > On 6/21/18 8:18 AM, Dave Taht wrote: > >> This is a case where inserting a teeny bit more latency to fill up the >> queue (ugh!), or a driver having some way to ask the probability of >> seeing more data in the >> next 10us, or... something like that, could help. >> > > Well, if the driver sees the arriving packets, it could infer that an > ack will be produced shortly and will need a sending opportunity. Certainly in the case of wifi and lte and other simplex technologies this seems feasible... 'cept that we're all busy finding ways to do ack compression this month and thus the two big tcp packets = 1 ack rule is going away. Still, an estimate, with a short timeout might help. Another thing I've longed for (sometimes) is whether or not an application like a web browser signalling the OS that it has a batch of network packets coming would help... web browser: setsockopt(batch_everything) parse the web page, generate all your dns, tcp requests, etc, etc setsockopt(release_batch) > Kathie > > (we tried this mechanism out for cable data head ends at Com21 and it > went into a patent that probably belongs to Arris now. But that was for > cable. It is a fact universally acknowledged that a packet of data must > be in want of an acknowledgement.) voip doesn't behave this way, but for recognisable protocols like tcp and perhaps quic... > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 19:17 ` Dave Taht @ 2018-06-21 19:41 ` Sebastian Moeller 2018-06-21 19:51 ` Toke Høiland-Jørgensen 2018-06-21 19:54 ` Dave Taht 2018-06-22 14:01 ` Kathleen Nichols 1 sibling, 2 replies; 32+ messages in thread From: Sebastian Moeller @ 2018-06-21 19:41 UTC (permalink / raw) To: Dave Täht; +Cc: Kathleen Nichols, bloat Hi All, > On Jun 21, 2018, at 21:17, Dave Taht <dave.taht@gmail.com> wrote: > > On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <nichols@pollere.com> wrote: >> On 6/21/18 8:18 AM, Dave Taht wrote: >> >>> This is a case where inserting a teeny bit more latency to fill up the >>> queue (ugh!), or a driver having some way to ask the probability of >>> seeing more data in the >>> next 10us, or... something like that, could help. >>> >> >> Well, if the driver sees the arriving packets, it could infer that an >> ack will be produced shortly and will need a sending opportunity. > > Certainly in the case of wifi and lte and other simplex technologies > this seems feasible... > > 'cept that we're all busy finding ways to do ack compression this > month and thus the > two big tcp packets = 1 ack rule is going away. Still, an estimate, > with a short timeout > might help. That short timeout seems essential, just because a link is wireless, does not mean the ACKs for passing TCP packets will appear shortly, who knows what routing happens after the wireless link (think city-wide mesh network). In a way such a solution should first figure out whether waiting has any chance of being useful, by looking at te typical delay between Data packets and the matching ACKs. > > Another thing I've longed for (sometimes) is whether or not an > application like a web > browser signalling the OS that it has a batch of network packets > coming would help... To make up for the fact that wireless uses unfortunately uses a very high per packet overhead it just tries to "hide" by amortizing it over more than one data packet. How about trying to find a better, less wasteful MAC instead ;) (and now we have two problems...) Now really from a latency perspective it clearly is better to ovoid overhead instead of use "batching" to better amortize it since batching increases latency (I stipulate that there are condition in which clever batching will not increase the noticeable latency if it can hide inside another latency increasing process). > > web browser: > setsockopt(batch_everything) > parse the web page, generate all your dns, tcp requests, etc, etc > setsockopt(release_batch) > >> Kathie >> >> (we tried this mechanism out for cable data head ends at Com21 and it >> went into a patent that probably belongs to Arris now. But that was for >> cable. It is a fact universally acknowledged that a packet of data must >> be in want of an acknowledgement.) > > voip doesn't behave this way, but for recognisable protocols like tcp > and perhaps quic... I note that for voip, waiting does not make sense as all packets carry information and keeping jitter low will noticeably increase a calls perceived quality (if just by allowing the application yo use a small de-jitter buffer and hence less latency). There is a reason why wifi's voice access class, oith has the highest probability to get the next tx-slot and also is not allowed to send aggregates (whether that is fully sane is another question, answering which I do not feel competent). I also think that on a docsis system it is probably a decent heuristic to assume that the endpoints will be a few milliseconds away at most (and only due to the coarse docsis grant-request clock). Best Regards Sebastian > >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > > > -- > > Dave Täht > CEO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-669-226-2619 > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 19:41 ` Sebastian Moeller @ 2018-06-21 19:51 ` Toke Høiland-Jørgensen 2018-06-21 19:54 ` Dave Taht 1 sibling, 0 replies; 32+ messages in thread From: Toke Høiland-Jørgensen @ 2018-06-21 19:51 UTC (permalink / raw) To: Sebastian Moeller, Dave Täht; +Cc: bloat Sebastian Moeller <moeller0@gmx.de> writes: > To make up for the fact that wireless uses unfortunately uses a > very high per packet overhead it just tries to "hide" by > amortizing it over more than one data packet. How about trying > to find a better, less wasteful MAC instead ;) (and now we have > two problems...) Now really from a latency perspective it > clearly is better to ovoid overhead instead of use "batching" to > better amortize it since batching increases latency (I stipulate > that there are condition in which clever batching will not > increase the noticeable latency if it can hide inside another > latency increasing process). Seems that 802.11ax will have some interesting features to this end. Specifically, the spectrum can be split, allowing smaller chunks of it to be used for reverse path transmissions (full-duplex at last?). https://en.wikipedia.org/wiki/802.11ax#Technical_improvements Also, 1024-QAM on 160Mhz channels; omg... -Toke ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 19:41 ` Sebastian Moeller 2018-06-21 19:51 ` Toke Høiland-Jørgensen @ 2018-06-21 19:54 ` Dave Taht 2018-06-21 20:11 ` Sebastian Moeller 1 sibling, 1 reply; 32+ messages in thread From: Dave Taht @ 2018-06-21 19:54 UTC (permalink / raw) To: Sebastian Moeller; +Cc: Kathleen Nichols, bloat On Thu, Jun 21, 2018 at 12:41 PM, Sebastian Moeller <moeller0@gmx.de> wrote: > Hi All, > >> On Jun 21, 2018, at 21:17, Dave Taht <dave.taht@gmail.com> wrote: >> >> On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <nichols@pollere.com> wrote: >>> On 6/21/18 8:18 AM, Dave Taht wrote: >>> >>>> This is a case where inserting a teeny bit more latency to fill up the >>>> queue (ugh!), or a driver having some way to ask the probability of >>>> seeing more data in the >>>> next 10us, or... something like that, could help. >>>> >>> >>> Well, if the driver sees the arriving packets, it could infer that an >>> ack will be produced shortly and will need a sending opportunity. >> >> Certainly in the case of wifi and lte and other simplex technologies >> this seems feasible... >> >> 'cept that we're all busy finding ways to do ack compression this >> month and thus the >> two big tcp packets = 1 ack rule is going away. Still, an estimate, >> with a short timeout >> might help. > > That short timeout seems essential, just because a link is wireless, does not mean the ACKs for passing TCP packets will appear shortly, who knows what routing happens after the wireless link (think city-wide mesh network). In a way such a solution should first figure out whether waiting has any chance of being useful, by looking at te typical delay between Data packets and the matching ACKs. We are in this discussion, having a few issues with multiple contexts. Mine (and eric's) is in improving wifi clients (laptops, handhelds) behavior, where the tcp stack is local. packet pairing estimates on routers... well, if you get an aggregate "in", you should be able to get an aggregate "out" when it traverses the same driver. routerwise, ack compression "done right" will help a bit... it's the "done right" part that's the sticking point. > >> >> Another thing I've longed for (sometimes) is whether or not an >> application like a web >> browser signalling the OS that it has a batch of network packets >> coming would help... > > To make up for the fact that wireless uses unfortunately uses a very high per packet overhead it just tries to "hide" by amortizing it over more than one data packet. How about trying to find a better, less wasteful MAC instead ;) (and now we have two problems...) On my bad days I'd really like to have a do-over on wifi. The only hope I've had has been for LiFi or a ressurection of I haven't poked into what's going on in 5G lately (the mac is "better", but towers being distant does not help), nor have I been tracking 802.11ax for a few years. Lower latency was all over the 802.11ax standard when I last paid attention. Has 802.11ad gone anywhere? >Now really from a latency perspective it clearly is better to ovoid overhead instead of use "batching" to better amortize it since batching increases latency (I stipulate that there are condition in which clever batching will not increase the noticeable latency if it can hide inside another latency increasing process). > >> >> web browser: >> setsockopt(batch_everything) >> parse the web page, generate all your dns, tcp requests, etc, etc >> setsockopt(release_batch) >> >>> Kathie >>> >>> (we tried this mechanism out for cable data head ends at Com21 and it >>> went into a patent that probably belongs to Arris now. But that was for >>> cable. It is a fact universally acknowledged that a packet of data must >>> be in want of an acknowledgement.) >> >> voip doesn't behave this way, but for recognisable protocols like tcp >> and perhaps quic... > > I note that for voip, waiting does not make sense as all packets carry information and keeping jitter low will noticeably increase a calls perceived quality (if just by allowing the application yo use a small de-jitter buffer and hence less latency). There is a reason why wifi's voice access class, oith has the highest probability to get the next tx-slot and also is not allowed to send aggregates (whether that is fully sane is another question, answering which I do not feel competent). > I also think that on a docsis system it is probably a decent heuristic to assume that the endpoints will be a few milliseconds away at most (and only due to the coarse docsis grant-request clock). > > Best Regards > Sebastian > > >> >>> _______________________________________________ >>> Bloat mailing list >>> Bloat@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/bloat >> >> >> >> -- >> >> Dave Täht >> CEO, TekLibre, LLC >> http://www.teklibre.com >> Tel: 1-669-226-2619 >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 19:54 ` Dave Taht @ 2018-06-21 20:11 ` Sebastian Moeller 0 siblings, 0 replies; 32+ messages in thread From: Sebastian Moeller @ 2018-06-21 20:11 UTC (permalink / raw) To: Dave Täht; +Cc: Kathleen Nichols, bloat Hi Dave, > On Jun 21, 2018, at 21:54, Dave Taht <dave.taht@gmail.com> wrote: > > On Thu, Jun 21, 2018 at 12:41 PM, Sebastian Moeller <moeller0@gmx.de> wrote: >> Hi All, >> >>> On Jun 21, 2018, at 21:17, Dave Taht <dave.taht@gmail.com> wrote: >>> >>> On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <nichols@pollere.com> wrote: >>>> On 6/21/18 8:18 AM, Dave Taht wrote: >>>> >>>>> This is a case where inserting a teeny bit more latency to fill up the >>>>> queue (ugh!), or a driver having some way to ask the probability of >>>>> seeing more data in the >>>>> next 10us, or... something like that, could help. >>>>> >>>> >>>> Well, if the driver sees the arriving packets, it could infer that an >>>> ack will be produced shortly and will need a sending opportunity. >>> >>> Certainly in the case of wifi and lte and other simplex technologies >>> this seems feasible... >>> >>> 'cept that we're all busy finding ways to do ack compression this >>> month and thus the >>> two big tcp packets = 1 ack rule is going away. Still, an estimate, >>> with a short timeout >>> might help. >> >> That short timeout seems essential, just because a link is wireless, does not mean the ACKs for passing TCP packets will appear shortly, who knows what routing happens after the wireless link (think city-wide mesh network). In a way such a solution should first figure out whether waiting has any chance of being useful, by looking at te typical delay between Data packets and the matching ACKs. > > We are in this discussion, having a few issues with multiple contexts. > Mine (and eric's) is in improving wifi clients (laptops, handhelds) > behavior, where the tcp stack is local. Ah, sorry, I got this wrong and was looking at this from the APs perspective; sorry for the noise... and thanks for the patience > > packet pairing estimates on routers... well, if you get an aggregate > "in", you should be able to get an aggregate "out" when it traverses > the same driver. routerwise, ack compression "done right" will help a > bit... it's the "done right" part that's the sticking point. How will ACK compression help? If done aggressively it will sparse out the ACK stream potentially making aggregating ACK infeasible, no? On the other hand if sparse enough maybe not aggregating is not too painful? I guess I am just slow today... Best Regards Sebastian > >> >>> >>> Another thing I've longed for (sometimes) is whether or not an >>> application like a web >>> browser signalling the OS that it has a batch of network packets >>> coming would help... >> >> To make up for the fact that wireless uses unfortunately uses a very high per packet overhead it just tries to "hide" by amortizing it over more than one data packet. How about trying to find a better, less wasteful MAC instead ;) (and now we have two problems...) > > On my bad days I'd really like to have a do-over on wifi. The only > hope I've had has been for LiFi or a ressurection of > > I haven't poked into what's going on in 5G lately (the mac is > "better", but towers being distant does not help), nor have I been > tracking 802.11ax for a few years. Lower latency was all over the > 802.11ax standard when I last paid attention. > > Has 802.11ad gone anywhere? > > >> Now really from a latency perspective it clearly is better to ovoid overhead instead of use "batching" to better amortize it since batching increases latency (I stipulate that there are condition in which clever batching will not increase the noticeable latency if it can hide inside another latency increasing process). >> >>> >>> web browser: >>> setsockopt(batch_everything) >>> parse the web page, generate all your dns, tcp requests, etc, etc >>> setsockopt(release_batch) >>> >>>> Kathie >>>> >>>> (we tried this mechanism out for cable data head ends at Com21 and it >>>> went into a patent that probably belongs to Arris now. But that was for >>>> cable. It is a fact universally acknowledged that a packet of data must >>>> be in want of an acknowledgement.) >>> >>> voip doesn't behave this way, but for recognisable protocols like tcp >>> and perhaps quic... >> >> I note that for voip, waiting does not make sense as all packets carry information and keeping jitter low will noticeably increase a calls perceived quality (if just by allowing the application yo use a small de-jitter buffer and hence less latency). There is a reason why wifi's voice access class, oith has the highest probability to get the next tx-slot and also is not allowed to send aggregates (whether that is fully sane is another question, answering which I do not feel competent). >> I also think that on a docsis system it is probably a decent heuristic to assume that the endpoints will be a few milliseconds away at most (and only due to the coarse docsis grant-request clock). >> >> Best Regards >> Sebastian >> >> >>> >>>> _______________________________________________ >>>> Bloat mailing list >>>> Bloat@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/bloat >>> >>> >>> >>> -- >>> >>> Dave Täht >>> CEO, TekLibre, LLC >>> http://www.teklibre.com >>> Tel: 1-669-226-2619 >>> _______________________________________________ >>> Bloat mailing list >>> Bloat@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/bloat >> > > > > -- > > Dave Täht > CEO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-21 19:17 ` Dave Taht 2018-06-21 19:41 ` Sebastian Moeller @ 2018-06-22 14:01 ` Kathleen Nichols 2018-06-22 14:12 ` Jonathan Morton 1 sibling, 1 reply; 32+ messages in thread From: Kathleen Nichols @ 2018-06-22 14:01 UTC (permalink / raw) To: bloat On 6/21/18 12:17 PM, Dave Taht wrote: > On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <nichols@pollere.com> wrote: >> On 6/21/18 8:18 AM, Dave Taht wrote: >> >>> This is a case where inserting a teeny bit more latency to fill up the >>> queue (ugh!), or a driver having some way to ask the probability of >>> seeing more data in the >>> next 10us, or... something like that, could help. >>> >> >> Well, if the driver sees the arriving packets, it could infer that an >> ack will be produced shortly and will need a sending opportunity. > > Certainly in the case of wifi and lte and other simplex technologies > this seems feasible... > > 'cept that we're all busy finding ways to do ack compression this > month and thus the > two big tcp packets = 1 ack rule is going away. Still, an estimate, > with a short timeout > might help. It would be a poor algorithm that assumed the answer was "1" or "2" or "42". It would be necessary to analyze data to see if something adaptive is possible and it may not be. Your original note was looking for a way for finding out if the probability of seeing more data in the next 10us was sufficiently large to delay "a teeny bit" so that would be the problem statement. > > Another thing I've longed for (sometimes) is whether or not an > application like a web > browser signalling the OS that it has a batch of network packets > coming would help... > > web browser: > setsockopt(batch_everything) > parse the web page, generate all your dns, tcp requests, etc, etc > setsockopt(release_batch) > >> Kathie >> >> (we tried this mechanism out for cable data head ends at Com21 and it >> went into a patent that probably belongs to Arris now. But that was for >> cable. It is a fact universally acknowledged that a packet of data must >> be in want of an acknowledgement.) > > voip doesn't behave this way, but for recognisable protocols like tcp > and perhaps quic... > >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-22 14:01 ` Kathleen Nichols @ 2018-06-22 14:12 ` Jonathan Morton 2018-06-22 14:49 ` Michael Richardson 0 siblings, 1 reply; 32+ messages in thread From: Jonathan Morton @ 2018-06-22 14:12 UTC (permalink / raw) To: Kathleen Nichols; +Cc: bloat > On 22 Jun, 2018, at 5:01 pm, Kathleen Nichols <nichols@pollere.com> wrote: > > Your original note was looking for a way > for finding out if the probability of seeing more data in the next 10us > was sufficiently large to delay "a teeny bit" so that would be the > problem statement. I would instead frame the problem as "how can we get hardware to incorporate extra packets, which arrive between the request and grant phases of the MAC, into the same TXOP?" Then we no longer need to think probabilistically, or induce unnecessary delay in the case that no further packets arrive. - Jonathan Morton ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-22 14:12 ` Jonathan Morton @ 2018-06-22 14:49 ` Michael Richardson 2018-06-22 15:02 ` Jonathan Morton 0 siblings, 1 reply; 32+ messages in thread From: Michael Richardson @ 2018-06-22 14:49 UTC (permalink / raw) To: bloat Jonathan Morton <chromatix99@gmail.com> wrote: >> Your original note was looking for a way >> for finding out if the probability of seeing more data in the next 10us >> was sufficiently large to delay "a teeny bit" so that would be the >> problem statement. > I would instead frame the problem as "how can we get hardware to > incorporate extra packets, which arrive between the request and grant > phases of the MAC, into the same TXOP?" Then we no longer need to > think probabilistically, or induce unnecessary delay in the case that > no further packets arrive. I've never looked at the ring/buffer/descriptor structure of the ath9k, but with most ethernet devices, they would just continue reading descriptors until it was empty. Is there some reason that something similar can not occur? Or is the problem at a higher level? Or is that we don't want to enqueue packets so early, because it's a source of bloat? -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | network architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-22 14:49 ` Michael Richardson @ 2018-06-22 15:02 ` Jonathan Morton 2018-06-22 21:55 ` Michael Richardson 0 siblings, 1 reply; 32+ messages in thread From: Jonathan Morton @ 2018-06-22 15:02 UTC (permalink / raw) To: Michael Richardson; +Cc: bloat > On 22 Jun, 2018, at 5:49 pm, Michael Richardson <mcr@sandelman.ca> wrote: > >> I would instead frame the problem as "how can we get hardware to >> incorporate extra packets, which arrive between the request and grant >> phases of the MAC, into the same TXOP?" Then we no longer need to >> think probabilistically, or induce unnecessary delay in the case that >> no further packets arrive. > > I've never looked at the ring/buffer/descriptor structure of the ath9k, but > with most ethernet devices, they would just continue reading descriptors > until it was empty. Is there some reason that something similar can not > occur? > > Or is the problem at a higher level? > Or is that we don't want to enqueue packets so early, because it's a source > of bloat? The question is of when the aggregate frame is constructed and "frozen", using only the packets in the queue at that instant. When the MAC grant occurs, transmission must begin immediately, so most hardware prepares the frame in advance of that moment - but how far in advance? Behaviour suggests that it can be as soon as the MAC request is issued, in response to the *first* packet arriving in the queue - so a second TXOP is required for the *subsequent* packets arriving a microsecond later, even though there's technically still plenty of time to reform the aggregate then. In principle it should be possible to delay frame construction until the moment the radio is switched on; there is a short period consumed by a data-indepedent preamble sequence. In the old days, HW designers would have bent over backwards to make that happen. - Jonathan Morton ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-22 15:02 ` Jonathan Morton @ 2018-06-22 21:55 ` Michael Richardson 2018-06-25 10:38 ` Toke Høiland-Jørgensen 0 siblings, 1 reply; 32+ messages in thread From: Michael Richardson @ 2018-06-22 21:55 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 1877 bytes --] Jonathan Morton <chromatix99@gmail.com> wrote: >>> I would instead frame the problem as "how can we get hardware to >>> incorporate extra packets, which arrive between the request and grant >>> phases of the MAC, into the same TXOP?" Then we no longer need to >>> think probabilistically, or induce unnecessary delay in the case that >>> no further packets arrive. >> >> I've never looked at the ring/buffer/descriptor structure of the ath9k, but >> with most ethernet devices, they would just continue reading descriptors >> until it was empty. Is there some reason that something similar can not >> occur? >> >> Or is the problem at a higher level? >> Or is that we don't want to enqueue packets so early, because it's a source >> of bloat? > The question is of when the aggregate frame is constructed and > "frozen", using only the packets in the queue at that instant. When > the MAC grant occurs, transmission must begin immediately, so most > hardware prepares the frame in advance of that moment - but how far in > advance? Oh, I understand now. The aggregate frame has to be constructed, and it's this frame that is actually in the xmit queue. I'm guessing that it's in the hardware, because if it was in the driver, then we could perhaps do something? > In principle it should be possible to delay frame construction until > the moment the radio is switched on; there is a short period consumed > by a data-indepedent preamble sequence. In the old days, HW designers > would have bent over backwards to make that happen. -- ] Never tell me the odds! | ipv6 mesh networks [ ] Michael Richardson, Sandelman Software Works | network architect [ ] mcr@sandelman.ca http://www.sandelman.ca/ | ruby on rails [ [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 464 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-22 21:55 ` Michael Richardson @ 2018-06-25 10:38 ` Toke Høiland-Jørgensen 2018-06-25 23:54 ` Jim Gettys 0 siblings, 1 reply; 32+ messages in thread From: Toke Høiland-Jørgensen @ 2018-06-25 10:38 UTC (permalink / raw) To: Michael Richardson, bloat Michael Richardson <mcr@sandelman.ca> writes: > Jonathan Morton <chromatix99@gmail.com> wrote: > >>> I would instead frame the problem as "how can we get hardware to > >>> incorporate extra packets, which arrive between the request and grant > >>> phases of the MAC, into the same TXOP?" Then we no longer need to > >>> think probabilistically, or induce unnecessary delay in the case that > >>> no further packets arrive. > >> > >> I've never looked at the ring/buffer/descriptor structure of the ath9k, but > >> with most ethernet devices, they would just continue reading descriptors > >> until it was empty. Is there some reason that something similar can not > >> occur? > >> > >> Or is the problem at a higher level? > >> Or is that we don't want to enqueue packets so early, because it's a source > >> of bloat? > > > The question is of when the aggregate frame is constructed and > > "frozen", using only the packets in the queue at that instant. When > > the MAC grant occurs, transmission must begin immediately, so most > > hardware prepares the frame in advance of that moment - but how far in > > advance? > > Oh, I understand now. The aggregate frame has to be constructed, and it's > this frame that is actually in the xmit queue. I'm guessing that it's in the > hardware, because if it was in the driver, then we could perhaps do > something? No, it's in the driver for ath9k. So it would be possible to delay it slightly to try to build a larger one. The timing constraints are too tight to do it reactively when the request is granted, though; so delaying would result in idleness if there are no other flows to queue before then... Even for devices that build aggregates in firmware or hardware (as all AC chipsets do), it might be possible to throttle the queues at higher levels to try to get better batching. It's just not obvious that there's an algorithm that can do this in a way that will "do no harm" for other types of traffic, for instance... -Toke ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-25 10:38 ` Toke Høiland-Jørgensen @ 2018-06-25 23:54 ` Jim Gettys 2018-06-26 0:07 ` Jonathan Morton 0 siblings, 1 reply; 32+ messages in thread From: Jim Gettys @ 2018-06-25 23:54 UTC (permalink / raw) To: Toke Høiland-Jørgensen; +Cc: Michael Richardson, bloat [-- Attachment #1: Type: text/plain, Size: 3141 bytes --] On Mon, Jun 25, 2018 at 6:38 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote: > Michael Richardson <mcr@sandelman.ca> writes: > > > Jonathan Morton <chromatix99@gmail.com> wrote: > > >>> I would instead frame the problem as "how can we get hardware to > > >>> incorporate extra packets, which arrive between the request and > grant > > >>> phases of the MAC, into the same TXOP?" Then we no longer need > to > > >>> think probabilistically, or induce unnecessary delay in the case > that > > >>> no further packets arrive. > > >> > > >> I've never looked at the ring/buffer/descriptor structure of the > ath9k, but > > >> with most ethernet devices, they would just continue reading > descriptors > > >> until it was empty. Is there some reason that something similar > can not > > >> occur? > > >> > > >> Or is the problem at a higher level? > > >> Or is that we don't want to enqueue packets so early, because > it's a source > > >> of bloat? > > > > > The question is of when the aggregate frame is constructed and > > > "frozen", using only the packets in the queue at that instant. > When > > > the MAC grant occurs, transmission must begin immediately, so most > > > hardware prepares the frame in advance of that moment - but how > far in > > > advance? > > > > Oh, I understand now. The aggregate frame has to be constructed, and > it's > > this frame that is actually in the xmit queue. I'm guessing that it's > in the > > hardware, because if it was in the driver, then we could perhaps do > > something? > > No, it's in the driver for ath9k. So it would be possible to delay it > slightly to try to build a larger one. The timing constraints are too > tight to do it reactively when the request is granted, though; so > delaying would result in idleness if there are no other flows to queue > before then... > > Even for devices that build aggregates in firmware or hardware (as all > AC chipsets do), it might be possible to throttle the queues at higher > levels to try to get better batching. It's just not obvious that there's > an algorithm that can do this in a way that will "do no harm" for other > types of traffic, for instance... > > > Isn't this sort of delay a natural consequence of a busy channel? What matters is not conserving txops *all the time*, but only when the channel is busy and there aren't more txops available.... So when you are trying to transmit on a busy channel, that contention time will naturally increase, since you won't be able to get a transmit opportunity immediately. So you should queue up more packets into an aggregate in that case. We only care about conserving txops when they are scarce, not when they are abundant. This principle is why a window system as crazy as X11 is competitive: it naturally becomes more efficient in the face of load (more and more requests batch up and are handled at maximum efficiency, so the system is at maximum efficiency at full load. Or am I missing something here? Jim [-- Attachment #2: Type: text/html, Size: 5487 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-25 23:54 ` Jim Gettys @ 2018-06-26 0:07 ` Jonathan Morton 2018-06-26 0:21 ` David Lang 0 siblings, 1 reply; 32+ messages in thread From: Jonathan Morton @ 2018-06-26 0:07 UTC (permalink / raw) To: Jim Gettys; +Cc: Toke Høiland-Jørgensen, bloat >> No, it's in the driver for ath9k. So it would be possible to delay it >> slightly to try to build a larger one. The timing constraints are too >> tight to do it reactively when the request is granted, though; so >> delaying would result in idleness if there are no other flows to queue >> before then... There has to be some sort of viable compromise here. How about initiating the request immediately, then building the aggregate when the request completes transmission? That should give at least the few microseconds required for the immediately following acks to reach the queue, and be included in the same aggregate. > Isn't this sort of delay a natural consequence of a busy channel? > > What matters is not conserving txops *all the time*, but only when the channel is busy and there aren't more txops available.... > > So when you are trying to transmit on a busy channel, that contention time will naturally increase, since you won't be able to get a transmit opportunity immediately. So you should queue up more packets into an aggregate in that case. > > We only care about conserving txops when they are scarce, not when they are abundant. > > This principle is why a window system as crazy as X11 is competitive: it naturally becomes more efficient in the face of load (more and more requests batch up and are handled at maximum efficiency, so the system is at maximum efficiency at full load. > > Or am I missing something here? The problem is that currently every data aggregate received (one TXOP each from the AP) results in two TXOPs just to acknowledge them, the first one containing only a single ack. This is clearly wasteful, given the airtime overhead per TXOP relative to the raw data rate of modern wifi. Relying solely on backpressure would require that the channel was sufficiently busy to prevent the second TXOP from occurring until the following data aggregate is received, and that just seems too delicate to me. - Jonathan Morton ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-26 0:07 ` Jonathan Morton @ 2018-06-26 0:21 ` David Lang 2018-06-26 0:36 ` Simon Barber 0 siblings, 1 reply; 32+ messages in thread From: David Lang @ 2018-06-26 0:21 UTC (permalink / raw) To: Jonathan Morton; +Cc: Jim Gettys, bloat On Tue, 26 Jun 2018, Jonathan Morton wrote: >> We only care about conserving txops when they are scarce, not when they are abundant. >> >> This principle is why a window system as crazy as X11 is competitive: it naturally becomes more efficient in the face of load (more and more requests batch up and are handled at maximum efficiency, so the system is at maximum efficiency at full load. >> >> Or am I missing something here? > > The problem is that currently every data aggregate received (one TXOP each > from the AP) results in two TXOPs just to acknowledge them, the first one > containing only a single ack. This is clearly wasteful, given the airtime > overhead per TXOP relative to the raw data rate of modern wifi. Relying > solely on backpressure would require that the channel was sufficiently busy to > prevent the second TXOP from occurring until the following data aggregate is > received, and that just seems too delicate to me. If there are no other stations competing for airtime, why does it matter that we use two txops? [1] If there are no other stations that you are competing with for airtime, go ahead and use it. If there are other stations that you are competing with for airtime, you are unlikely to get the txop immediately, so as long as you can keep updating the rf packet to send until the txop actially happens, the later data will get folded in. There will be a few times when you do get the txop immediately, and so you do end up 'wasting' a txop, but the vast majority of the time you will be able to combine the packets. Now, the trick is figureing out how long we can wait to finalize the rf packet David Lang [1] ignoring the hidden transmitter problem for the moment) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-26 0:21 ` David Lang @ 2018-06-26 0:36 ` Simon Barber 2018-06-26 0:44 ` Jonathan Morton 0 siblings, 1 reply; 32+ messages in thread From: Simon Barber @ 2018-06-26 0:36 UTC (permalink / raw) To: David Lang; +Cc: Jonathan Morton, bloat Most hardware needs the packet finalized before it starts to contend for the medium (as far as I’m aware - let me know if you know differently). One issue is that if RTS/CTS is in use, then the packet duration needs to be known in advance (or at least mid point of the RTS transmission). Simon > On Jun 25, 2018, at 5:21 PM, David Lang <david@lang.hm> wrote: > > On Tue, 26 Jun 2018, Jonathan Morton wrote: > >>> We only care about conserving txops when they are scarce, not when they are abundant. >>> This principle is why a window system as crazy as X11 is competitive: it naturally becomes more efficient in the face of load (more and more requests batch up and are handled at maximum efficiency, so the system is at maximum efficiency at full load. >>> Or am I missing something here? >> >> The problem is that currently every data aggregate received (one TXOP each from the AP) results in two TXOPs just to acknowledge them, the first one containing only a single ack. This is clearly wasteful, given the airtime overhead per TXOP relative to the raw data rate of modern wifi. Relying solely on backpressure would require that the channel was sufficiently busy to prevent the second TXOP from occurring until the following data aggregate is received, and that just seems too delicate to me. > > If there are no other stations competing for airtime, why does it matter that we use two txops? [1] > > If there are no other stations that you are competing with for airtime, go ahead and use it. If there are other stations that you are competing with for airtime, you are unlikely to get the txop immediately, so as long as you can keep updating the rf packet to send until the txop actially happens, the later data will get folded in. > > There will be a few times when you do get the txop immediately, and so you do end up 'wasting' a txop, but the vast majority of the time you will be able to combine the packets. > > Now, the trick is figureing out how long we can wait to finalize the rf packet > > David Lang > > > [1] ignoring the hidden transmitter problem for the moment) > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-26 0:36 ` Simon Barber @ 2018-06-26 0:44 ` Jonathan Morton 2018-06-26 0:52 ` Jim Gettys ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Jonathan Morton @ 2018-06-26 0:44 UTC (permalink / raw) To: Simon Barber; +Cc: David Lang, bloat > On 26 Jun, 2018, at 3:36 am, Simon Barber <simon@superduper.net> wrote: > > Most hardware needs the packet finalized before it starts to contend for the medium (as far as I’m aware - let me know if you know differently). One issue is that if RTS/CTS is in use, then the packet duration needs to be known in advance (or at least mid point of the RTS transmission). This is a valid argument. I think we could successfully argue for a delay of 1ms, if there isn't already enough data in the queue to fill an aggregate, after the oldest packet arrives until a request is issued. > If there are no other stations competing for airtime, why does it matter that we use two txops? One further argument would be power consumption. Radio transmitters eat batteries for lunch; the only consistently worse offender I can think of is a display backlight, assuming the software is efficient. - Jonathan Morton ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-26 0:44 ` Jonathan Morton @ 2018-06-26 0:52 ` Jim Gettys 2018-06-26 0:56 ` David Lang 2018-06-26 1:27 ` Dave Taht 2 siblings, 0 replies; 32+ messages in thread From: Jim Gettys @ 2018-06-26 0:52 UTC (permalink / raw) To: Jonathan Morton; +Cc: Simon Barber, bloat [-- Attachment #1: Type: text/plain, Size: 1622 bytes --] On Mon, Jun 25, 2018 at 8:44 PM Jonathan Morton <chromatix99@gmail.com> wrote: > > On 26 Jun, 2018, at 3:36 am, Simon Barber <simon@superduper.net> wrote: > > > > Most hardware needs the packet finalized before it starts to contend for > the medium (as far as I’m aware - let me know if you know differently). One > issue is that if RTS/CTS is in use, then the packet duration needs to be > known in advance (or at least mid point of the RTS transmission). > > This is a valid argument. I think we could successfully argue for a delay > of 1ms, if there isn't already enough data in the queue to fill an > aggregate, after the oldest packet arrives until a request is issued. > > > If there are no other stations competing for airtime, why does it matter > that we use two txops? > > One further argument would be power consumption. Radio transmitters eat > batteries for lunch; the only consistently worse offender I can think of is > a display backlight, assuming the software is efficient. > > > Not clear if this is true; we need current data. In OLPC days, we measured the receive/transmit power consumption, and transmit took essentially no more power than receive. The dominant power consumption was due to signal processing the RF, not the transmitter. Just listening sucked power.... Does someone understand what current 802.11 and actual chip sets consume for power? Jim > - Jonathan Morton > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > [-- Attachment #2: Type: text/html, Size: 2824 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-26 0:44 ` Jonathan Morton 2018-06-26 0:52 ` Jim Gettys @ 2018-06-26 0:56 ` David Lang 2018-06-26 11:16 ` Toke Høiland-Jørgensen 2018-06-26 1:27 ` Dave Taht 2 siblings, 1 reply; 32+ messages in thread From: David Lang @ 2018-06-26 0:56 UTC (permalink / raw) To: Jonathan Morton; +Cc: Simon Barber, bloat [-- Attachment #1: Type: TEXT/PLAIN, Size: 2640 bytes --] On Tue, 26 Jun 2018, Jonathan Morton wrote: >> On 26 Jun, 2018, at 3:36 am, Simon Barber <simon@superduper.net> wrote: >> >> Most hardware needs the packet finalized before it starts to contend for the >> medium (as far as I’m aware - let me know if you know differently). One issue >> is that if RTS/CTS is in use, then the packet duration needs to be known in >> advance (or at least mid point of the RTS transmission). > > This is a valid argument. I think we could successfully argue for a delay of > 1ms, if there isn't already enough data in the queue to fill an aggregate, > after the oldest packet arrives until a request is issued. why does the length of the txop need to be known at the time that it's requested? I could see an argument that fairness algorithms need this info, but the per txop overhead is _so_ much larger than the data transmission, that you would have to add a huge amount of data to noticably affect the length of the transmission. remember, in wifi you don't ask a central point for permission to use X amount of airtime, you wait for everyone to stop transmitting (and then a random time) and then start sending. Nothing else in the area knows that you are going to start transmitting, and it's only once they start decoding the start of the rf packet you are sending that they can see how long it will be before you finish >> If there are no other stations competing for airtime, why does it matter that we use two txops? > > One further argument would be power consumption. Radio transmitters eat > batteries for lunch; the only consistently worse offender I can think of is a > display backlight, assuming the software is efficient. True, but this gets back to the question of how frequent this case is. If you are in areas with congestion most of the time, so the common case is to have to wait long enough for the data to be combined, then the difference in power savings is going to be small. 'waiting just in case there is more to send' looks good on specific benchmarks, but it adds latency all the time, even when it's not needed. Now, using a travel analogy I think how we operate today is as if we were a train at a station, when we first are ready to move, the doors are closed and everyone sits inside waiting for permission to move (think of how annoyed you have been sitting in a closed aircraft at an airport waiting to move), and anyone outside has to wait for the next train But if instead we leave the doors open after we request permission, and only close them when we know that we are going to be able to send very soon, late arrivals can board. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-26 0:56 ` David Lang @ 2018-06-26 11:16 ` Toke Høiland-Jørgensen 0 siblings, 0 replies; 32+ messages in thread From: Toke Høiland-Jørgensen @ 2018-06-26 11:16 UTC (permalink / raw) To: David Lang, Jonathan Morton; +Cc: bloat David Lang <david@lang.hm> writes: > On Tue, 26 Jun 2018, Jonathan Morton wrote: > >>> On 26 Jun, 2018, at 3:36 am, Simon Barber <simon@superduper.net> wrote: >>> >>> Most hardware needs the packet finalized before it starts to contend for the >>> medium (as far as I’m aware - let me know if you know differently). One issue >>> is that if RTS/CTS is in use, then the packet duration needs to be known in >>> advance (or at least mid point of the RTS transmission). >> >> This is a valid argument. I think we could successfully argue for a delay of >> 1ms, if there isn't already enough data in the queue to fill an aggregate, >> after the oldest packet arrives until a request is issued. > > why does the length of the txop need to be known at the time that it's > requested? Because that's how the hardware is designed. There are really two discussions here: (1) what could we do with a clean-slate(ish) design, and (2) what can we retrofit into existing drivers such as the ath9k. I think that the answer to (1) is probably 'quite a lot', but unfortunately the answer to (2) is 'not that much'. We could probably do a little bit better in ath9k, but for anything newer all bets are off, as this functionality has moved into firmware. Now, if there was a hardware vendor that was paying attention and could do the right thing throughout the stack, that would be awesome of course. But for Linux at least, sadly it seems that most hardware vendors can barely figure out how to get *any* driver upstream... :/ Also, from a conceptual point of view, I really think ACK timing issues are best solved at the TCP stack level. Which Eric is already working on (SACK compression is already in 4.18, with normal ACK compression to follow). -Toke ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-26 0:44 ` Jonathan Morton 2018-06-26 0:52 ` Jim Gettys 2018-06-26 0:56 ` David Lang @ 2018-06-26 1:27 ` Dave Taht 2018-06-26 3:30 ` Simon Barber 2 siblings, 1 reply; 32+ messages in thread From: Dave Taht @ 2018-06-26 1:27 UTC (permalink / raw) To: Jonathan Morton; +Cc: Simon Barber, bloat, Make-Wifi-fast On Mon, Jun 25, 2018 at 5:44 PM, Jonathan Morton <chromatix99@gmail.com> wrote: >> On 26 Jun, 2018, at 3:36 am, Simon Barber <simon@superduper.net> wrote: >> >> Most hardware needs the packet finalized before it starts to contend for the medium (as far as I’m aware - let me know if you know differently). One issue is that if RTS/CTS is in use, then the packet duration needs to be known in advance (or at least mid point of the RTS transmission). > > This is a valid argument. I think we could successfully argue for a delay of 1ms, if there isn't already enough data in the queue to fill an aggregate, after the oldest packet arrives until a request is issued. Whoa, nelly! In the context of the local tcp stack over wifi, I was making an observation that I "frequently" saw a pattern of a single ack txop followed by a bunch in a separate txop. and I suggested a very short (10us) timeout before committing to the hw - not 1ms. Aside from this anecdote we have not got real data or statistics. The closest thing I have to a tool that can take apart wireless aircaps is here: https://github.com/dtaht/airtime-pie-chart which can be hacked to take more things apart than it currently does. Looking for this pattern in more traffic would be revealing in multiple ways. Looking for more patterns in bigger wifi networks would be good also. I like erics suggestion of doing more ack compression higher up in the tcp stack. There are two other things I've suggested in the past we look at. 1) The current fq_codel_for_wifi code has a philosophy of "one aggregate in the hardware, one ready to go". A simpler modification to fit more in would be to (wait the best case estimate for delivering the one in the hardware - a bit), then form the one ready-to-go. 2) rate limiting mcast and smoothing mcast bursts over time, allowing more unicast through. presently the mcast queue is infinite and very bursty. 802.11 std actually suggests mcast be rate limited by htb, where I'd be htb + fq + merging dup packets. I was routinely able to blow up the c.h.i.p's wifi and the babel protocol by flooding it with mcast, as the local mcast queue could easily grow 16+ seconds long. um, I'm giving a preso tomorrow and will run behind this thread. It's nice to see the renewed enthusiasm here, keep it up. >> If there are no other stations competing for airtime, why does it matter that we use two txops? > > One further argument would be power consumption. Radio transmitters eat batteries for lunch; the only consistently worse offender I can think of is a display backlight, assuming the software is efficient. > - Jonathan Morton > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [Bloat] lwn.net's tcp small queues vs wifi aggregation solved 2018-06-26 1:27 ` Dave Taht @ 2018-06-26 3:30 ` Simon Barber 0 siblings, 0 replies; 32+ messages in thread From: Simon Barber @ 2018-06-26 3:30 UTC (permalink / raw) To: Dave Taht, Jonathan Morton; +Cc: bloat, Make-Wifi-fast Current versions of Wireshark have an experimental feature I added to expose airtime usage per packet and show 802.11 pcaps on a timeline. Enable it under Preferences->Protocol->802.11 Radio Simon On June 25, 2018 6:27:59 PM Dave Taht <dave.taht@gmail.com> wrote: > On Mon, Jun 25, 2018 at 5:44 PM, Jonathan Morton <chromatix99@gmail.com> wrote: >>> On 26 Jun, 2018, at 3:36 am, Simon Barber <simon@superduper.net> wrote: >>> >>> Most hardware needs the packet finalized before it starts to contend for >>> the medium (as far as I’m aware - let me know if you know differently). One >>> issue is that if RTS/CTS is in use, then the packet duration needs to be >>> known in advance (or at least mid point of the RTS transmission). >> >> This is a valid argument. I think we could successfully argue for a delay >> of 1ms, if there isn't already enough data in the queue to fill an >> aggregate, after the oldest packet arrives until a request is issued. > > Whoa, nelly! In the context of the local tcp stack over wifi, I was > making an observation that I "frequently" saw a pattern of a single > ack txop followed by a bunch in a separate txop. and I suggested a > very short (10us) timeout before committing to the hw - not 1ms. > > Aside from this anecdote we have not got real data or statistics. The > closest thing I have to a tool that can take apart wireless aircaps is > here: https://github.com/dtaht/airtime-pie-chart which can be hacked > to take more things apart than it currently does. Looking for this > pattern in more traffic would be revealing in multiple ways. Looking > for more patterns in bigger wifi networks would be good also. > > I like erics suggestion of doing more ack compression higher up in the > tcp stack. > > There are two other things I've suggested in the past we look at. 1) > The current fq_codel_for_wifi code has a philosophy of "one aggregate > in the hardware, one ready to go". A simpler modification to fit more > in would be to (wait the best case estimate for delivering the one in > the hardware - a bit), then form the one ready-to-go. > > 2) rate limiting mcast and smoothing mcast bursts over time, allowing > more unicast through. presently the mcast queue is infinite and very > bursty. 802.11 std actually suggests mcast be rate limited by htb, > where I'd be htb + fq + merging dup packets. I was routinely able to > blow up the c.h.i.p's wifi and the babel protocol by flooding it with > mcast, as the local mcast queue could easily grow 16+ seconds long. > > um, I'm giving a preso tomorrow and will run behind this thread. It's > nice to see the renewed enthusiasm here, keep it up. > >>> If there are no other stations competing for airtime, why does it matter >>> that we use two txops? >> >> One further argument would be power consumption. Radio transmitters eat >> batteries for lunch; the only consistently worse offender I can think of is >> a display backlight, assuming the software is efficient. > >> - Jonathan Morton >> >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > > > -- > > Dave Täht > CEO, TekLibre, LLC > http://www.teklibre.com > Tel: 1-669-226-2619 Sent with AquaMail for Android https://www.mobisystems.com/aqua-mail ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2018-06-26 11:16 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-06-21 4:58 [Bloat] lwn.net's tcp small queues vs wifi aggregation solved Dave Taht 2018-06-21 9:22 ` Toke Høiland-Jørgensen 2018-06-21 12:55 ` Eric Dumazet 2018-06-21 15:18 ` Dave Taht 2018-06-21 15:31 ` Caleb Cushing 2018-06-21 15:46 ` Stephen Hemminger 2018-06-21 17:41 ` Caleb Cushing 2018-06-21 15:50 ` Dave Taht 2018-06-21 16:29 ` David Collier-Brown 2018-06-21 16:54 ` Jonathan Morton 2018-06-21 16:43 ` Kathleen Nichols 2018-06-21 19:17 ` Dave Taht 2018-06-21 19:41 ` Sebastian Moeller 2018-06-21 19:51 ` Toke Høiland-Jørgensen 2018-06-21 19:54 ` Dave Taht 2018-06-21 20:11 ` Sebastian Moeller 2018-06-22 14:01 ` Kathleen Nichols 2018-06-22 14:12 ` Jonathan Morton 2018-06-22 14:49 ` Michael Richardson 2018-06-22 15:02 ` Jonathan Morton 2018-06-22 21:55 ` Michael Richardson 2018-06-25 10:38 ` Toke Høiland-Jørgensen 2018-06-25 23:54 ` Jim Gettys 2018-06-26 0:07 ` Jonathan Morton 2018-06-26 0:21 ` David Lang 2018-06-26 0:36 ` Simon Barber 2018-06-26 0:44 ` Jonathan Morton 2018-06-26 0:52 ` Jim Gettys 2018-06-26 0:56 ` David Lang 2018-06-26 11:16 ` Toke Høiland-Jørgensen 2018-06-26 1:27 ` Dave Taht 2018-06-26 3:30 ` Simon Barber
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox