Sorry I didn’t engage with this, folks — probably came across as rude, but just had a large and unexpected career shift ongoing (https://twitter.com/stub_AS/status/1469283183132876809?s=20), and didn’t feel up to it, especially, as I’m largely abandoning my research along these lines due to these developments. In any case, I have a lot of respect for you folks educating everyone on latency and buffer bloat, and have been following Dave (Taht)’s great work in the space for awhile. Best, Ankit On Jul 19, 2021, at 17:50, George Burdell > wrote: On Sat, Jul 10, 2021 at 01:27:28PM -0700, David Lang wrote: any buffer sizing based on the number of packets is wrong. Base your buffer size on transmit time and you have a chance of being reasonable. This is very true. Packets have a dynamic range of 64 bytes to 64k (GRO) and sizing queues in terms of packets leads to bad behavior on mixed up and down traffic particularly. Also... people doing AQM and TCP designs tend to almost always test one way traffic only, and this leads to less than desirable behavior on real world traffic. Strike that. Terrible behavior! a pure single queue AQM struggles mightily to find a good hit rate when there are a ton of acks, dns, gaming, voip, etc, mixed in with the capacity seeking flows. Nearly every AQM paper you read never tests real, bidir traffic. It's a huge blind spot, which is why the bufferbloat effort *starts* with the rrul test and related on testing any new idea we have. bfifos are better, but harder to implement in hardware. A fun trick: If you are trying to optimize your traffic for R/T communications rather than speedtest, you can clamp your tcp "mss" to smaller than 600 bytes *at the router*, and your network gets better. (we really should get around to publishing something on that, when you are plagued by a giant upstream FIFO, filling it with smaller packets really helps, and it's something a smart user could easily do regardless of the ISP's preferences) In cases like wifi where packets aren't sent individually, but are sent in blobs of packets going to the same destination, yes... you want to buffer at least a blobs worth of packets to each destination so that when your transmit slot comes up, you can maximize it. Nooooooo! This is one of those harder tradeoffs that is pretty counter intuitive. You want per station queuing, yes. However the decision as to how much service time you want to grant each station is absolutely not in maximizing the transmit slot, but in maximizing the number of stations you can serve in reasonable time. Simple (and inaccurate) example: 100 stations at 4ms txop each, stuffed full of *udp* data, is 400ms/round. (plus usually insane numbers of retries). This breaks a lot of things, and doesn't respect the closely coupled nature of tcp (please re-read the codel paper!). Cutting the txop in this case to 1ms cuts interstation service time... at the cost of "bandwidth" that can't be stuffed into the slow header + wifi data rate equation. but what you really want to do is give the sparsest stations quicker access to the media so they can ramp up to parity (and usually complete their short flows much faster, and then get off) I run with BE 2.4ms txops and announce the same in the beacon. I'd be willing to bet your scale conference network would work much better if you did that also. (It would be better if we could scale txop size to the load, but fq_codel on wifi already does the sparse station optimization which translates into many shorter txops than you would see from other wifi schedulers, and the bulk of the problem I see is the *stations*) lastly, you need to defer constructing the blob as long as possible, so you can shoot at, mark, or reschedule (FQ), the packets in there at the last moment before they are committed to the hardware. Ideally you would not construct any blob at all until a few microseconds before the transmit opportunity. Shifting this back to starlink - they have a marvelous opportunity to do just this, in the dishy, as they are half duplex and could defer grabbing the packets from a sch_cake buffer until precisely before that txop to the sat arrives. (my guess would be no more than 400us based on what I understand of the arm chip they are using) This would be much better than what we could do in the ath9k where we were forced to always have "one in the hardware, one ready to go" due to limitations in that chip. We're making some progress on the openwifi fpga here, btw... Wifi has the added issue that the blob headers are at a much lower data rate than the dta itself, so you can cram a LOT of data into a blob without making a significant difference in the airtime used, so you really do want to be able to send full blobs (not at the cost of delaying tranmission if you don't have a full blob, a mistake some people make, but you do want to buffer enough to fill the blobs) and given that dropped packets results in timeouts and retransmissions that affect the rest of the network, it's not obviously wrong for a lossy hop like wifi to retry a failed transmission, it just needs to not retry too many times. David Lang On Sat, 10 Jul 2021, Rodney W. Grimes wrote: Date: Sat, 10 Jul 2021 04:49:50 -0700 (PDT) From: Rodney W. Grimes > To: Dave Taht > Cc: starlink@lists.bufferbloat.net, Ankit Singla >, Sam Kumar > Subject: Re: [Starlink] SatNetLab: A call to arms for the next global Internet testbed While it is good to have a call to arms, like this: ... much information removed as I only one to reply to 1 very narrow, but IMHO, very real problem in our networks today ... Here's another piece of pre-history - alohanet - the TTL field was the "time to live" field. The intent was that the packet would indicate how much time it would be valid before it was discarded. It didn't work out, and was replaced by hopcount, which of course switched networks ignore and isonly semi-useful for detecting loops and the like. TTL works perfectly fine where the original assumptions that a device along a network path only hangs on to a packet for a reasonable short duration, and that there is not some "retry" mechanism in place that is causing this time to explode. BSD, and as far as I can recall, almost ALL original IP stacks had a Q depth limit of 50 packets on egress interfaces. Everything pretty much worked well and the net was happy. Then these base assumptions got blasted in the name of "measurable bandwidth" and the concept of packets are so precious we must not loose them, at almost any cost. Linux crammed the per interface Q up to 1000, wifi decided that it was reasable to retry at the link layer so many times that I have seen packets that are >60 seconds old. Proposed FIX: Any device that transmits packets that does not already have an inherit FIXED transmission time MUST consider the current TTL of that packet and give up if > 10mS * TTL elapses while it is trying to transmit. AND change the default if Q size in LINUX to 50 for fifo, the codel, etc AQM stuff is fine at 1000 as it has delay targets that present the issue that initially bumping this to 1000 caused. ... end of Rods Rant ... -- Rod Grimes rgrimes@freebsd.org _______________________________________________ Starlink mailing list Starlink@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/starlink _______________________________________________ Starlink mailing list Starlink@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/starlink