[SM] How does that generalize to internet access links? My gut feeling is that an FQ scheduler comes close.
Probably not possible. Current fiber SFP 100Gs optics are the most economic per the SERDES/Laser interface. Any other SFP is suboptimal, probably for the next decade. Then there are still DSL internet lines, satellite links, etc. And then WiFi which isn't really internet access but first/last hop to the IAP last mile link. It seems way too complicated to generalize a single solution. It's also a bit of an engineering race and deployment race so the targets driven mostly by market conditions and engineering project priorities are not fixed.
I do think we can define generalized tests - though that's a digression (and I'm biased too being a test & measurement engineer.)
[SM] I guess often things are obvious only retrospectively, but how could one design a switch differently?
A suggestion is to look at merchant silicon used by the major integrators that sell into data centers. But keep in mind the IAP forwarding plane is a moving target so having some form of hardware programmability in field is probably needed too. The COGs and volumes are very different too. I think the market and time will provide the final answer (if there is one) and then it will change again ;)
[SM] Is this driven more by the need to aggregate packets to amortize some cost over a larger payload or to reduce the scheduling overhead or to regularize things (as in fixed size DTUs used in DSL with G.INP retransmissions)?
TXOPs scarcity is driven by listen before talk (LBT.) This is needed for collision avoidance. Unfortunately, WiFi networks w/o waveguides that share the same carrier have to be separated in time in a distributed manner to optimize the overall system. )Adding a scheduling carrier done by things like mobile networks doesn't work well with small WiFi cells - though 802.11ax is a similar scheduling support mechanism)
[SM] I am all for better hardware, but will this ever allow us the regress back to dumb upper layers? I have some doubts, but hey I would not be unhappy if my AQM would stay idle most of the time, because lower layers avoid triggering it.
Doubtful to me to achieve the ideal. Transports enhancements like BBRv2 seem worthwhile. And, yes, the "AQM hammer" to mitigate standing queue(s') bloat is likely going to be needed as real engineering can't typically achieve an ideal as some resources are shared and finite as you stated elsewhere.
Many of the new responsiveness tests under loads are being designed to create this potentially "artificial" condition, though many times it's real too, so these tests are a good thing for awareness for sure. What these tests don't do is monitor actual traffic conditions over time and space to see how many times AQM had to be activated as well as measure how well disaggregating the congested shared queue is working.
My opinion is that devices that support OpenWRT could be instrumented to support network telemetry to provide actuals, at least for the WiFi hop. There are multiple ways to do this. Some require new engineering efforts. Others require distributed clock sync so tend to be in test labs only.
Bob