This is a much saner test result[1], showing about a 20% improvement under the rrul_be test. I scaled back the topology to two instances of cake on the middlebox, shaping to 100mbits on one side and 10mbits on the other, and flipped filtering on or off. The win will improve more with upload/download ratios of ever worse than 10/1, and the rrul is not exactly a test of real traffic. What other ratios are out there, particularly in the dsl world? I can think of a few ways to get more acks to filter out, for example, not using the "sparse flow optimization" for acks. [1] it also turned out my test target box, an odroid c2, couldn't push more than 500mbits bidir in the first place.