Sounds encouraging. Just a note/thought... I am actually not running
ingress through the IFB. I set the download speed to 0 and perform egress
shaping on the lan bridge iface (br-lan). Maybe not the lightest setup, but
this gives a really high flexiblity to classify the ingress traffic (ie. I
dont have to use best effort nor trust the incoming DSCPs which are anyway
tweaked by my provider).

On x86_64 I agree with your observation, no speed problem there. But
running a 1.83 Ghz box for routing 4 devices is a tad overkill :)

I use the connmark stuff to reclass tcp streaming traffic from CS0 to AF4x
and udp traffic to EF (Netflix/Youtube and Voip respectively). I also apply
priority to DNS (AF4x), NTP (EF), SSH (AF4x) and deprioritize some traffic
to CS1. I do this in my firewall3 config and firewall.user (some of my
rules are MAC based -> chromecast, nas box, etc..). I set the mark and
connmark save it on egress, on ingress (egress of the other iface) I simply
restore the mark. In an ideal world... all apps would apply meaningful
DSCPs to their packets and this classification would not be needed. In
general, I tend to trust the DSCPs on egress and dont override them, on
ingress I squash everything.

I used to run htb + fq_codel and performance was awful, cake performs
*much* better.
