Hi
I am having a hard time trying to shape 3000 users at ceil speeds from 10 to 300mbps in a 7/7Gbps link using HTB+SFQ+TC(filter by IP hashkey mask) for a few days now tweaking HTB and SFQ parameters with no luck so far.
Everything seems right, up 4Gbps overall download speed with shaping on.
I have no significant packets delay, no dropped packets and no high CPU average loads (not more than 20% - htop info)
But when the speed comes to about 4.5Gbps download (upload is about 500mbps), chaos kicks in.
CPU load goes sky high (all 24x2.4Ghz physical cores above 90% - 48x2.4Ghz if count that virtualization is on) and as a consequence packets are dropped (as reported by tc -s class sh ...), RTT goes above 200ms and a lots of ungry users. This goes from about 7PM to 11 PM every day.
If I turn shaping off, everything return to normality immediately and peaks of not more than 5Gbps (1 second average) are observed and a CPU load of about 5%. So I infer the uplink is not crowded.
I use one root HTB qdisc and one root (1:) HTB class.
Then about 20~30 same level (1:xx) inner classes to (sort of) separate the users by region
And under these inner classes, goes the almost 3000 leaves (1:xxxx).
I have one class with about 900 users and this quantity decreases by the other inner classes having some of them with just one user.
Is the way I'm using HTB+SFQ+TC suitable for this job?
Since the script that creates the shaping environment is too long I do not post it here.
What can I inform you guys to help me solve this?
Fragments of code, stats, some measurements? What?
Thanks.
Regards
Ethy