[Cake] clogging qdisc
Pete Heist
pete at heistp.net
Sun Dec 30 16:52:44 EST 2018
There’s at least one reason why hfsc is still in use- good rate limiting performance, but I was never able to get its service guarantees working as well as I’d like though. I prefer htb’s simpler design and predictable behavior, and I'd speculate that it’s hfsc that’s causing the clogging described.
This is interesting though, as I'm currently re-writing FreeNet’s qos script, “due” Jan. 8. It’s personal now, because after an upgrade to Ubiquiti’s AC gear I’ve got some problems at home with high RTT. One of the two causes of this is the backhaul qos scripts, which are making a 100mbit full-duplex link act like a half-duplex link with high TCP RTT.
I can reproduce it in the lab, and rrul_be tests are looking much better with a simpler queueing strategy, and cake. :) Either we’ll be convinced enough that cake is stable on kernel 3.16, or else it may still have to be htb/hfsc+fq_codel, we’ll see...
> On Dec 30, 2018, at 5:51 PM, Dave Taht <dave.taht at gmail.com> wrote:
>
> real example of an isp configuration
>
> ---------- Forwarded message ---------
> From: Grzegorz Gwóźdź <grzegorz at gwozdz.info>
> Date: Sat, Dec 29, 2018 at 4:25 PM
> Subject: Re: clogging qdisc
> To: <lartc at vger.kernel.org>
>
>
> sch_cake looks promising but is too simple. I've got thousands of
> customers with different tariffs
>
> My setup (eth0 is FROM customers, eth1 is TO Internet):
>
> /sbin/tc qdisc add dev eth0 root handle 1: hfsc default 1
> /sbin/tc qdisc add dev eth1 root handle 1: hfsc default 1
>
> #Base class
> /sbin/tc class add dev eth0 parent 1: classid 1:1 hfsc sc m1 2048000kbit
> d 10000000 m2 2048000kbit ul m1 2048000kbit d 10000000 m2 2048000kbit
> /sbin/tc class add dev eth1 parent 1: classid 1:1 hfsc sc m1 2048000kbit
> d 10000000 m2 2048000kbit ul m1 2048000kbit d 10000000 m2 2048000kbit
>
> #Hash filters 1 lvl
> /sbin/tc filter add dev eth0 parent 1:0 prio 1 handle 255: protocol ip
> u32 divisor 256
> /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 800::
> match ip dst 192.168.0.0/16 hashkey mask 0x0000ff00 at 16 link 255:
> /sbin/tc filter add dev eth1 parent 1:0 prio 1 handle 255: protocol ip
> u32 divisor 256
> /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 800::
> match ip src 192.168.0.0/16 hashkey mask 0x0000ff00 at 12 link 255:
>
> #Hash filters 2 lvl
> for i in `seq 1 254`; do
> Hi=`printf "%.2x" $i`
> /sbin/tc filter add dev eth0 parent 1:0 prio 1 handle $Hi: protocol
> ip u32 divisor 256
> /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht
> 255:$Hi: match ip dst 192.168.$i.0/24 hashkey mask 0x000000ff at 16 link
> $Hi:
> done
>
> for i in `seq 1 254`; do
> Hi=`printf "%.2x" $i`
> /sbin/tc filter add dev eth1 parent 1:0 prio 1 handle $Hi: protocol
> ip u32 divisor 256
> /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht
> 255:$Hi: match ip src 192.168.$i.0/24 hashkey mask 0x000000ff at 12 link
> $Hi:
> done
>
> #And for every customer (about 3000):
> ######################
> let dwnrate=12288
> let dwnceil=14336
> /sbin/tc class add dev eth0 parent 1: classid 1:0113 hfsc sc m1
> $dwnceil"kbit" d 30000000 m2 $dwnrate"kbit" ul m1 $dwnceil"kbit" d
> 30000000 m2 $dwnrate"kbit"
> /sbin/tc qdisc add dev eth0 parent 1:0113 handle 0113: sfq perturb 10
>
> let uplrate=3072
> let uplceil=3584
> /sbin/tc class add dev eth1 parent 1: classid 1:0113 hfsc sc m1
> $uplceil"kbit" d 30000000 m2 $uplrate"kbit" ul m1 $uplceil"kbit" d
> 30000000 m2 $uplrate"kbit"
> /sbin/tc qdisc add dev eth1 parent 1:0113 handle 0113: sfq perturb 10
>
> /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 01:13:
> match ip dst 192.168.1.19/32 flowid 1:0113
> /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 01:13:
> match ip src 192.168.1.19/32 flowid 1:0113
> ######################
>
> let dwnrate=8192
> let dwnceil=10240
> /sbin/tc class add dev eth0 parent 1: classid 1:0219 hfsc sc m1
> $dwnceil"kbit" d 30000000 m2 $dwnrate"kbit" ul m1 $dwnceil"kbit" d
> 30000000 m2 $dwnrate"kbit"
> /sbin/tc qdisc add dev eth0 parent 1:0219 handle 0219: sfq perturb 10
>
> let uplrate=2048
> let uplceil=2560
> /sbin/tc class add dev eth1 parent 1: classid 1:0219 hfsc sc m1
> $uplceil"kbit" d 30000000 m2 $uplrate"kbit" ul m1 $uplceil"kbit" d
> 30000000 m2 $uplrate"kbit"
> /sbin/tc qdisc add dev eth1 parent 1:0219 handle 0219: sfq perturb 10
>
>
> /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 02:19:
> match ip dst 192.168.2.25/32 flowid 1:0219
> /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 02:19:
> match ip src 192.168.2.25/32 flowid 1:0219
>
> ######################
>
> I use static routing and next container (linked by bridge common for
> both containers) is doing NAT
>
>
> I would like to delete classes and filters one by one to find out if
> this is specific customer that is causing trouble...
>
> I can do:
>
> /sbin/tc qdisc del dev eth0 parent 1:0219 handle 0219: sfq perturb 10
>
> but I can't do
>
> /sbin/tc class del dev eth0 parent 1: classid 1:0219
>
> or
>
> /sbin/tc class del dev eth0 parent 1: classid 1:0219 hfsc sc m1
> 10240kbit d 30000000 m2 8192kbit ul m1 10240kbit d 30000000 m2 8192kbit
>
> because:
>
> RTNETLINK answers: Device or resource busy
>
> Why?
>
>
> Deleting filters also does not work as expected
>
> /sbin/tc filter del dev eth0 protocol ip parent 1:0 prio 1 u32 ht 02:19:
> match ip dst 192.168.2.25/32 flowid 1:0219
>
> deletes all filters. After that
>
> tc -s filter ls dev eth0
>
> returns nothing. Why?
>
>
> GG
>
>
> On 28.12.2018 12:57, Dave Taht wrote:
>> I am of course, always interested in more folk dumping hfsc and
>> complicated designs, and trying sch_cake....
>>
>> On Fri, Dec 28, 2018 at 3:54 AM Alan Goodman
>> <notifications at yescomputersolutions.com> wrote:
>>> Perhaps you should post an example of your tc setup?
>>>
>>> I had a bug a few months back where traffic in important queues would
>>> seemingly randomly get 100% drop rate (as in your example below). Upon
>>> penning an email with the tc setup I realised that I had a leaf class on
>>> the wrong branch and was trying to guarantee 99.9+% of traffic for that
>>> leaf if it had significant traffic... Number 1:2 was swapped for number
>>> 1:1 and everything went back to normal.
>>>
>>> Alan
>>>
>>> On 27/12/2018 22:26, Grzegorz Gwóźdź wrote:
>>>>> Are there any "hacks" in TC allowing to look in the guts?
>>>>>
>>>>> It looks like it's changing state to "clogged" but
>>>>>
>>>>> tc -s class ls dev eth0
>>>>>
>>>>> looks completely normal (only grows number of sfq queues created
>>>>> dynamically for every connection since more and more connections are
>>>>> created but not closed)
>>>>
>>>> In fact i've noticed something interesting during "clugged" state...
>>>>
>>>> a few runs of:
>>>>
>>>> tc -s class ls dev eth0
>>>>
>>>> shows that filters sort packets well but packets that goes into
>>>> suitable classes are dropped:
>>>>
>>>> class hfsc 1:1012 parent 1: leaf 1012: sc m1 6144Kbit d 10.0s m2
>>>> 4096Kbit ul m1 6144Kbit d 10.0s m2 4096Kbit
>>>> Sent 103306048 bytes 75008 pkt (dropped 12, overlimits 0 requeues 0)
>>>> backlog 39Kb 127p requeues 0
>>>> period 13718 work 103306048 bytes rtwork 103306048 bytes level 0
>>>>
>>>> and after a while:
>>>>
>>>> class hfsc 1:1012 parent 1: leaf 1012: sc m1 6144Kbit d 10.0s m2
>>>> 4096Kbit ul m1 6144Kbit d 10.0s m2 4096Kbit
>>>> Sent 103306048 bytes 75008 pkt (dropped 116, overlimits 0 requeues 0)
>>>> backlog 39160b 127p requeues 0
>>>> period 13718 work 103306048 bytes rtwork 103306048 bytes level 0
>>>>
>>>> "Sent" stands still and all packets are "dropped"
>>>>
>>>> Some classes passes packets but as time goes by more and more classes
>>>> stops passing and starts dropping.
>>>>
>>>>
>>>> GG
>>>>
>>
>>
>
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740
> _______________________________________________
> Cake mailing list
> Cake at lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
More information about the Cake
mailing list