[Cake] Fwd: clogging qdisc

Dave Taht dave.taht at gmail.com
Sun Dec 30 11:51:21 EST 2018


real example of an isp configuration

---------- Forwarded message ---------
From: Grzegorz Gwóźdź <grzegorz at gwozdz.info>
Date: Sat, Dec 29, 2018 at 4:25 PM
Subject: Re: clogging qdisc
To: <lartc at vger.kernel.org>


sch_cake looks promising but is too simple. I've got thousands of
customers with different tariffs

My setup (eth0 is FROM customers, eth1 is TO Internet):

/sbin/tc qdisc add dev eth0 root handle 1: hfsc default 1
/sbin/tc qdisc add dev eth1 root handle 1: hfsc default 1

#Base class
/sbin/tc class add dev eth0 parent 1: classid 1:1 hfsc sc m1 2048000kbit
d 10000000 m2 2048000kbit ul m1 2048000kbit d 10000000 m2 2048000kbit
/sbin/tc class add dev eth1 parent 1: classid 1:1 hfsc sc m1 2048000kbit
d 10000000 m2 2048000kbit ul m1 2048000kbit d 10000000 m2 2048000kbit

#Hash filters 1 lvl
/sbin/tc filter add dev eth0 parent 1:0 prio 1 handle 255: protocol ip
u32 divisor 256
/sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 800::
match ip dst 192.168.0.0/16 hashkey mask 0x0000ff00 at 16 link 255:
/sbin/tc filter add dev eth1 parent 1:0 prio 1 handle 255: protocol ip
u32 divisor 256
/sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 800::
match ip src 192.168.0.0/16 hashkey mask 0x0000ff00 at 12 link 255:

#Hash filters 2 lvl
for i in `seq 1 254`; do
     Hi=`printf "%.2x" $i`
     /sbin/tc filter add dev eth0 parent 1:0 prio 1 handle $Hi: protocol
ip u32 divisor 256
     /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht
255:$Hi: match ip dst 192.168.$i.0/24 hashkey mask 0x000000ff at 16 link
$Hi:
done

for i in `seq 1 254`; do
     Hi=`printf "%.2x" $i`
     /sbin/tc filter add dev eth1 parent 1:0 prio 1 handle $Hi: protocol
ip u32 divisor 256
     /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht
255:$Hi: match ip src 192.168.$i.0/24 hashkey mask 0x000000ff at 12 link
$Hi:
done

#And for every customer (about 3000):
######################
let dwnrate=12288
let dwnceil=14336
/sbin/tc class add dev eth0 parent 1: classid 1:0113 hfsc sc m1
$dwnceil"kbit" d 30000000 m2 $dwnrate"kbit" ul m1 $dwnceil"kbit" d
30000000 m2 $dwnrate"kbit"
/sbin/tc qdisc add dev eth0 parent 1:0113 handle 0113: sfq perturb 10

let uplrate=3072
let uplceil=3584
/sbin/tc class add dev eth1 parent 1: classid 1:0113 hfsc sc m1
$uplceil"kbit" d 30000000 m2 $uplrate"kbit" ul m1 $uplceil"kbit" d
30000000 m2 $uplrate"kbit"
/sbin/tc qdisc add dev eth1 parent 1:0113 handle 0113: sfq perturb 10

/sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 01:13:
match ip dst 192.168.1.19/32 flowid 1:0113
/sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 01:13:
match ip src 192.168.1.19/32 flowid 1:0113
######################

let dwnrate=8192
let dwnceil=10240
/sbin/tc class add dev eth0 parent 1: classid 1:0219 hfsc sc m1
$dwnceil"kbit" d 30000000 m2 $dwnrate"kbit" ul m1 $dwnceil"kbit" d
30000000 m2 $dwnrate"kbit"
/sbin/tc qdisc add dev eth0 parent 1:0219 handle 0219: sfq perturb 10

let uplrate=2048
let uplceil=2560
/sbin/tc class add dev eth1 parent 1: classid 1:0219 hfsc sc m1
$uplceil"kbit" d 30000000 m2 $uplrate"kbit" ul m1 $uplceil"kbit" d
30000000 m2 $uplrate"kbit"
/sbin/tc qdisc add dev eth1 parent 1:0219 handle 0219: sfq perturb 10


/sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 02:19:
match ip dst 192.168.2.25/32 flowid 1:0219
/sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 02:19:
match ip src 192.168.2.25/32 flowid 1:0219

######################

I use static routing and next container (linked by bridge common for
both containers) is doing NAT


I would like to delete classes and filters one by one to find out if
this is specific customer that is causing trouble...

I can do:

/sbin/tc qdisc del dev eth0 parent 1:0219 handle 0219: sfq perturb 10

but I can't do

/sbin/tc class del dev eth0 parent 1: classid 1:0219

or

/sbin/tc class del dev eth0 parent 1: classid 1:0219 hfsc sc m1
10240kbit d 30000000 m2 8192kbit ul m1 10240kbit d 30000000 m2 8192kbit

because:

RTNETLINK answers: Device or resource busy

Why?


Deleting filters also does not work as expected

/sbin/tc filter del dev eth0 protocol ip parent 1:0 prio 1 u32 ht 02:19:
match ip dst 192.168.2.25/32 flowid 1:0219

deletes all filters. After that

tc -s filter ls dev eth0

returns nothing. Why?


GG


On 28.12.2018 12:57, Dave Taht wrote:
> I am of course, always interested in more folk dumping hfsc and
> complicated designs, and trying sch_cake....
>
> On Fri, Dec 28, 2018 at 3:54 AM Alan Goodman
> <notifications at yescomputersolutions.com> wrote:
>> Perhaps you should post an example of your tc setup?
>>
>> I had a bug a few months back where traffic in important queues would
>> seemingly randomly get 100% drop rate (as in your example below).  Upon
>> penning an email with the tc setup I realised that I had a leaf class on
>> the wrong branch and was trying to guarantee 99.9+% of traffic for that
>> leaf if it had significant traffic... Number 1:2 was swapped for number
>> 1:1 and everything went back to normal.
>>
>> Alan
>>
>> On 27/12/2018 22:26, Grzegorz Gwóźdź wrote:
>>>> Are there any "hacks" in TC allowing to look in the guts?
>>>>
>>>> It looks like it's changing state to "clogged" but
>>>>
>>>> tc -s class ls dev eth0
>>>>
>>>> looks completely normal (only grows number of sfq queues created
>>>> dynamically for every connection since more and more connections are
>>>> created but not closed)
>>>
>>> In fact i've noticed something interesting during "clugged" state...
>>>
>>> a few runs of:
>>>
>>> tc -s class ls dev eth0
>>>
>>> shows that filters sort packets well but packets that goes into
>>> suitable classes are dropped:
>>>
>>> class hfsc 1:1012 parent 1: leaf 1012: sc m1 6144Kbit d 10.0s m2
>>> 4096Kbit ul m1 6144Kbit d 10.0s m2 4096Kbit
>>>   Sent 103306048 bytes 75008 pkt (dropped 12, overlimits 0 requeues 0)
>>>   backlog 39Kb 127p requeues 0
>>>   period 13718 work 103306048 bytes rtwork 103306048 bytes level 0
>>>
>>> and after a while:
>>>
>>> class hfsc 1:1012 parent 1: leaf 1012: sc m1 6144Kbit d 10.0s m2
>>> 4096Kbit ul m1 6144Kbit d 10.0s m2 4096Kbit
>>>   Sent 103306048 bytes 75008 pkt (dropped 116, overlimits 0 requeues 0)
>>>   backlog 39160b 127p requeues 0
>>>   period 13718 work 103306048 bytes rtwork 103306048 bytes level 0
>>>
>>> "Sent" stands still and all packets are "dropped"
>>>
>>> Some classes passes packets but as time goes by more and more classes
>>> stops passing and starts dropping.
>>>
>>>
>>> GG
>>>
>
>



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740


More information about the Cake mailing list