[Cake] Fwd: clogging qdisc

Cake - FQ_codel the next generation
 help / color / mirror / Atom feed

* [Cake] Fwd: clogging qdisc
       [not found]           ` <ff695b9a-741e-1d41-f94a-258d4189491c@gwozdz.info>
@ 2018-12-30 16:51             ` Dave Taht
  2018-12-30 21:52               ` [Cake] " Pete Heist
  0 siblings, 1 reply; 2+ messages in thread
From: Dave Taht @ 2018-12-30 16:51 UTC (permalink / raw)
  To: Cake List

real example of an isp configuration

---------- Forwarded message ---------
From: Grzegorz Gwóźdź <grzegorz@gwozdz.info>
Date: Sat, Dec 29, 2018 at 4:25 PM
Subject: Re: clogging qdisc
To: <lartc@vger.kernel.org>


sch_cake looks promising but is too simple. I've got thousands of
customers with different tariffs

My setup (eth0 is FROM customers, eth1 is TO Internet):

/sbin/tc qdisc add dev eth0 root handle 1: hfsc default 1
/sbin/tc qdisc add dev eth1 root handle 1: hfsc default 1

#Base class
/sbin/tc class add dev eth0 parent 1: classid 1:1 hfsc sc m1 2048000kbit
d 10000000 m2 2048000kbit ul m1 2048000kbit d 10000000 m2 2048000kbit
/sbin/tc class add dev eth1 parent 1: classid 1:1 hfsc sc m1 2048000kbit
d 10000000 m2 2048000kbit ul m1 2048000kbit d 10000000 m2 2048000kbit

#Hash filters 1 lvl
/sbin/tc filter add dev eth0 parent 1:0 prio 1 handle 255: protocol ip
u32 divisor 256
/sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 800::
match ip dst 192.168.0.0/16 hashkey mask 0x0000ff00 at 16 link 255:
/sbin/tc filter add dev eth1 parent 1:0 prio 1 handle 255: protocol ip
u32 divisor 256
/sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 800::
match ip src 192.168.0.0/16 hashkey mask 0x0000ff00 at 12 link 255:

#Hash filters 2 lvl
for i in `seq 1 254`; do
     Hi=`printf "%.2x" $i`
     /sbin/tc filter add dev eth0 parent 1:0 prio 1 handle $Hi: protocol
ip u32 divisor 256
     /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht
255:$Hi: match ip dst 192.168.$i.0/24 hashkey mask 0x000000ff at 16 link
$Hi:
done

for i in `seq 1 254`; do
     Hi=`printf "%.2x" $i`
     /sbin/tc filter add dev eth1 parent 1:0 prio 1 handle $Hi: protocol
ip u32 divisor 256
     /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht
255:$Hi: match ip src 192.168.$i.0/24 hashkey mask 0x000000ff at 12 link
$Hi:
done

#And for every customer (about 3000):
######################
let dwnrate=12288
let dwnceil=14336
/sbin/tc class add dev eth0 parent 1: classid 1:0113 hfsc sc m1
$dwnceil"kbit" d 30000000 m2 $dwnrate"kbit" ul m1 $dwnceil"kbit" d
30000000 m2 $dwnrate"kbit"
/sbin/tc qdisc add dev eth0 parent 1:0113 handle 0113: sfq perturb 10

let uplrate=3072
let uplceil=3584
/sbin/tc class add dev eth1 parent 1: classid 1:0113 hfsc sc m1
$uplceil"kbit" d 30000000 m2 $uplrate"kbit" ul m1 $uplceil"kbit" d
30000000 m2 $uplrate"kbit"
/sbin/tc qdisc add dev eth1 parent 1:0113 handle 0113: sfq perturb 10

/sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 01:13:
match ip dst 192.168.1.19/32 flowid 1:0113
/sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 01:13:
match ip src 192.168.1.19/32 flowid 1:0113
######################

let dwnrate=8192
let dwnceil=10240
/sbin/tc class add dev eth0 parent 1: classid 1:0219 hfsc sc m1
$dwnceil"kbit" d 30000000 m2 $dwnrate"kbit" ul m1 $dwnceil"kbit" d
30000000 m2 $dwnrate"kbit"
/sbin/tc qdisc add dev eth0 parent 1:0219 handle 0219: sfq perturb 10

let uplrate=2048
let uplceil=2560
/sbin/tc class add dev eth1 parent 1: classid 1:0219 hfsc sc m1
$uplceil"kbit" d 30000000 m2 $uplrate"kbit" ul m1 $uplceil"kbit" d
30000000 m2 $uplrate"kbit"
/sbin/tc qdisc add dev eth1 parent 1:0219 handle 0219: sfq perturb 10


/sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 02:19:
match ip dst 192.168.2.25/32 flowid 1:0219
/sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 02:19:
match ip src 192.168.2.25/32 flowid 1:0219

######################

I use static routing and next container (linked by bridge common for
both containers) is doing NAT


I would like to delete classes and filters one by one to find out if
this is specific customer that is causing trouble...

I can do:

/sbin/tc qdisc del dev eth0 parent 1:0219 handle 0219: sfq perturb 10

but I can't do

/sbin/tc class del dev eth0 parent 1: classid 1:0219

or

/sbin/tc class del dev eth0 parent 1: classid 1:0219 hfsc sc m1
10240kbit d 30000000 m2 8192kbit ul m1 10240kbit d 30000000 m2 8192kbit

because:

RTNETLINK answers: Device or resource busy

Why?


Deleting filters also does not work as expected

/sbin/tc filter del dev eth0 protocol ip parent 1:0 prio 1 u32 ht 02:19:
match ip dst 192.168.2.25/32 flowid 1:0219

deletes all filters. After that

tc -s filter ls dev eth0

returns nothing. Why?


GG


On 28.12.2018 12:57, Dave Taht wrote:
> I am of course, always interested in more folk dumping hfsc and
> complicated designs, and trying sch_cake....
>
> On Fri, Dec 28, 2018 at 3:54 AM Alan Goodman
> <notifications@yescomputersolutions.com> wrote:
>> Perhaps you should post an example of your tc setup?
>>
>> I had a bug a few months back where traffic in important queues would
>> seemingly randomly get 100% drop rate (as in your example below).  Upon
>> penning an email with the tc setup I realised that I had a leaf class on
>> the wrong branch and was trying to guarantee 99.9+% of traffic for that
>> leaf if it had significant traffic... Number 1:2 was swapped for number
>> 1:1 and everything went back to normal.
>>
>> Alan
>>
>> On 27/12/2018 22:26, Grzegorz Gwóźdź wrote:
>>>> Are there any "hacks" in TC allowing to look in the guts?
>>>>
>>>> It looks like it's changing state to "clogged" but
>>>>
>>>> tc -s class ls dev eth0
>>>>
>>>> looks completely normal (only grows number of sfq queues created
>>>> dynamically for every connection since more and more connections are
>>>> created but not closed)
>>>
>>> In fact i've noticed something interesting during "clugged" state...
>>>
>>> a few runs of:
>>>
>>> tc -s class ls dev eth0
>>>
>>> shows that filters sort packets well but packets that goes into
>>> suitable classes are dropped:
>>>
>>> class hfsc 1:1012 parent 1: leaf 1012: sc m1 6144Kbit d 10.0s m2
>>> 4096Kbit ul m1 6144Kbit d 10.0s m2 4096Kbit
>>>   Sent 103306048 bytes 75008 pkt (dropped 12, overlimits 0 requeues 0)
>>>   backlog 39Kb 127p requeues 0
>>>   period 13718 work 103306048 bytes rtwork 103306048 bytes level 0
>>>
>>> and after a while:
>>>
>>> class hfsc 1:1012 parent 1: leaf 1012: sc m1 6144Kbit d 10.0s m2
>>> 4096Kbit ul m1 6144Kbit d 10.0s m2 4096Kbit
>>>   Sent 103306048 bytes 75008 pkt (dropped 116, overlimits 0 requeues 0)
>>>   backlog 39160b 127p requeues 0
>>>   period 13718 work 103306048 bytes rtwork 103306048 bytes level 0
>>>
>>> "Sent" stands still and all packets are "dropped"
>>>
>>> Some classes passes packets but as time goes by more and more classes
>>> stops passing and starts dropping.
>>>
>>>
>>> GG
>>>
>
>



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Cake] clogging qdisc
  2018-12-30 16:51             ` [Cake] Fwd: clogging qdisc Dave Taht
@ 2018-12-30 21:52               ` Pete Heist
  0 siblings, 0 replies; 2+ messages in thread
From: Pete Heist @ 2018-12-30 21:52 UTC (permalink / raw)
  To: Dave Taht; +Cc: Cake List

There’s at least one reason why hfsc is still in use- good rate limiting performance, but I was never able to get its service guarantees working as well as I’d like though. I prefer htb’s simpler design and predictable behavior, and I'd speculate that it’s hfsc that’s causing the clogging described.

This is interesting though, as I'm currently re-writing FreeNet’s qos script, “due” Jan. 8. It’s personal now, because after an upgrade to Ubiquiti’s AC gear I’ve got some problems at home with high RTT. One of the two causes of this is the backhaul qos scripts, which are making a 100mbit full-duplex link act like a half-duplex link with high TCP RTT.

I can reproduce it in the lab, and rrul_be tests are looking much better with a simpler queueing strategy, and cake. :) Either we’ll be convinced enough that cake is stable on kernel 3.16, or else it may still have to be htb/hfsc+fq_codel, we’ll see...

> On Dec 30, 2018, at 5:51 PM, Dave Taht <dave.taht@gmail.com> wrote:
> 
> real example of an isp configuration
> 
> ---------- Forwarded message ---------
> From: Grzegorz Gwóźdź <grzegorz@gwozdz.info>
> Date: Sat, Dec 29, 2018 at 4:25 PM
> Subject: Re: clogging qdisc
> To: <lartc@vger.kernel.org>
> 
> 
> sch_cake looks promising but is too simple. I've got thousands of
> customers with different tariffs
> 
> My setup (eth0 is FROM customers, eth1 is TO Internet):
> 
> /sbin/tc qdisc add dev eth0 root handle 1: hfsc default 1
> /sbin/tc qdisc add dev eth1 root handle 1: hfsc default 1
> 
> #Base class
> /sbin/tc class add dev eth0 parent 1: classid 1:1 hfsc sc m1 2048000kbit
> d 10000000 m2 2048000kbit ul m1 2048000kbit d 10000000 m2 2048000kbit
> /sbin/tc class add dev eth1 parent 1: classid 1:1 hfsc sc m1 2048000kbit
> d 10000000 m2 2048000kbit ul m1 2048000kbit d 10000000 m2 2048000kbit
> 
> #Hash filters 1 lvl
> /sbin/tc filter add dev eth0 parent 1:0 prio 1 handle 255: protocol ip
> u32 divisor 256
> /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 800::
> match ip dst 192.168.0.0/16 hashkey mask 0x0000ff00 at 16 link 255:
> /sbin/tc filter add dev eth1 parent 1:0 prio 1 handle 255: protocol ip
> u32 divisor 256
> /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 800::
> match ip src 192.168.0.0/16 hashkey mask 0x0000ff00 at 12 link 255:
> 
> #Hash filters 2 lvl
> for i in `seq 1 254`; do
>     Hi=`printf "%.2x" $i`
>     /sbin/tc filter add dev eth0 parent 1:0 prio 1 handle $Hi: protocol
> ip u32 divisor 256
>     /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht
> 255:$Hi: match ip dst 192.168.$i.0/24 hashkey mask 0x000000ff at 16 link
> $Hi:
> done
> 
> for i in `seq 1 254`; do
>     Hi=`printf "%.2x" $i`
>     /sbin/tc filter add dev eth1 parent 1:0 prio 1 handle $Hi: protocol
> ip u32 divisor 256
>     /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht
> 255:$Hi: match ip src 192.168.$i.0/24 hashkey mask 0x000000ff at 12 link
> $Hi:
> done
> 
> #And for every customer (about 3000):
> ######################
> let dwnrate=12288
> let dwnceil=14336
> /sbin/tc class add dev eth0 parent 1: classid 1:0113 hfsc sc m1
> $dwnceil"kbit" d 30000000 m2 $dwnrate"kbit" ul m1 $dwnceil"kbit" d
> 30000000 m2 $dwnrate"kbit"
> /sbin/tc qdisc add dev eth0 parent 1:0113 handle 0113: sfq perturb 10
> 
> let uplrate=3072
> let uplceil=3584
> /sbin/tc class add dev eth1 parent 1: classid 1:0113 hfsc sc m1
> $uplceil"kbit" d 30000000 m2 $uplrate"kbit" ul m1 $uplceil"kbit" d
> 30000000 m2 $uplrate"kbit"
> /sbin/tc qdisc add dev eth1 parent 1:0113 handle 0113: sfq perturb 10
> 
> /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 01:13:
> match ip dst 192.168.1.19/32 flowid 1:0113
> /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 01:13:
> match ip src 192.168.1.19/32 flowid 1:0113
> ######################
> 
> let dwnrate=8192
> let dwnceil=10240
> /sbin/tc class add dev eth0 parent 1: classid 1:0219 hfsc sc m1
> $dwnceil"kbit" d 30000000 m2 $dwnrate"kbit" ul m1 $dwnceil"kbit" d
> 30000000 m2 $dwnrate"kbit"
> /sbin/tc qdisc add dev eth0 parent 1:0219 handle 0219: sfq perturb 10
> 
> let uplrate=2048
> let uplceil=2560
> /sbin/tc class add dev eth1 parent 1: classid 1:0219 hfsc sc m1
> $uplceil"kbit" d 30000000 m2 $uplrate"kbit" ul m1 $uplceil"kbit" d
> 30000000 m2 $uplrate"kbit"
> /sbin/tc qdisc add dev eth1 parent 1:0219 handle 0219: sfq perturb 10
> 
> 
> /sbin/tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 ht 02:19:
> match ip dst 192.168.2.25/32 flowid 1:0219
> /sbin/tc filter add dev eth1 protocol ip parent 1:0 prio 1 u32 ht 02:19:
> match ip src 192.168.2.25/32 flowid 1:0219
> 
> ######################
> 
> I use static routing and next container (linked by bridge common for
> both containers) is doing NAT
> 
> 
> I would like to delete classes and filters one by one to find out if
> this is specific customer that is causing trouble...
> 
> I can do:
> 
> /sbin/tc qdisc del dev eth0 parent 1:0219 handle 0219: sfq perturb 10
> 
> but I can't do
> 
> /sbin/tc class del dev eth0 parent 1: classid 1:0219
> 
> or
> 
> /sbin/tc class del dev eth0 parent 1: classid 1:0219 hfsc sc m1
> 10240kbit d 30000000 m2 8192kbit ul m1 10240kbit d 30000000 m2 8192kbit
> 
> because:
> 
> RTNETLINK answers: Device or resource busy
> 
> Why?
> 
> 
> Deleting filters also does not work as expected
> 
> /sbin/tc filter del dev eth0 protocol ip parent 1:0 prio 1 u32 ht 02:19:
> match ip dst 192.168.2.25/32 flowid 1:0219
> 
> deletes all filters. After that
> 
> tc -s filter ls dev eth0
> 
> returns nothing. Why?
> 
> 
> GG
> 
> 
> On 28.12.2018 12:57, Dave Taht wrote:
>> I am of course, always interested in more folk dumping hfsc and
>> complicated designs, and trying sch_cake....
>> 
>> On Fri, Dec 28, 2018 at 3:54 AM Alan Goodman
>> <notifications@yescomputersolutions.com> wrote:
>>> Perhaps you should post an example of your tc setup?
>>> 
>>> I had a bug a few months back where traffic in important queues would
>>> seemingly randomly get 100% drop rate (as in your example below).  Upon
>>> penning an email with the tc setup I realised that I had a leaf class on
>>> the wrong branch and was trying to guarantee 99.9+% of traffic for that
>>> leaf if it had significant traffic... Number 1:2 was swapped for number
>>> 1:1 and everything went back to normal.
>>> 
>>> Alan
>>> 
>>> On 27/12/2018 22:26, Grzegorz Gwóźdź wrote:
>>>>> Are there any "hacks" in TC allowing to look in the guts?
>>>>> 
>>>>> It looks like it's changing state to "clogged" but
>>>>> 
>>>>> tc -s class ls dev eth0
>>>>> 
>>>>> looks completely normal (only grows number of sfq queues created
>>>>> dynamically for every connection since more and more connections are
>>>>> created but not closed)
>>>> 
>>>> In fact i've noticed something interesting during "clugged" state...
>>>> 
>>>> a few runs of:
>>>> 
>>>> tc -s class ls dev eth0
>>>> 
>>>> shows that filters sort packets well but packets that goes into
>>>> suitable classes are dropped:
>>>> 
>>>> class hfsc 1:1012 parent 1: leaf 1012: sc m1 6144Kbit d 10.0s m2
>>>> 4096Kbit ul m1 6144Kbit d 10.0s m2 4096Kbit
>>>>  Sent 103306048 bytes 75008 pkt (dropped 12, overlimits 0 requeues 0)
>>>>  backlog 39Kb 127p requeues 0
>>>>  period 13718 work 103306048 bytes rtwork 103306048 bytes level 0
>>>> 
>>>> and after a while:
>>>> 
>>>> class hfsc 1:1012 parent 1: leaf 1012: sc m1 6144Kbit d 10.0s m2
>>>> 4096Kbit ul m1 6144Kbit d 10.0s m2 4096Kbit
>>>>  Sent 103306048 bytes 75008 pkt (dropped 116, overlimits 0 requeues 0)
>>>>  backlog 39160b 127p requeues 0
>>>>  period 13718 work 103306048 bytes rtwork 103306048 bytes level 0
>>>> 
>>>> "Sent" stands still and all packets are "dropped"
>>>> 
>>>> Some classes passes packets but as time goes by more and more classes
>>>> stops passing and starts dropping.
>>>> 
>>>> 
>>>> GG
>>>> 
>> 
>> 
> 
> 
> 
> -- 
> 
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-12-30 21:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <de21eeaf-9ab5-3c3e-4f01-dff157c0488c@gwozdz.info>
     [not found] ` <c5078178-5fa7-8335-9c71-65e42dce256d@spamtrap.tnetconsulting.net>
     [not found]   ` <ca56a36d-8ca7-45aa-13ce-d9088931a557@gwozdz.info>
     [not found]     ` <3665b2ec-2f73-2fe4-2ab3-2c1e692773ec@gwozdz.info>
     [not found]       ` <b9ea3292-869a-5131-0684-42c9506a8a05@yescomputersolutions.com>
     [not found]         ` <CAA93jw5rJVbjdvkGXii6+uFZXc79bVhyxwksvHoudAoTwLaomQ@mail.gmail.com>
     [not found]           ` <ff695b9a-741e-1d41-f94a-258d4189491c@gwozdz.info>
2018-12-30 16:51             ` [Cake] Fwd: clogging qdisc Dave Taht
2018-12-30 21:52               ` [Cake] " Pete Heist

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox