* [Cake] cake and hfsc rate limiters outperforming htb on one-armed router
@ 2018-12-28 0:17 Pete Heist
2018-12-30 20:42 ` Pete Heist
0 siblings, 1 reply; 6+ messages in thread
From: Pete Heist @ 2018-12-28 0:17 UTC (permalink / raw)
To: Cake List
For whatever reason, I’m seeing the rate limiters in cake and hfsc vastly outperform htb in the one-armed router configuration I described in my previous thread. To simplify things, I apply the qdiscs with a single class only at egress of eth0 on apu1a:
apu2a <— default VLAN —> apu1a <— VLAN 3300 —> apu2b
I use iperf3 from apu2a to apu2b and find the rate at which things break down. Whereas cake and hfsc can both reach around 850mbit, htb is breaking down at around 200mbit, which seems rather strange. This could be a function of the older kernel I have to use, the hardware, or maybe htb just isn’t suited well to this task for some reason. I wish I knew, as I’d rather be using htb for this task than hfsc (especially given the lockup issue with cake)...
——
#!/bin/bash
# point where iperf3 throughput drops below ~93% of theoretical:
# htb: 200mbit
# hfsc: 850mbit
# cake: 850mbit
IFACE=eth0
RATE=850mbit
start_htb() {
stop
tc qdisc add dev $IFACE root handle 1: htb default 1
tc class add dev $IFACE parent 1: classid 1:1 htb rate $RATE ceil $RATE
tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
}
start_hfsc() {
stop
tc qdisc add dev $IFACE root handle 1: hfsc default 1
tc class add dev $IFACE parent 1: classid 1:1 hfsc sc rate $RATE ul rate $RATE
tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
}
start_cake() {
stop
tc qdisc add dev $IFACE root cake bandwidth $RATE
}
stop() {
tc qdisc del dev $IFACE root &>/dev/null
tc qdisc del dev $IFACE ingress &>/dev/null
}
"$@“
——
root@apu1a:~/rate_limiters# uname -a
Linux apu1a 3.16.7-ckt9-voyage #1 SMP Thu Apr 23 11:10:44 HKT 2015 i686 GNU/Linux
root@apu1a:~/rate_limiters# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 20
model : 2
model name : AMD G-T40E Processor
stepping : 0
microcode : 0x5000101
cpu MHz : 800.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
bogomips : 1999.83
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
processor : 1
vendor_id : AuthenticAMD
cpu family : 20
model : 2
model name : AMD G-T40E Processor
stepping : 0
microcode : 0x5000101
cpu MHz : 800.000
cache size : 512 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
bogomips : 1999.83
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
root@apu1a:~/rate_limiters# ethtool -i eth0
driver: r8169
version: 2.3LK-NAPI
firmware-version: rtl_nic/rtl8168e-2.fw
bus-info: 0000:01:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Cake] cake and hfsc rate limiters outperforming htb on one-armed router
2018-12-28 0:17 [Cake] cake and hfsc rate limiters outperforming htb on one-armed router Pete Heist
@ 2018-12-30 20:42 ` Pete Heist
2018-12-30 21:51 ` Sebastian Moeller
0 siblings, 1 reply; 6+ messages in thread
From: Pete Heist @ 2018-12-30 20:42 UTC (permalink / raw)
To: Cake List
It’s a bit more complicated than this. It looks like the htb rate limiter is different in that as rates increase the actual rate starts to deviate from the specified rate early on, but it rather gracefully handles the “out of CPU” situation, where it still maintains control of the queue, just gradually fails to meet the rate specified by greater and greater percentages.
Instead of a single flow test with iperf3, here are rates that each limiter can reach on egress of both apu1a interfaces during an rrul_be test:
# - max limit on APU for one-armed routing, rrul_be test 4+4 flows (firewall on):
# - cake: 210mbit
# - htb+fq_codel: 93%@100mbit, 90%@200mbit, 84%@300mbit, 72%@400mbit, 59%@500mbit
# - hfsc+fq_codel: 310mbit
# - hfsc+cake: 300mbit
The numbers for cake and hfsc are right before loss of queue, and with htb the queue isn’t lost even at 500mbit, for example, just the actual rate is only 59% of what was specified.
I really need to graph the specified rate vs the actual rate, inter-flow and intra-flow latency, stepped 25mbit at a time. I think it would be interesting, so this is on my todo list if there’s time after the ISP config gets done.
> On Dec 28, 2018, at 1:17 AM, Pete Heist <pete@heistp.net> wrote:
>
> For whatever reason, I’m seeing the rate limiters in cake and hfsc vastly outperform htb in the one-armed router configuration I described in my previous thread. To simplify things, I apply the qdiscs with a single class only at egress of eth0 on apu1a:
>
> apu2a <— default VLAN —> apu1a <— VLAN 3300 —> apu2b
>
> I use iperf3 from apu2a to apu2b and find the rate at which things break down. Whereas cake and hfsc can both reach around 850mbit, htb is breaking down at around 200mbit, which seems rather strange. This could be a function of the older kernel I have to use, the hardware, or maybe htb just isn’t suited well to this task for some reason. I wish I knew, as I’d rather be using htb for this task than hfsc (especially given the lockup issue with cake)...
>
> ——
>
> #!/bin/bash
>
> # point where iperf3 throughput drops below ~93% of theoretical:
> # htb: 200mbit
> # hfsc: 850mbit
> # cake: 850mbit
>
> IFACE=eth0
> RATE=850mbit
>
> start_htb() {
> stop
> tc qdisc add dev $IFACE root handle 1: htb default 1
> tc class add dev $IFACE parent 1: classid 1:1 htb rate $RATE ceil $RATE
> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
> }
>
> start_hfsc() {
> stop
> tc qdisc add dev $IFACE root handle 1: hfsc default 1
> tc class add dev $IFACE parent 1: classid 1:1 hfsc sc rate $RATE ul rate $RATE
> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
> }
>
> start_cake() {
> stop
> tc qdisc add dev $IFACE root cake bandwidth $RATE
> }
>
> stop() {
> tc qdisc del dev $IFACE root &>/dev/null
> tc qdisc del dev $IFACE ingress &>/dev/null
> }
>
> "$@“
> ——
>
> root@apu1a:~/rate_limiters# uname -a
> Linux apu1a 3.16.7-ckt9-voyage #1 SMP Thu Apr 23 11:10:44 HKT 2015 i686 GNU/Linux
>
> root@apu1a:~/rate_limiters# cat /proc/cpuinfo
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 20
> model : 2
> model name : AMD G-T40E Processor
> stepping : 0
> microcode : 0x5000101
> cpu MHz : 800.000
> cache size : 512 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fdiv_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 6
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
> bogomips : 1999.83
> clflush size : 64
> cache_alignment : 64
> address sizes : 36 bits physical, 48 bits virtual
> power management: ts ttp tm stc 100mhzsteps hwpstate
>
> processor : 1
> vendor_id : AuthenticAMD
> cpu family : 20
> model : 2
> model name : AMD G-T40E Processor
> stepping : 0
> microcode : 0x5000101
> cpu MHz : 800.000
> cache size : 512 KB
> physical id : 0
> siblings : 2
> core id : 1
> cpu cores : 2
> apicid : 1
> initial apicid : 1
> fdiv_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 6
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
> bogomips : 1999.83
> clflush size : 64
> cache_alignment : 64
> address sizes : 36 bits physical, 48 bits virtual
> power management: ts ttp tm stc 100mhzsteps hwpstate
>
> root@apu1a:~/rate_limiters# ethtool -i eth0
> driver: r8169
> version: 2.3LK-NAPI
> firmware-version: rtl_nic/rtl8168e-2.fw
> bus-info: 0000:01:00.0
> supports-statistics: yes
> supports-test: no
> supports-eeprom-access: no
> supports-register-dump: yes
> supports-priv-flags: no
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Cake] cake and hfsc rate limiters outperforming htb on one-armed router
2018-12-30 20:42 ` Pete Heist
@ 2018-12-30 21:51 ` Sebastian Moeller
2018-12-30 22:36 ` Pete Heist
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Moeller @ 2018-12-30 21:51 UTC (permalink / raw)
To: Pete Heist; +Cc: Cake List
Hi Pete,
you might want to have a look at htb's burst and cburst parameters, as these should allow to trade in latency under load for bandwidth utilization.
Best Regards
Sebastian
> On Dec 30, 2018, at 21:42, Pete Heist <pete@heistp.net> wrote:
>
> It’s a bit more complicated than this. It looks like the htb rate limiter is different in that as rates increase the actual rate starts to deviate from the specified rate early on, but it rather gracefully handles the “out of CPU” situation, where it still maintains control of the queue, just gradually fails to meet the rate specified by greater and greater percentages.
>
> Instead of a single flow test with iperf3, here are rates that each limiter can reach on egress of both apu1a interfaces during an rrul_be test:
>
> # - max limit on APU for one-armed routing, rrul_be test 4+4 flows (firewall on):
> # - cake: 210mbit
> # - htb+fq_codel: 93%@100mbit, 90%@200mbit, 84%@300mbit, 72%@400mbit, 59%@500mbit
> # - hfsc+fq_codel: 310mbit
> # - hfsc+cake: 300mbit
>
> The numbers for cake and hfsc are right before loss of queue, and with htb the queue isn’t lost even at 500mbit, for example, just the actual rate is only 59% of what was specified.
>
> I really need to graph the specified rate vs the actual rate, inter-flow and intra-flow latency, stepped 25mbit at a time. I think it would be interesting, so this is on my todo list if there’s time after the ISP config gets done.
>
>> On Dec 28, 2018, at 1:17 AM, Pete Heist <pete@heistp.net> wrote:
>>
>> For whatever reason, I’m seeing the rate limiters in cake and hfsc vastly outperform htb in the one-armed router configuration I described in my previous thread. To simplify things, I apply the qdiscs with a single class only at egress of eth0 on apu1a:
>>
>> apu2a <— default VLAN —> apu1a <— VLAN 3300 —> apu2b
>>
>> I use iperf3 from apu2a to apu2b and find the rate at which things break down. Whereas cake and hfsc can both reach around 850mbit, htb is breaking down at around 200mbit, which seems rather strange. This could be a function of the older kernel I have to use, the hardware, or maybe htb just isn’t suited well to this task for some reason. I wish I knew, as I’d rather be using htb for this task than hfsc (especially given the lockup issue with cake)...
>>
>> ——
>>
>> #!/bin/bash
>>
>> # point where iperf3 throughput drops below ~93% of theoretical:
>> # htb: 200mbit
>> # hfsc: 850mbit
>> # cake: 850mbit
>>
>> IFACE=eth0
>> RATE=850mbit
>>
>> start_htb() {
>> stop
>> tc qdisc add dev $IFACE root handle 1: htb default 1
>> tc class add dev $IFACE parent 1: classid 1:1 htb rate $RATE ceil $RATE
>> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
>> }
>>
>> start_hfsc() {
>> stop
>> tc qdisc add dev $IFACE root handle 1: hfsc default 1
>> tc class add dev $IFACE parent 1: classid 1:1 hfsc sc rate $RATE ul rate $RATE
>> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
>> }
>>
>> start_cake() {
>> stop
>> tc qdisc add dev $IFACE root cake bandwidth $RATE
>> }
>>
>> stop() {
>> tc qdisc del dev $IFACE root &>/dev/null
>> tc qdisc del dev $IFACE ingress &>/dev/null
>> }
>>
>> "$@“
>> ——
>>
>> root@apu1a:~/rate_limiters# uname -a
>> Linux apu1a 3.16.7-ckt9-voyage #1 SMP Thu Apr 23 11:10:44 HKT 2015 i686 GNU/Linux
>>
>> root@apu1a:~/rate_limiters# cat /proc/cpuinfo
>> processor : 0
>> vendor_id : AuthenticAMD
>> cpu family : 20
>> model : 2
>> model name : AMD G-T40E Processor
>> stepping : 0
>> microcode : 0x5000101
>> cpu MHz : 800.000
>> cache size : 512 KB
>> physical id : 0
>> siblings : 2
>> core id : 0
>> cpu cores : 2
>> apicid : 0
>> initial apicid : 0
>> fdiv_bug : no
>> f00f_bug : no
>> coma_bug : no
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 6
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
>> bogomips : 1999.83
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 36 bits physical, 48 bits virtual
>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>
>> processor : 1
>> vendor_id : AuthenticAMD
>> cpu family : 20
>> model : 2
>> model name : AMD G-T40E Processor
>> stepping : 0
>> microcode : 0x5000101
>> cpu MHz : 800.000
>> cache size : 512 KB
>> physical id : 0
>> siblings : 2
>> core id : 1
>> cpu cores : 2
>> apicid : 1
>> initial apicid : 1
>> fdiv_bug : no
>> f00f_bug : no
>> coma_bug : no
>> fpu : yes
>> fpu_exception : yes
>> cpuid level : 6
>> wp : yes
>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
>> bogomips : 1999.83
>> clflush size : 64
>> cache_alignment : 64
>> address sizes : 36 bits physical, 48 bits virtual
>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>
>> root@apu1a:~/rate_limiters# ethtool -i eth0
>> driver: r8169
>> version: 2.3LK-NAPI
>> firmware-version: rtl_nic/rtl8168e-2.fw
>> bus-info: 0000:01:00.0
>> supports-statistics: yes
>> supports-test: no
>> supports-eeprom-access: no
>> supports-register-dump: yes
>> supports-priv-flags: no
>>
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Cake] cake and hfsc rate limiters outperforming htb on one-armed router
2018-12-30 21:51 ` Sebastian Moeller
@ 2018-12-30 22:36 ` Pete Heist
2018-12-31 0:10 ` Sebastian Moeller
0 siblings, 1 reply; 6+ messages in thread
From: Pete Heist @ 2018-12-30 22:36 UTC (permalink / raw)
To: Sebastian Moeller; +Cc: Cake List
The experiments I did with those didn’t yield great results, with changing a value by one MTU sometimes causing sudden throughput or inter-flow latency increases, with the tradeoffs not being clear. I’m afraid admins could easily cause problems fiddling with these. Fortunately most customer facing routers have aggregate bitrates that an APU1 can handle even with default htb, or cake. I also appreciate that such settings don’t exist in cake… :)
> On Dec 30, 2018, at 10:51 PM, Sebastian Moeller <moeller0@gmx.de> wrote:
>
> Hi Pete,
>
> you might want to have a look at htb's burst and cburst parameters, as these should allow to trade in latency under load for bandwidth utilization.
>
>
> Best Regards
> Sebastian
>
>> On Dec 30, 2018, at 21:42, Pete Heist <pete@heistp.net> wrote:
>>
>> It’s a bit more complicated than this. It looks like the htb rate limiter is different in that as rates increase the actual rate starts to deviate from the specified rate early on, but it rather gracefully handles the “out of CPU” situation, where it still maintains control of the queue, just gradually fails to meet the rate specified by greater and greater percentages.
>>
>> Instead of a single flow test with iperf3, here are rates that each limiter can reach on egress of both apu1a interfaces during an rrul_be test:
>>
>> # - max limit on APU for one-armed routing, rrul_be test 4+4 flows (firewall on):
>> # - cake: 210mbit
>> # - htb+fq_codel: 93%@100mbit, 90%@200mbit, 84%@300mbit, 72%@400mbit, 59%@500mbit
>> # - hfsc+fq_codel: 310mbit
>> # - hfsc+cake: 300mbit
>>
>> The numbers for cake and hfsc are right before loss of queue, and with htb the queue isn’t lost even at 500mbit, for example, just the actual rate is only 59% of what was specified.
>>
>> I really need to graph the specified rate vs the actual rate, inter-flow and intra-flow latency, stepped 25mbit at a time. I think it would be interesting, so this is on my todo list if there’s time after the ISP config gets done.
>>
>>> On Dec 28, 2018, at 1:17 AM, Pete Heist <pete@heistp.net> wrote:
>>>
>>> For whatever reason, I’m seeing the rate limiters in cake and hfsc vastly outperform htb in the one-armed router configuration I described in my previous thread. To simplify things, I apply the qdiscs with a single class only at egress of eth0 on apu1a:
>>>
>>> apu2a <— default VLAN —> apu1a <— VLAN 3300 —> apu2b
>>>
>>> I use iperf3 from apu2a to apu2b and find the rate at which things break down. Whereas cake and hfsc can both reach around 850mbit, htb is breaking down at around 200mbit, which seems rather strange. This could be a function of the older kernel I have to use, the hardware, or maybe htb just isn’t suited well to this task for some reason. I wish I knew, as I’d rather be using htb for this task than hfsc (especially given the lockup issue with cake)...
>>>
>>> ——
>>>
>>> #!/bin/bash
>>>
>>> # point where iperf3 throughput drops below ~93% of theoretical:
>>> # htb: 200mbit
>>> # hfsc: 850mbit
>>> # cake: 850mbit
>>>
>>> IFACE=eth0
>>> RATE=850mbit
>>>
>>> start_htb() {
>>> stop
>>> tc qdisc add dev $IFACE root handle 1: htb default 1
>>> tc class add dev $IFACE parent 1: classid 1:1 htb rate $RATE ceil $RATE
>>> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
>>> }
>>>
>>> start_hfsc() {
>>> stop
>>> tc qdisc add dev $IFACE root handle 1: hfsc default 1
>>> tc class add dev $IFACE parent 1: classid 1:1 hfsc sc rate $RATE ul rate $RATE
>>> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
>>> }
>>>
>>> start_cake() {
>>> stop
>>> tc qdisc add dev $IFACE root cake bandwidth $RATE
>>> }
>>>
>>> stop() {
>>> tc qdisc del dev $IFACE root &>/dev/null
>>> tc qdisc del dev $IFACE ingress &>/dev/null
>>> }
>>>
>>> "$@“
>>> ——
>>>
>>> root@apu1a:~/rate_limiters# uname -a
>>> Linux apu1a 3.16.7-ckt9-voyage #1 SMP Thu Apr 23 11:10:44 HKT 2015 i686 GNU/Linux
>>>
>>> root@apu1a:~/rate_limiters# cat /proc/cpuinfo
>>> processor : 0
>>> vendor_id : AuthenticAMD
>>> cpu family : 20
>>> model : 2
>>> model name : AMD G-T40E Processor
>>> stepping : 0
>>> microcode : 0x5000101
>>> cpu MHz : 800.000
>>> cache size : 512 KB
>>> physical id : 0
>>> siblings : 2
>>> core id : 0
>>> cpu cores : 2
>>> apicid : 0
>>> initial apicid : 0
>>> fdiv_bug : no
>>> f00f_bug : no
>>> coma_bug : no
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 6
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
>>> bogomips : 1999.83
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 36 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>> processor : 1
>>> vendor_id : AuthenticAMD
>>> cpu family : 20
>>> model : 2
>>> model name : AMD G-T40E Processor
>>> stepping : 0
>>> microcode : 0x5000101
>>> cpu MHz : 800.000
>>> cache size : 512 KB
>>> physical id : 0
>>> siblings : 2
>>> core id : 1
>>> cpu cores : 2
>>> apicid : 1
>>> initial apicid : 1
>>> fdiv_bug : no
>>> f00f_bug : no
>>> coma_bug : no
>>> fpu : yes
>>> fpu_exception : yes
>>> cpuid level : 6
>>> wp : yes
>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
>>> bogomips : 1999.83
>>> clflush size : 64
>>> cache_alignment : 64
>>> address sizes : 36 bits physical, 48 bits virtual
>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>
>>> root@apu1a:~/rate_limiters# ethtool -i eth0
>>> driver: r8169
>>> version: 2.3LK-NAPI
>>> firmware-version: rtl_nic/rtl8168e-2.fw
>>> bus-info: 0000:01:00.0
>>> supports-statistics: yes
>>> supports-test: no
>>> supports-eeprom-access: no
>>> supports-register-dump: yes
>>> supports-priv-flags: no
>>>
>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Cake] cake and hfsc rate limiters outperforming htb on one-armed router
2018-12-30 22:36 ` Pete Heist
@ 2018-12-31 0:10 ` Sebastian Moeller
2018-12-31 8:53 ` Pete Heist
0 siblings, 1 reply; 6+ messages in thread
From: Sebastian Moeller @ 2018-12-31 0:10 UTC (permalink / raw)
To: Pete Heist; +Cc: Cake List
[-- Attachment #1: Type: text/plain, Size: 7460 bytes --]
Well the idea would be to scale the buffer to cover, say Xms at the configured bandwidth, so HTB could deal with CPU stalls up to X-Yms (with Y << X)... We just switched sqm-scripts to automatically scale the buffering to 1ms....
Would be interested to learn whether that would increase HTB's utilisation?
On December 30, 2018 11:36:27 PM GMT+01:00, Pete Heist <pete@heistp.net> wrote:
>The experiments I did with those didn’t yield great results, with
>changing a value by one MTU sometimes causing sudden throughput or
>inter-flow latency increases, with the tradeoffs not being clear. I’m
>afraid admins could easily cause problems fiddling with these.
>Fortunately most customer facing routers have aggregate bitrates that
>an APU1 can handle even with default htb, or cake. I also appreciate
>that such settings don’t exist in cake… :)
>
>> On Dec 30, 2018, at 10:51 PM, Sebastian Moeller <moeller0@gmx.de>
>wrote:
>>
>> Hi Pete,
>>
>> you might want to have a look at htb's burst and cburst parameters,
>as these should allow to trade in latency under load for bandwidth
>utilization.
>>
>>
>> Best Regards
>> Sebastian
>>
>>> On Dec 30, 2018, at 21:42, Pete Heist <pete@heistp.net> wrote:
>>>
>>> It’s a bit more complicated than this. It looks like the htb rate
>limiter is different in that as rates increase the actual rate starts
>to deviate from the specified rate early on, but it rather gracefully
>handles the “out of CPU” situation, where it still maintains control of
>the queue, just gradually fails to meet the rate specified by greater
>and greater percentages.
>>>
>>> Instead of a single flow test with iperf3, here are rates that each
>limiter can reach on egress of both apu1a interfaces during an rrul_be
>test:
>>>
>>> # - max limit on APU for one-armed routing, rrul_be test 4+4 flows
>(firewall on):
>>> # - cake: 210mbit
>>> # - htb+fq_codel: 93%@100mbit, 90%@200mbit, 84%@300mbit,
>72%@400mbit, 59%@500mbit
>>> # - hfsc+fq_codel: 310mbit
>>> # - hfsc+cake: 300mbit
>>>
>>> The numbers for cake and hfsc are right before loss of queue, and
>with htb the queue isn’t lost even at 500mbit, for example, just the
>actual rate is only 59% of what was specified.
>>>
>>> I really need to graph the specified rate vs the actual rate,
>inter-flow and intra-flow latency, stepped 25mbit at a time. I think it
>would be interesting, so this is on my todo list if there’s time after
>the ISP config gets done.
>>>
>>>> On Dec 28, 2018, at 1:17 AM, Pete Heist <pete@heistp.net> wrote:
>>>>
>>>> For whatever reason, I’m seeing the rate limiters in cake and hfsc
>vastly outperform htb in the one-armed router configuration I described
>in my previous thread. To simplify things, I apply the qdiscs with a
>single class only at egress of eth0 on apu1a:
>>>>
>>>> apu2a <— default VLAN —> apu1a <— VLAN 3300 —> apu2b
>>>>
>>>> I use iperf3 from apu2a to apu2b and find the rate at which things
>break down. Whereas cake and hfsc can both reach around 850mbit, htb is
>breaking down at around 200mbit, which seems rather strange. This could
>be a function of the older kernel I have to use, the hardware, or maybe
>htb just isn’t suited well to this task for some reason. I wish I knew,
>as I’d rather be using htb for this task than hfsc (especially given
>the lockup issue with cake)...
>>>>
>>>> ——
>>>>
>>>> #!/bin/bash
>>>>
>>>> # point where iperf3 throughput drops below ~93% of theoretical:
>>>> # htb: 200mbit
>>>> # hfsc: 850mbit
>>>> # cake: 850mbit
>>>>
>>>> IFACE=eth0
>>>> RATE=850mbit
>>>>
>>>> start_htb() {
>>>> stop
>>>> tc qdisc add dev $IFACE root handle 1: htb default 1
>>>> tc class add dev $IFACE parent 1: classid 1:1 htb rate $RATE
>ceil $RATE
>>>> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
>>>> }
>>>>
>>>> start_hfsc() {
>>>> stop
>>>> tc qdisc add dev $IFACE root handle 1: hfsc default 1
>>>> tc class add dev $IFACE parent 1: classid 1:1 hfsc sc rate
>$RATE ul rate $RATE
>>>> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
>>>> }
>>>>
>>>> start_cake() {
>>>> stop
>>>> tc qdisc add dev $IFACE root cake bandwidth $RATE
>>>> }
>>>>
>>>> stop() {
>>>> tc qdisc del dev $IFACE root &>/dev/null
>>>> tc qdisc del dev $IFACE ingress &>/dev/null
>>>> }
>>>>
>>>> "$@“
>>>> ——
>>>>
>>>> root@apu1a:~/rate_limiters# uname -a
>>>> Linux apu1a 3.16.7-ckt9-voyage #1 SMP Thu Apr 23 11:10:44 HKT 2015
>i686 GNU/Linux
>>>>
>>>> root@apu1a:~/rate_limiters# cat /proc/cpuinfo
>>>> processor : 0
>>>> vendor_id : AuthenticAMD
>>>> cpu family : 20
>>>> model : 2
>>>> model name : AMD G-T40E Processor
>>>> stepping : 0
>>>> microcode : 0x5000101
>>>> cpu MHz : 800.000
>>>> cache size : 512 KB
>>>> physical id : 0
>>>> siblings : 2
>>>> core id : 0
>>>> cpu cores : 2
>>>> apicid : 0
>>>> initial apicid : 0
>>>> fdiv_bug : no
>>>> f00f_bug : no
>>>> coma_bug : no
>>>> fpu : yes
>>>> fpu_exception : yes
>>>> cpuid level : 6
>>>> wp : yes
>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
>pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni
>monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm
>sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv
>svm_lock nrip_save pausefilter vmmcall
>>>> bogomips : 1999.83
>>>> clflush size : 64
>>>> cache_alignment : 64
>>>> address sizes : 36 bits physical, 48 bits virtual
>>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>>
>>>> processor : 1
>>>> vendor_id : AuthenticAMD
>>>> cpu family : 20
>>>> model : 2
>>>> model name : AMD G-T40E Processor
>>>> stepping : 0
>>>> microcode : 0x5000101
>>>> cpu MHz : 800.000
>>>> cache size : 512 KB
>>>> physical id : 0
>>>> siblings : 2
>>>> core id : 1
>>>> cpu cores : 2
>>>> apicid : 1
>>>> initial apicid : 1
>>>> fdiv_bug : no
>>>> f00f_bug : no
>>>> coma_bug : no
>>>> fpu : yes
>>>> fpu_exception : yes
>>>> cpuid level : 6
>>>> wp : yes
>>>> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
>cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt
>pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni
>monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm
>sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv
>svm_lock nrip_save pausefilter vmmcall
>>>> bogomips : 1999.83
>>>> clflush size : 64
>>>> cache_alignment : 64
>>>> address sizes : 36 bits physical, 48 bits virtual
>>>> power management: ts ttp tm stc 100mhzsteps hwpstate
>>>>
>>>> root@apu1a:~/rate_limiters# ethtool -i eth0
>>>> driver: r8169
>>>> version: 2.3LK-NAPI
>>>> firmware-version: rtl_nic/rtl8168e-2.fw
>>>> bus-info: 0000:01:00.0
>>>> supports-statistics: yes
>>>> supports-test: no
>>>> supports-eeprom-access: no
>>>> supports-register-dump: yes
>>>> supports-priv-flags: no
>>>>
>>>
>>> _______________________________________________
>>> Cake mailing list
>>> Cake@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cake
>>
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
[-- Attachment #2: Type: text/html, Size: 7669 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Cake] cake and hfsc rate limiters outperforming htb on one-armed router
2018-12-31 0:10 ` Sebastian Moeller
@ 2018-12-31 8:53 ` Pete Heist
0 siblings, 0 replies; 6+ messages in thread
From: Pete Heist @ 2018-12-31 8:53 UTC (permalink / raw)
To: Sebastian Moeller; +Cc: Cake List
[-- Attachment #1: Type: text/plain, Size: 7706 bytes --]
So more specifically, for 500mbit I can use a calculated burst/cburst of 62500 (1000 * 500000 / 8000), here’s the change:
default: 320mbit up / 268mbit down, 3ms latency, 8.8ms tcp rtt
burst/cburst 62500: 200mbit up / 480mbit down, 40ms latency, 40ms tcp rtt
Aggregate throughput goes from 588mbit to 680mbit, but latency skyrockets.
A burst of only 2xMTU doesn’t change much, and 4xMTU already jumps to 30ms latency and only 630mbit aggregate bandwidth.
So for this one-armed router setup on this hardware, I don’t see a worthwhile tradeoff of latency for aggregate throughput. Perhaps in other situations, it could be useful. :)
> On Dec 31, 2018, at 1:10 AM, Sebastian Moeller <moeller0@gmx.de> wrote:
>
> Well the idea would be to scale the buffer to cover, say Xms at the configured bandwidth, so HTB could deal with CPU stalls up to X-Yms (with Y << X)... We just switched sqm-scripts to automatically scale the buffering to 1ms....
> Would be interested to learn whether that would increase HTB's utilisation?
>
>
> On December 30, 2018 11:36:27 PM GMT+01:00, Pete Heist <pete@heistp.net> wrote:
> The experiments I did with those didn’t yield great results, with changing a value by one MTU sometimes causing sudden throughput or inter-flow latency increases, with the tradeoffs not being clear. I’m afraid admins could easily cause problems fiddling with these. Fortunately most customer facing routers have aggregate bitrates that an APU1 can handle even with default htb, or cake. I also appreciate that such settings don’t exist in cake… :)
>
> On Dec 30, 2018, at 10:51 PM, Sebastian Moeller <moeller0@gmx.de> wrote:
>
> Hi Pete,
>
> you might want to have a look at htb's burst and cburst parameters, as these should allow to trade in latency under load for bandwidth utilization.
>
>
> Best Regards
> Sebastian
>
> On Dec 30, 2018, at 21:42, Pete Heist <pete@heistp.net> wrote:
>
> It’s a bit more complicated than this. It looks like the htb rate limiter is different in that as rates increase the actual rate starts to deviate from the specified rate early on, but it rather gracefully handles the “out of CPU” situation, where it still maintains control of the queue, just gradually fails to meet the rate specified by greater and greater percentages.
>
> Instead of a single flow test with iperf3, here are rates that each limiter can reach on egress of both apu1a interfaces during an rrul_be test:
>
> # - max limit on APU for one-armed routing, rrul_be test 4+4 flows (firewall on):
> # - cake: 210mbit
> # - htb+fq_codel: 93%@100mbit, 90%@200mbit, 84%@300mbit, 72%@400mbit, 59%@500mbit
> # - hfsc+fq_codel: 310mbit
> # - hfsc+cake: 300mbit
>
> The numbers for cake and hfsc are right before loss of queue, and with htb the queue isn’t lost even at 500mbit, for example, just the actual rate is only 59% of what was specified.
>
> I really need to graph the specified rate vs the actual rate, inter-flow and intra-flow latency, stepped 25mbit at a time. I think it would be interesting, so this is on my todo list if there’s time after the ISP config gets done.
>
> On Dec 28, 2018, at 1:17 AM, Pete Heist <pete@heistp.net> wrote:
>
> For whatever reason, I’m seeing the rate limiters in cake and hfsc vastly outperform htb in the one-armed router configuration I described in my previous thread. To simplify things, I apply the qdiscs with a single class only at egress of eth0 on apu1a:
>
> apu2a <— default VLAN —> apu1a <— VLAN 3300 —> apu2b
>
> I use iperf3 from apu2a to apu2b and find the rate at which things break down. Whereas cake and hfsc can both reach around 850mbit, htb is breaking down at around 200mbit, which seems rather strange. This could be a function of the older kernel I have to use, the hardware, or maybe htb just isn’t suited well to this task for some reason. I wish I knew, as I’d rather be using htb for this task than hfsc (especially given the lockup issue with cake)...
>
> ——
>
> #!/bin/bash
>
> # point where iperf3 throughput drops below ~93% of theoretical:
> # htb: 200mbit
> # hfsc: 850mbit
> # cake: 850mbit
>
> IFACE=eth0
> RATE=850mbit
>
> start_htb() {
> stop
> tc qdisc add dev $IFACE root handle 1: htb default 1
> tc class add dev $IFACE parent 1: classid 1:1 htb rate $RATE ceil $RATE
> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
> }
>
> start_hfsc() {
> stop
> tc qdisc add dev $IFACE root handle 1: hfsc default 1
> tc class add dev $IFACE parent 1: classid 1:1 hfsc sc rate $RATE ul rate $RATE
> tc qdisc add dev $IFACE parent 1:1 handle 10: fq_codel
> }
>
> start_cake() {
> stop
> tc qdisc add dev $IFACE root cake bandwidth $RATE
> }
>
> stop() {
> tc qdisc del dev $IFACE root &>/dev/null
> tc qdisc del dev $IFACE ingress &>/dev/null
> }
>
> "$@“
> ——
>
> root@apu1a:~/rate_limiters# uname -a
> Linux apu1a 3.16.7-ckt9-voyage #1 SMP Thu Apr 23 11:10:44 HKT 2015 i686 GNU/Linux
>
> root@apu1a:~/rate_limiters# cat /proc/cpuinfo
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 20
> model : 2
> model name : AMD G-T40E Processor
> stepping : 0
> microcode : 0x5000101
> cpu MHz : 800.000
> cache size : 512 KB
> physical id : 0
> siblings : 2
> core id : 0
> cpu cores : 2
> apicid : 0
> initial apicid : 0
> fdiv_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 6
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
> bogomips : 1999.83
> clflush size : 64
> cache_alignment : 64
> address sizes : 36 bits physical, 48 bits virtual
> power management: ts ttp tm stc 100mhzsteps hwpstate
>
> processor : 1
> vendor_id : AuthenticAMD
> cpu family : 20
> model : 2
> model name : AMD G-T40E Processor
> stepping : 0
> microcode : 0x5000101
> cpu MHz : 800.000
> cache size : 512 KB
> physical id : 0
> siblings : 2
> core id : 1
> cpu cores : 2
> apicid : 1
> initial apicid : 1
> fdiv_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 6
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter vmmcall
> bogomips : 1999.83
> clflush size : 64
> cache_alignment : 64
> address sizes : 36 bits physical, 48 bits virtual
> power management: ts ttp tm stc 100mhzsteps hwpstate
>
> root@apu1a:~/rate_limiters# ethtool -i eth0
> driver: r8169
> version: 2.3LK-NAPI
> firmware-version: rtl_nic/rtl8168e-2.fw
> bus-info: 0000:01:00.0
> supports-statistics: yes
> supports-test: no
> supports-eeprom-access: no
> supports-register-dump: yes
> supports-priv-flags: no
>
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake <https://lists.bufferbloat.net/listinfo/cake>
>
>
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
[-- Attachment #2: Type: text/html, Size: 10747 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-12-31 8:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-28 0:17 [Cake] cake and hfsc rate limiters outperforming htb on one-armed router Pete Heist
2018-12-30 20:42 ` Pete Heist
2018-12-30 21:51 ` Sebastian Moeller
2018-12-30 22:36 ` Pete Heist
2018-12-31 0:10 ` Sebastian Moeller
2018-12-31 8:53 ` Pete Heist
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox