[Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?

General list for discussing Bufferbloat
 help / color / mirror / Atom feed

* [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?
@ 2017-03-16 22:14 xnor
       [not found] ` <57A046A6-C0F2-4209-93E5-CA728F4C64EE@gmx.de>
  2017-03-17 21:15 ` Eric Dumazet
  0 siblings, 2 replies; 9+ messages in thread
From: xnor @ 2017-03-16 22:14 UTC (permalink / raw)
  To: bloat

Hello,

I have a tl-wdr3600 router with openwrt (linux 3.18, config at [1]) and 
want to shape ingress traffic.

The WAN port, interface eth0.2, is connected to a cable modem.
I use the following script:

tc qdisc add dev eth0.2 handle ffff: ingress
ifconfig ifb0 up txqueuelen 1000
tc qdisc add dev ifb0 root        handle 1: htb default 10
tc class add dev ifb0 parent 1: classid 1:1 htb rate 18000kbit
tc class add dev ifb0 parent 1:1 classid 1:10 htb rate 18000kbit ceil 
18000kbit
tc filter add dev eth0.2 parent ffff: protocol all prio 1 u32 match u32 
0 0 action mirred egress redirect dev ifb0
tc qdisc add dev ifb0 parent 1:10 handle 100: fq_codel limit 1000 
quantum 1514 noecn

Now the problem is that it is very inaccurate, and this inaccuracy 
increases with number of TCP download connections.
With HTB (it's worse with HFSC) and 20 connections (rates calculated 
with /sys/class/net/eth0.2/statistics/rx_bytes over 5 seconds each):
19012.610 kbit
19266.877 kbit
19303.923 kbit

With a single connection and HTB, I only get about 17.2 Mbit. Curiously, 
/sys/class/net/ifb0/statistics show about 18.1 +/- 0.1 Mbit

So I did a tcpdump of eth0.2 and ifb0:
eth0.2's largest frames are 1514 bytes ethernet type IP.
ifb0's are 1518 bytes (including the VLAN tag I guess) with some unknown 
type.

What do I need to change to make shaping with ifb accurate? Ideally with 
both HTB and HFSC.

Additional question: What's the effect of the system not having HR 
timers?

[1] 
https://github.com/openwrt/openwrt/tree/chaos_calmer/target/linux/ar71xx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?
       [not found] ` <57A046A6-C0F2-4209-93E5-CA728F4C64EE@gmx.de>
@ 2017-03-17 19:15   ` xnor
  2017-03-17 19:22     ` Jonathan Morton
  2017-03-17 20:11     ` Sebastian Moeller
  0 siblings, 2 replies; 9+ messages in thread
From: xnor @ 2017-03-17 19:15 UTC (permalink / raw)
  To: bloat

Hey,

please reply to the list address so that everyone can see it.


>  Just a guess, but I believe /sys/class/net/eth0.2/statistics/rx_bytes 
>shows the data before HTB/HFSC had a chance to touch the packets.
Of course. So? HTB/HFSC don't shrink packages, and since I'm only doing 
TCP and also use fq_codel it should shape ingress quite nicely to my 
configured rate.

>  it would be interesting to repeat the tests but run tcpdump on the 
>interface that delivers the traffic to the test machine on the internal 
>LAN. My prediction is that on that port you will pretty much see the 
>18000Kbps you expect.
The LAN is connected through eth0.1 which is part of a bridge interface 
br-lan (this bridge is the only other interface with an IP address 
besides eth0.2).

With 160 download connections I've just measured (also included the 
average bytes per packet, short bpp):

eth0.2: 20.3 Mbps download (~1400 bpp)
eth0: 21.6 Mbps download (~800 bpp), 19 Mbps upload (~780 bpp)
eth0.1: 18 Mbps upload (~1490 bpp)
br-lan: 18 Mbps upload (~1490 bpp)

(all numbers approx. accurate to about +/- 0.1 Mbps)

This is completely saturating the 20 Mbps link and ruins performance.

I've tested to decrease the rate to make it work in the above scenario: 
I had to back off with the rate to about 14 Mbps (!) because then, as 
you can guess, measured eth0.2 bandwidth drops to below the 20 Mbps link 
speed.

With less connections the measured eth0.2 bandwidth is closer to the 
configured 18 Mbps and so works fine..


>  This seems old, if your router is supported you might want to try lede 
>(https://lede-project.org/releases/17.01/start#), then you could also 
>use the cake qdisc which has a few new tricks up its sleeve, nicer than 
>HTB and HFSC...
I am planning to upgrade, but I highly doubt it'll help and it also 
doesn't help me clear up the confusion with what is going on here.


It's definitely shaping something. The question is: what?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?
  2017-03-17 19:15   ` xnor
@ 2017-03-17 19:22     ` Jonathan Morton
  2017-03-17 20:11     ` Sebastian Moeller
  1 sibling, 0 replies; 9+ messages in thread
From: Jonathan Morton @ 2017-03-17 19:22 UTC (permalink / raw)
  To: xnor; +Cc: bloat


> On 17 Mar, 2017, at 21:15, xnor <xnoreq@gmail.com> wrote:
> 
>> This seems old, if your router is supported you might want to try lede (https://lede-project.org/releases/17.01/start#), then you could also use the cake qdisc which has a few new tricks up its sleeve, nicer than HTB and HFSC…

> I am planning to upgrade, but I highly doubt it'll help and it also doesn't help me clear up the confusion with what is going on here.
> 
> It's definitely shaping something. The question is: what?

If nothing else, it’ll be much easier to diagnose the problem using Cake because it provides lots of data.  The built-in shaper really is much better than either HTB or HFSC, and there’s much less to potentially go wrong in configuring it.

(Disclaimer: I wrote it!)

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?
  2017-03-17 19:15   ` xnor
  2017-03-17 19:22     ` Jonathan Morton
@ 2017-03-17 20:11     ` Sebastian Moeller
  1 sibling, 0 replies; 9+ messages in thread
From: Sebastian Moeller @ 2017-03-17 20:11 UTC (permalink / raw)
  To: xnor; +Cc: bloat

Hi xnor,

> On Mar 17, 2017, at 20:15, xnor <xnoreq@gmail.com> wrote:
> 
> Hey,
> 
> please reply to the list address so that everyone can see it.

	Sure can do, not a secret (at least not intended as a secret). Then again, if it had been you would have spoiled it effectively...

> 
> 
>> Just a guess, but I believe /sys/class/net/eth0.2/statistics/rx_bytes shows the data before HTB/HFSC had a chance to touch the packets.
> Of course. So? HTB/HFSC don't shrink packages, and since I'm only doing TCP and also use fq_codel it should shape ingress quite nicely to my configured rate.

	Erm, so TCP tends to probe the link capacity by increasing its bandwidth repeatedly, given enough flows some will always be touching against the limit. If you look at the rate packets emerge from the shaper instead the rate they rare fed into the shaper you would have IMHO a stronger argument that the shaper does not shape correctly… If after the shaper the rate is as configured but too large before it could mean that some of your senders do not respond nicely to the slow-down signal, or the slow down signal is unintelligible to the senders (like the shaper assumes ECN while the senders do set ECN bits but do not respond properly to the ECN signaling.


> 
>> it would be interesting to repeat the tests but run tcpdump on the interface that delivers the traffic to the test machine on the internal LAN. My prediction is that on that port you will pretty much see the 18000Kbps you expect.
> The LAN is connected through eth0.1 which is part of a bridge interface br-lan (this bridge is the only other interface with an IP address besides eth0.2).
> 
> With 160 download connections I've just measured (also included the average bytes per packet, short bpp):
> 
> eth0.2: 20.3 Mbps download (~1400 bpp)
> eth0: 21.6 Mbps download (~800 bpp), 19 Mbps upload (~780 bpp)
> eth0.1: 18 Mbps upload (~1490 bpp)
> br-lan: 18 Mbps upload (~1490 bpp)
> 
> (all numbers approx. accurate to about +/- 0.1 Mbps)

	I am confused about the reported directions here, but mostly by the average packet sizes on eth0, why are these at 100-100*800/1490 = 46.3087248322 % of the internal packet sizes? Does this indicate massive fragmentation on your wan link?


> 
> This is completely saturating the 20 Mbps link and ruins performance.

	It seems your router only has one port from the SoC to the switch and the WAN port also lives on the switch, so all packets will need to first go from WAN port to CPU via eth0 and then out again via eth0 to the LAN port on the switch, is that a correct interpretation? In that case you can also instantiate the internet download shaper as egress/upload shaper on eth0.1 avoiding the ifb dance... 

> 
> I've tested to decrease the rate to make it work in the above scenario: I had to back off with the rate to about 14 Mbps (!) because then, as you can guess, measured eth0.2 bandwidth drops to below the 20 Mbps link speed.

Well the true link speed should be a hard ceiling...

> 
> With less connections the measured eth0.2 bandwidth is closer to the configured 18 Mbps and so works fine..

	Again, incoming data is first handled and accounted by eth0/eth0.2 before the IFB sees it, so I very much expect to see more than the shaper rate at this point. If the shaper emits the correct rate into your internal network and also drops packets it seems unavoidable that more than the configured rate worth of data arrives into the shaper. But I am not an expert and hence might be wrong.

> 
> 
>> This seems old, if your router is supported you might want to try lede (https://lede-project.org/releases/17.01/start#), then you could also use the cake qdisc which has a few new tricks up its sleeve, nicer than HTB and HFSC...
> I am planning to upgrade, but I highly doubt it'll help and it also doesn't help me clear up the confusion with what is going on here.

	Well you wou;d jump from kernel 3.18? to a more recent 4.4 series kernel which might/should have some bugs fixed. But I understand the desire to understand the current situation better before moving on.

> 
> 
> It's definitely shaping something. The question is: what?

	Packets?

Best Regards

> 
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?
  2017-03-16 22:14 [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN? xnor
       [not found] ` <57A046A6-C0F2-4209-93E5-CA728F4C64EE@gmx.de>
@ 2017-03-17 21:15 ` Eric Dumazet
  2017-03-18 15:34   ` xnor
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2017-03-17 21:15 UTC (permalink / raw)
  To: xnor; +Cc: bloat

On Thu, 2017-03-16 at 22:14 +0000, xnor wrote:
> Hello,
> 
> I have a tl-wdr3600 router with openwrt (linux 3.18, config at [1]) and 
> want to shape ingress traffic.
> 
> The WAN port, interface eth0.2, is connected to a cable modem.
> I use the following script:
> 
> tc qdisc add dev eth0.2 handle ffff: ingress
> ifconfig ifb0 up txqueuelen 1000
> tc qdisc add dev ifb0 root        handle 1: htb default 10
> tc class add dev ifb0 parent 1: classid 1:1 htb rate 18000kbit
> tc class add dev ifb0 parent 1:1 classid 1:10 htb rate 18000kbit ceil 
> 18000kbit
> tc filter add dev eth0.2 parent ffff: protocol all prio 1 u32 match u32 
> 0 0 action mirred egress redirect dev ifb0
> tc qdisc add dev ifb0 parent 1:10 handle 100: fq_codel limit 1000 
> quantum 1514 noecn
> 
> Now the problem is that it is very inaccurate, and this inaccuracy 
> increases with number of TCP download connections.
> With HTB (it's worse with HFSC) and 20 connections (rates calculated 
> with /sys/class/net/eth0.2/statistics/rx_bytes over 5 seconds each):
> 19012.610 kbit
> 19266.877 kbit
> 19303.923 kbit

Sure, but you shape on ifb0, so the rate you observe on the ingress
interface is before any rate limit...

This could be 77.77 Mbit, and the tc scripts have no say about that.


> 
> With a single connection and HTB, I only get about 17.2 Mbit. Curiously, 
> /sys/class/net/ifb0/statistics show about 18.1 +/- 0.1 Mbit
> 
> So I did a tcpdump of eth0.2 and ifb0:
> eth0.2's largest frames are 1514 bytes ethernet type IP.
> ifb0's are 1518 bytes (including the VLAN tag I guess) with some unknown 
> type.
> 
> 
> What do I need to change to make shaping with ifb accurate? Ideally with 
> both HTB and HFSC.
> 

Have you checked your syslog ?

It seems you would need 'quantum 8000' or something like that.

tc class add dev ifb0 parent 1: classid 1:1 htb rate 18000kbit \
     quantum 8000
tc class add dev ifb0 parent 1:1 classid 1:10 htb rate 18000kbit \
     ceil 18000kbit quantum 8000




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?
  2017-03-17 21:15 ` Eric Dumazet
@ 2017-03-18 15:34   ` xnor
  2017-03-18 15:46     ` Jonathan Morton
  0 siblings, 1 reply; 9+ messages in thread
From: xnor @ 2017-03-18 15:34 UTC (permalink / raw)
  To: bloat; +Cc: chromatix99

Hey Jonathan,

I have updated to Lede 17.01.0, installed sch_cake, replaced my 
htb/hfsc+fq_codels with cakes (with bandwidth, docsis, internet 
options), but the problem persists.

I've seen that it has an autorate_ingress option, which doesn't improve 
my situation. But it raises a question:
Why not operate on ingress rates, or at least add an ingress-operation 
option/mode?

If I understood it correctly, and very simplified speaking, right now a 
queue is filled with link speed and drained by a configured bandwidth. 
As the pressure in a queue rises, packets will be dropped/marked to 
reduce pressure to some tolerable level.

Why not measure ingress rate (averaged over a window related to the 
configured rtt) and increase pressure if the ingress rate surpasses the 
configured bandwidth?

Anyway, I can provide information of cake now. Just let me know what you 
need (tc -d -s qdisc/class/filter ... I guess).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?
  2017-03-18 15:34   ` xnor
@ 2017-03-18 15:46     ` Jonathan Morton
  2017-03-24 18:18       ` xnor
  0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Morton @ 2017-03-18 15:46 UTC (permalink / raw)
  To: xnor; +Cc: bloat

> On 18 Mar, 2017, at 17:34, xnor <xnoreq@gmail.com> wrote:
> 
> If I understood it correctly, and very simplified speaking, right now a queue is filled with link speed and drained by a configured bandwidth. As the pressure in a queue rises, packets will be dropped/marked to reduce pressure to some tolerable level.
> 
> Why not measure ingress rate (averaged over a window related to the configured rtt) and increase pressure if the ingress rate surpasses the configured bandwidth?

If the ingress rate is higher than the shaped rate, the queue will fill, and this will automatically trigger higher AQM activity.  There is no need to actually measure ingress rate to achieve this, and that’s not what autorate_ingress is meant for.

What you should be looking for is the number of dropped and marked packets.

Marked packets (which appear due to AQM activity on ECN-capable traffic) continue through to the receiver and inform it directly of congestion, which is then communicated to the sender, which in turn is supposed to reduce its congestion window to suit.

Dropped packets represent the *difference* between the ingress rate and the rate actually reaching the receiver, which is informed about congestion by detecting these packets’ *absence*.  It must then tell the sender to re-send those packets, resulting in *more* ingress traffic for a given goodput.

A useful experiment would be to reduce your shaped rate until you see an effect on  goodput (measured at the receiver) and ingress rate.  For example, configure for 5Mbps down and 1Mbps up, and see what you actually get.

On some hardware we have seen a perplexing doubling of the configured shaped rate.  This is not a bug in the shaper, but may be due to bad timer hardware which runs faster than realtime.  In this case you might see *no* marking and dropping when configured for the full rate intended.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?
  2017-03-18 15:46     ` Jonathan Morton
@ 2017-03-24 18:18       ` xnor
  2017-03-25  0:10         ` Jonathan Morton
  0 siblings, 1 reply; 9+ messages in thread
From: xnor @ 2017-03-24 18:18 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: bloat

Hey Jonathan,

you haven't responded to the cake stats I'd sent you, but I just wanted 
to say that I seem to have solved my problem in an unconventional way.

I've grabbed the code and changed (miused) the autorate_ingress portion 
to do my own rate adjustments based on measured ingress rate. So if the 
ingress rate starts to exceed the configured rate, I lower the rate a 
bit and if the sender backs off and ingress rate drops then the rate is 
slowly raised again .. and so on and on - with reasonable limits and 
some smoothing.

Btw, sch_cake.c doesn't seem to ever set last_reconfig_time.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN?
  2017-03-24 18:18       ` xnor
@ 2017-03-25  0:10         ` Jonathan Morton
  0 siblings, 0 replies; 9+ messages in thread
From: Jonathan Morton @ 2017-03-25  0:10 UTC (permalink / raw)
  To: xnor; +Cc: bloat

> On 24 Mar, 2017, at 20:18, xnor <xnoreq@gmail.com> wrote:
> 
> you haven't responded to the cake stats I'd sent you

Ah, sorry about that.  I see them in the thread, but I must have missed them at the time.

You do indeed have a significant number of packet drops in each case, with the more significant numbers corresponding to more intensive use cases.  This is reflected in the differences you quote between “in” ad “out” bandwidth used; the “out” rates are all consistent with the configured rate.  Importantly, you also *don't* have any ECN-marked packets.

Unfortunately, this behaviour is a known limitation of shapers operating downstream of the actual bottleneck; they cannot control the queue as thoroughly as they could if placed upstream of it.  The problem is sharply exacerbated when ECN is not used, since the only signalling mechanism is then to drop packets, but every dropped packet has already consumed time on the bottleneck link and is then not accounted for by the shaper.

You may get significantly better results from turning on ECN negotiation on your hosts.  Most servers already accept ECN negotiation, but won’t initiate it.  (Do we have an ECN tutorial on bufferbloat.net already?)

Your solution is an interesting one, but I have a better idea which I could reasonably implement in Cake: account for dropped packets, as well as delivered packets, when an “ingress mode” flag is set.  This avoids having an inner control loop which, I’ve found, is always less accurate and takes longer to respond.

I’d be grateful if you could open an issue on Github regarding this “ingress mode” flag, as a reminder.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-03-25  0:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 22:14 [Bloat] Inaccurate rates with HTB/HFSC+fq_codel on router due to VLAN? xnor
     [not found] ` <57A046A6-C0F2-4209-93E5-CA728F4C64EE@gmx.de>
2017-03-17 19:15   ` xnor
2017-03-17 19:22     ` Jonathan Morton
2017-03-17 20:11     ` Sebastian Moeller
2017-03-17 21:15 ` Eric Dumazet
2017-03-18 15:34   ` xnor
2017-03-18 15:46     ` Jonathan Morton
2017-03-24 18:18       ` xnor
2017-03-25  0:10         ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox