[Codel] cake3 vs sqm+fq_codel at 115/12 mbit (basically comcast´s blast service)

CoDel AQM discussions
 help / color / mirror / Atom feed

* [Codel] cake3 vs sqm+fq_codel at 115/12 mbit (basically comcast´s blast service)
@ 2015-04-02 18:05 Dave Taht
  2015-04-02 19:03 ` [Codel] [Cerowrt-devel] " Jonathan Morton
  0 siblings, 1 reply; 2+ messages in thread
From: Dave Taht @ 2015-04-02 18:05 UTC (permalink / raw)
  To: cerowrt-devel, codel

[-- Attachment #1: Type: text/plain, Size: 2517 bytes --]

this is with a special build of openwrt (not CeroWrt) on the tplink
archer c7v2. It rips out the unaligned access hacks, and is compiled
for the mips74k processor in that box.

Even with hostapd
running like crazy for no good reason, we do fq/aqm/ecn perfectly with cake3
at the 115/12 mbit rate now common from comcast, with about 5% cpu
left over, where the sqm+fq_codel version runs out of cpu and falls
apart you will see in the attached graphs....

For the longest time we were aiming for a piece of affordable hardware
that could do 300
Mbit download shaping, with no luck. On this low end (this box is  89
dollars on newegg),
maybe this is enough to get restarted with, while we wait for other
stuff to stablize.

The 115Mbit service from comcast exhibits about 230ms worth of latency
under load on downloads without this shaping in place, 5-25ms with it.
:) The uplink, well, I have data for it somewhere, but it isnt
pretty... and totally fixed by cake3 here also.

(I still have to benchmark ipv6, I want to share some joy, however
briefly, first.)

That test build is at:

http://snapon.lab.bufferbloat.net/~cero3/archerc7v2/ar71xx/

DO NOT install this on any hardware that is not mips74k (e.g. dont try
the wndr3800). Do feel free to try anything in the above list that is
mips74k.

I would like to try an octeon build with cake3, to see if 115mbit can
be achieved there, too, but I think more performance analysis and
optimization is needed first.

Anyway, cake3 outputs a ton more statistics

root@OpenWrt:/# tc -s qdisc show dev eth1
qdisc cake3 8005: root refcnt 2 bandwidth 12Mbit diffserv4 flows
 Sent 437523173 bytes 1386559 pkt (dropped 4317, overlimits 1852389 requeues 0)
 backlog 0b 0p requeues 0
           Class 0     Class 1     Class 2     Class 3
  rate        12Mbit   11250Kbit       9Mbit       3Mbit
  target       5.0ms       5.0ms       5.0ms       6.1ms
interval     105.0ms     105.0ms     105.0ms     106.1ms
Pk delay       5.5ms       301us       295us       196us
Av delay       1.2ms        16us        32us        10us
Sp delay         2us         1us         2us         2us
  pkts        215134     1048937       10377      116428
way inds           4           0           0           0
way miss        4466         143           6          13
way cols           0           0           0           0
  bytes    160066252   257893148     1903096    24102496
  drops         4310           3           0           4
  marks         8037       52634           0       11451

[-- Attachment #2: fq_codel_sqm_archer.png --]
[-- Type: image/png, Size: 96956 bytes --]

[-- Attachment #3: sqm_cake3_archer.png --]
[-- Type: image/png, Size: 85009 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Codel] [Cerowrt-devel] cake3 vs sqm+fq_codel at 115/12 mbit (basically comcast´s blast service)
  2015-04-02 18:05 [Codel] cake3 vs sqm+fq_codel at 115/12 mbit (basically comcast´s blast service) Dave Taht
@ 2015-04-02 19:03 ` Jonathan Morton
  0 siblings, 0 replies; 2+ messages in thread
From: Jonathan Morton @ 2015-04-02 19:03 UTC (permalink / raw)
  To: Dave Taht; +Cc: codel, cerowrt-devel

[-- Attachment #1: Type: text/plain, Size: 3200 bytes --]

Awesome.

Oddly enough, cake3 actually gets slightly less throughput than
htb+fq_codel on the Pentium-MMX. However that's with the simplest possible
htb configuration (since I'm manually typing it in), and no firewall rules
or NAT going on (just a bridge between two Ethernet ports).

A couple of notes on the statistics that are now reported:

The rate for each class is now a threshold rather than a limit. The class
is permitted to use more than that bandwidth (up to the global limit), but
will yield to lower priority classes in that condition. This is consistent
with both user expectations and standard PHB specs, and means that traffic
benefits from high priority markings only if it's appropriately sparse.

On that note, I expect roughly the filtering uses of each class:

0 - background bulk traffic, CS1 marked, ie. BitTorrent. Use as many
parallel connections as you like, without worrying about ordinary traffic.

1 - best effort, the great majority of ordinary traffic - web pages,
software updates, whatever. If in doubt, leave it here (default CS0 lands
here).

2 - elevated priority, bandwidth sensitive traffic, such as streaming video
or a vlan.

3 - low volume, latency sensitive traffic such as VoIP, online games, NTP,
etc. EF traffic lands here.

A minor frustration for me here - firewall rules on ingress are processed
only after the traffic has already passed through ifb. This means I can't
custom mark my inbound traffic.

Three delay statistics are now reported, all of which are based on EWMAs of
packet sojourn times at dequeue. Pk is biased heavily to high delays (so
should usually report on fat flows), Sp to low delays (so should capture
sparse flows), and Av keeps a true average. The concept of a biased EWMA is
borrowed from ReplayGain and the whole "loudness war" problem that it aims
to solve; some broadcast studios (including the BBC) use audio meters which
work this way.

The new set-associative hash function also generates extra statistics. The
same 1024 queues are now divided into 128 sets of 8 "ways", and a tag on
each queue tracks which flow is presently using it. This allows hash
collisions to be resolved in most cases, with limited worst case overhead,
greatly improving flow isolation under severely stressed conditions. (It's
difficult to provoke this on a home network, but offices may well
appreciate this feature.)

The "way miss" counter is incremented whenever an empty queue's tag is
changed to assign it to a new flow, signalling a departure from the fast
path for that packet. Expect to see a small percentage of these with normal
traffic.

The "way indirect hit" counter tracks the situations where a hash collision
would have occurred with a plain hash function, but was resolved by the set
associativity. This is also a departure from the fast path.

The "way collision" counter indicates when even set associative hashing is
insufficient - there are more than 8 distinct flows attempting to occupy
queues in the same set. In such a case, the search for an empty queue is
terminated and the packet is placed in the queue matching the plain hash.
NB: so far this code path is completely untested to my knowledge!

- Jonathan Morton

[-- Attachment #2: Type: text/html, Size: 3554 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2015-04-02 19:03 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-02 18:05 [Codel] cake3 vs sqm+fq_codel at 115/12 mbit (basically comcast´s blast service) Dave Taht
2015-04-02 19:03 ` [Codel] [Cerowrt-devel] " Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox