[Cerowrt-devel] ping loss "considered harmful"

Thu Mar 5 15:53:30 EST 2015

I had spoken to someone at nznog that promised to combine mrtg +
smokeping or cacti + smokeping so as to be able to get long term
latency and bandwidth numbers on one graph. cc added.

On Thu, Mar 5, 2015 at 12:38 PM, Matt Taggart <matt at lackof.org> wrote:
> Dave Taht writes:
>
>> wow. It never registered to me that users might make a value judgement
>> based on the amount of ping *loss*, rather than latency, and in looking back in time, I can
>> think of multiple people that have said things based on their
>> perception that losing pings was bad, and that sqm-scripts was "worse
>> than something else because of it."
>
> This thread makes me realize that my standard method of measuring latency
> over time might have issues. I use smokeping
>
>   http://oss.oetiker.ch/smokeping/

in sqm-scripts's case, possibly, all you have been collecting is
largely worst case behavior, which I don't mind collecting as it tends
to be pretty good. :)

However, I have been unclear. In the main (modern - I don't know what
version you have) sqm code, IF you enable dscp squashing on inbound
(the default), you do end up with a single fq_codel queue, not 3, no
classification or ping prioritization. (it is the default because of
all the re-marking I have seen from comcast)

So if you are, as I am, monitoring your boxes from the outside, there
is no classification and prioritization present for ping.

do a tc -s qdisc show ifbwhatever (varies by platform) to see how many
queues you have. Example of a single queued inbound rate limiter +
fq_codel (yea! packet drop AND ecn working great!)

root at lorna-gw:~# tc -s qdisc show dev ifb4ge00
qdisc htb 1: root refcnt 2 r2q 10 default 10 direct_packets_stat 0
direct_qlen 32
 Sent 168443514948 bytes 334370551 pkt (dropped 0, overlimits
143273498 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 110: parent 1:10 limit 1001p flows 1024 quantum 300
target 5.0ms interval 100.0ms ecn
 Sent 168443514948 bytes 334370551 pkt (dropped 17480, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 1514 drop_overlimit 0 new_flow_count 125872421 ecn_mark 1044
  new_flows_len 0 old_flows_len 1

root at lorna-gw:~# uptime
 12:45:35 up 54 days, 22:33,  load average: 0.05, 0.05, 0.04

dscp classification in general, is only useful from within your own
network, going outside.

> which is a really nice way of measuring and visualizing packet loss and
> variations in latency. I am using the default probe type which uses fping
> (ICMP http://www.fping.org/ ).

I LOVE smokeping and wish very much we had a way to combine it with
mrtg data to see latency AND bandwidth at the same time.

>
> It has been working well, I set it up for a site in advance of setting up
> SQM and then afterwards I can see the changes and determine if more tuning
> is needed.  But if ICMP is having it's priority adjusted (up or down), then
> the results might not reflect the latency of other services.
>
> Fortunately the nice thing is that many other probe types exist
>
>   http://oss.oetiker.ch/smokeping/probe/index.en.html
>
> So which probe types would be good to use for bufferbloat measurement? I
> guess the answer is "whatever is important to you", but I also suspect
> there is a set of things that ISPs are known to mess with.
> HTTP? But also maybe HTTPS in case they are doing some sort of transparent
> proxy?
> DNS?
> SIP?
> I suppose you could even do explicit checks for things like Netflix (but
> then it's easy to go off on a tangent of building a net neutrality
> observatory).
>
> On a somewhat related note, I was once using smokeping to measure a fiber
> link to a bandwidth provider and had it configured to ping the router IP on
> the other side of the link. In talking to one of their engineers, I learned
> that they deprioritize ICMP when talking _with_ their routers, so my
> measurement weren't valid. (I don't know if they deprioritize ICMP traffic
> going _through_ their routers)

I do strongly recomend deprioritizing ping slightly, and as I noted, I
have seen many a borken
script that actually prioritized it, which is foolish, at best.

I keep hoping multiple (many!) someones here will go have lunch with
their company's oft lonely, oft starving sysadmin(s), to ask them what
they are doing as to firewalling, QoS and traffic shaping. Most of the
ones I have talked are quite eager to show off their work, which is
unfortunately often of wildly varying quality and complexity.

I find that an offer of saki and sushi are most conducive to getting
that conversation started.

I certainly would like to see more default corporate
firewall/QoS/shaping rules than I have personally, for various
platforms. Someone's got to have some good ideas in them... and it
would be nice to know how far the bad ones, have propagated.

> --
> Matt Taggart
> matt at lackof.org
>
>

-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb