Some time recently I read a casual paper (on Medium I think) that made the point that deep diving into outliers and understanding them has led to a half dozen Nobel prizes, because they lead to discoveries of phenomena that nobody else had even noticed. See for instinance the Holmdel Horn https://en.wikipedia.org/wiki/Holmdel_Horn_Antenna To keep sane, I tend to keep outliers and clip them as last as possible, e.g. by choice of graph axis. This way I have the opportunity to notice otherwise hidden patterns. In mlab data we sometimes see outliers that suggest "out of bounds" data rates. e.g. a repeated test that clearly has a max rate of 50 Mb/s or something, and then every so often a one test at 200 Mb/s or higher. My assumption is that these are from software managed shapers that occasionally fail to properly load their configurations. (I admit to not having looked hard enough to prove this hypnosis). Thanks, --MM-- The best way to predict the future is to create it. - Alan Kay We must not tolerate intolerance; however our response must be carefully measured: too strong would be hypocritical and risks spiraling out of control; too weak risks being mistaken for tacit approval. On Thu, Jun 17, 2021 at 7:25 AM Nick Buraglio wrote: > This is much more common in the high performance computing and networking > space (i.e. perfsonar, TWAMP, and OWAMP). I have also been pushing "gather > and store all the data" for ....since I was an engineer working on the > Teragrid (which is where I first saw Matt's MTU talk around 2002 or 03, > BTW). =) > High fidelity plots of everything that can be gathered is laborious to > curate but is invaluable for so many reasons. Now we just need a way to > make it happen everywhere for everyone in a way that's easy. > > nb > > > On Thu, Jun 17, 2021 at 8:57 AM Dave Taht wrote: > >> Capturing and plotting *all* the data is often revealing. >> >> Sometimes plotting the data you are discarding (for what seems like sane >> reasons) is quite revealing. Saw this on slashdot this morning, it’s >> good... >> >> >> https://www.newyorker.com/magazine/2021/06/21/when-graphs-are-a-matter-of-life-and-death >> >> In the bufferbloat effort I’ve fought time and time again for folk to >> stop throwing out data above the 95 percentile, and at the very least plot >> everything they threw out to find patterns... >> >> dslreports’ graphing tools, for example, throws out a ton of “outliers" … >> and the only reason why there is no data past 4 sec here, is that the test >> doesn’t run long enough. >> >> http://www.dslreports.com/speedtest/results/bufferbloat?up=1 >> >> (been trying to get ahold of someone over there to buy their raw data for >> years now. They have the biggest - 8 years worth - collection) >> >> mlabs has a similar data reduction issue that they haven’t got around to >> fixing. >> >> And more recently we encountered a smoothing problem in wireshark that >> made a halt in packet processing look more like a normal tcp cwnd cut…. >> >> _______________________________________________ >> Starlink mailing list >> Starlink@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/starlink >> >