<font face="times new roman" size="3"><p style="margin:0;padding:0;">All the points below make sense.   Ideally you want to measure the TCP FQ Codel interaction in the "real world".  Throughput benchmarks are irrelevant, the equivalent of  Hot Rod amateur dragstrip competitions among cars that cannot even turn corners.</p>

<p style="margin:0;padding:0;"> </p>

<p style="margin:0;padding:0;">Beyond being hard, there is no "agreed upon" standard for testing "real world" performance - which is why academics who care little about anything other than publishing go for the "Hot Rod" stuff.</p>

<p style="margin:0;padding:0;"> </p>

<p style="margin:0;padding:0;">In your lwn posting, I think it is worth pointing out that "wrongheaded benchmarks"  were exactly what drove the folks who created the bufferbloat problem in the first place.  And those people are still alive and kicking (in the wrong direction).  But that's how you get tenure.</p>

<p style="margin:0;padding:0;"> </p>

<p style="margin:0;padding:0;">The other issue is "KISS".   I would *seriously* suggest that the idea of "classification" not get too entangled with the problem at this point.</p>

<p style="margin:0;padding:0;"> </p>

<p style="margin:0;padding:0;">Classification has many downsides, most of which will just confuse the inventors, adding what is probably an unnecessarily complex space of design alternatives.  If you must discuss classification (which is another academic wet dream), discuss it as "future research".</p>

<p style="margin:0;padding:0;"> </p>

<p style="margin:0;padding:0;">Two classes (latency critical, and latency as short as possible) should be enough in a network that for "control loop" reasons wants to have minimal control latencies *all of the time*.  I'm not sure that two is the desired state - I tend to think 1 class is better on an end-to-end basis.</p>

<p style="margin:0;padding:0;"> </p>

<p style="margin:0;padding:0;">If you want to stabilize things with faster control loops, just order all queues by "packet entry" timestamps, and move ECN-style marking towards "head-marking" - that is signaling congestion in packets that are being transmitted if any packets are queued behind them.</p>

<p style="margin:0;padding:0;"> </p>

<p style="margin:0;padding:0;">That creates the most responsive control loops possible on an end-to-end basis for TCP and other congestion-managing protocols.</p>

<p style="margin:0;padding:0;"> </p>

<p style="margin:0;padding:0;">-----Original Message-----<br />From: "Dave Taht" <dave.taht@gmail.com><br />Sent: Saturday, November 24, 2012 11:19am<br />To: "Toke Høiland-Jørgensen" <toke@toke.dk><br />Cc: "Paolo Valente" <paolo.valente@unimore.it>, "Eric Raymond" <esr@thyrsus.com>, codel@lists.bufferbloat.net, cerowrt-devel@lists.bufferbloat.net, "bloat" <bloat@lists.bufferbloat.net>, paulmck@linux.vnet.ibm.com, "David Woodhouse" <dwmw2@infradead.org>, "John Crispin" <blogic@openwrt.org><br />Subject: Re: [Cerowrt-devel] FQ_Codel lwn draft article review<br /><br /></p>

<div id="SafeStyles1353774248">

<p style="margin:0;padding:0;">On Sat, Nov 24, 2012 at 1:07 AM, Toke Høiland-Jørgensen <toke@toke.dk> wrote:<br />> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:<br />><br />>> I am using these two in a new "Effectiveness of FQ-CoDel" section.<br />>> Chrome can display .svg, and if it becomes a problem, I am sure that<br />>> they can be converted.  Please let me know if some other data would<br />>> make the point better.<br /><br />My primary focus has been on making the kind of internet over a<br />billion people have, function better, that with <10Mbit uplinks. While<br />it's nice to show an improvement on 100Mbit, gigE and higher, I'd<br />rather talk to the 10Mbit and below cases whenever possible.<br /><br />><br />> If you are just trying to show the "ideal" effectiveness of fq_codel,<br />> two attached graphs are from some old tests we did at the UDS showing a<br />> simple ethernet link between two laptops with a single stream going in<br />> each direction. This is of course by no means a real-world test, but on<br />> the other hand they show a very visible factor ~4 improvement in<br />> latency.<br />><br />> These are the same graphs Dave used in his slides, but also in a 100mbit<br />> version.<br /><br />As noted above, 10Mbit is better to show. Secondly, in looking over<br />the 10Mbit graph, I realized that we could also keep injecting new<br />tcps at intervals of every 5 seconds, for shorter  periods, to observe<br />what happens.<br /><br />And more importantly, I'd like to avoid falling into the trap that so<br />much network research falls into, which is blithely benchmarking lots<br />of long duration TCP traffic,<br />rather than the kinds of network traffic we actually see in the real<br />world. A real world web page might have a hundred or more dns lookups<br />and a hundred tcp streams, the vast majority of which are so short as<br />to not get out of slow start.<br /><br />Now - seeing/measuring/graphing that - is *hard* - which is why it is<br />so rarely done. Because it's hard, but accurately measures the real<br />world, says it should be done.<br /><br />However, I can see leveraging the clean 10Mbit trace or a (better)<br />asymmetric 24/5.5 case, and while pounding it with the existing,<br />simple code for 1 full rate up, 1 full rate down, and a CIR stream for<br />voice - impacting that plot with chrome web page benchmark or<br />something similar.<br /><br />Indirectly observing the web load effects on that graph, while timing<br />web page completion, would be good, when comparing pfifo_fast and<br />various aqm variants.<br /><br /><br />>> Also, I know what ICMP is, but the UDP variants are new to me.  Could<br />>> you please expand the "EF", "BK", "BE", and "CSS" acronyms?<br />><br />> The UDP ping times are simply roundtrips/second (as measured by netperf)<br />> converted to ping times. The acronyms are diffserv markings, i.e.<br />> EF=expedited forwarding, BK=bulk (CS1 marking), BE=best effort (no<br />> marking).<br /><br />The classification tests are in there for a number of reasons.<br /><br />0) I needed multiple streams in the test anyway.<br /><br />1) Many people keep insisting that classification can work. It<br />doesn't. It never has. Not over the wild and wooly internet. It only<br />rarely does any good at all even on internal networks. It sometimes<br />works on some kinds of udp streams, but that's it. The bulk of the<br />problem is the massive packet streams modern offloads generate, and<br />breaking those up, everywhere possible, any time possible.<br /><br />I had put up a graph last week, that showed each classification bucket<br />for a tcp stream being totally ignored...<br /><br />2) Theoretically wireless 802.11e SHOULD respect classification. In<br />fact, it does, on the ath9k, to a large extent. However, on the iwl I<br />have, BE, BK traffic get completely starved by VO, and VI traffic,<br />which is something of a bug. I'm certain that due to inadaquate<br />testing, 802.11e classification is largely broken in the field, and<br />I'd hoped this test would bring that out to more people.<br /><br />3) I don't mind at an effort to make classification work, particularly<br />for traffic clearly marked background, such as bittorrent often is.<br />Perhaps this is an opportunity to get IPv6 done up right, as it seems<br />the diffserv bits are much more rarely fiddled with in transit.<br /><br />> The UDP ping tests tend to not work so well on a loaded link,<br />> however, since netperf stops sending packets after detecting<br />> (excessive(?)) loss. Which is why you see only see the UDP ping times on<br />> the first part of the graph.<br /><br />Netperf stops UDP_STREAM exchanges after the first lost udp packet.<br />This is not helpful.<br /><br />I keep noting that the next phase of the rrul development is to find a<br />good pair of CIR one way measurements that look a bit like voip.<br />Either that test can get added to netperf or we use another tool, or<br />we create one, and I keep hoping for recommendations from various<br />people on this list. Come on, something like this<br />exists? Anybody?<br /><br />Another reason for a UDP based voip-like ping test is that icmp is<br />frequently handled differently than other sorts of streams.<br /><br />A TCP based ping test used to be in there (and should go back) as it<br />shows the impact of packet loss on TCP behavior. (that said, the<br />TCP_RR test is roughly equivalent)<br /><br />After staring at the tons of data collected over the past year, on<br />wifi, I'm willing to strongly suggest we just drop TCP packets after<br />500ms in the wifi stack, period, as that exceeds the round trip<br />timeout...<br /><br /><br />> The markings are also used on the TCP flows, as seen in the legend for<br />> the up/downloads.<br />><br />>> All sessions were started at T+5, then?<br />><br />> The pings start right away, the transfers start at T+5 seconds. Looks<br />> like the first ~five seconds of transfer is being cut off on those<br />> graphs.<br /><br />Ramping up to 10K packets is silly at gigE, and looks like an outlier.<br /><br />> I think what happens is that one of the streams (the turquoise<br />> one) starts up faster than the other ones, consuming all the bandwidth<br />> for the first couple of seconds until they adjust to the same level.<br /><br />I'm not willing to draw this conclusion from this graph, and need<br />to/would like someone else to/ setup a test in a controlled<br />environment. the wrapper scripts<br />can dump the raw data and I can manually plot using gnuplot or a<br />spreadsheet, but it's tedious...<br /><br />> These initial values are then scaled off the graph as outlier values.<br /><br />Huge need for cdf plots and to present the outliers. In fact I'd like<br />graphs that just presented the outliers. Another way to approach it<br />would be, instead of creating static graphs, to use something like the<br />ds3.js and incorporate the ability to zoom<br />in, around, and so on, on multiple data sets. Or leverage mlab's tools.<br /><br />I am no better at javascript than python.<br /><br />> If<br />> you zoom in on the beginning of the graph you can see the turquoise line<br />> coming down from far off the scale in one direction, while the rest come<br />> From off the bottom.<br /><br />Not willing to draw any conclusions. I am.<br /><br />>> Please see attached for update including .git directory.<br />><br />> I got a little lost in all the lists of SFQ, but other than that I found<br />> it quite readable. The diagrams of the queuing algorithms are a tad big,<br />> though, I think. :)<br /><br />I would like to take some serious time to make them better. I'm<br />graphically hopeless, however I know what I like, and a picture does<br />tell a thousand words.<br /><br />><br />> When is the article going to be published?<br /><br />Well, jon strongly indicated he'd take an article, and I told him that<br />once I found a theme, co-authors, and time, I'd talk to him again. We<br />seem to be making rapid progress due to paul stepping up and your<br />graphing tools.<br /><br />So as for publication: when it's done, would be my guess! I would like<br />this to be the best presentation, possible, and also address some FUD<br />spread by the recent Cisco PIE presentation.<br /><br />That said, I do feel the need for formal publication in a dead-tree<br />journal somewhere, which could talk to some of the interesting stuff<br />like beating tcp global synchronization (finally), and the RTT info,<br />and maybe also explore the few known flaws of fq_codel...<br /><br /><br /><br /><br />-- <br />Dave Täht<br /><br />Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html<br />_______________________________________________<br />Cerowrt-devel mailing list<br />Cerowrt-devel@lists.bufferbloat.net<br />https://lists.bufferbloat.net/listinfo/cerowrt-devel</p>

</div></font>