From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ey0-f171.google.com (mail-ey0-f171.google.com [209.85.215.171]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 577CD20067D; Mon, 19 Dec 2011 04:04:09 -0800 (PST) Received: by eaad11 with SMTP id d11so5127361eaa.16 for ; Mon, 19 Dec 2011 04:04:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=uH0WBBgQgeFSTLF/7P3aD+4VZ6NHsOgNXfn1JD0/8hw=; b=ayJ7fHyNa5BvgZuV1SrF9QookmYf+k7TmGBvaJuoIF/ydA9LNGy5QjZLsPvsJWcd/4 3Boc3NFMUUQC1D+A0x8o9jKwMHE0AJ/7/+Z67OpsUihzTo+Pi6zwoBNQb0dyUQHLyM0d DWaCU/+lds5YX9WONHuH1lTBYNX+LcxmqrI0s= MIME-Version: 1.0 Received: by 10.204.157.12 with SMTP id z12mr1407284bkw.18.1324296246719; Mon, 19 Dec 2011 04:04:06 -0800 (PST) Received: by 10.204.15.69 with HTTP; Mon, 19 Dec 2011 04:04:06 -0800 (PST) Date: Mon, 19 Dec 2011 13:04:06 +0100 Message-ID: Subject: some AQM work thus far (was: Re: cerowrt-bql-3 available) From: Dave Taht To: cerowrt-devel@lists.bufferbloat.net, bloat-devel Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: bloat-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Developers working on AQM, device drivers, and networking stacks" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 19 Dec 2011 12:04:09 -0000 I've kind of switched my focus to ethernet for the nonce... On Mon, Dec 19, 2011 at 10:50 AM, Dave Taht wrote: > (PLEASE NOTE: you can prototype schedulers/AQMs/shapers just fine on an x= 86 box) Actually, um, er, you can't. If you want to operate a line rate - 100Mbit, 10Mbit as examples, and do anything with these schedulers, it helps to have a bql-enabled kernel. The one I'm using is up at: http://huchra.bufferbloat.net/~d/bql/ If you want to be running via htb or hsfc to create a 'soft' line rate (like 4Mbit), bql is not strictly required, but it does seem to help. Absolutely required is to disable tso,gso,and ufo on the ethernet driver. I don't know if this is a bug or not, but although the e1000e 'says' it's turning tso off when running at a 100Mbit line rate, gso remains on - and you still have to explicitly still turn off tso and gso in sequence. So most of my FQ/AQM scripts have the following in them in the pre-up stage IFACE=3Deth0 # pick an interface... ethtool -s $IFACE advertise 0x008 # switch to 100Mbit line rate for testing ethtool -K $IFACE tso off # Turn off TSO (I'm developing an intense hatred for TSO) ethtool -K $IFACE gso off # Turn off GSO (on by default since feb, sadly) ethtool -K $IFACE ufo off # Turn off UFO (very few drivers have this on) ethtool -G $IFACE tx 64 # Can't go any lower than this on If you don't turn off tso/gso/ufo, you end up sending colossal superpacket bursts through the scheduler - up to 64k in size - which really messes with byte oriented AQMs by default, and packet oriented ones nearly as much. I think this explains the issues I was having with SFB, as one example, and RED as another. They both kind of expect to have more and different information in the queue about the available streams. At 100Mbit, I still get best results with disabling the automatic MIAD algorithm in BQL to instead have a hard limit of 6000 bytes. I've experimented with the /proc/sys/net/ipv4/tcp_low_latency parameter to no real effect. And you have to remember there are always TWO (just like the sith) systems involved at minimum. More than once I would get a weird result and realize that I'd screwed up on the system not under test or the system inbetween. I should probably return to SFB and see what a FQing server does to it, and then look at what TSO/GS= O does to it. So, all the above is seemingly required to get to where an AQM has some effect on streams, from my x86 laptop at present. I note that in an experiment I did recently on cerowrt, a tx ring of 16 on it was actually slower than a tx ring of 4 by at gigE speeds talking to it's netperf daemon (280 vs 264 Mbit/sec) I still find that result puzzling but it seems repeatable. However what really matters is forwarding performance, and tend to think that a larger tx ring will help in that case, but all the same.... A last note - in doing this testing, and observing the results at gigabit, on cerowrt (which can barely host the test daemon and run at 280Mbit (they will forward at 500+)), it appears that the connection to it's local switch (gigabit) is so fast and buffered that a decent portion of the buffering is happening there. Trying to FQ in software on cerowrt at those speeds rarely actually happens. Similarly, running at GigE line rate, FQ does not happen very often on the x86 box - packets get sent out in bunches and the packet scheduler doesn't have enough time before dumping them to the hardware with any given bunch to do anything intelligent with them. As near as I can figure, in order to get decent Fair Queuing out of a desktop or server at these speeds, how we dequeue packets from the TCP portion of the stack needs to be rethunk, not just the txqueue portion I'm fiddling with now. Another thought would be to rate-limit/FQ/manage on the home desktops/laptops themselves multiple outgoing streams to destinations outside of the home network. I'm not sure to what extent this would help, but getting the bandwidth ratio (say you have 4Mbit) down from 1GigE - which is a factor of 127 - down to 4Mbit - might allow for more opportunities for more valuable packets to get moved to the head of line on the originating machine and compensate for the bursty nature of the incoming streams these days.... Even if the rest of your family is all banging on the network, your outgoing bandwidth estimate is then only off by a factor of 4, rather than 1000. I wish I had better tools to analyze the 'fairness' of multiple streams than the mark #1 eyeball. The closest thing I've found of late was jain's fairness index, and it doesn't do this. I have some screenshots and packet captures of QFQ,-TSO,-GSO,-UFO vs PFIFO_FAST+GSO+TSO if anybody wants them. The first is lPr0n that Nagle would love, the second, more like a horror movie in comparison. Actually, I'll stick them up on bug #305 after I get on a less (ironically) bufferbloated network... http://www.bufferbloat.net/issues/305 I hope to play with Eric's new adaptive RED over the holiday. --=20 Dave T=E4ht SKYPE: davetaht US Tel: 1-239-829-5608 FR Tel: 0638645374 http://www.bufferbloat.net