From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-iy0-f171.google.com (mail-iy0-f171.google.com [209.85.210.171]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id AE6D3200875 for ; Wed, 22 Jun 2011 07:50:30 -0700 (PDT) Received: by iyi12 with SMTP id 12so1147064iyi.16 for ; Wed, 22 Jun 2011 08:17:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=GVf40lDFu5tqFQtybL5CngqL9IQqS/YUISASorloGaM=; b=oDPCsSKg0vCDUekvKk4jX/lW6L1487eGCtIANac2dpz6wEVPOFnxygZ1Qgn/cd6WYs 5RZbY4mjxVPsScnARUZI3k0Qypc/iyA2y9k/nRYJAqYViU0gq46bppvSQkcF31McgHTj DQsWfajDKSYxmQQ9bkLwBRZTUe6I9cbkTuRLw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=qmEYRVQXElXhU8bc755L1eLI0lbjEOF4yqH5+u81vqSNRmrubBB6cQNd9G4CqCaGGS ktvCNg9TxddfRIoO6BE+cGPC+7m07eV8arYzIG8+v9YeyFR7w3FwBl3he748QHIjpUh3 Id2OOlC2S5W1wjb8psIZITK1XxyhKzOHLoumw= MIME-Version: 1.0 Received: by 10.42.29.131 with SMTP id r3mr861432icc.377.1308755855626; Wed, 22 Jun 2011 08:17:35 -0700 (PDT) Received: by 10.231.13.76 with HTTP; Wed, 22 Jun 2011 08:17:35 -0700 (PDT) Date: Wed, 22 Jun 2011 09:17:35 -0600 Message-ID: Subject: QoS for system critical packets on wireless From: Dave Taht To: babel-users@lists.alioth.debian.org, bloat-devel Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-BeenThere: bloat-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: "Developers working on AQM, device drivers, and networking stacks" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2011 14:50:31 -0000 The biggest fallout of the diffserv work I was trying was observing that most packets fell into one of 3 buckets: 1) System control and 'MICE' are < less than 1% of all packets. Mice includes a bunch of messages like ARP, NTP, UDP, and most of the icmp6 portion of the stack, in what I'm doing currently. Mice are desperately needed for the network to continue to function. 2) unresponsive streams and udp (varying, but generally pretty low) 3) responsive tcp streams (closer to 99%) So applying some level of packet prioritization to babel appears to make sense - AFTER getting the excessive buffers under control, more on that further below - so it can do it's work. At the moment I am arbitrarily classifying babel packets into the CS6 diffserv class, which in the end, is then classifying the result into the the 802.11e VI or VO classes. (VO being mildly more underused) I haven't the faintest idea if this actually does anything to the window in which frames are transmitted via multicast on wireless-b, g or n. Theoretically the 802.11e VI and VO classes have their own tiny timeslots to broadcast in separate from BE/BK. There is no need to use diffserv, merely prioritization using some other classification scheme be that outright tc queues, or iptables marking would help, to jump the queues internally. (I am getting a spectrum analyzer shortly) There were several (long term!!!) thoughts here: A) wireless devices are currently making heroic efforts (deep buffering, exorbitant retries) to get packets through. Seeing a big delay between transmit time and reception is more an indicator of congestion than actual packet loss is right now. By the time you see actual packet loss, the network has often already collapsed completely. B) Theoretically using a different 802.11e class reduces the heroism by some unknown amount. C) QoS, Packet marking and prioritization of any sort makes babel control packets jump closer to the head of the internal queues of the transmitting clients, thus speeding up routing change propagation. By all means, don't shoot mice, but elephants. Once you do all this stuff, packet loss comes closer to being a measure of actual problems in the air, instead of deep in the stack. D) The ECN bit could be used to indicate congestion on links that aren't losing packets but are experiencing congestion. Moving back to item C, I would like to recomend to babel users on Linux (at least) that they try: Reduce txqueuelen on their ethernet and wireless devices by a lot! I'm using 4-16 at present. Reduce driver buffering by a lot! I cut one driver from 512 packets buffered to 3, which made voip feasible. I note that reducing driver buffering currently reduces wireless-n in single threaded situations by a lot, however, in more real-world scenarios it's hardly noticable - and can be fixed one day, after we get better classification and feedback mechanisms. And by all means, if you are using a wireless-n device on a mostly-g network, excessive buffering hurts a lot. For wired, we are also using ethtool to reduce dma tx values that are often set optimally for GigE (64-256) to what is optimal for the real world of far less than 100Mbit, which appears to be in the range 4-16, where possible. The correct values need to be derived from further experimentation Once these changes are made, Qos actually starts to have some effect on overall network performance again. Without reducing buffer sizes dramatically, it doesn't. Apply QoS and packet priorization to system critical 'mice' packets at the very least. Rate limit but exclude them from being shot down by other bandwidth control mechanisms. Few people are doing QoS to ipv6 packets at all, and many icmp messages (in addition to babel's udp multicast) should be prioritized... I'm pretty sure, based on the results gathered thus far, that this will improve the quality of most mesh networks out there and I'd love for more people to be trying these things on a wider scale and let us know what the effects are. I've also written elsewhere about the effect of multicast traffic on wireless and am trying hard to stop bridging gigE (1000Mbit) and Wireless (a,b,g,n) together wherever possible, as the huge disparity between the multicast rates is not accounted for in any QoS scheme available to date in Linux. Addressing large scale usage multicast effectively is going to take a great deal of work, and even arp can cause headaches. --=20 Dave T=E4ht SKYPE: davetaht US Tel: 1-239-829-5608 http://the-edge.blogspot.com