Development issues regarding the cerowrt test router project
 help / color / mirror / Atom feed
From: Dave Taht <dave.taht@gmail.com>
To: Make-Wifi-fast <make-wifi-fast@lists.bufferbloat.net>,
	 cerowrt-devel@lists.bufferbloat.net
Subject: [Cerowrt-devel] babeld patch enabling ecn
Date: Tue, 28 Aug 2018 09:45:43 -0700	[thread overview]
Message-ID: <CAA93jw46K7weUKq2tem9MbX87Cz0gDVqefmdTwF+Ee_V64ohfQ@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 2461 bytes --]

In cleaning up the lab and some long out of tree patches, I guess I
should make a push for the the following patch to be more thoroughly
tested in openwrt. For review here first before running that
gauntlet....

[PATCH] Disable CS6 and enable ECN in Babeld
This one line patch disables CS6 marking and enables ECN in babeld.

ECN decouples "packet loss from congestion" from "loss due to bad connectivity".

It also moves unicast babel packets into the best effort queue.

The good:

* OpenWrt is fully fq_codeled and doesn't pay much attention to diffserv
* ECN'd Routes stay up even under extreme congestion
* ECN'd Packet loss returns to a measure of connectivity only
* Killing CS6 saves bandwidth - The 802.11n VO queue (where CS6 falls
normally)  cannot aggregate.


I would support a default qos-map for Openwrt 802.11n devices essentially
disabling the VO queue, as better aggregation works so much better,
(In fact I'd disable VI and BK universally also),

post fq_codel for wifi on ath*, and poorly in general on all devices -
  and then keep CS6 + ECN.


The bad:

* Babel does not do anything to reduce its rate on receipt of CE
  or modify its metrics

Given a choice between losing core connectivity under congestion or not...

* a babeld instance using ECN over a fq_codel'd link will always be
  more reachable than a non-ecn'd one

you can argue that a fq_codel'd link is generically faster than one that is not.

* CS6 does help somewhat on ethernet switches but it's largely been immeasurable
* Lacking an effective response to ECN large babel networks can fill it's FQ'd
  queue with undroppable packets. This problem is generic, actually, ECN or no,

  as the multicast queue is infinite and not fq_codeled.

  babel protocol packets however are light, a single flow, and babel flooding

  will be unnoticible except to itself, and if it gets truly out of hand

  the bulk dropper should kick in.

* Babeld should also independently schedule hellos from route announcements

  and manage the route announcement queue better


After 5 years in my deployment of babel + fq_codel running this patch IMHO

this is the best of multiple bad alternatives, and dramatically improves

network reliability.


It is the rough equivalent of adding a minimal "control plane" for
critical packets.


-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

[-- Attachment #2: 0001-Disable-CS6-and-enable-ECN-in-Babeld.patch --]
[-- Type: application/octet-stream, Size: 3304 bytes --]

From 4ac61f74fc0f2ecc49b5304038e3077aae8f7b04 Mon Sep 17 00:00:00 2001
From: Dave Taht <dave@taht.net>
Date: Tue, 28 Aug 2018 09:38:26 -0700
Subject: [PATCH] Disable CS6 and enable ECN in Babeld

This one line patch disables CS6 marking and enables ECN in babeld.

ECN decouples "packet loss from congestion" from "loss due to bad connectivity".

It also moves unicast babel packets into the best effort queue.

The good:

* OpenWrt is fully fq_codeled and doesn't pay much attention to diffserv
* ECN'd Routes stay up even under extreme congestion
* ECN'd Packet loss returns to a measure of connectivity only
* Killing CS6 saves bandwidth - The 802.11n VO queue (where CS6 falls normally)
  cannot aggregate.

I would support a default qos-map for Openwrt 802.11n devices essentially
disabling the VO queue, as better aggregation works so much better,
(In fact I'd disable VI and BK universally also),
post fq_codel for wifi on ath*, and poorly in general on all devices -
  and then keep CS6 + ECN.

The bad:

* Babel does not do anything to reduce its rate on receipt of CE
  or modify its metrics

Given a choice between losing core connectivity under congestion or not...

* a babeld instance using ECN over a fq_codel'd link will always be
  more reachable than a non-ecn'd one
you can argue that a fq_codel'd link is generically faster than one that is not.
* CS6 does help somewhat on ethernet switches but it's largely been immeasurable
* Lacking an effective response to ECN large babel networks can fill it's FQ'd
  queue with undroppable packets. This problem is generic, actually, ECN or no,
  as the multicast queue is infinite and not fq_codeled.
  babel protocol packets however are light, a single flow, and babel flooding
  will be unnoticible except to itself, and if it gets truly out of hand
  the bulk dropper should kick in.
* Babeld should also independently schedule hellos from route announcements
  and manage the route announcement queue better

After 5 years in my deployment of babel + fq_codel running this patch IMHO
this is the best of multiple bad alternatives, and dramatically improves
network reliability.

It is the rough equivalent of adding a minimal "control plane" for critical packets.

Otherwise congested routers can (and do) "fall off the net" when they shouldn't.

Other routing protocol stacks (OSPF, BATMAN, BMX, ISIS) should look deeply
into the benefits and pitfalls of ECN.

We've long been exploring alternatives in babel (RTT metrics, unicast hellos, etc)
but none have arrived yet. There are also some statistically sound
means of randomizing and interleaving route announcements to get past congestive
drops that could be used...

... but this works "good enough", for now.

Submitted after a rapidly aborted attempt at deploying vanilla openwrt 18.06
on my production 280+ route network.
---
 net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net.c b/net.c
index 1e5890d..88bbab3 100644
--- a/net.c
+++ b/net.c
@@ -46,7 +46,7 @@ babel_socket(int port)
     int s, rc;
     int saved_errno;
     int one = 1, zero = 0;
-    const int ds = 0xc0;        /* CS6 - Network Control */
+    const int ds = 0x02;        /* ECT - Enable ECN */
 
     s = socket(PF_INET6, SOCK_DGRAM, 0);
     if(s < 0)
-- 
2.15.1 (Apple Git-101)


             reply	other threads:[~2018-08-28 16:46 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-28 16:45 Dave Taht [this message]
2018-08-28 19:05 ` Toke Høiland-Jørgensen
2018-08-28 19:12   ` Dave Taht
2018-08-28 20:32     ` Toke Høiland-Jørgensen
2018-08-28 20:37       ` Dave Taht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/cerowrt-devel.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAA93jw46K7weUKq2tem9MbX87Cz0gDVqefmdTwF+Ee_V64ohfQ@mail.gmail.com \
    --to=dave.taht@gmail.com \
    --cc=cerowrt-devel@lists.bufferbloat.net \
    --cc=make-wifi-fast@lists.bufferbloat.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox