Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat

Historic archive of defunct list bloat-devel@lists.bufferbloat.net
 help / color / mirror / Atom feed

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
       [not found] <x1-oTZGm1A7eclvABnv1aK0z1Nc7iI@gwene.org>
@ 2011-02-20  1:59 ` Dave Täht
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Täht @ 2011-02-20  1:59 UTC (permalink / raw)
  To: bloat-devel




>Should we be somehow filtering out and ignoring the packets that get
>dropped, when we're calculating the average packet transmission rate?
>Presumably they're getting dropped almost instantly, so they don't
>really take up queue space and they have abnormally fast transmission
>times, which will tend to cause us to overestimate max_enqueued? They
>should be rare, though, at least. (And presumably we *should* include
>packets that get dropped because their retry timer ran out, since they
>were sitting in the queue for that long.) Possibly we should just
>ignore any packet that was handled in less than, oh, say, a few
>microseconds?

Perhaps doing continuous mean variance against time would be simpler
than what you propose.

Sorry, LISP example, with latex:

unknown <post@gwene.org> writes:

> A friend showed me this about 15 years ago. I use it every time I need to
> calculate the variance of some data set. I always forget the exact details and
> have to derive it again. But, it’s easy enough to derive that it’s never a
> problem.
>
> I had to derive it again on Friday and thought, `I should make sure more people
> get this tool into their utility belts'.
>
> First, a quick refresher on what we’re talking about here. The mean \mu of a
> data set { x_1, x_2, \ldots, x_n } is defined to be \frac{1}{n} \sum_{i=1}^n
> x_i. The variance \sigma^2 is defined to be \frac{1}{n} \sum_{i=1}^n (x_i - \
> mu)^2.
>
> A naïve approach to calculating the variance then goes something like this:
>
> (defun mean-variance (data)
>   (flet ((square (x) (* x x)))
>     (let* ((n (length data))
>            (sum (reduce #'+ data :initial-value 0))
>            (mu (/ sum n))
>            (vv (reduce #'(lambda (accum xi)
>                            (+ accum (square (- xi mu))))
>                        data :initial-value 0)))
>       (values mu (/ vv n)))))
>
> This code runs through the data list once to count the items, once to calculate
> the mean, and once to calculate the variance. It is easy to see how we could
> count the items at the same time we are summing them. It is not as obvious how
> we can calculate the sum of squared terms involving the mean until we’ve
> calculated the mean.
>
> If we expand the squared term and pull the constant \mu outside of the
> summations it ends up in, we find that:
> \frac{\sum (x_i - \mu)^2}{n} = \frac{\sum x_i^2}{n} - 2 \mu \frac{\sum x_i}{n}
>                            + \mu^2 \frac{\sum 1}{n}
>
> When we recognize that \frac{\sum x_i}{n} = \mu and \sum_{i=1}^n 1 = n, we get:
> \sigma^2 = \frac{\sum x_i^2}{n} - \mu^2 = \frac{\sum x_i^2}{n} - \left( \frac{\
>                              sum x_i}{n} \right)^2
> .
>
> This leads to the following code:
>
> (defun mean-variance (data)
>   (flet ((square (x) (* x x)))
>     (destructuring-bind (n xs x2s)
>         (reduce #'(lambda (accum xi)
>                     (list (1+ (first accum))
>                           (+ (second accum) xi)
>                           (+ (third accum) (square xi))))
>                 data :initial-value '(0 0 0))
>       (let ((mu (/ xs n)))
>         (values mu (- (/ x2s n) (square mu)))))))
>
> The code is not as simple, but you gain a great deal of flexibility. You can
> easily convert the above concept to continuously track the mean and variance as
> you iterate through an input stream. You do not have to keep data around to
> iterate through later. You can deal with things one sample at a time.
>
> The same concept extends to higher-order moments, too.
>
> Happy counting.
>
> Link

-- 
Dave Taht
http://nex-6.taht.net

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC] mac80211: implement eBDP algorithm to fight bufferbloat
@ 2011-02-17  1:49 John W. Linville
  2011-02-18 21:21 ` [RFC v2] " John W. Linville
  0 siblings, 1 reply; 18+ messages in thread
From: John W. Linville @ 2011-02-17  1:49 UTC (permalink / raw)
  To: linux-wireless; +Cc: johannes, bloat-devel, Nathaniel J. Smith

This is an implementation of the eBDP algorithm as documented in
Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
et al.

	http://www.hamilton.ie/tianji_li/buffersizing.pdf

This implementation timestamps an skb before handing it to the hardware
driver, then computes the service time when the transmit status is
reported back from the driver.  An exponentially weighted moving
average of per packet service times is used to restrict queueing
delays in hopes of achieving a target packet transmission latency.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
---
This is preliminary, but it seems to put some limits on latencies
for me.  I haven't even really done much testing, so YMMV...

I'm sure this isn't ideal.  This should be combined with the ALT
algorithm to yield the A* algorithm.  There are parameters that
probably should be tunable (or at least better researched).  This may
not be ideal for 802.11n -- it may even be detrimental to it.

Still, it is an attempt at addressing buffer bloat.  Plus, it should
pertain to all mac80211-based drivers.  So, feel free to test it,
suggest different parameters, report real numbers, post patches,
etc... :-)

 net/mac80211/ieee80211_i.h |    7 +++++++
 net/mac80211/iface.c       |    6 ++++++
 net/mac80211/status.c      |    9 +++++++++
 net/mac80211/tx.c          |   16 ++++++++++++++++
 4 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index f2ef15d..1c69b29 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -604,6 +604,13 @@ struct ieee80211_sub_if_data {
 		struct dentry *default_mgmt_key;
 	} debugfs;
 #endif
+
+	/* number of packets currently in flight */
+	atomic_t enqueued;
+
+	/* average packet service time */
+	struct ewma tserv_ns_avg;
+
 	/* must be last, dynamically sized area in this! */
 	struct ieee80211_vif vif;
 };
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 5a4e19b..ccfb659 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -857,6 +857,12 @@ static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata,
 	skb_queue_head_init(&sdata->skb_queue);
 	INIT_WORK(&sdata->work, ieee80211_iface_work);
 
+	/* initialize service time estimate for buffer control */
+	ewma_init(&sdata->tserv_ns_avg, 1, 8);
+	ewma_add(&sdata->tserv_ns_avg, 2 * NSEC_PER_MSEC);
+
+	atomic_set(&sdata->enqueued, 0);
+
 	switch (type) {
 	case NL80211_IFTYPE_P2P_GO:
 		type = NL80211_IFTYPE_AP;
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 010a559..9e5eed3 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -172,6 +172,7 @@ static void ieee80211_frame_acked(struct sta_info *sta, struct sk_buff *skb)
 
 void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 {
+	ktime_t tserv;
 	struct sk_buff *skb2;
 	struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
 	struct ieee80211_local *local = hw_to_local(hw);
@@ -188,6 +189,9 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 	bool send_to_cooked;
 	bool acked;
 
+	/* grab timestamp info for buffer control estimates */
+	tserv = ktime_sub(ktime_get(), skb->tstamp);
+
 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++) {
 		/* the HW cannot have attempted that rate */
 		if (i >= hw->max_report_rates) {
@@ -212,6 +216,11 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 		if (memcmp(hdr->addr2, sta->sdata->vif.addr, ETH_ALEN))
 			continue;
 
+		atomic_dec(&sta->sdata->enqueued);
+
+		/* factor current tserv into service time estimate */
+		ewma_add(&sta->sdata->tserv_ns_avg, ktime_to_ns(tserv));
+
 		acked = !!(info->flags & IEEE80211_TX_STAT_ACK);
 		if (!acked && test_sta_flags(sta, WLAN_STA_PS_STA)) {
 			/*
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 17ef4f4..8ccf60b 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1298,6 +1298,8 @@ static int __ieee80211_tx(struct ieee80211_local *local,
 
 	while (skb) {
 		int q = skb_get_queue_mapping(skb);
+		int max_enqueued;
+		unsigned long tserv_ns_avg;
 		__le16 fc;
 
 		spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
@@ -1323,6 +1325,20 @@ static int __ieee80211_tx(struct ieee80211_local *local,
 
 		sdata = vif_to_sdata(info->control.vif);
 
+		/* test for queue admission qualifications */
+		tserv_ns_avg = ewma_read(&sdata->tserv_ns_avg);
+		/* constants 2 msec and offset 5 should be tunable? */
+		max_enqueued = 2 * NSEC_PER_MSEC / tserv_ns_avg + 5;
+		if (atomic_read(&sdata->enqueued) > max_enqueued) {
+			/* silently drop */
+			dev_kfree_skb(skb);
+			return IEEE80211_TX_OK;
+		}
+
+		/* timestamp queue entry and increment queue length */
+		skb->tstamp = ktime_get();
+		atomic_inc(&sdata->enqueued);
+
 		switch (sdata->vif.type) {
 		case NL80211_IFTYPE_MONITOR:
 			info->control.vif = NULL;
-- 
1.7.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-17  1:49 [RFC] " John W. Linville
@ 2011-02-18 21:21 ` John W. Linville
  2011-02-19  3:44   ` Nathaniel Smith
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: John W. Linville @ 2011-02-18 21:21 UTC (permalink / raw)
  To: linux-wireless; +Cc: johannes, bloat-devel, Nathaniel J. Smith

This is an implementation of the eBDP algorithm as documented in
Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
et al.

	http://www.hamilton.ie/tianji_li/buffersizing.pdf

This implementation timestamps an skb before handing it to the
hardware driver, then computes the service time when the frame is
freed by the driver.  An exponentially weighted moving average of per
fragment service times is used to restrict queueing delays in hopes
of achieving a target fragment transmission latency.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
---
v1 -> v2:
- execute algorithm separately for each WMM queue
- change ewma scaling parameters
- calculate max queue len only when new latency data is received
- stop queues when occupancy limit is reached rather than dropping
- use skb->destructor for tracking queue occupancy

Johannes' comment about tx status reporting being unreliable (and what
he was really saying) finally sunk-in.  So, this version uses
skb->destructor to track in-flight fragments.  That should handle
fragments that get silently dropped in the driver for whatever reason
without leaking queue capacity.  Correct me if I'm wrong!

This version runs separate instances of the algorithm for each WMM
queue -- I reckon that makes sense.  I still ignore HT aggregation, as
I'm pretty sure I don't understand how it realy effects the issue.

I still haven't done any objective testing, so still no numbers -- feel
free to provide your own! ;-)  But for me, this version "feels" pretty
good where I meet the chair.

There are still bits that I would like to be adjustable.  But since I'm
still not sure what this ultimately becomes, I'm in no hurry to finalize
a UI.  I would be happy to consider a contribution from someone else in
that regard.

No doubt that this is all still quite preliminary!  I'm pretty sure
it still eats puppies.  So, don't get it wet!  And never, ever feed
it after midnight...

 net/mac80211/ieee80211_i.h |   12 ++++++++++++
 net/mac80211/iface.c       |   12 ++++++++++++
 net/mac80211/status.c      |   23 +++++++++++++++++++++++
 net/mac80211/tx.c          |   26 ++++++++++++++++++++++++++
 4 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index f2ef15d..47e65bd 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -604,6 +604,18 @@ struct ieee80211_sub_if_data {
 		struct dentry *default_mgmt_key;
 	} debugfs;
 #endif
+
+	struct {
+		/* number of packets currently in flight */
+		atomic_t enqueued;
+
+		/* computed maximum packets in flight */
+		int max_enqueued;
+
+		/* average packet service time */
+		struct ewma tserv_ns_avg;
+	} qdata[4];
+
 	/* must be last, dynamically sized area in this! */
 	struct ieee80211_vif vif;
 };
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 5a4e19b..3e04ddb 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -839,6 +839,8 @@ static void ieee80211_iface_work(struct work_struct *work)
 static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata,
 				  enum nl80211_iftype type)
 {
+	int i;
+
 	/* clear type-dependent union */
 	memset(&sdata->u, 0, sizeof(sdata->u));
 
@@ -857,6 +859,16 @@ static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata,
 	skb_queue_head_init(&sdata->skb_queue);
 	INIT_WORK(&sdata->work, ieee80211_iface_work);
 
+	for (i=0; i<4; i++) {
+		/* initialize service time estimate for buffer control */
+		ewma_init(&sdata->qdata[i].tserv_ns_avg, 1, 64);
+
+		atomic_set(&sdata->qdata[i].enqueued, 0);
+
+		/* default max should be ??? */
+		sdata->qdata[i].max_enqueued = 2;
+	}
+
 	switch (type) {
 	case NL80211_IFTYPE_P2P_GO:
 		type = NL80211_IFTYPE_AP;
diff --git a/net/mac80211/status.c b/net/mac80211/status.c
index 010a559..8cb6539 100644
--- a/net/mac80211/status.c
+++ b/net/mac80211/status.c
@@ -172,6 +172,7 @@ static void ieee80211_frame_acked(struct sta_info *sta, struct sk_buff *skb)
 
 void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 {
+	ktime_t tserv;
 	struct sk_buff *skb2;
 	struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data;
 	struct ieee80211_local *local = hw_to_local(hw);
@@ -188,6 +189,9 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 	bool send_to_cooked;
 	bool acked;
 
+	/* grab timestamp info for buffer control estimates */
+	tserv = ktime_sub(ktime_get(), skb->tstamp);
+
 	for (i = 0; i < IEEE80211_TX_MAX_RATES; i++) {
 		/* the HW cannot have attempted that rate */
 		if (i >= hw->max_report_rates) {
@@ -208,10 +212,29 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb)
 	fc = hdr->frame_control;
 
 	for_each_sta_info(local, hdr->addr1, sta, tmp) {
+		int q = skb_get_queue_mapping(skb);
+		unsigned long tserv_ns_avg;
+
 		/* skip wrong virtual interface */
 		if (memcmp(hdr->addr2, sta->sdata->vif.addr, ETH_ALEN))
 			continue;
 
+		/* factor current tserv into service time estimate */
+		ewma_add(&sta->sdata->qdata[q].tserv_ns_avg,
+			 ktime_to_ns(tserv));
+
+		/* determine queue admission qualifications */
+		tserv_ns_avg = ewma_read(&sta->sdata->qdata[q].tserv_ns_avg);
+		if (tserv_ns_avg) {
+			/* latency target and min limit should be tunable? */
+			sta->sdata->qdata[q].max_enqueued =
+				max_t(int, 2, 2 * NSEC_PER_MSEC / tserv_ns_avg);
+		}
+		if (atomic_read(&sta->sdata->qdata[q].enqueued) >=
+				sta->sdata->qdata[q].max_enqueued &&
+		    !ieee80211_queue_stopped(&local->hw, q))
+			ieee80211_stop_queue(&local->hw, q);
+
 		acked = !!(info->flags & IEEE80211_TX_STAT_ACK);
 		if (!acked && test_sta_flags(sta, WLAN_STA_PS_STA)) {
 			/*
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 17ef4f4..cba9060 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1284,6 +1284,20 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata,
 	return TX_CONTINUE;
 }
 
+static void ieee80211_tx_skb_free(struct sk_buff *skb)
+{
+	struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev);
+	struct ieee80211_local *local = sdata->local;
+	int q = skb_get_queue_mapping(skb);
+
+	/* decrement enqueued count */
+	atomic_dec(&sdata->qdata[q].enqueued);
+
+	/* if queue stopped, wake it */
+	if (ieee80211_queue_stopped(&local->hw, q))
+		ieee80211_wake_queue(&local->hw, q);
+}
+
 static int __ieee80211_tx(struct ieee80211_local *local,
 			  struct sk_buff **skbp,
 			  struct sta_info *sta,
@@ -1323,6 +1337,18 @@ static int __ieee80211_tx(struct ieee80211_local *local,
 
 		sdata = vif_to_sdata(info->control.vif);
 
+		/* timestamp queue entry and increment queue length */
+		skb->tstamp = ktime_get();
+		atomic_inc(&sdata->qdata[q].enqueued);
+
+		/* take ownership of skb */
+		skb_orphan(skb);
+		skb->dev = sdata->dev;
+		skb->destructor = ieee80211_tx_skb_free;
+		if (atomic_read(&sdata->qdata[q].enqueued) >=
+				sdata->qdata[q].max_enqueued)
+			ieee80211_stop_queue(&local->hw, q);
+
 		switch (sdata->vif.type) {
 		case NL80211_IFTYPE_MONITOR:
 			info->control.vif = NULL;
-- 
1.7.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-18 21:21 ` [RFC v2] " John W. Linville
@ 2011-02-19  3:44   ` Nathaniel Smith
  2011-02-21 18:47     ` John W. Linville
  2011-02-20  0:37   ` Nathaniel Smith
  2011-02-21 15:28   ` Johannes Berg
  2 siblings, 1 reply; 18+ messages in thread
From: Nathaniel Smith @ 2011-02-19  3:44 UTC (permalink / raw)
  To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless

On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
<linville@tuxdriver.com> wrote:
> +       /* grab timestamp info for buffer control estimates */
> +       tserv = ktime_sub(ktime_get(), skb->tstamp);
[...]
> +               ewma_add(&sta->sdata->qdata[q].tserv_ns_avg,
> +                        ktime_to_ns(tserv));

I think you're still measuring how long it takes one packet to get
from the end of the queue to the beginning, rather than measuring how
long it takes each packet to go out?

-- Nathaniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-19  3:44   ` Nathaniel Smith
@ 2011-02-21 18:47     ` John W. Linville
  2011-02-21 23:26       ` Nathaniel Smith
  0 siblings, 1 reply; 18+ messages in thread
From: John W. Linville @ 2011-02-21 18:47 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: bloat-devel, johannes, linux-wireless

On Fri, Feb 18, 2011 at 07:44:30PM -0800, Nathaniel Smith wrote:
> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
> <linville@tuxdriver.com> wrote:
> > +       /* grab timestamp info for buffer control estimates */
> > +       tserv = ktime_sub(ktime_get(), skb->tstamp);
> [...]
> > +               ewma_add(&sta->sdata->qdata[q].tserv_ns_avg,
> > +                        ktime_to_ns(tserv));
> 
> I think you're still measuring how long it takes one packet to get
> from the end of the queue to the beginning, rather than measuring how
> long it takes each packet to go out?

Yes, I am measuring how long the driver+device takes to release each
skb back to me (using that as a proxy for how long it takes to get
the fragment to the next hop).  Actually, FWIW I'm only measuring
that time for those skb's that result in a tx status report.

I tried to see how your measurement would be useful, but I just don't
see how the number of frames ahead of me in the queue is relevant to
the measured link latency?  I mean, I realize that having more packets
ahead of me in the queue is likely to increase the latency for this
frame, but I don't understand why I should use that information to
discount the measured latency...?

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 18:47     ` John W. Linville
@ 2011-02-21 23:26       ` Nathaniel Smith
  2011-02-23 22:28         ` John W. Linville
  0 siblings, 1 reply; 18+ messages in thread
From: Nathaniel Smith @ 2011-02-21 23:26 UTC (permalink / raw)
  To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless

On Mon, Feb 21, 2011 at 10:47 AM, John W. Linville
<linville@tuxdriver.com> wrote:
> On Fri, Feb 18, 2011 at 07:44:30PM -0800, Nathaniel Smith wrote:
>> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
>> <linville@tuxdriver.com> wrote:
>> > +       /* grab timestamp info for buffer control estimates */
>> > +       tserv = ktime_sub(ktime_get(), skb->tstamp);
>> [...]
>> > +               ewma_add(&sta->sdata->qdata[q].tserv_ns_avg,
>> > +                        ktime_to_ns(tserv));
>>
>> I think you're still measuring how long it takes one packet to get
>> from the end of the queue to the beginning, rather than measuring how
>> long it takes each packet to go out?
>
> Yes, I am measuring how long the driver+device takes to release each
> skb back to me (using that as a proxy for how long it takes to get
> the fragment to the next hop).  Actually, FWIW I'm only measuring
> that time for those skb's that result in a tx status report.
>
> I tried to see how your measurement would be useful, but I just don't
> see how the number of frames ahead of me in the queue is relevant to
> the measured link latency?  I mean, I realize that having more packets
> ahead of me in the queue is likely to increase the latency for this
> frame, but I don't understand why I should use that information to
> discount the measured latency...?

It depends on which latency you want to measure. The way that I
reasoned was, suppose that at some given time, the card is able to
transmit 1 fragment every T nanoseconds. Then it can transmit n
fragments in n*T nanoseconds, so if we want the queue depth to be 2
ms, then we have
  n * T = 2 * NSEC_PER_MSEC
  n = 2 * NSEC_PER_MSEC / T

Which is the calculation that you're doing:

+                       sta->sdata->qdata[q].max_enqueued =
+                               max_t(int, 2, 2 * NSEC_PER_MSEC / tserv_ns_avg);

But for this calculation to make sense, we need T to be the time it
takes the card to transmit 1 fragment. In your patch, you're not
measuring that. You're measuring the total time between when a packet
is enqueued and when it is transmitted; if there were K packets in the
queue ahead of it, then this is the time to send *all* of them --
you're measuring (K+1)*T. That's why in my patch, I recorded the
current size of the queue when each packet is enqueued, so I could
compute T = total_time / (K+1).

Under saturation conditions, K+1 will always equal max_enqueued, so I
guess in your algorithm, at the steady state we have

  max_enqueued = K+1 = 2 * NSEC_PER_MSEC / ((K+1) * T)
  (K+1)^2 = 2 * NSEC_PER_MSEC / T
  K+1 = sqrt(2 * NSEC_PER_MSEC / T)

So I think under saturation, you converge to setting the queue to the
square root of the appropriate size?

-- Nathaniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 23:26       ` Nathaniel Smith
@ 2011-02-23 22:28         ` John W. Linville
  2011-02-25 18:21           ` Nathaniel Smith
  0 siblings, 1 reply; 18+ messages in thread
From: John W. Linville @ 2011-02-23 22:28 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: bloat-devel, johannes, linux-wireless

On Mon, Feb 21, 2011 at 03:26:32PM -0800, Nathaniel Smith wrote:
> On Mon, Feb 21, 2011 at 10:47 AM, John W. Linville
> <linville@tuxdriver.com> wrote:

> > I tried to see how your measurement would be useful, but I just don't
> > see how the number of frames ahead of me in the queue is relevant to
> > the measured link latency?  I mean, I realize that having more packets
> > ahead of me in the queue is likely to increase the latency for this
> > frame, but I don't understand why I should use that information to
> > discount the measured latency...?
> 
> It depends on which latency you want to measure. The way that I
> reasoned was, suppose that at some given time, the card is able to
> transmit 1 fragment every T nanoseconds. Then it can transmit n
> fragments in n*T nanoseconds, so if we want the queue depth to be 2
> ms, then we have
>   n * T = 2 * NSEC_PER_MSEC
>   n = 2 * NSEC_PER_MSEC / T
> 
> Which is the calculation that you're doing:
> 
> +                       sta->sdata->qdata[q].max_enqueued =
> +                               max_t(int, 2, 2 * NSEC_PER_MSEC / tserv_ns_avg);
> 
> But for this calculation to make sense, we need T to be the time it
> takes the card to transmit 1 fragment. In your patch, you're not
> measuring that. You're measuring the total time between when a packet
> is enqueued and when it is transmitted; if there were K packets in the
> queue ahead of it, then this is the time to send *all* of them --
> you're measuring (K+1)*T. That's why in my patch, I recorded the
> current size of the queue when each packet is enqueued, so I could
> compute T = total_time / (K+1).

Thanks for the math!  I think I see what you are saying now.  Since the
measured time is being used to determine the queue size, we need to
factor-in the length of the queue that resulted in that measurment.

Unfortunately, I'm not sure how to apply this with the technique I
am using for the timing measurements. :-(  I'll have to think about
this some more...

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-23 22:28         ` John W. Linville
@ 2011-02-25 18:21           ` Nathaniel Smith
  2011-02-25 18:27             ` Nathaniel Smith
  0 siblings, 1 reply; 18+ messages in thread
From: Nathaniel Smith @ 2011-02-25 18:21 UTC (permalink / raw)
  To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless

On Wed, Feb 23, 2011 at 2:28 PM, John W. Linville
<linville@tuxdriver.com> wrote:
> On Mon, Feb 21, 2011 at 03:26:32PM -0800, Nathaniel Smith wrote:
>> But for this calculation to make sense, we need T to be the time it
>> takes the card to transmit 1 fragment. In your patch, you're not
>> measuring that. You're measuring the total time between when a packet
>> is enqueued and when it is transmitted; if there were K packets in the
>> queue ahead of it, then this is the time to send *all* of them --
>> you're measuring (K+1)*T. That's why in my patch, I recorded the
>> current size of the queue when each packet is enqueued, so I could
>> compute T = total_time / (K+1).
>
> Thanks for the math!  I think I see what you are saying now.  Since the
> measured time is being used to determine the queue size, we need to
> factor-in the length of the queue that resulted in that measurment.
>
> Unfortunately, I'm not sure how to apply this with the technique I
> am using for the timing measurements. :-(  I'll have to think about
> this some more...

I guess an ugly but effective approach would be to steal some bits
from the timestamp... something like (untested, and probably needs
some sign-extension bugs fixed):

#define LATENCY_BITS 8
#define ENQUEUED_BITS 24
BUILD_BUG_ON(LATENCY_BITS + ENQUEUED_BITS != 32);

static ktime_t _ktime_get_and_stash(int enqueued) {
        timespec now = ktime_to_timespec(ktime_get());
        now.sec &= (1 << LATENCY_BITS) - 1;
        BUG_ON(enqueued > (1 << ENQUEUED_BITS));
        now.sec |= enqueued << LATENCY_BITS;
        return timespec_to_ktime(now);
}

static u64 _ktime_diff_to_now_and_unstash(ktime_t then, int * enqueued) {

        timespec ts_then = ktime_to_timespec(then);
        timespec ts_now = ktime_to_timespec(ktime_get());
        *enqueued = ts_then.tv_sec >> LATENCY_BITS;
        ts_then.tv_sec &= (1 << LATENCY_BITS) - 1;
        ts_now.tv_sec &= (1 << LATENCY_BITS) - 1;
        if (ts_now.tv_sec < ts_then.tv_sec)
                ts_now.tv_sec += (1 << LATENCY_BITS);
         timespec_sub(ts_now, ts_then);
}

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-25 18:21           ` Nathaniel Smith
@ 2011-02-25 18:27             ` Nathaniel Smith
  0 siblings, 0 replies; 18+ messages in thread
From: Nathaniel Smith @ 2011-02-25 18:27 UTC (permalink / raw)
  To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless

On Fri, Feb 25, 2011 at 10:21 AM, Nathaniel Smith <njs@pobox.com> wrote:
> static u64 _ktime_diff_to_now_and_unstash(ktime_t then, int * enqueued) {
>
>        timespec ts_then = ktime_to_timespec(then);
>        timespec ts_now = ktime_to_timespec(ktime_get());
>        *enqueued = ts_then.tv_sec >> LATENCY_BITS;
>        ts_then.tv_sec &= (1 << LATENCY_BITS) - 1;
>        ts_now.tv_sec &= (1 << LATENCY_BITS) - 1;
>        if (ts_now.tv_sec < ts_then.tv_sec)
>                ts_now.tv_sec += (1 << LATENCY_BITS);
>         timespec_sub(ts_now, ts_then);
> }

Err, plus the 'return timespec_to_ns(...)' on the last line, that I
was trying to add when my computer suddenly decided I wanted to send
the message. How embarrassing.

Anyway, not sure this is a *good* idea, but it should work. Hopefully
we don't actually need to measure latencies > 256 seconds in this
context...

-- Nathaniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-18 21:21 ` [RFC v2] " John W. Linville
  2011-02-19  3:44   ` Nathaniel Smith
@ 2011-02-20  0:37   ` Nathaniel Smith
  2011-02-20  0:51     ` Jim Gettys
  2011-02-21 18:52     ` John W. Linville
  2011-02-21 15:28   ` Johannes Berg
  2 siblings, 2 replies; 18+ messages in thread
From: Nathaniel Smith @ 2011-02-20  0:37 UTC (permalink / raw)
  To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless

Actually, a few more comments just occurred to me...

On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
<linville@tuxdriver.com> wrote:
> Johannes' comment about tx status reporting being unreliable (and what
> he was really saying) finally sunk-in.  So, this version uses
> skb->destructor to track in-flight fragments.  That should handle
> fragments that get silently dropped in the driver for whatever reason
> without leaking queue capacity.  Correct me if I'm wrong!

Should we be somehow filtering out and ignoring the packets that get
dropped, when we're calculating the average packet transmission rate?
Presumably they're getting dropped almost instantly, so they don't
really take up queue space and they have abnormally fast transmission
times, which will tend to cause us to overestimate max_enqueued? They
should be rare, though, at least. (And presumably we *should* include
packets that get dropped because their retry timer ran out, since they
were sitting in the queue for that long.) Possibly we should just
ignore any packet that was handled in less than, oh, say, a few
microseconds?

Alternatively, if we aren't worried about those packets, then is there
any reason this patch should be mac80211 specific?

> +static void ieee80211_tx_skb_free(struct sk_buff *skb)
> +{
> +       struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev);
> +       struct ieee80211_local *local = sdata->local;
> +       int q = skb_get_queue_mapping(skb);
> +
> +       /* decrement enqueued count */
> +       atomic_dec(&sdata->qdata[q].enqueued);
> +
> +       /* if queue stopped, wake it */
> +       if (ieee80211_queue_stopped(&local->hw, q))
> +               ieee80211_wake_queue(&local->hw, q);
> +}

I think you need to check that .enqueued is < max_enqueued here,
instead of waking the queue unconditionally.

Suppose the data rate drops while there's a steady flow -- our
max_enqueued value will drop below the current queue size, but because
we wake the queue unconditionally after each packet goes out, and then
only stop it again after we've queued at least one new packet, we
might get 'stuck' with an over-large queue.

-- Nathaniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-20  0:37   ` Nathaniel Smith
@ 2011-02-20  0:51     ` Jim Gettys
  2011-02-20 15:24       ` Dave Täht
  2011-02-21 18:52     ` John W. Linville
  1 sibling, 1 reply; 18+ messages in thread
From: Jim Gettys @ 2011-02-20  0:51 UTC (permalink / raw)
  To: bloat-devel

On 02/19/2011 07:37 PM, Nathaniel Smith wrote:
> Actually, a few more comments just occurred to me...
>
> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
> <linville@tuxdriver.com>  wrote:
>> Johannes' comment about tx status reporting being unreliable (and what
>> he was really saying) finally sunk-in.  So, this version uses
>> skb->destructor to track in-flight fragments.  That should handle
>> fragments that get silently dropped in the driver for whatever reason
>> without leaking queue capacity.  Correct me if I'm wrong!
>
> Should we be somehow filtering out and ignoring the packets that get
> dropped, when we're calculating the average packet transmission rate?
> Presumably they're getting dropped almost instantly, so they don't
> really take up queue space and they have abnormally fast transmission
> times, which will tend to cause us to overestimate max_enqueued? They
> should be rare, though, at least. (And presumably we *should* include
> packets that get dropped because their retry timer ran out, since they
> were sitting in the queue for that long.) Possibly we should just
> ignore any packet that was handled in less than, oh, say, a few
> microseconds?

I will note that AQM algorithms that are likely to work will need to 
know the actual "goodput" of the link. (This from talking to Van 
Jacobson about the problem we face.)

			- Jim



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-20  0:51     ` Jim Gettys
@ 2011-02-20 15:24       ` Dave Täht
  0 siblings, 0 replies; 18+ messages in thread
From: Dave Täht @ 2011-02-20 15:24 UTC (permalink / raw)
  To: Jim Gettys; +Cc: bloat-devel

Jim Gettys <jg@freedesktop.org> writes:

> On 02/19/2011 07:37 PM, Nathaniel Smith wrote:
>> Actually, a few more comments just occurred to me...
>>
>> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
>> <linville@tuxdriver.com>  wrote:
>>> Johannes' comment about tx status reporting being unreliable (and what
>>> he was really saying) finally sunk-in.  So, this version uses
>>> skb->destructor to track in-flight fragments.  That should handle
>>> fragments that get silently dropped in the driver for whatever reason
>>> without leaking queue capacity.  Correct me if I'm wrong!
>>
>> Should we be somehow filtering out and ignoring the packets that get
>> dropped, when we're calculating the average packet transmission rate?
>> Presumably they're getting dropped almost instantly, so they don't
>> really take up queue space and they have abnormally fast transmission
>> times, which will tend to cause us to overestimate max_enqueued? They
>> should be rare, though, at least. (And presumably we *should* include
>> packets that get dropped because their retry timer ran out, since they
>> were sitting in the queue for that long.) Possibly we should just
>> ignore any packet that was handled in less than, oh, say, a few
>> microseconds?

The perfect is the enemy of the good. POGE[1] applies. After getting 2
orders of magnitude improvement with the patches thus far...

My reaction to the above is that I'd like to get what is being
implemented tested by more folk, across more drivers. Me being one of
them! I hope to get builds done today for several openwrt devices and my
laggy-lagn laptop. 

> I will note that AQM algorithms that are likely to work will need to
> know the actual "goodput" of the link. (This from talking to Van
> Jacobson about the problem we face.)

I care about AQM, I really do[2]... Certainly exposing more hooks and data
to the next layer in the stack from the driver would be good... But I'd
like mostly simply to get:

A) People using and loving debloated drivers, & more people aware of and
fixing the problem...

B) The dynamic range available via ethtool to userspace to be vastly
increased

C) Buffer sizes defaulting to much less, across the board.

These are the simplest, most effective, low hanging fruit/fixes. [1]

We're making great progress on wireless, how do we make progress on wired?

-- 
Dave Taht
http://nex-6.taht.net

1: http://en.wikipedia.org/wiki/Principle_of_good_enough
2: I'll patch SFB into my build and any other of the new AQMs
   anyone can suggest

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-20  0:37   ` Nathaniel Smith
  2011-02-20  0:51     ` Jim Gettys
@ 2011-02-21 18:52     ` John W. Linville
  1 sibling, 0 replies; 18+ messages in thread
From: John W. Linville @ 2011-02-21 18:52 UTC (permalink / raw)
  To: Nathaniel Smith; +Cc: bloat-devel, johannes, linux-wireless

On Sat, Feb 19, 2011 at 04:37:00PM -0800, Nathaniel Smith wrote:
> Actually, a few more comments just occurred to me...
> 
> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville
> <linville@tuxdriver.com> wrote:
> > Johannes' comment about tx status reporting being unreliable (and what
> > he was really saying) finally sunk-in.  So, this version uses
> > skb->destructor to track in-flight fragments.  That should handle
> > fragments that get silently dropped in the driver for whatever reason
> > without leaking queue capacity.  Correct me if I'm wrong!
> 
> Should we be somehow filtering out and ignoring the packets that get
> dropped, when we're calculating the average packet transmission rate?
> Presumably they're getting dropped almost instantly, so they don't
> really take up queue space and they have abnormally fast transmission
> times, which will tend to cause us to overestimate max_enqueued? They
> should be rare, though, at least. (And presumably we *should* include
> packets that get dropped because their retry timer ran out, since they
> were sitting in the queue for that long.) Possibly we should just
> ignore any packet that was handled in less than, oh, say, a few
> microseconds?

If you look, I only do the timing measurement for frames that
result in a tx status report.  Frames that are dropped will only hit
ieee80211_tx_skb_free (which reduces the enqueued count but doesn't
recalculate max_enqueue).

> Alternatively, if we aren't worried about those packets, then is there
> any reason this patch should be mac80211 specific?

No, in fact I was thinking the same thing.  Some thought needs to be
put to whether or not we can ignore the effects of letting dropped
packets effect the latency estimate...
 
> > +static void ieee80211_tx_skb_free(struct sk_buff *skb)
> > +{
> > +       struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev);
> > +       struct ieee80211_local *local = sdata->local;
> > +       int q = skb_get_queue_mapping(skb);
> > +
> > +       /* decrement enqueued count */
> > +       atomic_dec(&sdata->qdata[q].enqueued);
> > +
> > +       /* if queue stopped, wake it */
> > +       if (ieee80211_queue_stopped(&local->hw, q))
> > +               ieee80211_wake_queue(&local->hw, q);
> > +}
> 
> I think you need to check that .enqueued is < max_enqueued here,
> instead of waking the queue unconditionally.
> 
> Suppose the data rate drops while there's a steady flow -- our
> max_enqueued value will drop below the current queue size, but because
> we wake the queue unconditionally after each packet goes out, and then
> only stop it again after we've queued at least one new packet, we
> might get 'stuck' with an over-large queue.

Yes, thanks for pointing that out.  My non-thorough tests seem to do
a better job at holding the latency lower with that change.

Thanks for taking a look!

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-18 21:21 ` [RFC v2] " John W. Linville
  2011-02-19  3:44   ` Nathaniel Smith
  2011-02-20  0:37   ` Nathaniel Smith
@ 2011-02-21 15:28   ` Johannes Berg
  2011-02-21 16:12     ` Jim Gettys
  2011-02-21 19:06     ` John W. Linville
  2 siblings, 2 replies; 18+ messages in thread
From: Johannes Berg @ 2011-02-21 15:28 UTC (permalink / raw)
  To: John W. Linville; +Cc: bloat-devel, linux-wireless, Nathaniel J. Smith

On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote:
> This is an implementation of the eBDP algorithm as documented in
> Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
> et al.
> 
> 	http://www.hamilton.ie/tianji_li/buffersizing.pdf
> 
> This implementation timestamps an skb before handing it to the
> hardware driver, then computes the service time when the frame is
> freed by the driver.  An exponentially weighted moving average of per
> fragment service times is used to restrict queueing delays in hopes
> of achieving a target fragment transmission latency.
> 
> Signed-off-by: John W. Linville <linville@tuxdriver.com>
> ---
> v1 -> v2:
> - execute algorithm separately for each WMM queue
> - change ewma scaling parameters
> - calculate max queue len only when new latency data is received
> - stop queues when occupancy limit is reached rather than dropping
> - use skb->destructor for tracking queue occupancy
> 
> Johannes' comment about tx status reporting being unreliable (and what
> he was really saying) finally sunk-in.  So, this version uses
> skb->destructor to track in-flight fragments.  That should handle
> fragments that get silently dropped in the driver for whatever reason
> without leaking queue capacity.  Correct me if I'm wrong!

Yeah, I had that idea as well. Could unify the existing skb_orphan()
call though :-)

However, Nathaniel is right -- if the skb is freed right away during
tx() you kinda estimate its queue time to be virtually zero. That
doesn't make a lot of sense and might in certain conditions exacerbate
the problem, for example if the system is out of memory more packets
might be allowed through than in normal operation etc.

Also, for some USB drivers I believe SKB lifetime has no relation to
queue size at all because the data is just shuffled into an URB. I'm not
sure we can solve this generically. I'm not really sure how this works
for USB drivers, I think they queue up frames with the HCI controller
rather than directly with the device.

Finally, this isn't taking into account any of the issues about
aggregation and AP mode. Remember that both with multiple streams (on
different ACs) and even more so going to different stations
(AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and
let's not forget 11ac/ad) there may be vast differences in the time
different frames spend on a queue which are not just due to bloated
queues. I'm concerned about this since none of it has been taken into
account in the paper you're basing this on, all evaluations seem to be
pretty much based on a single traffic stream.

Overall, I think there should be some more research first. This might
help in some cases, but do we know it won't completely break throughput
in other cases?

johannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 15:28   ` Johannes Berg
@ 2011-02-21 16:12     ` Jim Gettys
  2011-02-21 19:15       ` John W. Linville
  2011-02-21 19:06     ` John W. Linville
  1 sibling, 1 reply; 18+ messages in thread
From: Jim Gettys @ 2011-02-21 16:12 UTC (permalink / raw)
  To: bloat-devel; +Cc: Dan Williams, Javier Cardona, Dave Woodhouse

On 02/21/2011 10:28 AM, Johannes Berg wrote:
> On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote:
>> This is an implementation of the eBDP algorithm as documented in
>> Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
>> et al.
>>
>> 	http://www.hamilton.ie/tianji_li/buffersizing.pdf
>>
>> This implementation timestamps an skb before handing it to the
>> hardware driver, then computes the service time when the frame is
>> freed by the driver.  An exponentially weighted moving average of per
>> fragment service times is used to restrict queueing delays in hopes
>> of achieving a target fragment transmission latency.
>>
>> Signed-off-by: John W. Linville<linville@tuxdriver.com>
>> ---
>> v1 ->  v2:
>> - execute algorithm separately for each WMM queue
>> - change ewma scaling parameters
>> - calculate max queue len only when new latency data is received
>> - stop queues when occupancy limit is reached rather than dropping
>> - use skb->destructor for tracking queue occupancy
>>
>> Johannes' comment about tx status reporting being unreliable (and what
>> he was really saying) finally sunk-in.  So, this version uses
>> skb->destructor to track in-flight fragments.  That should handle
>> fragments that get silently dropped in the driver for whatever reason
>> without leaking queue capacity.  Correct me if I'm wrong!
>
> Yeah, I had that idea as well. Could unify the existing skb_orphan()
> call though :-)
>
> However, Nathaniel is right -- if the skb is freed right away during
> tx() you kinda estimate its queue time to be virtually zero. That
> doesn't make a lot of sense and might in certain conditions exacerbate
> the problem, for example if the system is out of memory more packets
> might be allowed through than in normal operation etc.
>
> Also, for some USB drivers I believe SKB lifetime has no relation to
> queue size at all because the data is just shuffled into an URB. I'm not
> sure we can solve this generically. I'm not really sure how this works
> for USB drivers, I think they queue up frames with the HCI controller
> rather than directly with the device.

Let me give a concrete example:

I checked with Javier Cardona about the Marvell module (libertas driver) 
used on OLPC a couple months ago.

It turns out there are 4 packets of buffering out in the wireless module 
itself (clearly needed for autonomous forwarding).

There is also one packet buffer in the device driver itself; Dave 
Woodhouse says it simplified the driver greatly.

I don't know if anyone has been thinking about how to manage the 
buffering from top to bottom, with devices that may do internal 
buffering in various places.

>
> Finally, this isn't taking into account any of the issues about
> aggregation and AP mode. Remember that both with multiple streams (on
> different ACs) and even more so going to different stations
> (AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and
> let's not forget 11ac/ad) there may be vast differences in the time
> different frames spend on a queue which are not just due to bloated
> queues. I'm concerned about this since none of it has been taken into
> account in the paper you're basing this on, all evaluations seem to be
> pretty much based on a single traffic stream.
>
> Overall, I think there should be some more research first. This might
> help in some cases, but do we know it won't completely break throughput
> in other cases?
>
			- Jim

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 16:12     ` Jim Gettys
@ 2011-02-21 19:15       ` John W. Linville
  0 siblings, 0 replies; 18+ messages in thread
From: John W. Linville @ 2011-02-21 19:15 UTC (permalink / raw)
  To: Jim Gettys; +Cc: Dan Williams, Javier Cardona, bloat-devel, Dave Woodhouse

On Mon, Feb 21, 2011 at 11:12:14AM -0500, Jim Gettys wrote:
> On 02/21/2011 10:28 AM, Johannes Berg wrote:

> >However, Nathaniel is right -- if the skb is freed right away during
> >tx() you kinda estimate its queue time to be virtually zero. That
> >doesn't make a lot of sense and might in certain conditions exacerbate
> >the problem, for example if the system is out of memory more packets
> >might be allowed through than in normal operation etc.

For those only reading the bloat lists, I replied elsewhere to clarify
that the present patch is only calculating max_enqueued for frames
that result in a tx status report (i.e. not for dropped frames) --
just in case that detail is somehow relevant...

> >Also, for some USB drivers I believe SKB lifetime has no relation to
> >queue size at all because the data is just shuffled into an URB. I'm not
> >sure we can solve this generically. I'm not really sure how this works
> >for USB drivers, I think they queue up frames with the HCI controller
> >rather than directly with the device.
> 
> Let me give a concrete example:
> 
> I checked with Javier Cardona about the Marvell module (libertas
> driver) used on OLPC a couple months ago.
> 
> It turns out there are 4 packets of buffering out in the wireless
> module itself (clearly needed for autonomous forwarding).
> 
> There is also one packet buffer in the device driver itself; Dave
> Woodhouse says it simplified the driver greatly.
> 
> I don't know if anyone has been thinking about how to manage the
> buffering from top to bottom, with devices that may do internal
> buffering in various places.

(FWIW, my current patch won't affect the libertas driver...)

The role I see my patch playing is to evaluate how good a job the
driver/device is doing with it's own buffers.  If it is keeping its
latency low, then I will allow it more fragments to buffer.  If its
latency grows too large, I throttle the number of fragments I allow
the driver to see.  To a large degree, it doesn't matter how much
buffering the driver/device is doing so long as it is moving the
frames along quickly.

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 15:28   ` Johannes Berg
  2011-02-21 16:12     ` Jim Gettys
@ 2011-02-21 19:06     ` John W. Linville
  2011-02-21 20:26       ` Tianji Li
  2011-02-28 13:07       ` Johannes Berg
  1 sibling, 2 replies; 18+ messages in thread
From: John W. Linville @ 2011-02-21 19:06 UTC (permalink / raw)
  To: Johannes Berg; +Cc: bloat-devel, linux-wireless, Nathaniel J. Smith

On Mon, Feb 21, 2011 at 04:28:06PM +0100, Johannes Berg wrote:
> On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote:
> > This is an implementation of the eBDP algorithm as documented in
> > Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
> > et al.
> > 
> > 	http://www.hamilton.ie/tianji_li/buffersizing.pdf
> > 
> > This implementation timestamps an skb before handing it to the
> > hardware driver, then computes the service time when the frame is
> > freed by the driver.  An exponentially weighted moving average of per
> > fragment service times is used to restrict queueing delays in hopes
> > of achieving a target fragment transmission latency.
> > 
> > Signed-off-by: John W. Linville <linville@tuxdriver.com>
> > ---
> > v1 -> v2:
> > - execute algorithm separately for each WMM queue
> > - change ewma scaling parameters
> > - calculate max queue len only when new latency data is received
> > - stop queues when occupancy limit is reached rather than dropping
> > - use skb->destructor for tracking queue occupancy
> > 
> > Johannes' comment about tx status reporting being unreliable (and what
> > he was really saying) finally sunk-in.  So, this version uses
> > skb->destructor to track in-flight fragments.  That should handle
> > fragments that get silently dropped in the driver for whatever reason
> > without leaking queue capacity.  Correct me if I'm wrong!
> 
> Yeah, I had that idea as well. Could unify the existing skb_orphan()
> call though :-)

The one in ieee80211_skb_resize?  Any idea how that would look?
 
> However, Nathaniel is right -- if the skb is freed right away during
> tx() you kinda estimate its queue time to be virtually zero. That
> doesn't make a lot of sense and might in certain conditions exacerbate
> the problem, for example if the system is out of memory more packets
> might be allowed through than in normal operation etc.

As in my reply to Nathaniel, please notice that the timing estimate
(and the max_enqueued calculation) only happens for frames that result
in a tx status report -- at least for now...

However, if this were generalized beyond mac80211 then we wouldn't
be able to rely on tx status reports.  I can see that dropping frames
in the driver would lead to timing estimates that would cascade into
a wide-open queue size.  But I'm not sure that would be a big deal,
since in the long run those dropped frames should still result in IP
cwnd reductions, etc...?

> Also, for some USB drivers I believe SKB lifetime has no relation to
> queue size at all because the data is just shuffled into an URB. I'm not
> sure we can solve this generically. I'm not really sure how this works
> for USB drivers, I think they queue up frames with the HCI controller
> rather than directly with the device.

How do you think the time spent handling URBs in the USB stack relates
to the time spent transmitting frames?  At what point do those SKBs
get freed?

> Finally, this isn't taking into account any of the issues about
> aggregation and AP mode. Remember that both with multiple streams (on
> different ACs) and even more so going to different stations
> (AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and
> let's not forget 11ac/ad) there may be vast differences in the time
> different frames spend on a queue which are not just due to bloated
> queues. I'm concerned about this since none of it has been taken into
> account in the paper you're basing this on, all evaluations seem to be
> pretty much based on a single traffic stream.

Yeah, I'm still not sure we all have our heads around these issues.
I mean, on the one hand it seems wrong to limit queueing for one
stream or station just because some other stream or station is
higher latency.  But on the other hand, it seems to me that those
streams/stations still have to share the same link and that higher
real latency for one stream/station could still result in a higher
perceived latency for another stream/station sharing the same link,
since they still have to share the same air...no?

> Overall, I think there should be some more research first. This might
> help in some cases, but do we know it won't completely break throughput
> in other cases?

That's why it is posted RFC, of course. :-)

John
-- 
John W. Linville		Someday the world will need a hero, and you
linville@tuxdriver.com			might be all we have.  Be ready.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 19:06     ` John W. Linville
@ 2011-02-21 20:26       ` Tianji Li
  2011-02-28 13:07       ` Johannes Berg
  1 sibling, 0 replies; 18+ messages in thread
From: Tianji Li @ 2011-02-21 20:26 UTC (permalink / raw)
  To: John W. Linville, Johannes Berg, Nathaniel J. Smith
  Cc: linux-wireless, bloat-devel


On 02/21/2011 01:06 PM, John W. Linville wrote:
> On Mon, Feb 21, 2011 at 04:28:06PM +0100, Johannes Berg wrote:
>> On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote:
>>> This is an implementation of the eBDP algorithm as documented in
>>> Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li,
>>> et al.
>>>
>>> 	http://www.hamilton.ie/tianji_li/buffersizing.pdf
>>>
>>> This implementation timestamps an skb before handing it to the
>>> hardware driver, then computes the service time when the frame is
>>> freed by the driver.  An exponentially weighted moving average of per
>>> fragment service times is used to restrict queueing delays in hopes
>>> of achieving a target fragment transmission latency.
>>>
>>> Signed-off-by: John W. Linville<linville@tuxdriver.com>
>>> ---
>>> v1 ->  v2:
>>> - execute algorithm separately for each WMM queue
>>> - change ewma scaling parameters
>>> - calculate max queue len only when new latency data is received
>>> - stop queues when occupancy limit is reached rather than dropping
>>> - use skb->destructor for tracking queue occupancy
>>>
>>> Johannes' comment about tx status reporting being unreliable (and what
>>> he was really saying) finally sunk-in.  So, this version uses
>>> skb->destructor to track in-flight fragments.  That should handle
>>> fragments that get silently dropped in the driver for whatever reason
>>> without leaking queue capacity.  Correct me if I'm wrong!
>>
>> Yeah, I had that idea as well. Could unify the existing skb_orphan()
>> call though :-)
>
> The one in ieee80211_skb_resize?  Any idea how that would look?
>
>> However, Nathaniel is right -- if the skb is freed right away during
>> tx() you kinda estimate its queue time to be virtually zero. That
>> doesn't make a lot of sense and might in certain conditions exacerbate
>> the problem, for example if the system is out of memory more packets
>> might be allowed through than in normal operation etc.
>
> As in my reply to Nathaniel, please notice that the timing estimate
> (and the max_enqueued calculation) only happens for frames that result
> in a tx status report -- at least for now...
>
> However, if this were generalized beyond mac80211 then we wouldn't
> be able to rely on tx status reports.  I can see that dropping frames
> in the driver would lead to timing estimates that would cascade into
> a wide-open queue size.  But I'm not sure that would be a big deal,
> since in the long run those dropped frames should still result in IP
> cwnd reductions, etc...?
>
>> Also, for some USB drivers I believe SKB lifetime has no relation to
>> queue size at all because the data is just shuffled into an URB. I'm not
>> sure we can solve this generically. I'm not really sure how this works
>> for USB drivers, I think they queue up frames with the HCI controller
>> rather than directly with the device.
>
> How do you think the time spent handling URBs in the USB stack relates
> to the time spent transmitting frames?  At what point do those SKBs
> get freed?
>
>> Finally, this isn't taking into account any of the issues about
>> aggregation and AP mode. Remember that both with multiple streams (on
>> different ACs) and even more so going to different stations
>> (AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and
>> let's not forget 11ac/ad) there may be vast differences in the time
>> different frames spend on a queue which are not just due to bloated
>> queues. I'm concerned about this since none of it has been taken into
>> account in the paper you're basing this on, all evaluations seem to be
>> pretty much based on a single traffic stream.
>
> Yeah, I'm still not sure we all have our heads around these issues.
> I mean, on the one hand it seems wrong to limit queueing for one
> stream or station just because some other stream or station is
> higher latency.  But on the other hand, it seems to me that those
> streams/stations still have to share the same link and that higher
> real latency for one stream/station could still result in a higher
> perceived latency for another stream/station sharing the same link,
> since they still have to share the same air...no?

This is a good point.

A buffer builds up when there are long-lived TCP flows. They can block 
the buffer since they are elastic in the sense that they send more 
packets when previous are acknowledged, which means that if the flow is 
long, lots of packets will arrive at the buffer almost at the same time. 
If the buffer is large, the waiting to be serviced can be long. This is 
fine for long-lived flows since when we are download a large file, we do 
not quite care if we are done in 2 minutes or 3. However, if there are a 
couple of email checks, no one can tolerate a 'fresh' click takes 3-2=1 
minute.

To mitigate, we shorten the buffer sizes (by dropping) so that the 
waiting can be shorter. Since the long-lived flows are dominating, 
dropping happens much more likely on them, some packets from short-lived 
can also be dropped too if they are not lucky. Still due to the elastic 
(it is both bad and good :-)) nature of TCP, the dropping on long-lived 
flows makes them backoff, which gives time to short ones.

If a buffer is not used by elastic traffic, there is no need to do 
buffering. (Note that UDP can be elastic as well. The application layer 
of UDP normally has some logic to backoff if waiting is too long)

In 802.11 standard, there is only one queue. While in 802.11e/n, there 
are four, and by default only one of which is used for TCP (but this can 
be changed). There are some other queues in the drive for control 
purposes, but they do not count.

In our paper, we were doing buffersizing on the 802.11e/n TCP queue 
only. For 802.11, we need a few buffers on top of the 802.11 standard 
one to mimic those of 802.11e, and use sizing on the TCP buffer only.

The scheduling of which queues should be active at the MAC layer is 
another issue, which can not be solved with the sizing logic.

AQM may not may not be a better issue, but the issue is that it is not 
enabled even if so well known.

My 2 cents,
Tianji

>
>> Overall, I think there should be some more research first. This might
>> help in some cases, but do we know it won't completely break throughput
>> in other cases?
>
> That's why it is posted RFC, of course. :-)
>
> John

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat
  2011-02-21 19:06     ` John W. Linville
  2011-02-21 20:26       ` Tianji Li
@ 2011-02-28 13:07       ` Johannes Berg
  1 sibling, 0 replies; 18+ messages in thread
From: Johannes Berg @ 2011-02-28 13:07 UTC (permalink / raw)
  To: John W. Linville; +Cc: bloat-devel, linux-wireless, Nathaniel J. Smith

On Mon, 2011-02-21 at 14:06 -0500, John W. Linville wrote:

> > Yeah, I had that idea as well. Could unify the existing skb_orphan()
> > call though :-)
> 
> The one in ieee80211_skb_resize?  Any idea how that would look?

Yeah. I think it'd have to be moved out of _skb_resize and made
unconditional in that path, since eventually with this patch you'd do it
anyway.

> As in my reply to Nathaniel, please notice that the timing estimate
> (and the max_enqueued calculation) only happens for frames that result
> in a tx status report -- at least for now...

Oops, right.

> However, if this were generalized beyond mac80211 then we wouldn't
> be able to rely on tx status reports.  I can see that dropping frames
> in the driver would lead to timing estimates that would cascade into
> a wide-open queue size.  But I'm not sure that would be a big deal,
> since in the long run those dropped frames should still result in IP
> cwnd reductions, etc...?

I don't think we can generically rely on skb_orphan() in the network
stack since that will make socket buffer limits meaningless. In fact, it
pains me a bit that we had to do this in wireless before buffering the
skb, and doing it unconditionally may be worse?

> How do you think the time spent handling URBs in the USB stack relates
> to the time spent transmitting frames?  At what point do those SKBs
> get freed?

I honestly don't know. I would hope they are only freed when the URB was
processed (i.e. at least DMA'd to the target device) but I suppose a
driver might also copy the TX frame completely.

> Yeah, I'm still not sure we all have our heads around these issues.
> I mean, on the one hand it seems wrong to limit queueing for one
> stream or station just because some other stream or station is
> higher latency.  But on the other hand, it seems to me that those
> streams/stations still have to share the same link and that higher
> real latency for one stream/station could still result in a higher
> perceived latency for another stream/station sharing the same link,
> since they still have to share the same air...no?

Yeah, but retries (robustness) and aggregation (throughput) will
invariably affect latency for everybody else using the shared medium. I
suppose it would be better if queueing would be limited to a certain
amount of air time use *per peer station*, so that each connection can
have fairly low latency. However, this seems much harder to do. But what
could happen here is that bursty traffic to a far-away (slow) station
severely affects latency for and because there's also high traffic to a
closer station that caused a buffering increase.

johannes

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-02-28 13:08 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <x1-oTZGm1A7eclvABnv1aK0z1Nc7iI@gwene.org>
2011-02-20  1:59 ` [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat Dave Täht
2011-02-17  1:49 [RFC] " John W. Linville
2011-02-18 21:21 ` [RFC v2] " John W. Linville
2011-02-19  3:44   ` Nathaniel Smith
2011-02-21 18:47     ` John W. Linville
2011-02-21 23:26       ` Nathaniel Smith
2011-02-23 22:28         ` John W. Linville
2011-02-25 18:21           ` Nathaniel Smith
2011-02-25 18:27             ` Nathaniel Smith
2011-02-20  0:37   ` Nathaniel Smith
2011-02-20  0:51     ` Jim Gettys
2011-02-20 15:24       ` Dave Täht
2011-02-21 18:52     ` John W. Linville
2011-02-21 15:28   ` Johannes Berg
2011-02-21 16:12     ` Jim Gettys
2011-02-21 19:15       ` John W. Linville
2011-02-21 19:06     ` John W. Linville
2011-02-21 20:26       ` Tianji Li
2011-02-28 13:07       ` Johannes Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox