* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat [not found] <x1-oTZGm1A7eclvABnv1aK0z1Nc7iI@gwene.org> @ 2011-02-20 1:59 ` Dave Täht 0 siblings, 0 replies; 18+ messages in thread From: Dave Täht @ 2011-02-20 1:59 UTC (permalink / raw) To: bloat-devel >Should we be somehow filtering out and ignoring the packets that get >dropped, when we're calculating the average packet transmission rate? >Presumably they're getting dropped almost instantly, so they don't >really take up queue space and they have abnormally fast transmission >times, which will tend to cause us to overestimate max_enqueued? They >should be rare, though, at least. (And presumably we *should* include >packets that get dropped because their retry timer ran out, since they >were sitting in the queue for that long.) Possibly we should just >ignore any packet that was handled in less than, oh, say, a few >microseconds? Perhaps doing continuous mean variance against time would be simpler than what you propose. Sorry, LISP example, with latex: unknown <post@gwene.org> writes: > A friend showed me this about 15 years ago. I use it every time I need to > calculate the variance of some data set. I always forget the exact details and > have to derive it again. But, it’s easy enough to derive that it’s never a > problem. > > I had to derive it again on Friday and thought, `I should make sure more people > get this tool into their utility belts'. > > First, a quick refresher on what we’re talking about here. The mean \mu of a > data set { x_1, x_2, \ldots, x_n } is defined to be \frac{1}{n} \sum_{i=1}^n > x_i. The variance \sigma^2 is defined to be \frac{1}{n} \sum_{i=1}^n (x_i - \ > mu)^2. > > A naïve approach to calculating the variance then goes something like this: > > (defun mean-variance (data) > (flet ((square (x) (* x x))) > (let* ((n (length data)) > (sum (reduce #'+ data :initial-value 0)) > (mu (/ sum n)) > (vv (reduce #'(lambda (accum xi) > (+ accum (square (- xi mu)))) > data :initial-value 0))) > (values mu (/ vv n))))) > > This code runs through the data list once to count the items, once to calculate > the mean, and once to calculate the variance. It is easy to see how we could > count the items at the same time we are summing them. It is not as obvious how > we can calculate the sum of squared terms involving the mean until we’ve > calculated the mean. > > If we expand the squared term and pull the constant \mu outside of the > summations it ends up in, we find that: > \frac{\sum (x_i - \mu)^2}{n} = \frac{\sum x_i^2}{n} - 2 \mu \frac{\sum x_i}{n} > + \mu^2 \frac{\sum 1}{n} > > When we recognize that \frac{\sum x_i}{n} = \mu and \sum_{i=1}^n 1 = n, we get: > \sigma^2 = \frac{\sum x_i^2}{n} - \mu^2 = \frac{\sum x_i^2}{n} - \left( \frac{\ > sum x_i}{n} \right)^2 > . > > This leads to the following code: > > (defun mean-variance (data) > (flet ((square (x) (* x x))) > (destructuring-bind (n xs x2s) > (reduce #'(lambda (accum xi) > (list (1+ (first accum)) > (+ (second accum) xi) > (+ (third accum) (square xi)))) > data :initial-value '(0 0 0)) > (let ((mu (/ xs n))) > (values mu (- (/ x2s n) (square mu))))))) > > The code is not as simple, but you gain a great deal of flexibility. You can > easily convert the above concept to continuously track the mean and variance as > you iterate through an input stream. You do not have to keep data around to > iterate through later. You can deal with things one sample at a time. > > The same concept extends to higher-order moments, too. > > Happy counting. > > Link -- Dave Taht http://nex-6.taht.net ^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC] mac80211: implement eBDP algorithm to fight bufferbloat @ 2011-02-17 1:49 John W. Linville 2011-02-18 21:21 ` [RFC v2] " John W. Linville 0 siblings, 1 reply; 18+ messages in thread From: John W. Linville @ 2011-02-17 1:49 UTC (permalink / raw) To: linux-wireless; +Cc: johannes, bloat-devel, Nathaniel J. Smith This is an implementation of the eBDP algorithm as documented in Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li, et al. http://www.hamilton.ie/tianji_li/buffersizing.pdf This implementation timestamps an skb before handing it to the hardware driver, then computes the service time when the transmit status is reported back from the driver. An exponentially weighted moving average of per packet service times is used to restrict queueing delays in hopes of achieving a target packet transmission latency. Signed-off-by: John W. Linville <linville@tuxdriver.com> --- This is preliminary, but it seems to put some limits on latencies for me. I haven't even really done much testing, so YMMV... I'm sure this isn't ideal. This should be combined with the ALT algorithm to yield the A* algorithm. There are parameters that probably should be tunable (or at least better researched). This may not be ideal for 802.11n -- it may even be detrimental to it. Still, it is an attempt at addressing buffer bloat. Plus, it should pertain to all mac80211-based drivers. So, feel free to test it, suggest different parameters, report real numbers, post patches, etc... :-) net/mac80211/ieee80211_i.h | 7 +++++++ net/mac80211/iface.c | 6 ++++++ net/mac80211/status.c | 9 +++++++++ net/mac80211/tx.c | 16 ++++++++++++++++ 4 files changed, 38 insertions(+), 0 deletions(-) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index f2ef15d..1c69b29 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -604,6 +604,13 @@ struct ieee80211_sub_if_data { struct dentry *default_mgmt_key; } debugfs; #endif + + /* number of packets currently in flight */ + atomic_t enqueued; + + /* average packet service time */ + struct ewma tserv_ns_avg; + /* must be last, dynamically sized area in this! */ struct ieee80211_vif vif; }; diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 5a4e19b..ccfb659 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -857,6 +857,12 @@ static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata, skb_queue_head_init(&sdata->skb_queue); INIT_WORK(&sdata->work, ieee80211_iface_work); + /* initialize service time estimate for buffer control */ + ewma_init(&sdata->tserv_ns_avg, 1, 8); + ewma_add(&sdata->tserv_ns_avg, 2 * NSEC_PER_MSEC); + + atomic_set(&sdata->enqueued, 0); + switch (type) { case NL80211_IFTYPE_P2P_GO: type = NL80211_IFTYPE_AP; diff --git a/net/mac80211/status.c b/net/mac80211/status.c index 010a559..9e5eed3 100644 --- a/net/mac80211/status.c +++ b/net/mac80211/status.c @@ -172,6 +172,7 @@ static void ieee80211_frame_acked(struct sta_info *sta, struct sk_buff *skb) void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb) { + ktime_t tserv; struct sk_buff *skb2; struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data; struct ieee80211_local *local = hw_to_local(hw); @@ -188,6 +189,9 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb) bool send_to_cooked; bool acked; + /* grab timestamp info for buffer control estimates */ + tserv = ktime_sub(ktime_get(), skb->tstamp); + for (i = 0; i < IEEE80211_TX_MAX_RATES; i++) { /* the HW cannot have attempted that rate */ if (i >= hw->max_report_rates) { @@ -212,6 +216,11 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb) if (memcmp(hdr->addr2, sta->sdata->vif.addr, ETH_ALEN)) continue; + atomic_dec(&sta->sdata->enqueued); + + /* factor current tserv into service time estimate */ + ewma_add(&sta->sdata->tserv_ns_avg, ktime_to_ns(tserv)); + acked = !!(info->flags & IEEE80211_TX_STAT_ACK); if (!acked && test_sta_flags(sta, WLAN_STA_PS_STA)) { /* diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 17ef4f4..8ccf60b 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -1298,6 +1298,8 @@ static int __ieee80211_tx(struct ieee80211_local *local, while (skb) { int q = skb_get_queue_mapping(skb); + int max_enqueued; + unsigned long tserv_ns_avg; __le16 fc; spin_lock_irqsave(&local->queue_stop_reason_lock, flags); @@ -1323,6 +1325,20 @@ static int __ieee80211_tx(struct ieee80211_local *local, sdata = vif_to_sdata(info->control.vif); + /* test for queue admission qualifications */ + tserv_ns_avg = ewma_read(&sdata->tserv_ns_avg); + /* constants 2 msec and offset 5 should be tunable? */ + max_enqueued = 2 * NSEC_PER_MSEC / tserv_ns_avg + 5; + if (atomic_read(&sdata->enqueued) > max_enqueued) { + /* silently drop */ + dev_kfree_skb(skb); + return IEEE80211_TX_OK; + } + + /* timestamp queue entry and increment queue length */ + skb->tstamp = ktime_get(); + atomic_inc(&sdata->enqueued); + switch (sdata->vif.type) { case NL80211_IFTYPE_MONITOR: info->control.vif = NULL; -- 1.7.4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-17 1:49 [RFC] " John W. Linville @ 2011-02-18 21:21 ` John W. Linville 2011-02-19 3:44 ` Nathaniel Smith ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: John W. Linville @ 2011-02-18 21:21 UTC (permalink / raw) To: linux-wireless; +Cc: johannes, bloat-devel, Nathaniel J. Smith This is an implementation of the eBDP algorithm as documented in Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li, et al. http://www.hamilton.ie/tianji_li/buffersizing.pdf This implementation timestamps an skb before handing it to the hardware driver, then computes the service time when the frame is freed by the driver. An exponentially weighted moving average of per fragment service times is used to restrict queueing delays in hopes of achieving a target fragment transmission latency. Signed-off-by: John W. Linville <linville@tuxdriver.com> --- v1 -> v2: - execute algorithm separately for each WMM queue - change ewma scaling parameters - calculate max queue len only when new latency data is received - stop queues when occupancy limit is reached rather than dropping - use skb->destructor for tracking queue occupancy Johannes' comment about tx status reporting being unreliable (and what he was really saying) finally sunk-in. So, this version uses skb->destructor to track in-flight fragments. That should handle fragments that get silently dropped in the driver for whatever reason without leaking queue capacity. Correct me if I'm wrong! This version runs separate instances of the algorithm for each WMM queue -- I reckon that makes sense. I still ignore HT aggregation, as I'm pretty sure I don't understand how it realy effects the issue. I still haven't done any objective testing, so still no numbers -- feel free to provide your own! ;-) But for me, this version "feels" pretty good where I meet the chair. There are still bits that I would like to be adjustable. But since I'm still not sure what this ultimately becomes, I'm in no hurry to finalize a UI. I would be happy to consider a contribution from someone else in that regard. No doubt that this is all still quite preliminary! I'm pretty sure it still eats puppies. So, don't get it wet! And never, ever feed it after midnight... net/mac80211/ieee80211_i.h | 12 ++++++++++++ net/mac80211/iface.c | 12 ++++++++++++ net/mac80211/status.c | 23 +++++++++++++++++++++++ net/mac80211/tx.c | 26 ++++++++++++++++++++++++++ 4 files changed, 73 insertions(+), 0 deletions(-) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index f2ef15d..47e65bd 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -604,6 +604,18 @@ struct ieee80211_sub_if_data { struct dentry *default_mgmt_key; } debugfs; #endif + + struct { + /* number of packets currently in flight */ + atomic_t enqueued; + + /* computed maximum packets in flight */ + int max_enqueued; + + /* average packet service time */ + struct ewma tserv_ns_avg; + } qdata[4]; + /* must be last, dynamically sized area in this! */ struct ieee80211_vif vif; }; diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 5a4e19b..3e04ddb 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -839,6 +839,8 @@ static void ieee80211_iface_work(struct work_struct *work) static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata, enum nl80211_iftype type) { + int i; + /* clear type-dependent union */ memset(&sdata->u, 0, sizeof(sdata->u)); @@ -857,6 +859,16 @@ static void ieee80211_setup_sdata(struct ieee80211_sub_if_data *sdata, skb_queue_head_init(&sdata->skb_queue); INIT_WORK(&sdata->work, ieee80211_iface_work); + for (i=0; i<4; i++) { + /* initialize service time estimate for buffer control */ + ewma_init(&sdata->qdata[i].tserv_ns_avg, 1, 64); + + atomic_set(&sdata->qdata[i].enqueued, 0); + + /* default max should be ??? */ + sdata->qdata[i].max_enqueued = 2; + } + switch (type) { case NL80211_IFTYPE_P2P_GO: type = NL80211_IFTYPE_AP; diff --git a/net/mac80211/status.c b/net/mac80211/status.c index 010a559..8cb6539 100644 --- a/net/mac80211/status.c +++ b/net/mac80211/status.c @@ -172,6 +172,7 @@ static void ieee80211_frame_acked(struct sta_info *sta, struct sk_buff *skb) void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb) { + ktime_t tserv; struct sk_buff *skb2; struct ieee80211_hdr *hdr = (struct ieee80211_hdr *) skb->data; struct ieee80211_local *local = hw_to_local(hw); @@ -188,6 +189,9 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb) bool send_to_cooked; bool acked; + /* grab timestamp info for buffer control estimates */ + tserv = ktime_sub(ktime_get(), skb->tstamp); + for (i = 0; i < IEEE80211_TX_MAX_RATES; i++) { /* the HW cannot have attempted that rate */ if (i >= hw->max_report_rates) { @@ -208,10 +212,29 @@ void ieee80211_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb) fc = hdr->frame_control; for_each_sta_info(local, hdr->addr1, sta, tmp) { + int q = skb_get_queue_mapping(skb); + unsigned long tserv_ns_avg; + /* skip wrong virtual interface */ if (memcmp(hdr->addr2, sta->sdata->vif.addr, ETH_ALEN)) continue; + /* factor current tserv into service time estimate */ + ewma_add(&sta->sdata->qdata[q].tserv_ns_avg, + ktime_to_ns(tserv)); + + /* determine queue admission qualifications */ + tserv_ns_avg = ewma_read(&sta->sdata->qdata[q].tserv_ns_avg); + if (tserv_ns_avg) { + /* latency target and min limit should be tunable? */ + sta->sdata->qdata[q].max_enqueued = + max_t(int, 2, 2 * NSEC_PER_MSEC / tserv_ns_avg); + } + if (atomic_read(&sta->sdata->qdata[q].enqueued) >= + sta->sdata->qdata[q].max_enqueued && + !ieee80211_queue_stopped(&local->hw, q)) + ieee80211_stop_queue(&local->hw, q); + acked = !!(info->flags & IEEE80211_TX_STAT_ACK); if (!acked && test_sta_flags(sta, WLAN_STA_PS_STA)) { /* diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 17ef4f4..cba9060 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -1284,6 +1284,20 @@ ieee80211_tx_prepare(struct ieee80211_sub_if_data *sdata, return TX_CONTINUE; } +static void ieee80211_tx_skb_free(struct sk_buff *skb) +{ + struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev); + struct ieee80211_local *local = sdata->local; + int q = skb_get_queue_mapping(skb); + + /* decrement enqueued count */ + atomic_dec(&sdata->qdata[q].enqueued); + + /* if queue stopped, wake it */ + if (ieee80211_queue_stopped(&local->hw, q)) + ieee80211_wake_queue(&local->hw, q); +} + static int __ieee80211_tx(struct ieee80211_local *local, struct sk_buff **skbp, struct sta_info *sta, @@ -1323,6 +1337,18 @@ static int __ieee80211_tx(struct ieee80211_local *local, sdata = vif_to_sdata(info->control.vif); + /* timestamp queue entry and increment queue length */ + skb->tstamp = ktime_get(); + atomic_inc(&sdata->qdata[q].enqueued); + + /* take ownership of skb */ + skb_orphan(skb); + skb->dev = sdata->dev; + skb->destructor = ieee80211_tx_skb_free; + if (atomic_read(&sdata->qdata[q].enqueued) >= + sdata->qdata[q].max_enqueued) + ieee80211_stop_queue(&local->hw, q); + switch (sdata->vif.type) { case NL80211_IFTYPE_MONITOR: info->control.vif = NULL; -- 1.7.4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-18 21:21 ` [RFC v2] " John W. Linville @ 2011-02-19 3:44 ` Nathaniel Smith 2011-02-21 18:47 ` John W. Linville 2011-02-20 0:37 ` Nathaniel Smith 2011-02-21 15:28 ` Johannes Berg 2 siblings, 1 reply; 18+ messages in thread From: Nathaniel Smith @ 2011-02-19 3:44 UTC (permalink / raw) To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville <linville@tuxdriver.com> wrote: > + /* grab timestamp info for buffer control estimates */ > + tserv = ktime_sub(ktime_get(), skb->tstamp); [...] > + ewma_add(&sta->sdata->qdata[q].tserv_ns_avg, > + ktime_to_ns(tserv)); I think you're still measuring how long it takes one packet to get from the end of the queue to the beginning, rather than measuring how long it takes each packet to go out? -- Nathaniel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-19 3:44 ` Nathaniel Smith @ 2011-02-21 18:47 ` John W. Linville 2011-02-21 23:26 ` Nathaniel Smith 0 siblings, 1 reply; 18+ messages in thread From: John W. Linville @ 2011-02-21 18:47 UTC (permalink / raw) To: Nathaniel Smith; +Cc: bloat-devel, johannes, linux-wireless On Fri, Feb 18, 2011 at 07:44:30PM -0800, Nathaniel Smith wrote: > On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville > <linville@tuxdriver.com> wrote: > > + /* grab timestamp info for buffer control estimates */ > > + tserv = ktime_sub(ktime_get(), skb->tstamp); > [...] > > + ewma_add(&sta->sdata->qdata[q].tserv_ns_avg, > > + ktime_to_ns(tserv)); > > I think you're still measuring how long it takes one packet to get > from the end of the queue to the beginning, rather than measuring how > long it takes each packet to go out? Yes, I am measuring how long the driver+device takes to release each skb back to me (using that as a proxy for how long it takes to get the fragment to the next hop). Actually, FWIW I'm only measuring that time for those skb's that result in a tx status report. I tried to see how your measurement would be useful, but I just don't see how the number of frames ahead of me in the queue is relevant to the measured link latency? I mean, I realize that having more packets ahead of me in the queue is likely to increase the latency for this frame, but I don't understand why I should use that information to discount the measured latency...? John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-21 18:47 ` John W. Linville @ 2011-02-21 23:26 ` Nathaniel Smith 2011-02-23 22:28 ` John W. Linville 0 siblings, 1 reply; 18+ messages in thread From: Nathaniel Smith @ 2011-02-21 23:26 UTC (permalink / raw) To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless On Mon, Feb 21, 2011 at 10:47 AM, John W. Linville <linville@tuxdriver.com> wrote: > On Fri, Feb 18, 2011 at 07:44:30PM -0800, Nathaniel Smith wrote: >> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville >> <linville@tuxdriver.com> wrote: >> > + /* grab timestamp info for buffer control estimates */ >> > + tserv = ktime_sub(ktime_get(), skb->tstamp); >> [...] >> > + ewma_add(&sta->sdata->qdata[q].tserv_ns_avg, >> > + ktime_to_ns(tserv)); >> >> I think you're still measuring how long it takes one packet to get >> from the end of the queue to the beginning, rather than measuring how >> long it takes each packet to go out? > > Yes, I am measuring how long the driver+device takes to release each > skb back to me (using that as a proxy for how long it takes to get > the fragment to the next hop). Actually, FWIW I'm only measuring > that time for those skb's that result in a tx status report. > > I tried to see how your measurement would be useful, but I just don't > see how the number of frames ahead of me in the queue is relevant to > the measured link latency? I mean, I realize that having more packets > ahead of me in the queue is likely to increase the latency for this > frame, but I don't understand why I should use that information to > discount the measured latency...? It depends on which latency you want to measure. The way that I reasoned was, suppose that at some given time, the card is able to transmit 1 fragment every T nanoseconds. Then it can transmit n fragments in n*T nanoseconds, so if we want the queue depth to be 2 ms, then we have n * T = 2 * NSEC_PER_MSEC n = 2 * NSEC_PER_MSEC / T Which is the calculation that you're doing: + sta->sdata->qdata[q].max_enqueued = + max_t(int, 2, 2 * NSEC_PER_MSEC / tserv_ns_avg); But for this calculation to make sense, we need T to be the time it takes the card to transmit 1 fragment. In your patch, you're not measuring that. You're measuring the total time between when a packet is enqueued and when it is transmitted; if there were K packets in the queue ahead of it, then this is the time to send *all* of them -- you're measuring (K+1)*T. That's why in my patch, I recorded the current size of the queue when each packet is enqueued, so I could compute T = total_time / (K+1). Under saturation conditions, K+1 will always equal max_enqueued, so I guess in your algorithm, at the steady state we have max_enqueued = K+1 = 2 * NSEC_PER_MSEC / ((K+1) * T) (K+1)^2 = 2 * NSEC_PER_MSEC / T K+1 = sqrt(2 * NSEC_PER_MSEC / T) So I think under saturation, you converge to setting the queue to the square root of the appropriate size? -- Nathaniel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-21 23:26 ` Nathaniel Smith @ 2011-02-23 22:28 ` John W. Linville 2011-02-25 18:21 ` Nathaniel Smith 0 siblings, 1 reply; 18+ messages in thread From: John W. Linville @ 2011-02-23 22:28 UTC (permalink / raw) To: Nathaniel Smith; +Cc: bloat-devel, johannes, linux-wireless On Mon, Feb 21, 2011 at 03:26:32PM -0800, Nathaniel Smith wrote: > On Mon, Feb 21, 2011 at 10:47 AM, John W. Linville > <linville@tuxdriver.com> wrote: > > I tried to see how your measurement would be useful, but I just don't > > see how the number of frames ahead of me in the queue is relevant to > > the measured link latency? I mean, I realize that having more packets > > ahead of me in the queue is likely to increase the latency for this > > frame, but I don't understand why I should use that information to > > discount the measured latency...? > > It depends on which latency you want to measure. The way that I > reasoned was, suppose that at some given time, the card is able to > transmit 1 fragment every T nanoseconds. Then it can transmit n > fragments in n*T nanoseconds, so if we want the queue depth to be 2 > ms, then we have > n * T = 2 * NSEC_PER_MSEC > n = 2 * NSEC_PER_MSEC / T > > Which is the calculation that you're doing: > > + sta->sdata->qdata[q].max_enqueued = > + max_t(int, 2, 2 * NSEC_PER_MSEC / tserv_ns_avg); > > But for this calculation to make sense, we need T to be the time it > takes the card to transmit 1 fragment. In your patch, you're not > measuring that. You're measuring the total time between when a packet > is enqueued and when it is transmitted; if there were K packets in the > queue ahead of it, then this is the time to send *all* of them -- > you're measuring (K+1)*T. That's why in my patch, I recorded the > current size of the queue when each packet is enqueued, so I could > compute T = total_time / (K+1). Thanks for the math! I think I see what you are saying now. Since the measured time is being used to determine the queue size, we need to factor-in the length of the queue that resulted in that measurment. Unfortunately, I'm not sure how to apply this with the technique I am using for the timing measurements. :-( I'll have to think about this some more... John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-23 22:28 ` John W. Linville @ 2011-02-25 18:21 ` Nathaniel Smith 2011-02-25 18:27 ` Nathaniel Smith 0 siblings, 1 reply; 18+ messages in thread From: Nathaniel Smith @ 2011-02-25 18:21 UTC (permalink / raw) To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless On Wed, Feb 23, 2011 at 2:28 PM, John W. Linville <linville@tuxdriver.com> wrote: > On Mon, Feb 21, 2011 at 03:26:32PM -0800, Nathaniel Smith wrote: >> But for this calculation to make sense, we need T to be the time it >> takes the card to transmit 1 fragment. In your patch, you're not >> measuring that. You're measuring the total time between when a packet >> is enqueued and when it is transmitted; if there were K packets in the >> queue ahead of it, then this is the time to send *all* of them -- >> you're measuring (K+1)*T. That's why in my patch, I recorded the >> current size of the queue when each packet is enqueued, so I could >> compute T = total_time / (K+1). > > Thanks for the math! I think I see what you are saying now. Since the > measured time is being used to determine the queue size, we need to > factor-in the length of the queue that resulted in that measurment. > > Unfortunately, I'm not sure how to apply this with the technique I > am using for the timing measurements. :-( I'll have to think about > this some more... I guess an ugly but effective approach would be to steal some bits from the timestamp... something like (untested, and probably needs some sign-extension bugs fixed): #define LATENCY_BITS 8 #define ENQUEUED_BITS 24 BUILD_BUG_ON(LATENCY_BITS + ENQUEUED_BITS != 32); static ktime_t _ktime_get_and_stash(int enqueued) { timespec now = ktime_to_timespec(ktime_get()); now.sec &= (1 << LATENCY_BITS) - 1; BUG_ON(enqueued > (1 << ENQUEUED_BITS)); now.sec |= enqueued << LATENCY_BITS; return timespec_to_ktime(now); } static u64 _ktime_diff_to_now_and_unstash(ktime_t then, int * enqueued) { timespec ts_then = ktime_to_timespec(then); timespec ts_now = ktime_to_timespec(ktime_get()); *enqueued = ts_then.tv_sec >> LATENCY_BITS; ts_then.tv_sec &= (1 << LATENCY_BITS) - 1; ts_now.tv_sec &= (1 << LATENCY_BITS) - 1; if (ts_now.tv_sec < ts_then.tv_sec) ts_now.tv_sec += (1 << LATENCY_BITS); timespec_sub(ts_now, ts_then); } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-25 18:21 ` Nathaniel Smith @ 2011-02-25 18:27 ` Nathaniel Smith 0 siblings, 0 replies; 18+ messages in thread From: Nathaniel Smith @ 2011-02-25 18:27 UTC (permalink / raw) To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless On Fri, Feb 25, 2011 at 10:21 AM, Nathaniel Smith <njs@pobox.com> wrote: > static u64 _ktime_diff_to_now_and_unstash(ktime_t then, int * enqueued) { > > timespec ts_then = ktime_to_timespec(then); > timespec ts_now = ktime_to_timespec(ktime_get()); > *enqueued = ts_then.tv_sec >> LATENCY_BITS; > ts_then.tv_sec &= (1 << LATENCY_BITS) - 1; > ts_now.tv_sec &= (1 << LATENCY_BITS) - 1; > if (ts_now.tv_sec < ts_then.tv_sec) > ts_now.tv_sec += (1 << LATENCY_BITS); > timespec_sub(ts_now, ts_then); > } Err, plus the 'return timespec_to_ns(...)' on the last line, that I was trying to add when my computer suddenly decided I wanted to send the message. How embarrassing. Anyway, not sure this is a *good* idea, but it should work. Hopefully we don't actually need to measure latencies > 256 seconds in this context... -- Nathaniel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-18 21:21 ` [RFC v2] " John W. Linville 2011-02-19 3:44 ` Nathaniel Smith @ 2011-02-20 0:37 ` Nathaniel Smith 2011-02-20 0:51 ` Jim Gettys 2011-02-21 18:52 ` John W. Linville 2011-02-21 15:28 ` Johannes Berg 2 siblings, 2 replies; 18+ messages in thread From: Nathaniel Smith @ 2011-02-20 0:37 UTC (permalink / raw) To: John W. Linville; +Cc: bloat-devel, johannes, linux-wireless Actually, a few more comments just occurred to me... On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville <linville@tuxdriver.com> wrote: > Johannes' comment about tx status reporting being unreliable (and what > he was really saying) finally sunk-in. So, this version uses > skb->destructor to track in-flight fragments. That should handle > fragments that get silently dropped in the driver for whatever reason > without leaking queue capacity. Correct me if I'm wrong! Should we be somehow filtering out and ignoring the packets that get dropped, when we're calculating the average packet transmission rate? Presumably they're getting dropped almost instantly, so they don't really take up queue space and they have abnormally fast transmission times, which will tend to cause us to overestimate max_enqueued? They should be rare, though, at least. (And presumably we *should* include packets that get dropped because their retry timer ran out, since they were sitting in the queue for that long.) Possibly we should just ignore any packet that was handled in less than, oh, say, a few microseconds? Alternatively, if we aren't worried about those packets, then is there any reason this patch should be mac80211 specific? > +static void ieee80211_tx_skb_free(struct sk_buff *skb) > +{ > + struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev); > + struct ieee80211_local *local = sdata->local; > + int q = skb_get_queue_mapping(skb); > + > + /* decrement enqueued count */ > + atomic_dec(&sdata->qdata[q].enqueued); > + > + /* if queue stopped, wake it */ > + if (ieee80211_queue_stopped(&local->hw, q)) > + ieee80211_wake_queue(&local->hw, q); > +} I think you need to check that .enqueued is < max_enqueued here, instead of waking the queue unconditionally. Suppose the data rate drops while there's a steady flow -- our max_enqueued value will drop below the current queue size, but because we wake the queue unconditionally after each packet goes out, and then only stop it again after we've queued at least one new packet, we might get 'stuck' with an over-large queue. -- Nathaniel ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-20 0:37 ` Nathaniel Smith @ 2011-02-20 0:51 ` Jim Gettys 2011-02-20 15:24 ` Dave Täht 2011-02-21 18:52 ` John W. Linville 1 sibling, 1 reply; 18+ messages in thread From: Jim Gettys @ 2011-02-20 0:51 UTC (permalink / raw) To: bloat-devel On 02/19/2011 07:37 PM, Nathaniel Smith wrote: > Actually, a few more comments just occurred to me... > > On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville > <linville@tuxdriver.com> wrote: >> Johannes' comment about tx status reporting being unreliable (and what >> he was really saying) finally sunk-in. So, this version uses >> skb->destructor to track in-flight fragments. That should handle >> fragments that get silently dropped in the driver for whatever reason >> without leaking queue capacity. Correct me if I'm wrong! > > Should we be somehow filtering out and ignoring the packets that get > dropped, when we're calculating the average packet transmission rate? > Presumably they're getting dropped almost instantly, so they don't > really take up queue space and they have abnormally fast transmission > times, which will tend to cause us to overestimate max_enqueued? They > should be rare, though, at least. (And presumably we *should* include > packets that get dropped because their retry timer ran out, since they > were sitting in the queue for that long.) Possibly we should just > ignore any packet that was handled in less than, oh, say, a few > microseconds? I will note that AQM algorithms that are likely to work will need to know the actual "goodput" of the link. (This from talking to Van Jacobson about the problem we face.) - Jim ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-20 0:51 ` Jim Gettys @ 2011-02-20 15:24 ` Dave Täht 0 siblings, 0 replies; 18+ messages in thread From: Dave Täht @ 2011-02-20 15:24 UTC (permalink / raw) To: Jim Gettys; +Cc: bloat-devel Jim Gettys <jg@freedesktop.org> writes: > On 02/19/2011 07:37 PM, Nathaniel Smith wrote: >> Actually, a few more comments just occurred to me... >> >> On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville >> <linville@tuxdriver.com> wrote: >>> Johannes' comment about tx status reporting being unreliable (and what >>> he was really saying) finally sunk-in. So, this version uses >>> skb->destructor to track in-flight fragments. That should handle >>> fragments that get silently dropped in the driver for whatever reason >>> without leaking queue capacity. Correct me if I'm wrong! >> >> Should we be somehow filtering out and ignoring the packets that get >> dropped, when we're calculating the average packet transmission rate? >> Presumably they're getting dropped almost instantly, so they don't >> really take up queue space and they have abnormally fast transmission >> times, which will tend to cause us to overestimate max_enqueued? They >> should be rare, though, at least. (And presumably we *should* include >> packets that get dropped because their retry timer ran out, since they >> were sitting in the queue for that long.) Possibly we should just >> ignore any packet that was handled in less than, oh, say, a few >> microseconds? The perfect is the enemy of the good. POGE[1] applies. After getting 2 orders of magnitude improvement with the patches thus far... My reaction to the above is that I'd like to get what is being implemented tested by more folk, across more drivers. Me being one of them! I hope to get builds done today for several openwrt devices and my laggy-lagn laptop. > I will note that AQM algorithms that are likely to work will need to > know the actual "goodput" of the link. (This from talking to Van > Jacobson about the problem we face.) I care about AQM, I really do[2]... Certainly exposing more hooks and data to the next layer in the stack from the driver would be good... But I'd like mostly simply to get: A) People using and loving debloated drivers, & more people aware of and fixing the problem... B) The dynamic range available via ethtool to userspace to be vastly increased C) Buffer sizes defaulting to much less, across the board. These are the simplest, most effective, low hanging fruit/fixes. [1] We're making great progress on wireless, how do we make progress on wired? -- Dave Taht http://nex-6.taht.net 1: http://en.wikipedia.org/wiki/Principle_of_good_enough 2: I'll patch SFB into my build and any other of the new AQMs anyone can suggest ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-20 0:37 ` Nathaniel Smith 2011-02-20 0:51 ` Jim Gettys @ 2011-02-21 18:52 ` John W. Linville 1 sibling, 0 replies; 18+ messages in thread From: John W. Linville @ 2011-02-21 18:52 UTC (permalink / raw) To: Nathaniel Smith; +Cc: bloat-devel, johannes, linux-wireless On Sat, Feb 19, 2011 at 04:37:00PM -0800, Nathaniel Smith wrote: > Actually, a few more comments just occurred to me... > > On Fri, Feb 18, 2011 at 1:21 PM, John W. Linville > <linville@tuxdriver.com> wrote: > > Johannes' comment about tx status reporting being unreliable (and what > > he was really saying) finally sunk-in. So, this version uses > > skb->destructor to track in-flight fragments. That should handle > > fragments that get silently dropped in the driver for whatever reason > > without leaking queue capacity. Correct me if I'm wrong! > > Should we be somehow filtering out and ignoring the packets that get > dropped, when we're calculating the average packet transmission rate? > Presumably they're getting dropped almost instantly, so they don't > really take up queue space and they have abnormally fast transmission > times, which will tend to cause us to overestimate max_enqueued? They > should be rare, though, at least. (And presumably we *should* include > packets that get dropped because their retry timer ran out, since they > were sitting in the queue for that long.) Possibly we should just > ignore any packet that was handled in less than, oh, say, a few > microseconds? If you look, I only do the timing measurement for frames that result in a tx status report. Frames that are dropped will only hit ieee80211_tx_skb_free (which reduces the enqueued count but doesn't recalculate max_enqueue). > Alternatively, if we aren't worried about those packets, then is there > any reason this patch should be mac80211 specific? No, in fact I was thinking the same thing. Some thought needs to be put to whether or not we can ignore the effects of letting dropped packets effect the latency estimate... > > +static void ieee80211_tx_skb_free(struct sk_buff *skb) > > +{ > > + struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(skb->dev); > > + struct ieee80211_local *local = sdata->local; > > + int q = skb_get_queue_mapping(skb); > > + > > + /* decrement enqueued count */ > > + atomic_dec(&sdata->qdata[q].enqueued); > > + > > + /* if queue stopped, wake it */ > > + if (ieee80211_queue_stopped(&local->hw, q)) > > + ieee80211_wake_queue(&local->hw, q); > > +} > > I think you need to check that .enqueued is < max_enqueued here, > instead of waking the queue unconditionally. > > Suppose the data rate drops while there's a steady flow -- our > max_enqueued value will drop below the current queue size, but because > we wake the queue unconditionally after each packet goes out, and then > only stop it again after we've queued at least one new packet, we > might get 'stuck' with an over-large queue. Yes, thanks for pointing that out. My non-thorough tests seem to do a better job at holding the latency lower with that change. Thanks for taking a look! John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-18 21:21 ` [RFC v2] " John W. Linville 2011-02-19 3:44 ` Nathaniel Smith 2011-02-20 0:37 ` Nathaniel Smith @ 2011-02-21 15:28 ` Johannes Berg 2011-02-21 16:12 ` Jim Gettys 2011-02-21 19:06 ` John W. Linville 2 siblings, 2 replies; 18+ messages in thread From: Johannes Berg @ 2011-02-21 15:28 UTC (permalink / raw) To: John W. Linville; +Cc: bloat-devel, linux-wireless, Nathaniel J. Smith On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote: > This is an implementation of the eBDP algorithm as documented in > Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li, > et al. > > http://www.hamilton.ie/tianji_li/buffersizing.pdf > > This implementation timestamps an skb before handing it to the > hardware driver, then computes the service time when the frame is > freed by the driver. An exponentially weighted moving average of per > fragment service times is used to restrict queueing delays in hopes > of achieving a target fragment transmission latency. > > Signed-off-by: John W. Linville <linville@tuxdriver.com> > --- > v1 -> v2: > - execute algorithm separately for each WMM queue > - change ewma scaling parameters > - calculate max queue len only when new latency data is received > - stop queues when occupancy limit is reached rather than dropping > - use skb->destructor for tracking queue occupancy > > Johannes' comment about tx status reporting being unreliable (and what > he was really saying) finally sunk-in. So, this version uses > skb->destructor to track in-flight fragments. That should handle > fragments that get silently dropped in the driver for whatever reason > without leaking queue capacity. Correct me if I'm wrong! Yeah, I had that idea as well. Could unify the existing skb_orphan() call though :-) However, Nathaniel is right -- if the skb is freed right away during tx() you kinda estimate its queue time to be virtually zero. That doesn't make a lot of sense and might in certain conditions exacerbate the problem, for example if the system is out of memory more packets might be allowed through than in normal operation etc. Also, for some USB drivers I believe SKB lifetime has no relation to queue size at all because the data is just shuffled into an URB. I'm not sure we can solve this generically. I'm not really sure how this works for USB drivers, I think they queue up frames with the HCI controller rather than directly with the device. Finally, this isn't taking into account any of the issues about aggregation and AP mode. Remember that both with multiple streams (on different ACs) and even more so going to different stations (AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and let's not forget 11ac/ad) there may be vast differences in the time different frames spend on a queue which are not just due to bloated queues. I'm concerned about this since none of it has been taken into account in the paper you're basing this on, all evaluations seem to be pretty much based on a single traffic stream. Overall, I think there should be some more research first. This might help in some cases, but do we know it won't completely break throughput in other cases? johannes ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-21 15:28 ` Johannes Berg @ 2011-02-21 16:12 ` Jim Gettys 2011-02-21 19:15 ` John W. Linville 2011-02-21 19:06 ` John W. Linville 1 sibling, 1 reply; 18+ messages in thread From: Jim Gettys @ 2011-02-21 16:12 UTC (permalink / raw) To: bloat-devel; +Cc: Dan Williams, Javier Cardona, Dave Woodhouse On 02/21/2011 10:28 AM, Johannes Berg wrote: > On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote: >> This is an implementation of the eBDP algorithm as documented in >> Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li, >> et al. >> >> http://www.hamilton.ie/tianji_li/buffersizing.pdf >> >> This implementation timestamps an skb before handing it to the >> hardware driver, then computes the service time when the frame is >> freed by the driver. An exponentially weighted moving average of per >> fragment service times is used to restrict queueing delays in hopes >> of achieving a target fragment transmission latency. >> >> Signed-off-by: John W. Linville<linville@tuxdriver.com> >> --- >> v1 -> v2: >> - execute algorithm separately for each WMM queue >> - change ewma scaling parameters >> - calculate max queue len only when new latency data is received >> - stop queues when occupancy limit is reached rather than dropping >> - use skb->destructor for tracking queue occupancy >> >> Johannes' comment about tx status reporting being unreliable (and what >> he was really saying) finally sunk-in. So, this version uses >> skb->destructor to track in-flight fragments. That should handle >> fragments that get silently dropped in the driver for whatever reason >> without leaking queue capacity. Correct me if I'm wrong! > > Yeah, I had that idea as well. Could unify the existing skb_orphan() > call though :-) > > However, Nathaniel is right -- if the skb is freed right away during > tx() you kinda estimate its queue time to be virtually zero. That > doesn't make a lot of sense and might in certain conditions exacerbate > the problem, for example if the system is out of memory more packets > might be allowed through than in normal operation etc. > > Also, for some USB drivers I believe SKB lifetime has no relation to > queue size at all because the data is just shuffled into an URB. I'm not > sure we can solve this generically. I'm not really sure how this works > for USB drivers, I think they queue up frames with the HCI controller > rather than directly with the device. Let me give a concrete example: I checked with Javier Cardona about the Marvell module (libertas driver) used on OLPC a couple months ago. It turns out there are 4 packets of buffering out in the wireless module itself (clearly needed for autonomous forwarding). There is also one packet buffer in the device driver itself; Dave Woodhouse says it simplified the driver greatly. I don't know if anyone has been thinking about how to manage the buffering from top to bottom, with devices that may do internal buffering in various places. > > Finally, this isn't taking into account any of the issues about > aggregation and AP mode. Remember that both with multiple streams (on > different ACs) and even more so going to different stations > (AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and > let's not forget 11ac/ad) there may be vast differences in the time > different frames spend on a queue which are not just due to bloated > queues. I'm concerned about this since none of it has been taken into > account in the paper you're basing this on, all evaluations seem to be > pretty much based on a single traffic stream. > > Overall, I think there should be some more research first. This might > help in some cases, but do we know it won't completely break throughput > in other cases? > - Jim ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-21 16:12 ` Jim Gettys @ 2011-02-21 19:15 ` John W. Linville 0 siblings, 0 replies; 18+ messages in thread From: John W. Linville @ 2011-02-21 19:15 UTC (permalink / raw) To: Jim Gettys; +Cc: Dan Williams, Javier Cardona, bloat-devel, Dave Woodhouse On Mon, Feb 21, 2011 at 11:12:14AM -0500, Jim Gettys wrote: > On 02/21/2011 10:28 AM, Johannes Berg wrote: > >However, Nathaniel is right -- if the skb is freed right away during > >tx() you kinda estimate its queue time to be virtually zero. That > >doesn't make a lot of sense and might in certain conditions exacerbate > >the problem, for example if the system is out of memory more packets > >might be allowed through than in normal operation etc. For those only reading the bloat lists, I replied elsewhere to clarify that the present patch is only calculating max_enqueued for frames that result in a tx status report (i.e. not for dropped frames) -- just in case that detail is somehow relevant... > >Also, for some USB drivers I believe SKB lifetime has no relation to > >queue size at all because the data is just shuffled into an URB. I'm not > >sure we can solve this generically. I'm not really sure how this works > >for USB drivers, I think they queue up frames with the HCI controller > >rather than directly with the device. > > Let me give a concrete example: > > I checked with Javier Cardona about the Marvell module (libertas > driver) used on OLPC a couple months ago. > > It turns out there are 4 packets of buffering out in the wireless > module itself (clearly needed for autonomous forwarding). > > There is also one packet buffer in the device driver itself; Dave > Woodhouse says it simplified the driver greatly. > > I don't know if anyone has been thinking about how to manage the > buffering from top to bottom, with devices that may do internal > buffering in various places. (FWIW, my current patch won't affect the libertas driver...) The role I see my patch playing is to evaluate how good a job the driver/device is doing with it's own buffers. If it is keeping its latency low, then I will allow it more fragments to buffer. If its latency grows too large, I throttle the number of fragments I allow the driver to see. To a large degree, it doesn't matter how much buffering the driver/device is doing so long as it is moving the frames along quickly. John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-21 15:28 ` Johannes Berg 2011-02-21 16:12 ` Jim Gettys @ 2011-02-21 19:06 ` John W. Linville 2011-02-21 20:26 ` Tianji Li 2011-02-28 13:07 ` Johannes Berg 1 sibling, 2 replies; 18+ messages in thread From: John W. Linville @ 2011-02-21 19:06 UTC (permalink / raw) To: Johannes Berg; +Cc: bloat-devel, linux-wireless, Nathaniel J. Smith On Mon, Feb 21, 2011 at 04:28:06PM +0100, Johannes Berg wrote: > On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote: > > This is an implementation of the eBDP algorithm as documented in > > Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li, > > et al. > > > > http://www.hamilton.ie/tianji_li/buffersizing.pdf > > > > This implementation timestamps an skb before handing it to the > > hardware driver, then computes the service time when the frame is > > freed by the driver. An exponentially weighted moving average of per > > fragment service times is used to restrict queueing delays in hopes > > of achieving a target fragment transmission latency. > > > > Signed-off-by: John W. Linville <linville@tuxdriver.com> > > --- > > v1 -> v2: > > - execute algorithm separately for each WMM queue > > - change ewma scaling parameters > > - calculate max queue len only when new latency data is received > > - stop queues when occupancy limit is reached rather than dropping > > - use skb->destructor for tracking queue occupancy > > > > Johannes' comment about tx status reporting being unreliable (and what > > he was really saying) finally sunk-in. So, this version uses > > skb->destructor to track in-flight fragments. That should handle > > fragments that get silently dropped in the driver for whatever reason > > without leaking queue capacity. Correct me if I'm wrong! > > Yeah, I had that idea as well. Could unify the existing skb_orphan() > call though :-) The one in ieee80211_skb_resize? Any idea how that would look? > However, Nathaniel is right -- if the skb is freed right away during > tx() you kinda estimate its queue time to be virtually zero. That > doesn't make a lot of sense and might in certain conditions exacerbate > the problem, for example if the system is out of memory more packets > might be allowed through than in normal operation etc. As in my reply to Nathaniel, please notice that the timing estimate (and the max_enqueued calculation) only happens for frames that result in a tx status report -- at least for now... However, if this were generalized beyond mac80211 then we wouldn't be able to rely on tx status reports. I can see that dropping frames in the driver would lead to timing estimates that would cascade into a wide-open queue size. But I'm not sure that would be a big deal, since in the long run those dropped frames should still result in IP cwnd reductions, etc...? > Also, for some USB drivers I believe SKB lifetime has no relation to > queue size at all because the data is just shuffled into an URB. I'm not > sure we can solve this generically. I'm not really sure how this works > for USB drivers, I think they queue up frames with the HCI controller > rather than directly with the device. How do you think the time spent handling URBs in the USB stack relates to the time spent transmitting frames? At what point do those SKBs get freed? > Finally, this isn't taking into account any of the issues about > aggregation and AP mode. Remember that both with multiple streams (on > different ACs) and even more so going to different stations > (AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and > let's not forget 11ac/ad) there may be vast differences in the time > different frames spend on a queue which are not just due to bloated > queues. I'm concerned about this since none of it has been taken into > account in the paper you're basing this on, all evaluations seem to be > pretty much based on a single traffic stream. Yeah, I'm still not sure we all have our heads around these issues. I mean, on the one hand it seems wrong to limit queueing for one stream or station just because some other stream or station is higher latency. But on the other hand, it seems to me that those streams/stations still have to share the same link and that higher real latency for one stream/station could still result in a higher perceived latency for another stream/station sharing the same link, since they still have to share the same air...no? > Overall, I think there should be some more research first. This might > help in some cases, but do we know it won't completely break throughput > in other cases? That's why it is posted RFC, of course. :-) John -- John W. Linville Someday the world will need a hero, and you linville@tuxdriver.com might be all we have. Be ready. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-21 19:06 ` John W. Linville @ 2011-02-21 20:26 ` Tianji Li 2011-02-28 13:07 ` Johannes Berg 1 sibling, 0 replies; 18+ messages in thread From: Tianji Li @ 2011-02-21 20:26 UTC (permalink / raw) To: John W. Linville, Johannes Berg, Nathaniel J. Smith Cc: linux-wireless, bloat-devel On 02/21/2011 01:06 PM, John W. Linville wrote: > On Mon, Feb 21, 2011 at 04:28:06PM +0100, Johannes Berg wrote: >> On Fri, 2011-02-18 at 16:21 -0500, John W. Linville wrote: >>> This is an implementation of the eBDP algorithm as documented in >>> Section IV of "Buffer Sizing for 802.11 Based Networks" by Tianji Li, >>> et al. >>> >>> http://www.hamilton.ie/tianji_li/buffersizing.pdf >>> >>> This implementation timestamps an skb before handing it to the >>> hardware driver, then computes the service time when the frame is >>> freed by the driver. An exponentially weighted moving average of per >>> fragment service times is used to restrict queueing delays in hopes >>> of achieving a target fragment transmission latency. >>> >>> Signed-off-by: John W. Linville<linville@tuxdriver.com> >>> --- >>> v1 -> v2: >>> - execute algorithm separately for each WMM queue >>> - change ewma scaling parameters >>> - calculate max queue len only when new latency data is received >>> - stop queues when occupancy limit is reached rather than dropping >>> - use skb->destructor for tracking queue occupancy >>> >>> Johannes' comment about tx status reporting being unreliable (and what >>> he was really saying) finally sunk-in. So, this version uses >>> skb->destructor to track in-flight fragments. That should handle >>> fragments that get silently dropped in the driver for whatever reason >>> without leaking queue capacity. Correct me if I'm wrong! >> >> Yeah, I had that idea as well. Could unify the existing skb_orphan() >> call though :-) > > The one in ieee80211_skb_resize? Any idea how that would look? > >> However, Nathaniel is right -- if the skb is freed right away during >> tx() you kinda estimate its queue time to be virtually zero. That >> doesn't make a lot of sense and might in certain conditions exacerbate >> the problem, for example if the system is out of memory more packets >> might be allowed through than in normal operation etc. > > As in my reply to Nathaniel, please notice that the timing estimate > (and the max_enqueued calculation) only happens for frames that result > in a tx status report -- at least for now... > > However, if this were generalized beyond mac80211 then we wouldn't > be able to rely on tx status reports. I can see that dropping frames > in the driver would lead to timing estimates that would cascade into > a wide-open queue size. But I'm not sure that would be a big deal, > since in the long run those dropped frames should still result in IP > cwnd reductions, etc...? > >> Also, for some USB drivers I believe SKB lifetime has no relation to >> queue size at all because the data is just shuffled into an URB. I'm not >> sure we can solve this generically. I'm not really sure how this works >> for USB drivers, I think they queue up frames with the HCI controller >> rather than directly with the device. > > How do you think the time spent handling URBs in the USB stack relates > to the time spent transmitting frames? At what point do those SKBs > get freed? > >> Finally, this isn't taking into account any of the issues about >> aggregation and AP mode. Remember that both with multiple streams (on >> different ACs) and even more so going to different stations >> (AP/IBSS/mesh modes, and likely soon even in STA mode with (T)DLS, and >> let's not forget 11ac/ad) there may be vast differences in the time >> different frames spend on a queue which are not just due to bloated >> queues. I'm concerned about this since none of it has been taken into >> account in the paper you're basing this on, all evaluations seem to be >> pretty much based on a single traffic stream. > > Yeah, I'm still not sure we all have our heads around these issues. > I mean, on the one hand it seems wrong to limit queueing for one > stream or station just because some other stream or station is > higher latency. But on the other hand, it seems to me that those > streams/stations still have to share the same link and that higher > real latency for one stream/station could still result in a higher > perceived latency for another stream/station sharing the same link, > since they still have to share the same air...no? This is a good point. A buffer builds up when there are long-lived TCP flows. They can block the buffer since they are elastic in the sense that they send more packets when previous are acknowledged, which means that if the flow is long, lots of packets will arrive at the buffer almost at the same time. If the buffer is large, the waiting to be serviced can be long. This is fine for long-lived flows since when we are download a large file, we do not quite care if we are done in 2 minutes or 3. However, if there are a couple of email checks, no one can tolerate a 'fresh' click takes 3-2=1 minute. To mitigate, we shorten the buffer sizes (by dropping) so that the waiting can be shorter. Since the long-lived flows are dominating, dropping happens much more likely on them, some packets from short-lived can also be dropped too if they are not lucky. Still due to the elastic (it is both bad and good :-)) nature of TCP, the dropping on long-lived flows makes them backoff, which gives time to short ones. If a buffer is not used by elastic traffic, there is no need to do buffering. (Note that UDP can be elastic as well. The application layer of UDP normally has some logic to backoff if waiting is too long) In 802.11 standard, there is only one queue. While in 802.11e/n, there are four, and by default only one of which is used for TCP (but this can be changed). There are some other queues in the drive for control purposes, but they do not count. In our paper, we were doing buffersizing on the 802.11e/n TCP queue only. For 802.11, we need a few buffers on top of the 802.11 standard one to mimic those of 802.11e, and use sizing on the TCP buffer only. The scheduling of which queues should be active at the MAC layer is another issue, which can not be solved with the sizing logic. AQM may not may not be a better issue, but the issue is that it is not enabled even if so well known. My 2 cents, Tianji > >> Overall, I think there should be some more research first. This might >> help in some cases, but do we know it won't completely break throughput >> in other cases? > > That's why it is posted RFC, of course. :-) > > John ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat 2011-02-21 19:06 ` John W. Linville 2011-02-21 20:26 ` Tianji Li @ 2011-02-28 13:07 ` Johannes Berg 1 sibling, 0 replies; 18+ messages in thread From: Johannes Berg @ 2011-02-28 13:07 UTC (permalink / raw) To: John W. Linville; +Cc: bloat-devel, linux-wireless, Nathaniel J. Smith On Mon, 2011-02-21 at 14:06 -0500, John W. Linville wrote: > > Yeah, I had that idea as well. Could unify the existing skb_orphan() > > call though :-) > > The one in ieee80211_skb_resize? Any idea how that would look? Yeah. I think it'd have to be moved out of _skb_resize and made unconditional in that path, since eventually with this patch you'd do it anyway. > As in my reply to Nathaniel, please notice that the timing estimate > (and the max_enqueued calculation) only happens for frames that result > in a tx status report -- at least for now... Oops, right. > However, if this were generalized beyond mac80211 then we wouldn't > be able to rely on tx status reports. I can see that dropping frames > in the driver would lead to timing estimates that would cascade into > a wide-open queue size. But I'm not sure that would be a big deal, > since in the long run those dropped frames should still result in IP > cwnd reductions, etc...? I don't think we can generically rely on skb_orphan() in the network stack since that will make socket buffer limits meaningless. In fact, it pains me a bit that we had to do this in wireless before buffering the skb, and doing it unconditionally may be worse? > How do you think the time spent handling URBs in the USB stack relates > to the time spent transmitting frames? At what point do those SKBs > get freed? I honestly don't know. I would hope they are only freed when the URB was processed (i.e. at least DMA'd to the target device) but I suppose a driver might also copy the TX frame completely. > Yeah, I'm still not sure we all have our heads around these issues. > I mean, on the one hand it seems wrong to limit queueing for one > stream or station just because some other stream or station is > higher latency. But on the other hand, it seems to me that those > streams/stations still have to share the same link and that higher > real latency for one stream/station could still result in a higher > perceived latency for another stream/station sharing the same link, > since they still have to share the same air...no? Yeah, but retries (robustness) and aggregation (throughput) will invariably affect latency for everybody else using the shared medium. I suppose it would be better if queueing would be limited to a certain amount of air time use *per peer station*, so that each connection can have fairly low latency. However, this seems much harder to do. But what could happen here is that bursty traffic to a far-away (slow) station severely affects latency for and because there's also high traffic to a closer station that caused a buffering increase. johannes ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2011-02-28 13:08 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <x1-oTZGm1A7eclvABnv1aK0z1Nc7iI@gwene.org> 2011-02-20 1:59 ` [RFC v2] mac80211: implement eBDP algorithm to fight bufferbloat Dave Täht 2011-02-17 1:49 [RFC] " John W. Linville 2011-02-18 21:21 ` [RFC v2] " John W. Linville 2011-02-19 3:44 ` Nathaniel Smith 2011-02-21 18:47 ` John W. Linville 2011-02-21 23:26 ` Nathaniel Smith 2011-02-23 22:28 ` John W. Linville 2011-02-25 18:21 ` Nathaniel Smith 2011-02-25 18:27 ` Nathaniel Smith 2011-02-20 0:37 ` Nathaniel Smith 2011-02-20 0:51 ` Jim Gettys 2011-02-20 15:24 ` Dave Täht 2011-02-21 18:52 ` John W. Linville 2011-02-21 15:28 ` Johannes Berg 2011-02-21 16:12 ` Jim Gettys 2011-02-21 19:15 ` John W. Linville 2011-02-21 19:06 ` John W. Linville 2011-02-21 20:26 ` Tianji Li 2011-02-28 13:07 ` Johannes Berg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox