From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dpreed@reed.com>
Received: from smtp81.iad3a.emailsrvr.com (smtp81.iad3a.emailsrvr.com
	[173.203.187.81])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by huchra.bufferbloat.net (Postfix) with ESMTPS id C53C421F7B0
	for <cerowrt-devel@lists.bufferbloat.net>;
	Mon,  3 Aug 2015 16:37:48 -0700 (PDT)
Received: from smtp3.relay.iad3a.emailsrvr.com (localhost.localdomain
	[127.0.0.1])
	by smtp3.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
	69A3130043F; Mon,  3 Aug 2015 19:37:47 -0400 (EDT)
Received: from app3.wa-webapps.iad3a (relay-webapps.rsapps.net
	[172.27.255.140])
	by smtp3.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
	4B7B13002B4; Mon,  3 Aug 2015 19:37:47 -0400 (EDT)
X-Sender-Id: dpreed@reed.com
Received: from app3.wa-webapps.iad3a (relay-webapps.rsapps.net
	[172.27.255.140]) by 0.0.0.0:25 (trex/5.4.2);
	Mon, 03 Aug 2015 23:37:47 GMT
Received: from reed.com (localhost.localdomain [127.0.0.1])
	by app3.wa-webapps.iad3a (Postfix) with ESMTP id 380DD28008E;
	Mon,  3 Aug 2015 19:37:47 -0400 (EDT)
Received: by apps.rackspace.com
	(Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) 
	with HTTP; Mon, 3 Aug 2015 19:37:47 -0400 (EDT)
Date: Mon, 3 Aug 2015 19:37:47 -0400 (EDT)
From: dpreed@reed.com
To: "David Lang" <david@lang.hm>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_20150803193747000000_53689"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To: <alpine.DEB.2.02.1508030902560.11825@nftneq.ynat.uz>
References: <E9C29602-7F1D-43AD-980C-050B58FA0AC6@iii.ca>  
	<CAA93jw63gm1h9R4x_LZ2Po-Xqk8H1kYYmBtU_pb1kzVHOzAQbQ@mail.gmail.com> 
	<356F5FEE-9FBD-4FF9-AC17-86A642D918A4@gmail.com>  
	<5CC1DC90-DFAF-4A4D-8204-16CD4E20D6E3@gmx.de>  
	<CAJq5cE0ShGs43O=BBNH5WWHPxaJq2TJsFGs06VTpgdUmB4x06w@mail.gmail.com> 
	<4D24A497-5784-493D-B409-F704804326A7@gmx.de>  
	<1438361254.45977158@apps.rackspace.com>  
	<CAJq5cE1CmHjhkmG-_c3yLEX-uViTMk7OVf9trS+gEB6Djpw4gQ@mail.gmail.com> 
	<1438616670.710822730@apps.rackspace.com> 
	<alpine.DEB.2.02.1508030902560.11825@nftneq.ynat.uz>
X-Auth-ID: dpreed@reed.com
Message-ID: <1438645067.22797794@apps.rackspace.com>
X-Mailer: webmail/11.5.5-RC
Cc: Jonathan Morton <chromatix99@gmail.com>,
	cerowrt-devel@lists.bufferbloat.net, make-wifi-fast@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] [Make-wifi-fast] [tsvwg] Comments on
	draft-szigeti-tsvwg-ieee-802-11e
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 03 Aug 2015 23:38:18 -0000

------=_20150803193747000000_53689
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

=0AI design and build physical layer radio hardware (using SDR reception an=
d transmission in the 5 GHz and 10 GHz Amateur radio bands).=0A =0AFairness=
 is easy in a MAC. 1 usec. is 1,000 linear feet.  If the next station knows=
 when its turn is, it can start transmitting within a couple of microsecond=
s of seeing the tail of the last packet, and if there is adequate "sounding=
" of the physical environment, you can calculate tighter bounds than that. =
 Even if the transmission is 1 Gb/sec, 1 usec. is only 1,000 bits at most.=
=0A =0ABut at the end-to-end layer, today's networks are only at most 20 ms=
ec. end-to-end across a continent.  The most buffering you want to see on a=
n end-to-end basis is 10 msec.=0A =0AI disagree strongly that "mice" - smal=
l packets - need to be compressed.  Most small packets are very, very laten=
cy sensitive (acks, etc.). As long as they are a relatively small portion o=
f capacity, they aren't the place to trade latency degradation for throughp=
ut.  That's another example of focusing on the link rather than the end-to-=
end network context.=0A(local link acks can be piggy-backed in various ways=
, so when the local Wireless Ethernet domain is congested, there should be =
no naked ack packets)=0A =0AWhy does anyone measure a link in terms of a me=
asurement of small-packet efficiency?  The end-to-end protocols shouldn't b=
e sending small packets once a queue builds up at the source endpoint.=0A =
=0A=0A=0AOn Monday, August 3, 2015 12:14pm, "David Lang" <david@lang.hm> sa=
id:=0A=0A=0A=0A> On Mon, 3 Aug 2015, dpreed@reed.com wrote:=0A> =0A> > It's=
 not infeasible to make queues shorter. In any case, the throughput of a=0A=
> > link does not increase above the point where there is always one packet=
 ready=0A> > to go by the time the currently outgoing packet is completed. =
It physically=0A> > cannot do better than that.=0A> =0A> change 'one packet=
' to 'one transmissions worth of packets' and I'll agree=0A> =0A> > If hard=
ware designers can't create an interface that achieves that bound I'd=0A> >=
 be suspicious that they understand how to design hardware. In the case of=
=0A> > WiFi, this also includes the MAC protocol being designed so that whe=
n the=0A> > current packet on the air terminates, the next packet can be im=
mediately=0A> begun=0A> > - that's a little more subtle.=0A> =0A> on a shar=
ed medium (like radio) things are a bit messier.=0A> =0A> There are two iss=
ues=0A> =0A> 1. You shouldn't just transmit to a new station once you finis=
h sending to the=0A> first. Fairness requires that you pause and give other=
 stations a chance to=0A> transmit as well.=0A> =0A> 1. There is per-transm=
ission overhead (including the pause mentioned above) that=0A> can be very =
significant for small packets, so there is considerable value in=0A> sendin=
g multiple packets at once. It's a lighter version of what you run into=0A>=
 inserting into reliable databases. You can insert 1000 records in about th=
e same=0A> time you can insert 2 records sperately.=0A> =0A> The "stock" an=
swer to this is for hardware and software folks to hold off on=0A> sending =
anything in case there is more to send later that it can be batched=0A> wit=
h. This maximizes throughput at the cost of latency.=0A> =0A> What should b=
e done instead is to send what you have immediatly, and while it's=0A> send=
ing, queue whatever continues to arrive and the next chance you have to=0A>=
 send, you will have more to send. This scales the batch size with congesti=
on,=0A> minimizing latency at the cost of keeping the channel continually b=
usy, but=0A> inefficiently busy if you aren't at capacity.=0A> =0A> > But m=
y point here is that one needs to look at the efficiency of the system=0A> =
as=0A> > a whole (in context), and paradoxically to the hardware designer m=
indset, the=0A> > proper way to think about that efficiency is NOT about li=
nk throughput=0A> > maximization - instead it is an end-to-end property. On=
e has very little to=0A> > do with the other. Queueing doesn't affect link =
throughput beyond the=0A> "double=0A> > buffering" effect noted above: at m=
ost one packet queued behind the currently=0A> > transmitting packet.=0A> >=
=0A> > Regarding TXOP overhead - rather than complicated queueing, just all=
ow=0A> packets=0A> > to be placed in line *while the currently transmitting=
 packet is going out*,=0A> > and changed up to the point in time when they =
begin transmitting. This is=0A> > trivial in hardware.=0A> =0A> This is a k=
ey thing that a lot of hardware and software folks get wrong. All the=0A> c=
omplexity and bugs that you see around 'blocking/plugging' flows are the re=
sult=0A> of this mistake. As you say, send something as soon as you have it=
 to send. If=0A> there's more arriving, let it accumulate while the first b=
it is being sent and=0A> the next chance you get to send, send everything t=
hat's accumulated. This=0A> minimizes latency, greatly simplifies the code =
(no need for timers to=0A> unblock/release the data if more doesn't arrive)=
, and results in the exact same=0A> throughput under load.=0A> =0A> It does=
 have some interesting changes to the utilization curve at part load.=0A> T=
hese could be a problem with wifi under some conditions, but I think the=0A=
> trade-off is worth it since the wifi is going to end up running up to it'=
s limit=0A> sometime anyway, and the part load problems are just previews o=
f what you would=0A> run into at full load.=0A> =0A> David Lang=0A> =0A> >=
=0A> > On Friday, July 31, 2015 1:04pm, "Jonathan Morton"=0A> <chromatix99@=
gmail.com> said:=0A> >=0A> >=0A> >=0A> >> I think that is achievable, *even=
 if there is a WiFi network in the=0A> middle*, by thinking about the fact =
that the shared airwaves in a WiFi network=0A> behaves like a single link, =
so all the queues on individual stations are really=0A> *one queue*, and th=
at the optimal behavior of that link will be achieved if there=0A> is at mo=
st one packet queued at a time.=0A> > I agree that queues should be kept sh=
ort in general. However I don't think=0A> single packet queues are achievab=
le in the general case.=0A> > The general case includes Wi-Fi networks, who=
se TXOP overhead is so ruinously=0A> heavy that sending single MTU sized pa=
ckets is inefficient. Aggregating multiple=0A> packets into one TXOP requir=
es those several packets to be present in the buffer=0A> at that moment.=0A=
> > The general case includes links which vary in throughput frequently, pe=
rhaps=0A> on shorter timescales than an RTT, so either packets must be buff=
ered or capacity=0A> is left unused. This also happens to include Wi-Fi, bu=
t could easily include a=0A> standard wired link whose competing load varie=
s.=0A> > The endpoints do not have and do not receive sufficient informatio=
n in=0A> sufficient time to reliably make packets arrive at nodes just in t=
ime to be=0A> transmitted. Not even with ECN, not even with the wet dreams =
of the DCTCP folks,=0A> and not even with ELR (though ELR should be able to=
 make it happen under steady=0A> conditions, there are still transient cond=
itions in the general case).=0A> > - Jonathan Morton_______________________=
________________________=0A> Cerowrt-devel mailing list=0A> Cerowrt-devel@l=
ists.bufferbloat.net=0A> https://lists.bufferbloat.net/listinfo/cerowrt-dev=
el=0A> 
------=_20150803193747000000_53689
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<font face=3D"times new roman" size=3D"2"><p style=3D"margin:0;padding:0;fo=
nt-family: 'times new roman'; font-size: 10pt; word-wrap: break-word;">I de=
sign and build physical layer radio hardware (using SDR reception and trans=
mission in the 5 GHz and 10 GHz Amateur radio bands).</p>=0A<p style=3D"mar=
gin:0;padding:0;font-family: 'times new roman'; font-size: 10pt; word-wrap:=
 break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: 'tim=
es new roman'; font-size: 10pt; word-wrap: break-word;">Fairness is easy in=
 a MAC. 1 usec. is 1,000 linear feet. &nbsp;If the next station knows when =
its turn is, it can start transmitting within a couple of microseconds of s=
eeing the tail of the last packet, and if there is adequate "sounding" of t=
he physical environment, you can calculate tighter bounds than that. &nbsp;=
Even if the transmission is 1 Gb/sec, 1 usec. is only 1,000 bits at most.</=
p>=0A<p style=3D"margin:0;padding:0;font-family: 'times new roman'; font-si=
ze: 10pt; word-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding=
:0;font-family: 'times new roman'; font-size: 10pt; word-wrap: break-word;"=
>But at the end-to-end layer, today's networks are only at most&nbsp;20 mse=
c. end-to-end across a continent. &nbsp;The most buffering you want to see =
on an end-to-end basis is 10 msec.</p>=0A<p style=3D"margin:0;padding:0;fon=
t-family: 'times new roman'; font-size: 10pt; word-wrap: break-word;">&nbsp=
;</p>=0A<p style=3D"margin:0;padding:0;font-family: 'times new roman'; font=
-size: 10pt; word-wrap: break-word;">I disagree strongly that "mice" - smal=
l packets - need to be compressed. &nbsp;Most small packets are very, very =
latency sensitive (acks, etc.). As long as they are a relatively small port=
ion of capacity, they aren't the place to trade latency degradation for thr=
oughput. &nbsp;That's another example of focusing on the link rather than t=
he end-to-end network context.</p>=0A<p style=3D"margin:0;padding:0;font-fa=
mily: 'times new roman'; font-size: 10pt; word-wrap: break-word;">(local li=
nk acks can be piggy-backed in various ways, so when the local Wireless Eth=
ernet domain is congested, there should be no naked ack packets)</p>=0A<p s=
tyle=3D"margin:0;padding:0;font-family: 'times new roman'; font-size: 10pt;=
 word-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-f=
amily: 'times new roman'; font-size: 10pt; word-wrap: break-word;">Why does=
 anyone measure a link in terms of a measurement of small-packet efficiency=
? &nbsp;The end-to-end protocols shouldn't be sending small packets once a =
queue builds up at the source endpoint.</p>=0A<p style=3D"margin:0;padding:=
0;font-family: 'times new roman'; font-size: 10pt; word-wrap: break-word;">=
&nbsp;</p>=0A<!--WM_COMPOSE_SIGNATURE_START--><!--WM_COMPOSE_SIGNATURE_END-=
->=0A<p style=3D"margin:0;padding:0;font-family: 'times new roman'; font-si=
ze: 10pt; word-wrap: break-word;"><br /><br />On Monday, August 3, 2015 12:=
14pm, "David Lang" &lt;david@lang.hm&gt; said:<br /><br /></p>=0A<div id=3D=
"SafeStyles1438644305">=0A<p style=3D"margin:0;padding:0;font-family: 'time=
s new roman'; font-size: 10pt; word-wrap: break-word;">&gt; On Mon, 3 Aug 2=
015, dpreed@reed.com wrote:<br />&gt; <br />&gt; &gt; It's not infeasible t=
o make queues shorter. In any case, the throughput of a<br />&gt; &gt; link=
 does not increase above the point where there is always one packet ready<b=
r />&gt; &gt; to go by the time the currently outgoing packet is completed.=
 It physically<br />&gt; &gt; cannot do better than that.<br />&gt; <br />&=
gt; change 'one packet' to 'one transmissions worth of packets' and I'll ag=
ree<br />&gt; <br />&gt; &gt; If hardware designers can't create an interfa=
ce that achieves that bound I'd<br />&gt; &gt; be suspicious that they unde=
rstand how to design hardware. In the case of<br />&gt; &gt; WiFi, this als=
o includes the MAC protocol being designed so that when the<br />&gt; &gt; =
current packet on the air terminates, the next packet can be immediately<br=
 />&gt; begun<br />&gt; &gt; - that's a little more subtle.<br />&gt; <br /=
>&gt; on a shared medium (like radio) things are a bit messier.<br />&gt; <=
br />&gt; There are two issues<br />&gt; <br />&gt; 1. You shouldn't just t=
ransmit to a new station once you finish sending to the<br />&gt; first. Fa=
irness requires that you pause and give other stations a chance to<br />&gt=
; transmit as well.<br />&gt; <br />&gt; 1. There is per-transmission overh=
ead (including the pause mentioned above) that<br />&gt; can be very signif=
icant for small packets, so there is considerable value in<br />&gt; sendin=
g multiple packets at once. It's a lighter version of what you run into<br =
/>&gt; inserting into reliable databases. You can insert 1000 records in ab=
out the same<br />&gt; time you can insert 2 records sperately.<br />&gt; <=
br />&gt; The "stock" answer to this is for hardware and software folks to =
hold off on<br />&gt; sending anything in case there is more to send later =
that it can be batched<br />&gt; with. This maximizes throughput at the cos=
t of latency.<br />&gt; <br />&gt; What should be done instead is to send w=
hat you have immediatly, and while it's<br />&gt; sending, queue whatever c=
ontinues to arrive and the next chance you have to<br />&gt; send, you will=
 have more to send. This scales the batch size with congestion,<br />&gt; m=
inimizing latency at the cost of keeping the channel continually busy, but<=
br />&gt; inefficiently busy if you aren't at capacity.<br />&gt; <br />&gt=
; &gt; But my point here is that one needs to look at the efficiency of the=
 system<br />&gt; as<br />&gt; &gt; a whole (in context), and paradoxically=
 to the hardware designer mindset, the<br />&gt; &gt; proper way to think a=
bout that efficiency is NOT about link throughput<br />&gt; &gt; maximizati=
on - instead it is an end-to-end property. One has very little to<br />&gt;=
 &gt; do with the other. Queueing doesn't affect link throughput beyond the=
<br />&gt; "double<br />&gt; &gt; buffering" effect noted above: at most on=
e packet queued behind the currently<br />&gt; &gt; transmitting packet.<br=
 />&gt; &gt;<br />&gt; &gt; Regarding TXOP overhead - rather than complicat=
ed queueing, just allow<br />&gt; packets<br />&gt; &gt; to be placed in li=
ne *while the currently transmitting packet is going out*,<br />&gt; &gt; a=
nd changed up to the point in time when they begin transmitting. This is<br=
 />&gt; &gt; trivial in hardware.<br />&gt; <br />&gt; This is a key thing =
that a lot of hardware and software folks get wrong. All the<br />&gt; comp=
lexity and bugs that you see around 'blocking/plugging' flows are the resul=
t<br />&gt; of this mistake. As you say, send something as soon as you have=
 it to send. If<br />&gt; there's more arriving, let it accumulate while th=
e first bit is being sent and<br />&gt; the next chance you get to send, se=
nd everything that's accumulated. This<br />&gt; minimizes latency, greatly=
 simplifies the code (no need for timers to<br />&gt; unblock/release the d=
ata if more doesn't arrive), and results in the exact same<br />&gt; throug=
hput under load.<br />&gt; <br />&gt; It does have some interesting changes=
 to the utilization curve at part load.<br />&gt; These could be a problem =
with wifi under some conditions, but I think the<br />&gt; trade-off is wor=
th it since the wifi is going to end up running up to it's limit<br />&gt; =
sometime anyway, and the part load problems are just previews of what you w=
ould<br />&gt; run into at full load.<br />&gt; <br />&gt; David Lang<br />=
&gt; <br />&gt; &gt;<br />&gt; &gt; On Friday, July 31, 2015 1:04pm, "Jonat=
han Morton"<br />&gt; &lt;chromatix99@gmail.com&gt; said:<br />&gt; &gt;<br=
 />&gt; &gt;<br />&gt; &gt;<br />&gt; &gt;&gt; I think that is achievable, =
*even if there is a WiFi network in the<br />&gt; middle*, by thinking abou=
t the fact that the shared airwaves in a WiFi network<br />&gt; behaves lik=
e a single link, so all the queues on individual stations are really<br />&=
gt; *one queue*, and that the optimal behavior of that link will be achieve=
d if there<br />&gt; is at most one packet queued at a time.<br />&gt; &gt;=
 I agree that queues should be kept short in general. However I don't think=
<br />&gt; single packet queues are achievable in the general case.<br />&g=
t; &gt; The general case includes Wi-Fi networks, whose TXOP overhead is so=
 ruinously<br />&gt; heavy that sending single MTU sized packets is ineffic=
ient. Aggregating multiple<br />&gt; packets into one TXOP requires those s=
everal packets to be present in the buffer<br />&gt; at that moment.<br />&=
gt; &gt; The general case includes links which vary in throughput frequentl=
y, perhaps<br />&gt; on shorter timescales than an RTT, so either packets m=
ust be buffered or capacity<br />&gt; is left unused. This also happens to =
include Wi-Fi, but could easily include a<br />&gt; standard wired link who=
se competing load varies.<br />&gt; &gt; The endpoints do not have and do n=
ot receive sufficient information in<br />&gt; sufficient time to reliably =
make packets arrive at nodes just in time to be<br />&gt; transmitted. Not =
even with ECN, not even with the wet dreams of the DCTCP folks,<br />&gt; a=
nd not even with ELR (though ELR should be able to make it happen under ste=
ady<br />&gt; conditions, there are still transient conditions in the gener=
al case).<br />&gt; &gt; - Jonathan Morton_________________________________=
______________<br />&gt; Cerowrt-devel mailing list<br />&gt; Cerowrt-devel=
@lists.bufferbloat.net<br />&gt; https://lists.bufferbloat.net/listinfo/cer=
owrt-devel<br />&gt; </p>=0A</div></font>
------=_20150803193747000000_53689--