* [Bloat] some comments on draft-ietf-tsvwg-byte-pkt-congest-10.txt
@ 2013-06-17 19:40 Dave Taht
2013-06-18 3:40 ` Jonathan Morton
2013-06-23 23:46 ` Jonathan Morton
0 siblings, 2 replies; 3+ messages in thread
From: Dave Taht @ 2013-06-17 19:40 UTC (permalink / raw)
To: tsvwg, bloat, iccrg
This draft "updates" RFC2309 which has already been obsoleted by one
of it's original authors and a replacement draft (
http://tools.ietf.org/html/draft-baker-aqm-recommendation-00 ) is in
progress. Be that as it may...
This draft starts off on the wrong foot, and proceeds downhill
rapidly. I am glad that someone is trying to update BCP for their
current RED usage, but in general I think it is incorrect to
extrapolate from RED's behavior to other behaviors in many cases,
so...
" This document provides recommendations of best current practice for
dropping or marking packets using any active queue management (AQM)
algorithm, such as random early detection (RED), BLUE, pre-congestion
notification (PCN), etc. "
By excluding DRR, SFQ, SQF, Codel, FQ_Codel, PIE and others, and
attempting to generalize from experiences with RED to all AQM
technologies, it does its potential readers a disservice.
I'd change the first sentence to:
" This document provides recommendations of best current practice for
dropping or marking packets using the RED active queue management
(AQM) algorithm, using packet drop and congestion notification"
From that, it is possible to generalize the following, although the
backing argument is suspect, which I'll get into later...
" We give three strong recommendations: (1)
packet size should be taken into account when transports read and
respond to congestion indications, (2) packet size should not be
taken into account when network equipment creates congestion signals
(marking, dropping), and therefore (3) in the specific case of RED,
the byte-mode packet drop variant that drops fewer small packets
should not be used. "
"This memo updates RFC 2309 to deprecate
deliberate preferential treatment of small packets in AQM algorithms."
as RFC2309 itself is being obsoleted, we're going around in circles here.
Before tackling byte-pkt-congest-10 directly - a couple asides..
...snip snip...
RFC2309 refers to this 1994 paper:
"On the Self-Similar Nature of Ethernet Traffic (Extended Version)"
I would certainly like the model and analysis of this paper repeated
against modern traffic patterns. Has anyone done this? I loved this
paper when it came out....
last paragraph in section 2 of RFC2039 I heartily agree with
" In short, scheduling algorithms and queue management should be
seen as complementary, not as replacements for each other."
And if we can agree that AQM = Active queue *length* management and
can come up with a name for FQ+AQM hybrids that works for people (SQM
- smart queue management?) so we know what we're talking about when
talking about things, certain bits in section 3 get easier to deal
with.
Because of the overload on AQM I'm going to use SQM throughout what I
write below.
As one last general example of problems with rfc2039, some of the
references are to protocols so ancient and non-deployed as to render
the relevant arguments moot:
" voice and video, and also multicast bulk data transport [SRM96].
If no action is taken, such unresponsive flows could lead to a new
congestive collapse.
In general, all UDP-based streaming applications should
incorporate effective congestion avoidance mechanisms. For
example, recent research has shown the possibility of
incorporating congestion avoidance mechanisms such as Receiver-
driven Layered Multicast (RLM) within UDP-based streaming
applications such as packet video [McCanne96] [Bolot94]. Further
research and development on ways to accomplish congestion
avoidance for streaming applications will be very important."
It would be nice to have relevance to new stuff like webrtc in a new
draft of some sort rather than to non-deployed 20 year old protocols.
...ok back to the pkt-congest draft...
" Consensus has emerged over the years concerning the first stage: if
queues cannot be measured in time, whether they should be measured in
bytes or packets. Section 2.1 of this memo records this consensus in
the RFC Series. In summary the choice solely depends on whether the
resource is congested by bytes or packets."
Measuring queues in time is totally feasible as shown by pie and
codel, and implementations on weak hardware such as on mips. Last week
I got fq_codel running on a raspberri pi and a beaglebone to no
observable hit on cpu usage vs pfifo_fast... I'll do an arduino if
that's necessary to make the point, harder. Anybody got a 68020 or
slower to play with?
So an alternative formulation that makes sense is:
" When queues cannot be measured in time, should they be measured in
bytes or packets?
And dropping the rest of this:
" Section 2.1 of this memo records this consensus in
the RFC Series. In summary the choice solely depends on whether the
resource is congested by bytes or packets."
Moving on:
" This memo updates [RFC2309] to deprecate deliberate preferential
treatment of small packets in AQM algorithms. It recommends that (1)
packet size should be taken into account when transports read
congestion indications, (2) not when network equipment writes them.
This memo also adds to the congestion control principles enumerated
in BCP 41 [RFC2914]."
s/AQM/the RED/g across the draft
Still... to get to the meat of my own complaint with the draft and
philosophy expoused within, I'll pull out two example paragraphs and
try to make my argument....
"However, at the transport layer, TCP congestion control is a widely
deployed protocol that doesn't scale with packet size. To date this
hasn't been a significant problem because most TCP implementations
have been used with similar packet sizes. But, as we design new
congestion control mechanisms, this memo recommends that we should
build in scaling with packet size rather than assuming we should
follow TCP's example."
"Although many control packets happen to be
small, the alternative of network equipment favouring all small
packets would be dangerous. That would create perverse incentives to
split data transfers into smaller packets."
packets were (at least originally) "a small group or package of anything",
I view the incentives to create larger packets as *far, far, far, far
more perverse* than the incentives to create smaller ones. There will
always be header overhead, there will always be small signalling
packets, and there will always be bulk data that can *always* be
broken up into packets of smaller size, that gets in the way of more
interactive data if the larger packets aren't broken up into packets
of smaller size.
In my talks of late, I toss a 64GB memory stick across the room at
someone - wow! check out that bandwidth! 64GB/sec!! and then challenge
someone to try and read the last byte from the file on it going
sequentually)
(hint, most USB sticks barely do 8MB/sec and you'll be waiting a looong time)
If it were up to me I'd have held the internet's MTU to 574 bytes
until everybody was running at greater than 1Mbit, and the ratio
between the largest packet and smallest held to less than 10x1 for
eternity.
Big packets affect latency, badly. They are also subject to much
higher potential rates of error. At 1500 bytes, header overhead, even
with ipv6, is pretty minimal and few gains can be had by increasing
it. Jumbo frames are barely accepted in very few circumstances, etc.
So the argument and conclusions about larger packet sizes that
permeates this document is the opposite of the argument I'd make,
throughout.
So, as one example:
"6. Security Considerations
This memo recommends that queues do not bias drop probability towards
small packets as this creates a perverse incentive for transports to
break down their flows into tiny segments. One of the benefits of
implementing AQM was meant to be to remove this perverse incentive
that drop-tail queues gave to small packets."
I think the author has got the intended statement wrong ("away from"
rather than "towards"?)...
and my take on it is that the drop tail behavior towards small
packets was indeed very desirable, and should be retained in an SQM,
to keep latencies low on overload and create incentives for
right-sizing packets in general as per their actual transport needs.
"In
summary, it says that making drop probability depend on the size of
the packets that bits happen to be divided into simply encourages the
bits to be divided into smaller packets. "
YEA!
" Byte-mode drop would
therefore irreversibly complicate any attempt to fix the Internet's
incentive structures."
s/complicate/enhance/
The document refers to things like rfc5690 and I long ago lost hope for ECN.
The document then makes some blanket statements that aren't backed by
data (that I'm aware of)
Section 6
" In practice, transports cannot all be trusted to respond to
congestion. So another reason for recommending that queues do not
bias drop probability towards small packets is to avoid the
vulnerability to small packet DDoS attacks that would otherwise
result. One of the benefits of implementing AQM was meant to be to
remove drop-tail's DoS vulnerability to small packets, so we
shouldn't add it back again."
I am only aware of a few small packet attacks. (most UDP DNS attacks
try to do amplification actuallly) A strict DDOS based on small
packets can be made effective against drop tail and somewhat effective
against RED.
Most attacks I've looked at are actually MUCH less effective against a
SQM like fq_codel, given the random hash and equal service guarantees.
I would certainly like to take a hard look at the tools used to attack
and create Robust RED, so everything in section 6 seems highly
theoretical and needs proof.
Lastly:
section 5.2 needs to be thought about in the age of wireless, which is
often TXOP-congested rather than either of the two problems
identified, and is kind of pressing.
"5.2. Bit- & Packet-congestible Network
The position is much less clear-cut if the Internet becomes populated
by a more even mix of both packet-congestible and bit-congestible
resources (see Appendix B.2). This problem is not pressing, because
most Internet resources are designed to be bit-congestible before
packet processing starts to congest (see Section 1.1)."
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bloat] some comments on draft-ietf-tsvwg-byte-pkt-congest-10.txt
2013-06-17 19:40 [Bloat] some comments on draft-ietf-tsvwg-byte-pkt-congest-10.txt Dave Taht
@ 2013-06-18 3:40 ` Jonathan Morton
2013-06-23 23:46 ` Jonathan Morton
1 sibling, 0 replies; 3+ messages in thread
From: Jonathan Morton @ 2013-06-18 3:40 UTC (permalink / raw)
To: Dave Taht; +Cc: iccrg, tsvwg, bloat
On 17 Jun, 2013, at 10:40 pm, Dave Taht wrote:
> if we can agree that AQM = Active queue *length* management and
> can come up with a name for FQ+AQM hybrids that works for people (SQM
> - smart queue management?) so we know what we're talking about when
> talking about things
SQ = Smart Queueing sounds good to me, for AQM+FQ together. It should probably also be taken to include any third or future category of techniques that is found to work well in concert, as well.
For the fourth (original) corner of the graph, PQ could mean Passive Queueing, meaning neither AQM nor FQ - this would have to refer to ordinary priority queues and packet aggregation as well as a dumb FIFO. That means we need a robust definition of FQ.
So AQM means any technique which seeks to maintain the average length of the queue below the maximum, by proactively dropping or ECN-marking packets.
And FQ means any technique which uses a separate queue for each flow, or stochastically approaches that ideal. Broadly classifying packets into a small number of categories (as has been done since the TOS days) is not sufficient to be FQ.
An example of a third technique would be such broad packet classification, which allocates a separate AQM+FQ combination for each category and dequeues them according to a priority algorithm. We've discussed such things quite recently; the main problem seems to be identifying traffic reliably without application-level cooperation, which is a perennial problem.
> Anybody got a 68020 or slower to play with?
I have a Mac IIcx (16MHz 68030+FPU) with an Ethernet card - an '030 is pretty close to an '020 in performance per clock, if the '020 isn't using an MMU. It has enough RAM to run Linux effectively. It doesn't have DMA, though - a common limitation on single-user machines of the time - which is a severe limit on throughput. Also, it probably needs some major surgery due to age-related component degradation (capacitors), since it currently won't power up. Ironically, I ran into that problem just after fitting the new RAM, though it had been showing signs of it beforehand.
The next closest thing I've got is a 25MHz 486 (an IBM PS/1 in which I swapped the original i486SX for a DX) with two Ethernet cards (one is the well-regarded 3c509B) and an analogue modem fitted; a leftover from my university days a decade ago. I know that it can route T1 level speeds despite not having full-duplex Ethernet or PCI. I'd need to set up Linux afresh on it, of course, but then I could insert it in place of my usual firewall machine for testing. That is, of course, if it hasn't succumbed to the same class of fault as the IIcx.
The next candidate I have after that is an Acorn RiscPC, but I think a 30MHz ARM CPU is sufficiently faster than a 68020 - *any* 68020 - to dilute the point. Also, I don't have an Ethernet card for it, and even if I did, it also lacks DMA hardware AFAIK. However, you could downclock the Raspberry Pi's CPU quite a lot (edit config.txt in /boot) if you want to explore this space - as long as the GPU and RAM clocks remain above some threshold, everything should work. A quick look reveals definite reports of a 50MHz CPU clock working.
If you really want an '020 or '030 with DMA, the best candidates would probably be an Amiga, a NeXT Cube or an early Sun workstation. The latter have probably survived relatively well, and were quite likely used for at least a few of the early performance studies that people now habitually rely on.
- Jonathan Morton
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [Bloat] some comments on draft-ietf-tsvwg-byte-pkt-congest-10.txt
2013-06-17 19:40 [Bloat] some comments on draft-ietf-tsvwg-byte-pkt-congest-10.txt Dave Taht
2013-06-18 3:40 ` Jonathan Morton
@ 2013-06-23 23:46 ` Jonathan Morton
1 sibling, 0 replies; 3+ messages in thread
From: Jonathan Morton @ 2013-06-23 23:46 UTC (permalink / raw)
To: Dave Taht; +Cc: bloat Mainlinglist
On 17 Jun, 2013, at 10:40 pm, Dave Taht wrote:
> Anybody got a 68020 or slower to play with?
As a follow-up to this, I just found somewhere to plug in my 486, and it does indeed still work - even the RTC was only about a month off. Booting from a floppy (tomsrtbt) confirms that all of the hardware seems to be in working order, including the 3c509B - the other NIC is an NE2000 clone of some sort, somewhat newer than the 3com card.
However, it needs a fresh install of Linux, which might still take some time to arrange, considering that it won't boot directly from CD, nor will the BIOS directly recognise any hard disk over 528MB. Even tomsrtbt is too old to support the STROKE protocol for determining the real size of a modern hard disk - if 40GB can be considered "modern" - otherwise I could have used it to unpack a Gentoo stage. Back in the day, I used a smaller disk which could be supported via a BIOS extender in the boot sector.
I think I will need to squeeze a 3.x series kernel onto a floppy disk, and use another machine to build up the rootfs on the hard disk. Luckily, a 486 doesn't need many device drivers, but recent kernels seem to be a lot bigger in the core than older ones.
- Jonathan Morton
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-06-23 23:46 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-17 19:40 [Bloat] some comments on draft-ietf-tsvwg-byte-pkt-congest-10.txt Dave Taht
2013-06-18 3:40 ` Jonathan Morton
2013-06-23 23:46 ` Jonathan Morton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox