From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <msokolov@ivan.Harhan.ORG>
Received: from ivan.Harhan.ORG (unknown [208.221.139.33])
	by huchra.bufferbloat.net (Postfix) with SMTP id DCEAA21F198
	for <bloat@lists.bufferbloat.NET>; Sat, 17 Nov 2012 16:53:26 -0800 (PST)
Received: by ivan.Harhan.ORG (5.61.1.6/1.36)
	id AA22967; Sun, 18 Nov 2012 00:53:17 GMT
Date: Sun, 18 Nov 2012 00:53:17 GMT
From: msokolov@ivan.Harhan.ORG (Michael Spacefalcon)
Message-Id: <1211180053.AA22967@ivan.Harhan.ORG>
To: bloat <bloat@lists.bufferbloat.NET>
Subject: Re: [Bloat] Designer of a new HW gadget wishes to avoid bufferbloat
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 18 Nov 2012 00:53:30 -0000

Albert Rafetseder <albert.rafetseder+bufferbloat@univie.ac.at> wrote:

> To save you the additional ring buffer, the starts-of-packets =
> information could be stored as a length or number of cells field in =
> front of the stream of cells of one packet.

It seems to me that the logic complexity of what you are suggesting
(at least when fitting it into the architecture of my implementation)
would be more than an additional ring buffer, and the total number of
RAM bits needed to store the necessary info (allowing for the worst
case of each ATM cell being its own packet) would be the same either
way.

In case someone here enjoys reading Verilog, you can see my current
work-in-progress in this public CVS repository:

$ cvs -d :pserver:anoncvs@ifctfvax.Harhan.ORG:/fs1/IFCTF-cvs co BlitzDSU

The HDLC_to_SDSL logic block is complete in that all logic is there to
produce an outgoing SDSL bit stream whose ultimate data source is the
incoming HDLC one, but there aren't any provisions for AQM yet, i.e.,
it is the simple first version in which the only queue control
mechanism is the "caudal" drop on the fill side of the ring buffer.
All logic is completely untested except that it passes through the
Quartus compiler to produce FPGA configuration bits.  I still need to
write the SDSL_to_HDLC logic, then hopefully move it from Cyclone II
to Cyclone III (i.e., fight through the process of getting a newer
version of Quartus running), then design and build the PCB for this
FPGA to go onto, and only then we'll be able to test any of this logic
for real...

Oh, and in the unlikely case that someone is actually bored enough to
look at my FPGA design in detail, one clarification is in order: the
design assumes that the SYSCLK frequency fed to the FPGA will be an
independent clock source that has no relation to the SDSL bit rate or
the clock arriving from the V.35/HDLC interface, i.e., SYSCLK will
always tick at its own steady rate no matter what happens in the data
path, which SDSL speed is in use, etc.  Furthermore, the logic assumes
that the SYSCLK frequency is significantly higher than the highest
supported data path speed - SDSL tops at 2.3 Mbps, and I'm thinking of
running SYSCLK at something like 40 MHz - plenty fast for my purposes,
and plenty slow for the Cyclone FPGAs.  This way SYSCLK becomes the
fast clock to which everything else can be synchronized inside the
FPGA.

> In a 46kibibit RAM,

Huh?  46 kibibit?  Where did that number come from?  The "46K" figure
must be from my mention of desiring to use a Cyclone III FPGA, but the
total RAM capacity in the latter parts is 46 kibibyte, not kibibit -
more specifically, 46 so-called M9K blocks, each of the latter holding
9 kibibit (including parity bits) or 1 kibibyte.

But that is the total RAM capacity of an FPGA part.  We won't be able
to devote all of it to the HDLC->SDSL cell ring buffer as there are
other needs: the HDLC->SDSL cell header buffer, a possible additional
buffer to allow the drain logic to skip packets, and several buffers
for the other logic block handling the SDSL->HDLC direction.  (The
latter can be set up to have no bottleneck, hence no bufferbloat
issues to discuss on this list, but it still needs a certain amount of
buffer RAM for the packet reassembly and debug capture features.)

Back to the actual usable capacity of the HDLC->SDSL cell ring buffer
we are discussing here, the version that is currently in CVS above and
compiles for Cyclone II has the tick-define set to size the buffer at
128 ATM cells of 48 octets each.  (The current design requires the
buffer capacity in cells to be a power of 2.)  This buffer, together
with a smaller companion that stores 32 bits of header info per cell,
for exactly the same number of cells, currently takes up 13 M4K blocks
of the Cyclone II fabric - that is 6.5 kibibyte.  One Ethernet MTU is
32 ATM cells, hence with a Cyclone II FPGA we can buffer up to 4
Ethernet MTUs.

But if I can move this design to a Cyclone III FPGA, I should be able
to bump the buffer size to 512 cells by changing one tick-define.  That
would take up 26 M9K blocks (CIII has M9K blocks, twice the size of
CII's M4K) out of the 46 available, leaving enough for the SDSL->HDLC
logic, and would hold 16 Ethernet MTUs worth of ATM cells.

> "Caudal" (near tail / tail-side) drop?

Thanks for the term - it now appears in my Verilog code in the name of
one of the states in the fill logic state machine. :-)

> I wonder if it suffices to =
> head-drop more than one packet's worth of cells if required to keep the =
> read and write pointers sufficiently far apart.

We'll have to revisit the whole AQM head drop logic design once we
have this stuff physically running on a board and get the rest of the
system working - not a small task...  I'm hoping that we'll be OK with
this late addition, as long as the FPGA part is sized with some logic
and RAM capacity to spare.

> With a buffer of less =
> than four Ethernet MTUs, what's the worst case really?

See above regarding how many Ethernet MTUs we can buffer up using
different FPGA part choices.

SF