[Bloat] one benefit of turning off shaping + fq_codel

Tue Nov 27 17:33:45 EST 2018

>> I wish I knew of a mailing list where I could get a definitive answer
>> on "modern problems with async circuits", or an update on the kind of
>> techniques the new AI chips were using to keep their power consumption
>> so low. I'll keep googling.
> 
> I’d be interested in knowing this as well. This gives some examples of async circuits: https://web.stanford.edu/class/archive/ee/ee371/ee371.1066/lectures/lect_12.pdf
> 
> Page 43, “Bottom Line” mentions that asynchronous design has “some delay matching / overhead issues”. Apparently delay matching means getting the signal outputs on two separate paths to arrive at the same time(?) Presumably overhead refers to the 2x space on the die previously mentioned, for completion detection. Pages 23-25 on “data-bundling constraints” might also highlight some other challenges. Some more current material would be interesting though...

The area overhead is at least partly mitigated by the major advantage of not having to distribute and gate a coherent clock signal across the entire chip.  I half-remember seeing a quote that distributing the clock represents about 30% of the area and/or power consumption of a modern deep-sub-micron design.  This is area and power that is not directly contributing to functionality.

Generally there are two major styles of asynchronous logic:

1: Standard combinatorial logic stages accompanied by self-timing circuits with a matched delay, generally known as "bundled data".  This style has little overhead (probably less than the clock distribution it replaces) but requires local timing closure (the timing circuit must have strictly *more* delay than the logic it accompanies) to assure correct functionality.  I suspect that achieving local timing closure is easier than the global timing closure required by conventional synchronous logic.

2: Dual-rail QDI logic, in which completion is explicitly signalled by the arrival of a result.  This almost completely eliminates timing closure from the logic correctness equation, but the area overhead can be substantial.  Achieving maximum performance in this style can also be challenging, but suitable approaches do exist, eg:

	https://brej.org/papers/mapld.pdf

Both styles can inherently adapt timings to thermal and voltage conditions within a design range without much explicit provisioning, and typically have much cleaner power load and EMI characteristics than synchronous logic.  But as you can see from the above, the downsides typically associated with async logic tend to apply to one or the other of the styles, not to both at once.

 - Jonathan Morton