From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x22b.google.com (mail-oi0-x22b.google.com [IPv6:2607:f8b0:4003:c06::22b]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 0F24521F565; Fri, 5 Dec 2014 12:15:47 -0800 (PST) Received: by mail-oi0-f43.google.com with SMTP id a3so983650oib.30 for ; Fri, 05 Dec 2014 12:15:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=WV3PkjOfvmgaFqowD38w8wk5z0gltCuINSCIaXZPvHA=; b=DSJMBBMg8+1MRAPdrvlHm6H62PY6kBqZXmJrw3n63/JHyAMEu7VtLKN9SnCl6JDhxZ 7bO9VqlF8nWFRzHrEHQDmiWNanG61Oc3LqHqwzLzNhWB5JPZ1ENBLgU3gETR+9GwKTKG N71UVSGG+Fnab/9fOZcPHWO2lc/dJTbmSfoa6G5iyXj3fxeTZNh7VxIO5ldF9o3A7O8N +5+keU0P+2ZqII10rO2JJuDtWjl+QtQnyT7oaJbnRo4+TjDi7lMGJb6XgtUKmWEBWVTH geh6fMO1hAka5kBmlW2u8+vI6oYDsYoyAD8UIDRMIIqBesfWTD7WcmAmIDR50WSBhf35 4ihA== MIME-Version: 1.0 X-Received: by 10.182.241.133 with SMTP id wi5mr11693823obc.10.1417810546819; Fri, 05 Dec 2014 12:15:46 -0800 (PST) Received: by 10.202.227.77 with HTTP; Fri, 5 Dec 2014 12:15:46 -0800 (PST) In-Reply-To: <14518.1417808038@turing-police.cc.vt.edu> References: <14518.1417808038@turing-police.cc.vt.edu> Date: Fri, 5 Dec 2014 12:15:46 -0800 Message-ID: From: Dave Taht To: Valdis Kletnieks Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: "cerowrt-devel@lists.bufferbloat.net" , bloat Subject: Re: [Bloat] [Cerowrt-devel] Fwd: Will Edwards to give Mill talk in Estonia on 12/10/2014 X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Dec 2014 20:16:16 -0000 On Fri, Dec 5, 2014 at 11:33 AM, wrote: > On Fri, 05 Dec 2014 11:18:57 -0800, Dave Taht said: >> The Mill is an extremely wide-issue VLIW design, able to issue 30+ >> MIMD operations per cycle. The Mill is inherently a vector machine >> and can vectorize and pipeline almost all loops in general purpose >> code. > > The big question is whether we know more about writing compilers for VLIW > machines than we did when the Itanium came out. That was hard enough to > get just 3 instructions packed per word (of course, the fact that it wasn= 't > 3 generic instructions, but 2 of one flavor and 1 of another, didn't help= ). Well, in this case half the instructions are one flavor the other half anot= her. But it's the belt concept in the "mill" that is key. Basically, having tons and tons of fixed addressible registers doesn't work well (as in the itanium, sparc, and other arches) for a variety of reasons... Taking a classic smaller register set, such as in the x86_64, and trying add all these superscalar and out of order features to it has hit a brick wall ... and the best we see in arms and mips ( with way more registers) is typically two out of order ops, total. stack machines overly serialize operations and tend to bottleneck on local cache (see the transputer T800 for the last decent example) Aside from a bunch of genuinely weirder architectures (see for example the propeller, or dave may's xcore stuff, or parallella) the mill's "belt" idea - temporal register addressing - is the first new id= ea I've seen in cpu design for a very, very long time. (perhaps it was tried in some other architecture?) Even if the mill can't get to 32 ops/cycle generally (and some of those ops are overhead in maintaining the belt, but not as much as you might think), I do think it can get to quite a few, even in branchy code, and the lower end versions of the arch are comparable in ops/cycle to the best we can do today with computers running at much faster basic clock rates. and context switch/subroutine call overhead! 4 cycles. Wow. :) I certainly have quibbles with the presos I've read so far, edge cases like floating point ops, and other seemingly nice-to-have but not critical to the core architecture feature(s)... but I long for a FPGA version, at least, to play with. I've spent a lot of time trying to come up with a microarchitecture that could do fq_codel at 10GigE+ speeds (prototyping in the parallella's FPGA), and kept dreaming of something like the "propeller" at a really high clock rate... ... then I stumbled over this. Sure, it's years out, but, like wow. Well worth an initial hour to read/think/watch about. --=20 Dave T=C3=A4ht thttp://www.bufferbloat.net/projects/bloat/wiki/Upcoming_Talks