16-bit stack machine implemented on a Cypress CY37128 CPLD
- To: <MISC>
- Subject: 16-bit stack machine implemented on a Cypress CY37128 CPLD
- From: "Myron Plichota" <myron.plichota@xxxxxxxxxxxx>
- Date: Fri, 31 Dec 1999 11:01:53 -0500
I have developed a 16-bit zero-operand stack machine that I call Steamer16.
It fits on the Cypress CY37128 CPLD in an 84-pin PLCC package. Using the
125 MHz speed grade, wirewrapped operation at 20 MHz is predicted by the
simulator.
Unfortunately, a dual-stack Forth architecture doesn't fit in the 128
macrocells available. Consequently the design isn't a true Forth chip, but
it is a zero-operand stack machine nonetheless. In the future I would like
to fit a true Forth architecture to one of the CPLD or FPGA architectures
that include on-chip RAM blocks for the stacks.
Being fearfull of actually fitting the design to the target device, the
instruction set and architecture was minimized to a ridiculous extent, and
it indeed just barely fits. In the future, more elaborate implementations
may be implemented on larger devices not suitable for hobby projects due to
exotic packaging. For this reason, the documentation contains nerdy phrases
typical of growth-path specifications, but don't let that distract you from
understanding the Steamer16 initial implementation that exists today.
I plan to design a companion chip, also using the CY37128 to provide a
timer, parallel I/O, a funnel shifter, memory decoder/wait state logic, and
glue logic for a 16-bit 3-port multiplier/accumulator.
I think it might be bad netiqette to attach the 40Kbyte zip file I have
available because of the load on the MISC server. It contains the
assembler, JEDEC file, and side documentation. Interested parties should
e-mail me for a copy. Please withold any technical questions until having
read the documentation package.
BTW, I am well aware of the shortcomings of the Steamer16 implementation,
so please don't take me to task over it. My defense is:
1) it fits on a low-cost CPLD in a package hobbyists can deal with
2) companion chips can alleviate some of the shortcomings
3) at 20 MHz, it can clunk through inelegant code sequences quickly
Following is an excerpt from the assembler documentation (STASM.TXT), part
of the zipped package.
Happy New Millenium, MISCers!
Myron Plichota
************************************************************
Programming Model:
The Steamer architecture consists of a program counter (P) and a 3-deep
RPN evaluation stack (TOP, 2ND, 3RD). P is cleared on reset. The stack
registers are undefined until loaded under program control. There is no
program status word or carry flag. P addresses instruction groups, not
necessarily individual instructions. Steamer architecture mandates
operations on natural size words without forbidding other data types.
Steamer16 implements the Steamer architecture in 16 bits, with no
enhancements.
Stack diagrams:
Stack diagrams are used to describe instruction behavior by showing both
the inputs on the stack and the results in a concise notation. The input
list is on the left-hand side of the "--" before/after separator, the
results are on the right-hand side.
eg. ( 3RD 2ND TOP -- 3RD 2ND TOP)
The input list shows the proper order of input entry in left-to-right
order. The input list shows only the requisite stack entries.
The output list shows all three entries. The symbols x, y, and z, are
used
to denote the original values of any surviving independent stack entries.
Instruction Descriptions: opcodes are in hexadecimal order
NOP, {0} ( -- x y z) no operation
lit, {8} ( -- y z data) P++ read memory at P, increment P
@, {9} ( addr -- x y data) read memory at addr
!, {A} ( data addr -- x x x) write data to memory at addr
+, {B} ( n1 n2 -- x x n1+n2) add 2ND to TOP
AND, {C} ( n1 n2 -- x x n1&n2) and 2ND to TOP
OR, {D} ( n1 n2 -- x x n1|n2) or 2ND to TOP
XOR, {E} ( n1 n2 -- x x n1^n2) exclusive-or 2ND to TOP
zgo, {F} ( flg addr -- x x x) if flg equals 0 then jump to addr
else continue
Notes:
1) 3RD is sticky. When the stack shrinks it holds its value.
2) lit, is the only instruction that grows the stack, destroying
3RD.
3) The Steamer16 instruction set contains no additions to the
Steamer
required instruction set.
4) Opcodes {1..7} are implemented as no operation and are not part
of
the Steamer required instruction set.
Instruction Timing:
Steamer16 executes all instructions in 1 clock cycle. A quartet fetch
cycle is required when the current quartet has finished executing or a
jump is taken. For sequential execution, quartets are fetched and
executed
in 5 clocks.
Software delays are deterministic and may be counted from the fetch of
any quartet.
The adder for the +, instruction is implemented as a cascade of 8 2-bit
ripple-carry adder cells. Running on a 125 MHz part, the maximum clock
frequency is 20 MHz for unambiguous results.
Instruction timing is not mandated in the Steamer architecture.