No Subject
- To: MISC
- From: jfox@xxxxxxxxxx (Jeff Fox)
- Date: Wed, 10 May 1995 19:29:15 -0700
Dear Raul, (and misc readers)
>Apparently C has an address but none of the other registers do?
The coprocessor clock and control registers have addresses as
does the memory processor and cpu configuration register.
Presumably if the coprocessors are not being used certain bits
could be read and written in those registers as a scratch pad.
But you would have to be careful or you would turn on a
coprocessor.
>Jeff Fox:(Chuck:)
> The configuration register stores patterns; the package pins
> display patterns.
>
>This seems to imply to me that the other coprocessors (analog,video,
>serial) use numbers rather than patterns?
I think the documentation in the preliminary specs says these
are the patters for instructions/data on the coprocessors. I don't
know I will check it. The CPU is the only one with numbers
as its native point of view. I can say from coding that the
video processor and the cpu see addresses as numbers. That
is a linear sequence of addresses to the cpu is the same linear
sequence of addresses to the video processor. I would expect
this is also true on the other coprocessors. As for their
instruction bits, and their data formats, they can be documented
as either numbers or patterns. I preferred the "always use
number/address" view, I will have to see after I have written
more coprocessor support code.
If you view from a PC doing cross development you like the
pattern view. A number on the PC is also a pattern on the x21,
but that means that numbers and addresses must by xored with
AAAAA in the cross assembler/compiler. If you are running on x21
you may prefer the number view, since everything except patterns
in the outside world look like numbers. Perhaps as I write more
code in assembler I will use # for a number and p for a pattern
as Chuck does in his assembler.
Documenting the idea is where you get into trouble, like particles
and waves, these things are not numbers or patterns, it is a point
of view. If you look from the other side you see everything reversed.
Jeff Fox: (Chuck:)
INSTRUCTIONS
...
The 27 instruction codes are:
00 else unconditional jump 08 @R+ fetch, address in R, increment R
01 T=0 jump if T0-19 zero 09 @A+ fetch, address in A, increment A
...
>Are instruction bits pattern bits or numeric bits? I presume that for
>the case of addressing they're [inverted] numeric bits.
I think they are numbers, but Chuck may have documented them
as patterns, you never know, read it carefuly and see what
you think. I expect Chuck has documented them as patterns, he
sees things this way. He doesn't think from the number or
address perspective unless he is thinking about the alu.
One of the changes I made to his assembler to port it from
F-PC cross-assembler to P21Forth assembler was to XOR all
of the opcodes with AAAAA. This is because Chuck prefers the
pattern view, and P21Forth and I prefer the number view.
This is pretty much Chuck's original documentation, with just
a few corrections and additions I put in.
Jeff Fox:
macros
A! @A @
A! !A !
dup dup -or com -1
dup dup -or 0
over com and -or OR
A! push A@ pop SWAP
# (com) push ; long_jump
>Chuck:
> When an opcode appears in slots 1-3, it must be complemented. An
> add instruction must be coded nop + if carry needs to propagate
> only 10 places, or nop nop + for a full 20 places. This is not
> required in slot 0 if the instruction fetch provides equivalent
> delay.
>I think it would be really handy to have a cnop macro -- this would
>insert a nop only for slot2..slot4 instructions. one cnop would
>typically provide for 10 bits of carry, two cnops would provide for 20
>bits of carry. Finally, 3 cnops would guarantee the next
>instruction occurs in slot 1.
macros will be easy to add either to a macro assembler or to an
optimizing native code compiler. The problem I see with calculating
the timing is more complex on F21 because of all the bits in
the configuration register to control the cpu and memory access
speeds. With bits to make sram and dram and prom run slower or
faster the time between instruction words is even more variable
than on P21. Dram code will need fewer nops than 12ns sram code.
But even DRAM timing will cover a much wider range, so you
sort of have to know how fast the memories being used are going
to be to be close to the limit, or you can just be conservative
and write for worst case (12ns sram) or tune for a specific
setting of the timing bits in the configuration register.
Chuck added smart nops for A! on the P21, and I tried smart nops
in different macro assemblers. The optimizing compiler has to
do similar things.
Jeff Fox: (Chuck:)
INTERRUPT
An interrupt can occur when an instruction word is fetched. The
requested instruction is replaced by a 15-bit call to 00000 in home page.
The current address will be pushed onto R when the call is executed.
At least 3 stack positions must be available (reserved) for interrupts,
2 on data and 1 on return. Register A must be saved and restored.
The cause of the interrupt is in C2-0. C must be read to determine it.
The interrupt is cleared when C is rewritten, which may only occur once.
The source of the interrupt is cleared by the corresponding address bit
in the address of C. It is intended that this code be executed at the
end of interrupt processing (say for C0):
A 015554 # com ( pattern 1 1110 0000 0000 0000 0--1) A!
@A !A A! ;
The address bits A2-0 specify the interrupt(s) to be cleared. @A !A A! ;
must all be in the same word. Another interrupt may occur immediately.
>Raul:
>This isn't completely clear -- what are the stack positions needed
>for? I presume this termination restores A? Or is A trashed? What's
>going on here?
>
>Also, how are interrupt causes encoded? I see more than three
>potential interrupt sources -- perhaps some causes have priority over
>other causes? I suppose that a robust interrupt service routine would
>need to deal with all potential interrupt causes?
The code given uses several positions on the stacks. A return address,
A is saved, a pattern is put on the stack and then into A etc. You
could not really write an interrupt routine that didnt use stack
space. SO if you have already filled all stack positions and an
interrupt happens, you are screwed. Thats all. Just documenting that
interrupts do use the stacks.
There are three sources of interrupts at the present time, three bits.
Three coprocessors that can generatre interrupts are documented in the
configuration register. I added a comment somewhere that an external
interrupt on the 8 bit i/o port would be planned for the production
chip.
>Raul:
>I presume that for 15ns or faster RAM you need to do nop nop + to get
>complete carry propagation even if the stack was last modified by a
>slot 4 instruction?
Yes, exactly, if you are not adding small numbers or ORING two
parts of a whole. You can for instance use + to OR two numbers
if there is no carry involved. FFC00 003FF + needs no nops
Jeff Fox: (Chuck:)
MEMORY
CONFIGURATION
Typical: 0-5 DRAMs, 0-3 SRAMs, 1 ROM
5 1Mx4 Page-mode DRAMs: Toshiba TC514400APL-80
or 1 1Mx16 and 1 1Mx4 :
and/or 3 8Kx8 SRAMs :
1 8-bit PCMCIA card :
or 1 8-bit ROM :
>Er.. does this mean the F21 is viable with only ROM and no DRAM or
>SRAM? If so, what's performance like? [I didn't see any mention of
>an on-chip SRAM page.]
There is on on-chip sram not until Chuck learns how to do it. F21
will have 8k (or perhaps 32k) words of high speed SRAM. Thus
and/or 3 8kx8 12-15-25ns sram. These are really cheap parts these
days ($3) Enough for some useful parallel apps, but not enough
memory for generating video.
To generate video you need 5 * 256kx4 DRAM. These are $4 or $5 each.
P21 or F21 will run on an 8 bit ram/rom/prom/pcmcia only. The P21
is limited to 250ns access, F21 can do 150 or 250. In fact if you
had a fast 8 bit part you could also access it via the high
speed 8 bit memory address space. This means 40ns access on P21
and 15ns access on F21. So 4Mips or less w/ P21 in slow memory
and up to 25 mips w/ fast 8 bit parts. With F21 max of 6mips
from 150ns rom, and 60mips from a single 8 bit ram.
But the 8 bit instruction set is limited. because the 5 bit
instruction uses 5 bits in the 8 bit memory representation of
jump instructions this means that jump call t=0 c=0 can only
have 8 addresses on each page.
Jeff Fox:
ADDRESS MAP 1995 DATA
address = pattern
000000 - 0FFFFF DRAM 20 bit 1 M words (or 256K words)
180000 - 1BFFFF slow 8 bit SRAM 1 M bytes as 4 256k pages
1C0000 - 1FFFFF fast 8 bit SRAM 1 M bytes as 4 256k pages
100000 - 101FFF slow 20 bit SRAM 8 K words
140000 - 141FFF fast 20 bit SRAM 8 K words
16BAAA 1C1000 I/O port register
168AAA 1C2000 analog clock register
16EAAA 1C4000 network clock register
162AAA 1C8000 network match register
14AAAA 1E0000 configuration register
Any docs on the network clock and network match registers? Presumably
the network clock is functionally similar to the analog clock
register? Presumably the network match register has something to do
with different serial formats (e.g. synchronous vs asynchronous
serial, ethernet, ...)? Or perhaps it's for output (as opposed to
input)? Or perhaps it's for somehow adapting to an otherwise unknown
input speed? Perhaps the network interface writes out whatever data
is in memory just before it reads stuff in? I think this area needs
more doc.
Jeff Fox:
I/O PORT
Write to pattern 1C1000: 0-7 data
10-17 direction: pattern 00000 input
3FC00 output
Read from 1C1000: 0-7 pad
8-9 0
10-17 direction
18-19 0
(The production F21 will provide for external interrupt on the I/O port.)
>Presumably data written to the i/o port is latched while reads are
>unlatched (except of course that the data is placed on the stack)?
What is problem with the stack? You can say set 4 bits for input
and 4 bits for output, and write a pattern for output where you
output bits are latched, then read and see what is on the unlatched
input bits. Why is the stack a problem?
Jeff Fox:
VIDEO 1995 SPECS
... The pixel clock can run up to 20 MHz, with a corresponding
frame rate.
>Presumably the pixel clock is controlled by the crystal. Why only
>20MHz? Bus limitations? Logic complexity?
Don't know. I am confused by this one. Chuck lists 4 different
frequencies in reference to normal video. We all know 3.54 Mhz,
and Chuck mentions that there are 455 cycles of 7.x Mhz per video
line, and of course a 14.x Mhz xtal.
Now I take this to mean that things are divided down. But F21 also
has a USE-UNDIVIDED-VIDEO-CLOCK bit in the video register. So I
hope this means that without the clock divide a 20 mhz xtal can
be programmed the same as a 40mhz xtalk without the divide clock.
I don't know why the limit. It is after all an analog circuit
with BIG analog transistors that may limit some things. I
know the 6 bit D/A A/D is limited to 14Mhz to get reliable
readings in all 6 bits, but that doesnt mean we won't try to
see how fast it can be used with fewer bits of resolution.
I expect to be able to do 768x482 NTSC in addition to 384x482
and I also expect to be able to do RGB up to 800x600, but
will have to experiment with different clocks and different
video format. I even want to try 18 bit color on three
synchronized analog processors, and see how high of a
resolution spatially can be supported.
Jeff