Re: MISC-d Digest V99 #106
- To: MISC
- Subject: Re: MISC-d Digest V99 #106
- From: "GARY B. LAWRENCE" <garyl@xxxxxxxxxxx>
- Date: Fri, 31 Dec 1999 03:39:27 -0700
- References: <199912291211.IAA03284@pisa.rockefeller.edu>
- Reply-To: garyl@xxxxxxxxxxx
MISC-d-request@pisa.rockefeller.edu wrote:
>
> Subject:
>
> MISC-d Digest Volume 99 : Issue 106
>
> Today's Topics:
> Re: MISC-d Digest V99 #105
> Re[2]: MISC-d Digest V99 #105
>
> ---------------------------------------------------------------
>
> Subject: Re: MISC-d Digest V99 #105
> Date: Wed, 29 Dec 1999 00:52:10 EST
> From: "Wayne Morellini" <waynemm@hotmail.com>
> To: MISC
> CC: sz@uc.ru
>
> Hello sz
>
> I'll answer some of the questions because you have been so helpful before,
> and not so many people are using the list now a days.
>
> From: sz <sz@uc.ru>To: "MISC@pisa. rockefeller. edu" <MISC>Subject: F21 and
> possible enhacements.Date: Thu, 23 Dec
>
> Hello Misc@pisa.,
>
> I've discussed some of F21 features in our Russian FIDOnet
> conference and my opponent points me that F21 works at 100MHz
> because it can't get faster in 0.35 technology. As far as I remember
> it's untrue. Am I right?
>
> You are right the 100Mhz was either for the p21 1.2 or .8 micron, or the
> speed running from dram (due to pad power up time limitations in the memory
> interface). I do wish they (Chuck's misc cpu's) would move to DDR-ram,
> SDRAM or SRAM, but that is probably financially out of reach now. What is
> the transistor count for pure F21 command execution core? Good question,
> everything was supoosed to be less than 15,000 tranistores, maybe less than
> 9000, based on the old p21
100mhz was the fastest burst speed of the mup21 - as Jeff has listed on
his spec sheets
the fastest burst speed for a f21 that could work from rom was 333mhz -
however the
current f21d speed must be judged by
a. the number of processors running in dynamic ram and the number of
times
they jump over page boundries
b. if the code is small enough to run in static ram - then the main
processor can run
closer to the 200 mhz range
> Also, what is real state of stack computers?
current stack computers include the novix -
Harris rtx used I think by ball brothers for space payloads
the patriot-scientific 1000 used now to run java applications on set
top boxes
- info on their web sight claims that they can do interrupt driven
java code faster
than pico java chips designed for java code
both the rtx and the pt-1000 are based on chuck's novix and shaboom
designs
> ? Usefull for tasks that don't require existing register CPU, and leading
> edge.
>
> Why mainstream goes
> register CPU? Because stacks are slower to access, or because
> mainstream did not develop suitable technology to handle stacks?
>
Mainstream designers stick with upgrades to 8080 code to keep all
that old software
working. At some point they will have to change. What I have seen of
the Merced processor, it looks like they will change. It has three 41
bit opcodes stored in a 128 bit word and most opcodes work with three
registers - each in a range of 128 registers.
What Charles Moore is trying to do is have the simplest combination of
hardware and software. The stack computer makes opcode decoding and
addressing as simple as possible.
Chuck can get four five bit opcodes to work with
a. top of stack and operations combining top of stack and second
stack position
b. push stack to return stack or pop from top of return stack to
stack
c. store indirect or fetch indirect using an address register
d. store address from stack into address register or fetch address
from address register to stack
e. use both the address register and return stack to do a fetch with
address increment and store with address increment - this makes for a
very fast move sequence.
The move is used a great deal in graphics, work processors,
database manipulation
and string operations. More complex processors have address
increment operations
but you must be careful when looking at complex processors - speed
comparisons are made much more difficult because of
a. opcodes of many clock cycles - the 68000 had internal 32 bit
registers, but
because of multiple cycles per opcode - an 8 mhz 68000 was not
faster than an 8 bit
1 mhz 6502 on some input output 8 bit operations - higher speed
variants of the
6502 were better for some microcontroller applications.
b. use of cache memory - what happens when the cashe is flushed - how
much code is
actually in the cache on average - how good is the cache hit rate.
When later Intel processors are compared to Chucks mup21 with 7000
transistors
and very small die size in 1.2 micron fab - you are comparing the
cheapest possible
small design in terms of computer aided design and lowest cost chip
fabrication
to the R&D budget of a company so large and powerful that their
total valuation
is greater than the GNP of some third world countries. If Chuck
could do his designs in the current .15 micron size and include his
entire dram and sram addresses
on chip - about the same scale in terms of transistors as the
lastest intel or alpha processor - his chip could run continuous on chip
access and show useable continous instruction speeds in the range of 600
million to 1 billion. Since Chuck is
using an integer only opcode design, this chip would still be only
better for total integer applications.
c. how well does the whole system design work - in intel based pc
clone designs
there is a lot of glue logic to consider when comparing speeds
1. dma, chip refresh, timer interupts, large number of input output
interrupts
have to be considered when judging how well a system design
works .
With the F21 chip one on board processor handles video - simpler
video
one processor does analog input and output - sound output
perhaps
one processor does network input and output
the main processor setups up all the other processors thru
memory access only
- multiple dynamic memory access still slows down memory access,
but problems
seen in windows 95/98 where even serial data as slow as 28.8 khz
can be lost
because the system is not handling all interrupts fast enough
can be avoided.
Chuck's chips try to include as many simple system functions as
possible on chip.
This means that very cheap system boards can be made that also
use a great deal
less low level code. This low level code also executes in less
time which makes
it possible for fast interrupts such as network packet reception
to be handled
without external first in first out buffers. To get my windows
95 modem to work
well I need to add a board with a serial fifo - a 16550 chip on
it.
d. how much work has to be done to produce a compiler to optimize
the processor
design
1. the complexity of modern compiler design was shown to me by a
speed comparison made between risc processors -
alpha, mips and sparc
each one needed a very good compiler to optimize the use of
register operations to try to beat intel code and to
beat other risc processors.
one particular speed comparison showed the alpha behind the
other two
risc processors in comparisons running c code in some
situations
I did not understand how this was possible since that
particular alpha ship
perhaps 233 mhz or 300 mhz was faster in opcode speed. A
friend explained
that indeed the alpha chip was faster but that Dec had not yet
written a compiler that was optimal for the alpha chip.
Later comparisons showed that
their compiler had been improved and the alpha was shown to be
the fastest chip.
The point I wish to make is that a very large company with years
of computer science experience - DEC - needed considerable
amount of time - perhaps more than
three or four years to make an optimal compiler. That is complex!
In the case of Chuck's chips - a fast small stack design with
very quick opcode
decode and less than 32 opcodes - a useable and fairly fast
forth system can be
implemented with a great deal less effort. The person doing the
chip fabrication
design - mainly Chuck - and the person doing the chip simulation
and testing -
again mainly Chuck - can also do the compiler - operating system
- driver - design - again mainly Chuck.
If you check the most recent articles about Chuck's work on Ultra
technology, Chuck
talks about having redone the I21 Itv chip design to use sdram. This
makes dram access
faster and uses the memory chip design that is most likely to be the
lowest cost per
bit. In the article he said that the most recent design was not put on
a fab line yet.
If Itv does have large scale production a new fab will be done later.
Jeff Fox has put the text and some pictures up from video 1 that
shows chuck using
his okcad system to display F21 chip design. I bought the video, and
would encourage
anyone to look at that article because it shows just how productive
Chuck is. by using
simple well thought out code he designed an entire chip layout,
simulation and test
system on a 386 computer using very little dos or bios code. Some
people make fun of just how simple his designs are. He even stated that
you should get rid off all file
operations when possible and load the whole application into memory
with only a simple
load from disk and store with a similar total memory store to disk. I
can understand
that now that systems with 32meg to 256meg are available, a forth
application can be
loaded totally in a small percentage of the total memory without much
chance of running out of memory. I ironically spent a few hours getting
a create-file operation
to work under gforth. I do not know how I could have so much trouble
with a file operation because I have done file and/or block operations
on 6502 fig forth, f83 forth, f83s forth, f-pc forth and win32for. I
wish to find ways I can work with a simpler
hardware, software combination. Perhaps I could become a little more
productive.
anyway best wishes for the new year and I hope to chat with all
of you after
the new year if the world does not end. gary
> Why go Microsoft, PC, BMX bikes, BMW/Citroen CV2, western Qwerty Keyboard
> etc = fashion, right place, right time etc and being stuck with it.
>
> Best regards,
> sz mailto:sz@uc.ru
>
> Merry Christmas and a happy new Mellenium (or Y2KB experience for those in
> system admin;) to you sz and all Miscers.
>
> Wayne
>
> P.S. sz calculate that only 1-2 cycles are needed to render each pixel in
> voxels, with *8 that much for photo realism. I suggest that you stick with
> voxels, maybe usefull for a MISC extension, or with programmable silicon a
> hardware accelerator.