home .. forth .. misc mail list archive ..
Re: Hobbit again

To: MISC
Subject: Re: Hobbit again
From: Jaap van Ganswijk <ganswijk@xxxxxxxxx>
Date: Sun, 24 Mar 1996 12:27:37 +0100
At 12:38 PM 3/21/96 GMT, Andrew Haley:
>> >> The Hobbit has (almost) no registers and uses a cache on the stack memory
>> >> instead. Compared to Forth engines and other pure stack machines it has a
>> >> very orthogonal instruction set however and can manipulate not only the
>> top of
>> >> the stack but also local variables higher up on the stack. Which helps
>> >> compiled C/Pascal/... a lot.
>> >> Just see the upper positions on the stack as registers in cache...
>> >> .
>> >> People try to tell me, that this concept can't be efficient, but I
don't see
>> >> why...
>> >
>> >It's less efficient because it makes the datapath in the CPU more
>> >complicated.  For example, superscalar execution is inhibited because
>> >it's possible that a stack element may be rewritten by a memory
>> >reference after it has been issued as a operand to an execution unit.
>> >The only way to solve this problem is to heave a complex interlock
>> >within the CPU.
>> 
>> We must learn to live with the fact, that CPU's are complicated
>> if we want them to do complicated tasks (quickly).
>
>Erm, no.  This is precisely the opposite of reality.  Interlocks and
>MUXes within CPUs cause delays: the way to make a CPU go faster is to
>reduce the number of such things.  The RISC lesson is that if you want
>a fast computer you make it as simple as possible.

RISC is of course a very interesting approach, but it wastes memory
(and disk-space and program load time) and because of it's
register structure can't handle a variable number of arguments correctly etc.

I feel however, that you approach the problem too much from the
hardware angle.
Writing a compiler is very difficult, especially when too many axes of
freedom are present and especially when some problems must be solved
by a repetitive process. You can easily solve one such problem,
but most architectures require several of these optimizations, which
are interdependent.
Hardware designer often create a lot of problems, which they say are
each easily solvable, but the complexity of all these problems together
may be too much to solve (at least in acceptable compiler time).

Example:
When you have a small number of registers like the 68000 has (8),
you may figure out how many registers a function needs,
to calculate all of its expressions etc. with all local variables
in memory. Then you allocate all remaining registers to local
variables. Now you recalculate the number of temporary registers
which may have become less, because some of the operands
are already in a register. This process is repeated...

When adding the complexity of also having address registers besides
data registers, things become a lot more difficult.
When having even more registers with special abilities as in the
case of the 80x86 things get even much more complicated...
When wanting to optimize also other things, for example using
loop enrolling (sp?) things are getting completely undeterministic.

As Hennessy and Patterson suggest there should be a symbiosis
between hardware development and compiler/software development.

>Making local variables appear to be in main memory slows down the
>processor's maximum clock rate and the ability to go superscalar.
>That's what's wrong with it.

I don't see why letting the registers be a cache on memory slows
things down. It doesn't have to be associative memory.
Just two bits are needed to indicate if the register is filled and/or changed.
Most of the data in the registers will never go to memory!
For task switches several register banks can be used. When the OS
wants to write in the user task space it will have to do a user task cache
flush first.
 
>> >With CPUs, simpler is better.  The only advantage of Hobbit is that it
>> >allows one to take the address of a stack element: that's a language
>> >misfeature.
>> 
>> I beg to differ:
>> The number of spaces in a computer should be as few as possible,
>> with a clear optimum of there being only one space.
>
>What do you mean by "optimum"?

Best solution.

>Dou you mean from a theoretical point of view?

Yes, that way all system software can be written in C, because
C programs can reach everything in that one space.
Lot's of problems are automatically solved: When all registers
are in the memory space, a task switch can be done with a simple
block move or a cache flush..

>It's certainly never going to be the fastest way of doing it.

I'm arguing from the software side. I have seen a lot
of hardware tricks in processors, that needed a lot of software
(and execution time) to be 'corrected'.
Some problems just can't be corrected as in the case of
the Am29000. The 'neat' segmentation solutions in the
80x86's have cost and still cost 1000's of programmers
man years.

The fact, that the 'arrays' of system, debug and test registers
in the 386+ can't be read by indexing, makes writing a
monitor/debuger not easier, since separate assembler routines
must be written to read/write them.

>> Obviously, this space would be the memory space.
>> Other spaces like the register space, the stack space and the
>> IO space should be part of this space.
>> Since the memory space should allow taking the address of an
>> element, any of the other spaces should also allow this...
>
>But that's only possible by having a gazillion read/write ports and
>hazard interlocks and MUXes.  Every one of these costs, both in terms
>of die size and delays.  Every transistor in the datapath contributes
>a delay.

It will always be complex, since the problem is complex...

>> I realize, that some would like to improve the world by
>> forbidding 99% of the languages that the world currently uses,
>> but in day-to-day life, if your processor can't handle these
>> languages efficiently, you won't sell your processor,
>> since the world isn't prepared to (and can't) rewrite all
>> the programs it has written so far...
>
>That's right.  And experience has shown that if you want to execute C
>quickly you should use a register based RISC CPU.  Using similar
>process technologies, Hobbit executes C slower than the ARM, for
>example.  And Hobbit is much bigger.

The ARM has probably been spent much more time and money on.

This is no proof, that a Hobbit-like architecture couldn't have
saved a lot of programming effort, which is what I am trying to say...


Oops, what a long text, anybody reached this point? ;-)

Groeten,
Jaap

-- Chip Directory
-- http://www.hitex.com/chipdir/ - USA
-- http://www.xs4all.nl/~ganswijk/chipdir/ - Europe
-- Many other international mirror sites from there...
Follow-Ups:
- Re: Hobbit again
  - From: Andrew Haley
- RISC and waste
  - From: Penio Penev
Previous by thread: Re: Hobbit again
Next by thread: Re: Hobbit again
Index(es):
- Thread