No Subject
- To: MISC
- From: jfox@xxxxxxxxxx (Jeff Fox)
- Date: Mon, 5 Jun 1995 08:30:28 -0700
Dear MISC readers,
>Regarding the 1-bit return instruction for the P32, could it be that Chuck is
>thinking of a "Repeat this fetched group of 6 instructions" operation for very
>fast (but tiny) loops? As I recall, something like that was proposed or
>implemented in ShBoom, Chuck's 32-bit predecessor to Mup21. If so, then the
>normal ; instruction is still needed. Can anyone who attended the SVFIG
>presentation comment on this?
>
>Mike Losh
Chuck said, "One option is to do nothing with those two left over bits."
So although he seems to favor a 1 bit return instruction there is still one
bit unasigned. As for the 1 bit return, I don't know exactly how it will
work. Will it execute a return after the last non-NOP instruction? Will
you have to pad with NOP and have return execute after the 5 bit instructions?
Will the normal return instruction be replaced by something else?
A 1 bit micro-loop might be useful on P32. There was a micro-loop on Sh-Boom,
but it was scrapped in the change from P20 to P21. As the paper I published
at the FORML Conference at Asilomar in 91 indicated a micro-loop was of
no real value on a 20 bit machine. Since it was an unconditional loop
it required one other instruction to be a conditional skip (out of the
current instruction, also scrapped) and with a conditional flag somewhere
having two instrutions was just not enough.
The only thing I could actually do with the micro-loop was in transfers.
There was a R!+ instruction in P20, so I had a CMOVE that did @A+ !R+
in a micro-loop. This was the only place where I could make use of the
micro-loop. But I realized that it was optimal for the CPU but not
for the memory interface processor.
Although there was 0 overhead for instruction fetches in the micro-loop
there was a big overhead for offpage memory access. If A and R pointed
to different pages of memory ( as they normally would) then every
fetch and store (every one!) would be an offpage access.
I rewrote the code in a more conventional fashion. It had an overhead
of 8 instruction fetches in the inner loop. This is because it would
load 4 words with @A+ @A+ @A+ @A+ so that only 1 in 4 was offpage,
then reorder the data and store it with R!+ R!+ R!+ R!+. This
routine although much longer and with more instruction load overhead
was still faster than the micro-loop version because of the offpage
memory load overhead.
The micro-loop version would have been faster if both pointers were
on the same page, or if one was in SRAM. But P20 didn't have that
21st bit for memory addressing. So you had to change a configuration
register bit to address SRAM. So you couldn't point one register
into DRAM and one into SRAM like you could on P21.
With more than six instructions in P32 a micro loop might be useful.
If it was a single bit, then you could put six five bit instructions
and one one bit instruction in the micro-loop. I have not studied
the usefulness of this, but someone might like to try writing some
code and see just how useful a micro-loop would be on P32.
Chuck doesn't have time for this like of useful instruction set
experimentation, but does listen to the results of such tests.
One of the things that makes MISC fun is that there is the opportunity
to influence the design of new machines. You may be able to get the
instructions that you really want, if you can show how useful they
would be. You don't have this opportunity with the big chips being
done by the big companies.
Of course Chuck could always decide to not replace the old ; with
something new, or to not implement that last unused instruction bit
on P32. But if someone shows how useful it would be to be have
some specific instruction, and it is not excessively difficult to
implement in hardware it is a chance to have input into the design.
Jeff Fox