No Subject
- To: MISC
- From: jfox@xxxxxxxxxx (Jeff Fox)
- Date: Tue, 30 May 1995 16:57:48 -0700
Dear MISC readers,
Penio asks:
>
>What are the implications of the fast internal speed on the NOPs,
>needed for carry propagation?
Instructions like NOP and carry propagation will have fixed relative
timing relationship. That is on MuP21 carry moves 8 bits per instruction
clock. We don't know the number yet for F21. It may be less than 8.
Also it will move a couple of less in the + or +* itself. So the
internal clock has no effect on this. It will not matter at all if
it is 200, 300, 400, 500 or whatever. The bits propagation per clock
will be the same for a given design. What can change is the memory
timing and therefore the time between memory fetches. This is just as
it has been on MuP21. You may need more NOP if the code is going
to run out of SRAM on P21.
>One possible scenario is that all four instructions will be latched as
>soon as the could, and then wait for a long time till the next four come.
This is the way it works. There is some advantage to a faster internal
clock even if the the memory speed is still the limiting factor. This
is because when any non-sequential memory access occurs it
must be added to the instruction exection time. Prefetch only
happens after any branching or data memory access. So the sooner
you complete a CPU operation the sooner you begin the next valid
memory access. If you start an invalid access it is just aborted
when the actual memory access starts.
>Another scenario is that although the results are stable very quickly,
>the latching is delayed, so that the overall waiting for the next
>instruction to come is distributed among the four slots. This means, that
>the carry for each individual slot will have more time to propagate, and
>longer operands will be able to be added without a NOP.
Slowing down the CPU to do this also means slowing down throughput
whenever memory accesses are performed.
Also the internals of these MISC chips are tuned to the cpu. Signals
are there, and then they go away. Signals can change again in a few
hundred picoseconds. So to spread a delay out accross an instruction
might mean adding lots of latches and changing the way things are
tuned to work.
Also on F21 there are three different memory spaces that run at different
speeds, there is a configuration bit to adjust SRAM speed, one for ROM
speed, and three to adjust overall memory timing. How would you know
how fast the memory is going to be able to spread out delays without
quite a large amount of logic? Just how much would you slow down each
instruction when memory timing could go a dozen different ways?
>The motivation is for the following code to work for as longer
>operands, as possible:
>
>+n 2/ +n 2/
>
>This is a part of a multiplication, which in case of insufficient
>carry propagation time would be carried out like this:
>
>+n 2/ nop nop
>
>thus slowing it twice. Of course, the best scenario is that the delay is
>distributed only between the first and the third slots, so they are
>stretched longer and accommodate longer add operands. This will make some
>programs relying on multi-precision multiplication run twice as fast.
>
>--
>Penio Penev <Penev@venezia.Rockefeller.edu> 1-212-327-7423
Interesting idea. However it is not the way the hardware works. Chuck
could delay the internal clock to match 1/4 the memory clock in some
cases. Perhaps code can adjust the memory timing speed bits in the
configuration registers to get the best performance out of some
particular sequence. But the delays are only there between words or
memory accesses.
Jeff Fox