Re: P64 Math
>>
>>> When you hit 32 bits I think it's worthwhile to use some sort of carry
>>> propagate acceleration. Something like a carry skip adder doesn't cost too
>>> much silicon, and will be *much* faster than a ripple adder for wide words.
>>
>>Can you name numbers? At least a qualitative factor. It still scales
>>linearly to bit length, I presume?
>>
CLA: Pure carry lookahead
Pyramid: Don't ask
Conditional sum:Carry select with 1bit groups, effectively
BCLA(i): Ripple within blocks of size i, CLA across blocks
RCLA(i): CLA across blocks of size i, ripple between blocks
SBCLA(i,j): CLA across blocks of size i, and superblocks of j blocks
ISBCLA(i,j): SBCLA variant - more parallelism
SRCLA(i,j): RCLA with superblocks
Carry complete: Nasty, non-deterministic thing
Serial: One bit per cycle, single full adder & a latch.
Note that T(n) in the following table ignores fanout restrictions,
propagation delay and lots of other things, so it's only an estimate, and
will be badly wrong for some architectures (eg CLA).
Adder Time for n-bit add: T(n)
CLA 4
Carry select 6
Pyramid 2*log2(n)+1
Conditional sum 2*log2(n)+2
SRCLA(4,4) n/8+6
SRCLA(2,8) n/8+6
SRCLA(4,8) n/16+6
Carry complete 2*log2(n)+4 (average)
SBCLA 6 * n^(1/3) - 2
BCLA 4 * n^0.5 - 2
1-level Skip ~4 * n^0.5
RCLA(4) n/2 + 1
RCLA(8) n/4 + 1
Ripple 2n - 1
Serial 3n
And the numbers:
( Cost(and) == Cost(or) == 2
Cost(xor) == 4
Cost(latch) == 8
T is in units of one gate-delay )
For 16 bit words
Adder T(16) Cost Cost:Perf
CLA 4 1264 51
Carry Select 6 440 26
Pyramid 9 342 31
RCLA(4) 10 336 34
RCLA(8) 6 560 34
Conditional-sum 10 698 70
SRCLA(2,4) 10 392 39
ISBCLA(2,4) 10 360 36
Carry-complete 12(avg) 368 44
2-level skip(2,4) 15 240 36
BCLA(4) 14 300 42
SBCLA(2,2) 14 368 52
1-level skip 15 240 36
Ripple 31 224 69
Serial 48 22 11
For 32 bit words
Adder T(32) Cost Cost:Perf
CLA 4 7392 296
Carry Select 6 992 60
SRCLA(2,4) 10 904 90
ISBCLA(2,4) 10 944 94
Pyramid 11 774 85
Conditional-sum 12 1594 191
Carry-complete 14(avg) 736 103
SBCLA(2,2) 18 748 135
RCLA(4) 18 672 121
RCLA(8) 10 700 70
2-level skip(4,2) 20 572 114
BCLA(4) 22 600 132
1-level skip 23 480 110
Ripple 63 448 282
Serial 96 22 21
For 64 bit words
Adder T(64) Cost Cost:Perf
CLA 4 50624 2025
Carry Select 6 2688 161
Pyramid 13 1734 225
SRCLA(4,4) 14 1808 253
SRCLA(4,8) 10 2032 203
Conditional-sum 14 3578 501
Carry-somplete 16(avg) 1472 236
ISBCLA(4,4) 18 1344 242
ISBCLA(4,8) 14 1568 220
SBCLA(4,4) 22 1548 341
2-level skip(4,4) 28 1140 319
1-level skip 29 960 250
RCLA(4) 34 1344 457
RCLA(8) 18 1400 403
BCLA(4) 38 1299 456
BCLA(8) 30 1320 396
Ripple 127 896 1138
Serial 192 22 42
Points to notice.....
The performance values for the CLA, Pyramid & carry-select are completely
wrong - the fan-out & fan-in needed to build a 64bit CLA doesn't bear
thinking about.
The pure CLA and pure pyramid adders are infeasible to build (due to fan-out
problems), but can be used as part of a hybrid adder scheme.
The cost:performance ratio for the serial adder is way, way better than
anything else, though it has a high latency.
Compare areas & times for the ripple adder and the 1-level skip. At all three
word lengths the skip adder is at least twice as fast as the ripple, and only
marginally bigger. The skip adder begins to look like a very good buy.
Take a look at `Computer Arithmetic Systems', Omondi, Prentice-Hall
ISBN 0-13-334301-4 for more details about this sort of stuff.
--
-- Steve Atkins -- <atkinss@inmos.co.uk> -- +44 454 611439 --
<Squeak!> <Squeak!> <Squeak!>....... <Splash!>