Re: Network coprocessor on the F21
- To: Penio Penev <penev>
- Subject: Re: Network coprocessor on the F21
- From: Eugen Leitl <ui22204@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 18 May 1995 21:05:04 +0200 (MET DST)
- Cc: MISC
- In-Reply-To: <Pine.SGI.3.91.950518141434.5431B-100000@pisa>
On Thu, 18 May 1995, Penio Penev wrote:
[ remark on ring topo cut ]
>
> It depends on the task at hand. If the algorithm needs only to broadcast
> data, this could be very efficient.
The class of broadcast algorithms is a subset of generic
message routing things. Since the bandwidth is limited
and all links are blocked during any transfer, only
two objects can communicate over the ring during one
time. This is better than isolated processor, yet not
so fantastic by far as I had estimated initially.
> It could be very efficient for a small cluster.
It depends on the problem. Mine will not map very good.
But I do not complain, mark. A limited bandwidth channel
is much better than no channel at all. And at that low
price.
> It could be very efficient for local bidirectional, if the ring direction
> can be switched on the fly.
>
> For certain algorithms F21 is not the right chip. But, as Jeff will say,
> Chuck is in the custom VLSI business, so once the market niche is
> identified, one can order a specification very well tailored to the problem.
Let us hope he will put real independant asynchronous links in. Several of
them. So about 6-8. Then things will really start going interesting.
> > > No, it is not needed. There is no need for arbitration at the network
> > > level, since there is only one input and one output. The only arbitration
> > > is done for memory access by the memory coprocessor. But then, it is not a
> > > crossbar -- the low priority devices just wait.
This is not arbitaration. This is mere priority scheduling.
A crossbar or its perfect shuffle counterpart could handle several
channels independently. Of course, simultaneous watch (= parallel compare)
for a header token on each link would be necessary. One could use longer
delay between individual data packet as sync tokens, followed by dummy
pads to keep down time constraints. Then the next read in word header
would trigger processor action. Of course, parallel interrupts have to
be disabled. The problem with this approach is that several high speed
links would generate events at too high a frequency to be comfortable.
We still have to do some computation besides routing, do we?
Of course, increasing packet length will help. Or making the router
completely autonomous. It can be a MISC CPU, albeit with custom
ops, on its own. It could then handle as many events as it may like.
If routing code will not know the matching method the main CPU could
be invoked for assistance. The same things if incoming data packet is
destined for us.
> >
> > You are making it seem like a plus.
>
> For certain applications it is a plus.
Which ones do you have in mind? Pipelining ones?
> > For a
> > time I thought this to be a kind of 4th Transputer.
>
> P32 might be more closer to a Transputer. It is said to have several
> links. But this is coming later in the year.
I've heard of it. I watch out for definitive specs. But P32 may
be never released for broader market. I hope it will.
-- Eugene
>
> --
> Penio Penev <Penev@venezia.Rockefeller.edu> 1-212-327-7423
>