SV1 Architecture Highlights
“Single Stream Processors” (SSPs)
300 MHz clock (3.33 ns clock period)
Dual-pipe floating-point units
- Can produce two add-multiply results per clock period
- Scalar operations share one of the pipes
Theoretical peak performance: 1.2 GFLOP/sec per SSP
Vector length of 64 words
32 Kword (256 Kbyte) cache for vector, scalar and instruction loads
Four SSPs per module board
- Module-to-memory bandwidth is about 5 GB/sec (6.4 GB/sec peak)
- Each processor can access around 2.5 GB/sec (3.2 GB/sec peak)
Notes:
- Leadin: First, let us review some of the hardware features that influence the performance of single processor jobs.
- -The cpus are capable of producing 4 floating point operations per clock period, but this places a high demand for operands from memory
- -Part of the new memory hirearchy is that each processor has a 32 Kword cache for vector and scalar operations. (we will see more about this in a few slides)
- -To understand single process memory bandwidth a little better we need to take a closer look at the hardware layout