Computational and Memory Performance
Again, the computational performance scales with memory performance
MSP shows significantly better memory throughput over autotasked code
Floating-Point Performance
Larger apparent cache yields greater than 4x speedup for MSP executable
Average vector lengths: T90: 99.58 SV1: 60.53
Notes:
- We can now add two values to our floating point results
- 1) Autotasked and run with 4 cpus (not optimized)
- 2) Multi-streamed and run on an MSP
- Note the more than a factor of four speedup for the MSP case.
- - larger effective cache & more memory bandwidth
- Performance of auto-tasked case could be improved using compiler directives
- -Notice that as with the SSP runs, floating point performance and memory bandwidth scale together