Slide 27 of 41
Notes:
- We can now add two values to our floating point results
- 1) Autotasked and run with 4 cpus (not optimized)
- 2) Multi-streamed and run on an MSP
- Note the more than a factor of four speedup for the MSP case.
- - larger effective cache & more memory bandwidth
- Performance of auto-tasked case could be improved using compiler directives
- -Notice that as with the SSP runs, floating point performance and memory bandwidth scale together