Computational and Memory Performance

Memory Performance

Again, the computational performance scales with memory performance

MSP shows significantly better memory throughput over autotasked code

Floating-Point Performance

Larger apparent cache yields greater than 4x speedup for MSP executable

Average vector lengths: T90: 99.58 SV1: 60.53

Previous slide Next slide Back to first slide View graphic version

Notes:

We can now add two values to our floating point results

1) Autotasked and run with 4 cpus (not optimized)

2) Multi-streamed and run on an MSP

Note the more than a factor of four speedup for the MSP case.

- larger effective cache & more memory bandwidth

Performance of auto-tasked case could be improved using compiler directives

-Notice that as with the SSP runs, floating point performance and memory bandwidth scale together