Computational Performance

For the T90, V2 is the better performing code

When targeting MSPs, entire code must be examined as in V3 to reduce code with serial dependencies

Previous slide Next slide Back to first slide View graphic version

Notes:

-Recalling that V1, V2 and V3 are the different stages of optimization, we can see several trends in the floating pt

1) After vectorization, memory bank conflicts were a significant bottleneck. The SV1 cache helped to offset this and we saw a respectable 50% of T90 performance

2) Eliminating the memory bank conflicts in V2 significantly improved T90 performanc and we droped to ~30% SSP

3) At this point we would stop on older Cray platforms. Adding the 3rd step of removing serial dependencies allows the MSP code to match T90 performance (30% v2-> v3)