Nonlinear Wave Equation Solver
Originally tuned for Cray-YMP
Loop intensive, but shorter average vector length
Three versions of the code:
- V1 - Most significant loops optimized for tasking and vectorization, but memory bank conflicts exist
- V2 - Bank conflicts eliminated
- V3 - Remaining serial loops optimized for tasking/streaming
Traditional development cycle would go through V1 and V2 to get highest single processor performance
Third step necessary to maximize performance on MSP
Notes:
- -Now let us look at an application that was originally tuned for the Y-MP and see what additional tuning techniques will be needed to migrate to MSPs on the SV1
- -I will reference 3 version of the code that relate to the levels of optimization:
- 2) V2 - minimize memory bank conflicts
- 3) V3 - eliminate serial dependencies