Nonlinear Wave Equation Solver
Originally tuned for Cray-YMP
Loop intensive, but shorter average vector length
Three versions of the code:
- V1 - Most significant loops optimized for tasking and vectorization, but memory bank conflicts exist
- V2 - Bank conflicts eliminated
- V3 - Remaining serial loops optimized for tasking/streaming
Traditional development cycle would go through V1 and V2 to get highest single processor performance
Third step necessary to maximize performance on MSP