Table of Contents
Performance Tuning for the Cray SV1
Goals
Outline
SV1 Basics
SV1 Architecture
SV1 Architecture
The SV1 Data Cache
The Multi-Stream Processor
The Multi-Stream Processor
Memory Structure
Compilers
Single-SSP Performance Issues
The SV1 Data Cache
General Considerations
Vector Code Performance
Example: Vector Code
Vector Code Performance
In-Cache vs Out-of-Cache
Cache Hit Rate
Example: Laplace Equation Solver
Performance: Vector Version
Performance: Scalar Version
Memory Bandwidth: Vector
Memory Bandwidth: Scalar
Memory Bank Conflicts
Performance Impact due to Memory Stride
Performance Impact due to Memory Stride
Some Lessons
Multi-Processor Performance Issues
The Multi-Stream Processor
General Considerations
Laplace Equation Solver - Streamed
Computational and Memory Performance
Hold Issue Conditions and Cache Performance
Nonlinear Wave Equation Solver
Code Version 2 - Streamed
Code Version 3 - Streamed
Computational Performance
Memory Performance
Memory/Floating Point Performance
Some Lessons
Conclusions
|
Authors: James Giulliani and David Robertson
Email: dgr@osc.edu |