Performance Tuning for the Cray SV1

Click here to start

Table of Contents

Performance Tuning for the Cray SV1

Goals

Outline

SV1 Basics

SV1 Architecture

SV1 Architecture

The SV1 Data Cache

The Multi-Stream Processor

The Multi-Stream Processor

Memory Structure

Compilers

Single-SSP Performance Issues

The SV1 Data Cache

General Considerations

Vector Code Performance

Example: Vector Code

Vector Code Performance

In-Cache vs Out-of-Cache

Cache Hit Rate

Example: Laplace Equation Solver

Performance: Vector Version

Performance: Scalar Version

Memory Bandwidth: Vector

Memory Bandwidth: Scalar

Memory Bank Conflicts

Performance Impact due to Memory Stride

Performance Impact due to Memory Stride

Some Lessons

Multi-Processor Performance Issues

The Multi-Stream Processor

General Considerations

Laplace Equation Solver - Streamed

Computational and Memory Performance

Hold Issue Conditions and Cache Performance

Nonlinear Wave Equation Solver

Code Version 2 - Streamed

Code Version 3 - Streamed

Computational Performance

Memory Performance

Memory/Floating Point Performance

Some Lessons

Conclusions

Authors: James Giulliani and David Robertson

Email: dgr@osc.edu