Single Processor Performance IssuesLessons Learned

The SV1 is still a vector machine!

Basic cache optimization rules obtain:
- best to structure application so that data fits in cache (cache blocking)
- best to access data with small memory strides
- avoid strides that are a large power of two especially

Probably best to focus on these first, rather than high VLs

Unlike caches with line sizes greater than one, thrashing is no worse than computing without the cache

The cache can hide the effects of bank conflicts
- Also at their worst for power-of-two strides

Acceptable cache hit rates are in the 40-60% range, depending on application

Many codes can achieve 50% of T90 performance

Previous slide Next slide Back to first slide View graphic version

Notes:

Program for vectorization

Optimize for best cache use to maximize memory bandwidth
- Floating performance scales with memory bandwidth