Vector Code Performance
Our experience suggests that typical performance is 30-50% of a T90 for codes that vectorize well
- Tends to the lower end of this range for long vector lengths, or codes requiring heavy memory bandwidth
- Tends toward the higher end for shorter vector lengths
Problems that take advantage of the cache also run very well
- Problems that fit in cache
- Codes with exploitable temporal locality (data reuse)
For scalar code, performance is typically 60-80% of the T90
Notes:
- -Best performance will be seen when you can take advantage of the vector hardware
- -Because of the high demands that vector operations place on the memory system, looking at cache utilization will be important to keep the functional units satified