Vector Code Performance
Our experience suggests that typical performance is 30-70% of a T90 for codes that vectorize well
Tends to the lower end of this range for long vector lengths, or codes requiring heavy memory bandwidth
Tends toward the higher end for shorter vector lengths
Problems that fit in cache run very nicely!
Codes with exploitable temporal locality (data reuse) also run very well
For scalar code, performance is typically 60-80% of the T90