Example: Vector Code
5 multiplies, 3 adds, 2 reciprocals per iteration
For n of order 2000
- vector lengths near maximum
- all arrays fit in cache
For n of order 7000, data does not all fit in cache
Notes:
- Let us look at a simple example that will allow us to see some of the characteristics of the cache
- The following kernel vectorizes well and shows a nice balance among the functional units.
- Because the cache is write through and write allocate, memory stores are written to cache. So the assignment will fill up a cache location as well as the four arrays on the right.
- Given this, for n up to approx 6000 we should see good performance as all arrays will fit in cache
- As n is increased, performance should begin to taper off as cache locations are overwritten with data before they can be reused