General Considerations
Effective cache size quadrupled
- Can result in super-linear speedups for appropriately sized problems
Loops that donít vectorize (e.g. outer loops) can be streamed
- Can stream outer loops while vectorizing inner loops
Must eliminate conditions that inhibit streaming