Automatic scheduling: HPF
HPF offers block or cyclic data distribution
- independently for each dimension of an array
Scheduling is via the owner-computes rule
- s = u*x +w*v is computed by the owner(s) of s
The parallelism model is generally flat
- all nodes are working on the same loop nest
- global barrier synchronization is sufficient
Compilers for HPF have come a long way
Extensions have been added for layout of irregular data structures
- experimentation with these features is ongoing