Parallel programming is still hard
Programming is too tedious
Architecture changes too often
Locality optimization competes with load balancing
Dynamic load re-balancing changes “who’s where”
Data layout for irregular data structures is painful
Memory per node is often insufficient
Data races are far too common
Debugging tools are primitive
Some of these problems should be correctable.