Slide 30 of 41
Notes:
-Here is the modified code after the 3rd round of optimization
-By adding in dummy arrays for the partial sums and moving the global reduction outside of the loop, the patrial sum can now be streamed