Hold Issue Conditions and Cache Performance
Large problem size, two arrays of about 4 million elements each, limit potential cache read hits
Better memory performance of tasked and streamed code reduces waits on vector registers
High wait on vector functional units shows good performance