Numerics:Numerical core of JDS format
do i = 1 , (jd_ptr(j`+1) - jd_ptr(j`))
Y(i) = Y(i) + value(jd_ptr(j`)+i-1) * X( col_ind(jd_ptr(j`)+i-1) )
Long inner loop (~DMat): Autoparallelization / -vectorization
MPI parallelization:i-loop structure maintained
Memory bound: 2 Flop per (4 Load + 1 Store)