CUG Archive

Papers

Accelerating the Big Data Analytics Suite

Authors: Pierre Carrier (Hewlett Packard Enterprise), Scott Moe (Advanced Micro Devices, Inc; Microsoft Azure), Colin Wahl (Hewlett Packard Enterprise), Alessandro Fanfarillo (Advanced Micro Devices, Inc)

Abstract: The Big Data Analytics Suite (BDAS) contains three classic machine learning codes: K-Means, Principal Component Analysis (PCA), and Support Vector Machine (SVM). This article describes how the 3 CPU codes, originally written in R, have been rewritten in C++ with HIP and MPI, and recast into GEMM-centric operations, taking full advantage of the heterogeneous architecture of the Frontier system. The new accelerated implementation of K-Means is now 80% GEMM-centric, PCA is 99% GEMM-centric, and finally, a new implementation in SVM will make it 20% GEMM-centric. Once completed in SVM, the entire machine learning suite will be GEMM driven. A discussion about AMD Tensile optimization of the GEMM operation adapted to extremely tall-and-skinny matrices in BDAS will be included. The improvements from the original CPU R codes to the new accelerated versions, referenced to the same number of Frontier nodes in use, are 320X, 360X and 120X, respectively for K-Means, PCA, and SVM. Future integration with python will be discussed, especially in the context of Dragon, and inclusion of various precision types.

Long Description: The Big Data Analytics Suite (BDAS) contains three classic machine learning codes: K-Means, Principal Component Analysis (PCA), and Support Vector Machine (SVM). This article describes how the 3 CPU codes, originally written in R, have been rewritten in C++ with HIP and MPI, and recast into GEMM-centric operations, taking full advantage of the heterogeneous architecture of the Frontier system. The new accelerated implementation of K-Means is now 80% GEMM-centric, PCA is 99% GEMM-centric, and finally, a new implementation in SVM will make it 20% GEMM-centric. Once completed in SVM, the entire machine learning suite will be GEMM driven. A discussion about AMD Tensile optimization of the GEMM operation adapted to extremely tall-and-skinny matrices in BDAS will be included. The improvements from the original CPU R codes to the new accelerated versions, referenced to the same number of Frontier nodes in use, are 320X, 360X and 120X, respectively for K-Means, PCA, and SVM. Future integration with python will be discussed, especially in the context of Dragon, and inclusion of various precision types.

Paper: PDF

Back to Papers Archive Listing