Authors: Howard Pritchard (Los Alamos National Laboratory), Thomas Naughton (Oak Ridge National Laboratory), Amir Shehata (Oak Ridge National Laboratory), David Bernholdt (Oak Ridge National Laboratory)
Abstract: Open MPI for HPE Cray EX Systems
Open MPI is an open-source implementation of the MPI-3 standard that is developed and maintained by collaborators from academia, industry, and national laboratories.
Oak Ridge National Laboratory (ORNL) and Los Alamos National Laboratory
(LANL) are collaborating on porting and optimizing Open MPI and related components for use on HPE Cray EX systems, with a focus on the DOE Frontier
and Aurora exa-scale systems.
A key component of this effort involves development of a new LinkX Open Fabrics Interface (OFI) provider. In this paper, we describe enhancements to Open MPI, OpenPMIx runtime components, and the LinkX OFI provider. Performance results are presented for point to point and collective communication operations using both the vendor CXI provider and the LinkX provider, including results obtained using GPU accelerators. Recommended deployment options for EX systems will be discussed, along with future work.
Long Description: Open MPI for HPE Cray EX Systems
Open MPI is an open-source implementation of the MPI-3 standard that is developed and maintained by collaborators from academia, industry, and national laboratories.
Oak Ridge National Laboratory (ORNL) and Los Alamos National Laboratory
(LANL) are collaborating on porting and optimizing Open MPI and related components for use on HPE Cray EX systems, with a focus on the DOE Frontier
and Aurora exa-scale systems.
A key component of this effort involves development of a new LinkX Open Fabrics Interface (OFI) provider. In this paper, we describe enhancements to Open MPI, OpenPMIx runtime components, and the LinkX OFI provider. Performance results are presented for point to point and collective communication operations using both the vendor CXI provider and the LinkX provider, including results obtained using GPU accelerators. Recommended deployment options for EX systems will be discussed, along with future work.
Paper: PDF