Authors: Jesse Hanley (Oak Ridge National Laboratory), Dustin Leverman (Oak Ridge National Laboratory), Christopher Coffman (Oak Ridge National Laboratory), Bradley Gipson (Oak Ridge National Laboratory), Christopher Brumgard (Oak Ridge National Laboratory), Rick Mohr (Oak Ridge National Laboratory)
Abstract: The world’s first exascale supercomputer, OLCF’s ’Frontier’, debuted last year and is allocated for INCITE awards this year. OLCF partnered with HPE to design, procure and deploy a parallel file system to support the demands of this new machine. This file system is based on the ClusterStor E1000 storage platform and has been integrated into the OLCF site.
With a useable namespace of 679PB, this cluster employs several newer features in Lustre to provide a solution that combines the performance of NVMe and the capacity of traditional hard disk drives. We present the architecture and configuration of this system and detail the steps taken to operationalize the file system cluster. The authors aim to provide the contents described as a community resource for others that are designing or deploying storage systems.
Long Description: The world’s first exascale supercomputer, OLCF’s ’Frontier’, debuted last year and is allocated for INCITE awards this year. OLCF partnered with HPE to design, procure and deploy a parallel file system to support the demands of this new machine. This file system is based on the ClusterStor E1000 storage platform and has been integrated into the OLCF site.
With a useable namespace of 679PB, this cluster employs several newer features in Lustre to provide a solution that combines the performance of NVMe and the capacity of traditional hard disk drives. We present the architecture and configuration of this system and detail the steps taken to operationalize the file system cluster. The authors aim to provide the contents described as a community resource for others that are designing or deploying storage systems.
Paper: PDF