Authors: Daniel Fulton (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Laurie Stephey (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Shane Canon (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Brandon Cook (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Adam Lavely (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract: Although containers provide many operational advantages including flexibility, portability and reproducibility, a fully containerized ecosystem for HPC systems does not yet exist. To date, containers in HPC typically require both substantial user expertise and additional container and job configuration. In this paper, we argue that a fully containerized HPC platform is compelling for both HPC administrators and users, offer ideas for what this platform might look like, and identify gaps that must be addressed to move from current state of the art to this containers everywhere approach. Additionally, we will discuss enabling core functionality, including communicating with the Slurm scheduler, using custom user-designed images, and using tracing/debuggers inside containers. We argue that to achieve the greatest benefit for both HPC administrators and users a model is needed that will enable both novice users, who have not yet adopted container technologies, as well as expert users who have already embraced containers. The aspiration of this work is to move towards a model in which all users can reap the benefits of working in a containerized environment without being an expert in containers or without even knowing that they are inside of one.
Long Description: Although containers provide many operational advantages including flexibility, portability and reproducibility, a fully containerized ecosystem for HPC systems does not yet exist. To date, containers in HPC typically require both substantial user expertise and additional container and job configuration. In this paper, we argue that a fully containerized HPC platform is compelling for both HPC administrators and users, offer ideas for what this platform might look like, and identify gaps that must be addressed to move from current state of the art to this containers everywhere approach. Additionally, we will discuss enabling core functionality, including communicating with the Slurm scheduler, using custom user-designed images, and using tracing/debuggers inside containers. We argue that to achieve the greatest benefit for both HPC administrators and users a model is needed that will enable both novice users, who have not yet adopted container technologies, as well as expert users who have already embraced containers. The aspiration of this work is to move towards a model in which all users can reap the benefits of working in a containerized environment without being an expert in containers or without even knowing that they are inside of one.
Paper: PDF