Authors: Steven Robson (EPCC, The University of Edinburgh), Kieran Leach (EPCC, The University of Edinburgh), Stephen Booth (EPCC, The University of Edinburgh), Greg Blow (EPCC, The University of Edinburgh), Maciej Hamczyk (EPCC, The University of Edinburgh), Philip Cass (EPCC, The University of Edinburgh)
Abstract: EPCC operates a variety of services including ARCHER2, commissioned by UK Research and Innovation on behalf of the UK Government as the UK’s Tier-1 service, and Cirrus, a UKRI Tier-2 service.
EPCC had historically operated services using the PBS Pro scheduler, including HECToR and ARCHER, the previous national Tier One services, and Cirrus until mid 2020. In 2020 EPCC began to host ARCHER2, operating with the Slurm scheduler, as well as conducting a rebuild and upgrade of Cirrus, during which Slurm was deployed in place of PBS Pro.
Since 2020, EPCC have developed a number of approaches for management and configuration of the Slurm scheduler which have significantly improved service manageability and user experience. We will describe the key features of these approaches in this paper.
We will present our conclusion from the past several years working with Slurm, considering that:
• Slurm has proved valuable in its configurability and flexibility as a scheduler
• Scheduler configuration setup can be usefully shared across different services even though they have different requirements and levels of heterogeneity
• Automation can considerably reduce the effort required by staff supporting Slurm
Long Description: EPCC operates a variety of services including ARCHER2, commissioned by UK Research and Innovation on behalf of the UK Government as the UK’s Tier-1 service, and Cirrus, a UKRI Tier-2 service.
EPCC had historically operated services using the PBS Pro scheduler, including HECToR and ARCHER, the previous national Tier One services, and Cirrus until mid 2020. In 2020 EPCC began to host ARCHER2, operating with the Slurm scheduler, as well as conducting a rebuild and upgrade of Cirrus, during which Slurm was deployed in place of PBS Pro.
Since 2020, EPCC have developed a number of approaches for management and configuration of the Slurm scheduler which have significantly improved service manageability and user experience. We will describe the key features of these approaches in this paper.
We will present our conclusion from the past several years working with Slurm, considering that:
• Slurm has proved valuable in its configurability and flexibility as a scheduler
• Scheduler configuration setup can be usefully shared across different services even though they have different requirements and levels of heterogeneity
• Automation can considerably reduce the effort required by staff supporting Slurm
Paper: PDF