CUG Archive

Papers

Powersched: A HPC System Power and Energy Management Framework

Authors: Marcel Marquardt (Hewlett Packard Enterprise), Jan Mäder (Hewlett Packard Enterprise), Tobias Schiffmann (Hewlett Packard Enterprise), Christian Simmendinger (Hewlett Packard Enterprise), Torsten Wilde (Hewlett Packard Enterprise)

Abstract: Supercomputers can consume huge amounts of energy. This rising power consumption already has led to a situation in which large HPC sites run overprovisioned supercomputers, where the peak power demand of the system can exceed the power available at a given site. In addition, energy prices, especially in Europe, have dramatically increased over the last year. To address both problems, we see the need to run HPC applications at a maximally energy efficient sweetspot in terms of instructions per watt.

The Powersched framework presented in this paper is part of HPE’s vision towards a holistic system power management software stack. The framework implements a proof-of-concept prototype for inband application aware power and energy management. It can manage overprovisioned systems while at same time steering HPC workloads into their energetic sweetspot. Powersched records CPU profiling data while changing system runtime parameters, such as the available power per CPU package. Using Machine Learning, it derives an optimal sweetspot for the given workload and its profiling counter footprint.

We present details of the Powersched framework, its implementation, and first results using a clustering-based machine learning technique showing an average energy saving of around 14% with an average runtime increase of less than 2%.

Long Description: Supercomputers can consume huge amounts of energy. This rising power consumption already has led to a situation in which large HPC sites run overprovisioned supercomputers, where the peak power demand of the system can exceed the power available at a given site. In addition, energy prices, especially in Europe, have dramatically increased over the last year. To address both problems, we see the need to run HPC applications at a maximally energy efficient sweetspot in terms of instructions per watt.

The Powersched framework presented in this paper is part of HPE’s vision towards a holistic system power management software stack. The framework implements a proof-of-concept prototype for inband application aware power and energy management. It can manage overprovisioned systems while at same time steering HPC workloads into their energetic sweetspot. Powersched records CPU profiling data while changing system runtime parameters, such as the available power per CPU package. Using Machine Learning, it derives an optimal sweetspot for the given workload and its profiling counter footprint.

We present details of the Powersched framework, its implementation, and first results using a clustering-based machine learning technique showing an average energy saving of around 14% with an average runtime increase of less than 2%.

Paper: PDF

Back to Papers Archive Listing