CUG2023 Proceedings

Sunday, May 7th

1:30pm-2:45pm

Programming Environments, Applications, and Documentation (PEAD)

Introduction PEAD

HPE Documentation

Training

Birds of a Feather

3:00pm-5:00pm

Programming Environments, Applications, and Documentation (PEAD)

User Module Environment

PE Updates and Testing

Future Directions for Fortran

Birds of a Feather

Monday, May 8th

8:30am-10:00am

Tutorial 1A

Using the HPE Cray Programming Environment (HCPE) to Port and Optimize Applications to hybrid systems with GPUs using OpenMP Offload or OpenACC

Tutorial

Tutorial 1B

Advanced Topics for Cray System Management for HPE Cray EX Systems

Tutorial

Tutorial 1C

Supercomputer Affinity on HPE Systems

Tutorial

10:30am-12:00pm

Tutorial 1A Continued

Using the HPE Cray Programming Environment (HCPE) to Port and Optimize Applications to hybrid systems with GPUs using OpenMP Offload or OpenACC

Tutorial

Tutorial 1B Continued

Advanced Topics for Cray System Management for HPE Cray EX Systems

Tutorial

Tutorial 1C Continued

Supercomputer Affinity on HPE Systems

Tutorial

1:00pm-2:30pm

Tutorial 2A

System Monitoring with CSM and HPCM

Tutorial

Tutorial 2B

Analyzing the Slingshot Fabric with the Slingshot Dashboard

Tutorial

Tutorial 2C

Omnitools: Performance Analysis Tools for AMD GPUs

Tutorial

3:00pm-4:30pm

Tutorial 2A Continued

System Monitoring with CSM and HPCM

Tutorial

Tutorial 2B Continued

Analyzing the Slingshot Fabric with the Slingshot Dashboard

Tutorial

Tutorial 2C Continued

Omnitools: Performance Analysis Tools for AMD GPUs

Tutorial

4:35pm-6:00pm

BoF 1A

Systems Monitoring Working Group BOF

Birds of a Feather

BoF 1B

Extending Software Tools for CSM

Birds of a Feather

BoF 1C

HPC System Test: Challenges and Lessons Learned Deploying Bleeding-Edge Network Technologies

Birds of a Feather

Tuesday, May 9th

8:30am-10:00am

Plenary: Welcome, Keynote

CUG Welcome

Keynote: Daring to think of the impossible: The story of Vlasiator

AMD Together We Advance - A Look Back and a Look Forward

Multi-dimensional HPC for Breakthrough Results

Plenary

10:30am-12:00pm

Plenary: Sponsor talk, Best Paper, New sites

CSC Finland: 52 years of leadership in European HPC

CUG Election Candidate Statement

Balancing Workloads in More Ways than One

Komondor, the Hungarian breed

High Performance Remote Linux Desktops with ThinLinc

Plenary

1:00pm-2:30pm

Technical Session 1A

Pekka Manninen

CPE Update

Deploying Alternative User Environments on Alps

Automating Software Stack Deployment on an HPE Cray EX Supercomputer

Paper, Presentation

Technical Session 1B

Lena M Lopatina

Building Efficient AI Pipelines with Self-Learning Data Foundation for AI

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

Benchmarking High-End ARM Systems with Scientific Applications. Performance and Energy Efficiency.

Paper, Presentation

Technical Session 1C

Chris Fuson

Flexible Slurm configuration for large scale HPC

Supporting Many Task Workloads on Frontier using PMIx and PRRTE

Slurm 23.02, 23.11, and Beyond

Paper, Presentation

3:00pm-5:00pm

Technical Session 2A

Frank M. Indiviglio

Deploying Cloud-Native HPC Clusters on HPE Cray EX

Software-defined Multi-tenancy on HPE Cray EX Supercomputers

New User Experiences with K3s and MetalLB on Managed Nodes

The WLCG Journey at CSCS: from Piz Daint to Alps

Paper, Presentation

Technical Session 2B

Lena M Lopatina

Deploying a Parallel File System for the World’s First Exascale Supercomputer

Hiding I/O using SMT on the ARCHER2 HPE Cray EX system

MPI-IO Local Aggregation as Collective Buffering for NVMe Lustre Storage Targets

Kfabric Lustre Network Driver

Paper, Presentation

Technical Session 2C

Brett Bode

Stress-less MPI Stress Tests

Leveraging Libfabric to Compare Containerized MPI Applications’ Performance Over Slingshot 11

Designing HPE Cray Message Passing Toolkit Software Stack for HPE Cray EX supercomputers

Open MPI for HPE Cray EX Systems

Paper, Presentation

5:05pm-5:50pm

BoF 2A

Energy-based allocations and charging on large scale HPC systems

Birds of a Feather

BoF 2B

HPCM Users

Birds of a Feather

BoF 2C

Data Mobility Service – Data curation and intelligent management

Birds of a Feather

Wednesday, May 10th

8:30am-10:00am

Plenary: CUG Elections, Panel

CUG Business

Women in HPC presents: Equity in Technical Leadership

Women in HPC presents: Equity in Technical Leadership

Kelly Rowland and Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); Ann Gentile (Sandia National Laboratories); Lipi Gupta and Yun (Helen) He (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); Lena Lopatina (Los Alamos National Laboratory); Verónica Melesse Vergara (Oak Ridge National Laboratory); Hai Ah Nam (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); Amy Neeser (University of California, Berkeley); Jean Sexton (Lawrence Berkeley National Laboratory); and Laurie Stephey and Zhengji Zhao (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

Abstract

The San Francisco Bay Area chapter of Women in HPC invites everyone to attend this discussion of equity in technical leadership. Technical leadership includes the skills to lead, direct and manage technical projects; we have invited a panel of experts on this topic to explore what it looks like to support individuals in groups underrepresented in HPC as technical leaders at various career stages.

The gender gap in technical roles is well-documented, and recent studies have started to delve into the issues behind the persistence of this gap and what exacerbates the issue in leadership specifically. To seed the panel discussion, we’ll present and discuss relevant findings from these investigations as they relate to supporting women in HPC technical leadership. We’ll also provide a brief “State of Women in HPC” report drawing from public demographic information with the goal of tracking progress over time. This presentation is intended to start a conversation towards improving the HPC field by empowering attendees of CUG and their colleagues at their respective organizations to succeed on their career paths of interest. We hope to engage the audience and the panelists in a constructive, actionable exchange.

pdf

The End of Heterogeneous Programming

Arm in HPC

CUG Business, Plenary

10:30am-11:40am

Plenary: HPE Update

HPE update by Gerald Kleyn

Vendor, Plenary

2:15pm-3:45pm

Technical Session 3A

Martti Louhivuori

Monitoring and characterizing GPU usage

Evaluating and Influencing Extreme-Scale Monitoring Implementations

STREAM: A Scalable Federated HPC Telemetry Platform

Paper, Presentation

Technical Session 3B

Tina Declerck

HPE’s Holistic system Power and energy Management (HPM) vision

Powersched: A HPC System Power and Energy Management Framework

Power Capping of Heterogeneous Systems

Paper, Presentation

Technical Session 3C

Helen He

Building AMD ROCm from Source on a Supercomputer

Arkouda: A high-performance data analytics package

HPC workflow orchestration using the ipython notebook platform

Paper, Presentation

4:00pm-5:00pm

Technical Session 4C

Tina Declerck

Delta: Living on the Edge of Slingshot Support.

Cray Systems Management (CSM) Security Policy Engine

Paper, Presentation

4:00pm-5:30pm

Technical Session 4A

Craig West

Polaris and Acceptance Testing

Frontier Node Health Checking and State Management

Performance Portable Thread Cooperation On AMD and NVIDIA GPUs

Paper, Presentation

Technical Session 4B

Abhinav S. Thota

Using Containers to Deliver User Environments on HPE Cray EX

Towards a "Containers Everywhere" HPC Platform

Reducing File System Stress Caused by Large Python Installations Using Containers

Paper, Presentation

5:00pm-5:30pm

BoF 3C

Next Steps in System Management

Birds of a Feather

Thursday, May 11th

8:30am-10:00am

Technical Session 5A

Helen He

Climate Change Adaptation Digital Twin to support decision making

Climate Change Adaptation Digital Twin to support decision making

Jenni Kontkanen (CSC - IT Center for Science Ltd.); Mario Acosta, Pierre-Antoine Bretonnière, and Miguel Castrillo (Barcelona Supercomputing Centre); Paolo Davini (ISAC-CNR – Institute of Atmospheric Sciences and Climate, Consiglio Nazionale delle Ricerche); Francisco Doblas-Reyes (Barcelona Supercomputing Centre, Institució Catalana de Recerca i Estudis Avançats); Barbara Früh (DWD – Deutscher Wetterdienst); Jost von Hardenberg (Politecnico di Torino); Thomas Jung (Alfred Wegener Institute Helmholtz Center for Polar and Marine Research); Heikki Järvinen (University of Helsinki); Jan Keller (DWD – Deutscher Wetterdienst); Daniel Klocke (MPI-M – Max Planck Institute for Meteorology); Outi Sievi-Korte (CSC - IT Center for Science Ltd.); Sami Niemelä (Finnish Meteorological Institute); Bjorn Stevens (MPI-M – Max Planck Institute for Meteorology); Stephan Thober (Helmholtz Centre for Environmental Research); and Pekka Manninen (CSC - IT Center for Science Ltd.)

Abstract

To guide climate change adaptation efforts, there is a need for developing new types of climate information systems that provide timely information on local and regional impacts of climate change. We aim towards this by developing a Climate Change Adaptation Digital Twin, as part of European Commission’s Destination Earth programme.

The Climate DT is a pre-exascale climate information system that harnesses two Earth-system models (ESMs), ICON and IFS-FESOM/NEMO. The models will be adapted to two EuroHPC pre-exascale systems. These include LUMI, an HPE Cray EX supercomputer that is in operation in Kajaani, Finland, and MareNostrum5 that will be available during 2023 in Barcelona, Spain.

The Climate DT introduces the idea of a generic state vector (GSV), which is evolved by the ESMs and streamed to applications. This enables the ESMs to work at an unprecedented scale, which improves the fidelity of the information as well as its relevance for the users. Use cases from five impact sectors and a framework for quality assessment and uncertainty quantification will be implemented within Climate DT as applications that operate on the streamed GSV.

We will give an overview of the Climate DT, including its objectives, technical design and progress made so far.

pdf

Early Experiences on the OLCF Frontier System with AthenaPK and Parthenon-Hydro

LUMI - Delivering Real-World Application Performance at Scale through Collaboration

Paper, Presentation

Technical Session 5B

Zhengji Zhao

Improving energy efficiency on the ARCHER2 UK National Supercomputing Service

Reducing HPC energy footprint for large scale GPU accelerated workloads

Estimating energy-efficiency in quantum optimization algorithms.

Paper, Presentation

Technical Session 5C

Bilel Hadri

Morpheus unleashed: Fast cross-platform SpMV on emerging architectures

Deploying HPC Seismic Redatuming on HPE/Cray Systems

Accelerating the Big Data Analytics Suite

Paper, Presentation

10:30am-11:00am

CUG2023 General Session

CUG2024 Site Presentation

Plenary

11:00am-12:00pm

Technical Session 6A

Jussi Heikonen

Frontier As a Machine for Science: How To Build an Exascale Computer and Why You Would Want To

The Ookami Apollo80 system: Progress, Challenges and Next Steps

Paper, Presentation

Technical Session 6B

Bilel Hadri

Just One More Maintenance: Operating the Perlmutter Supercomputer While Upgrading to Slingshot 11

Orchestration of Exascale Fabrics using the Slingshot Fabric Manager: Practical Examples from LUMI and Frontier

Paper, Presentation

Technical Session 6C

Jim Williams

Slingshot and HPC Storage – Choosing the right Lustre Network Driver (LND)

Journey in slingshot HSN segmentation using VLANs

Paper, Presentation

1:00pm-2:30pm

Technical Session 7A

Eva Siegmann

Porting a large cosmology code to GPU, a case study examining JAX and OpenMP.

Adding GPU support to TurboGAP. Towards exascale molecular dynamics with machine learning potentials

Scalable High-Fidelity Simulation of Turbulence With Neko Using Accelerators

Paper, Presentation

Technical Session 7B

Chris Fuson

Observability, Monitoring, and In Situ Analytics in Exascale Applications

Assessing Memory Bandwidth on ARCHER2 and LUMI Using CAMP

Overview of SPEC HPC Benchmarks and Details of the SPEChpc 2021 Benchmark

Paper, Presentation

Technical Session 7C

Tina Declerck

VASP Performance on HPE Cray EX Based on NVIDIA A100 GPUs and AMD Milan CPUs

Containerization Workflow and Performance of Weather and Climate Applications

Integration of Modern HPC Performance Analysis in Vlasiator for Sustained Exascale

Paper, Presentation

2:45pm-4:15pm

Technical Session 8A

Tina Declerck

HPC Cluster CI/CD Image Build Pipelining

The Tri-Lab Operating System Stack (TOSS) on Cray EX Supercomputers

Manta, a cli to simplify common Shasta operations

Paper, Presentation

Technical Session 8B

Cristian-Vasile Achim

Header Only Porting: a light-weight header-only library for CUDA/HIP porting

STAX, HPC meta-containers from edge to core workflow orchestration

Paper, Presentation

Technical Session 8C

Zhengji Zhao

Performance Study on CPU-based Machine Learning with PyTorch

Towards Training Trillion Parameter Models on HPE GPU Systems

A Deep Dive into NVIDIA's HPC Software

Paper, Presentation

4:20pm-4:30pm

CUG 23 Closing

Plenary

Friday, May 12th

8:00am-4:30pm

Sustainable HPC Operations Workshop ( in Kajaani)

Workshop in Kajaani