CUG2024 Proceedings

Overview | By Event Type | Author Index

Birds of a Feathers

Networking/Social Event

CUG Program Committee

Birds of a Feathers

Birds of a Feather

Programming Environments, Applications, and Documentation (PEAD)

The PEAD (Programming Environments, Applications, and Documentation) is a CUG Special Interest Group that provides a forum for discussion and information exchange between CUG sites and Cray/HPE. The group focus includes system usability, performance of programming environments (including compilers, libraries, and tools), scientific applications running on Cray/HPE systems, user support, communication, and documentation. The group host meetings at CUG each year to help foster discussions surrounding these topics between HPE and member sites. Following a successful event at last year's CUG, this year the PEAD SIG will meet Sunday, May 05, from 1:00 PM - 5:00 PM. We are planning topics surrounding the HPE PE roadmap, training collaborations, HPE documentation, as well as Fortran support. All topics will be interactive and discussion based. Registration for the event is required. Lunch will be available for everyone who registers for the meeting.

PEAD Introduction

Chris Fuson (Oak Ridge National Laboratory)

HPE Fortran Support

Bill Long (HPE)

HPE/Cray CPE Roadmap

Barbara Chapman (HPE)

Programming Environment Management BoF

Nick Hagerty (Oak Ridge National Laboratory) and David Carlson (Stony Brook University)

Birds of a Feather

Programming Environments, Applications, and Documentation (PEAD)

HPE Documentation and Training Updates

Peggy Sanchez (HPE)

Collaborative Development of HPC Training Materials

Lipi Gupta (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Ann Backhaus (Pawsey Supercomputing Research Centre), and Jane Herriman (Lawrence Livermore National Laboratory)

HPE User Engagement Survey

Barbara Chapman, Peggy Sanchez, and Kaylie Anderson (HPE)

Open Discussions

Chris Fuson (Oak Ridge National Laboratory)

Birds of a Feather

BoF 1B

High Performance Data-centre Digital Twins

Matthias Maiterth (Oak Ridge National Laboratory); Tim Dykes and Jess Jones (HPE HPC/AI EMEA Research Lab); Adrian Jackson and Michele Weiland (EPCC, The University of Edinburgh); and Wes Brewer (Oak Ridge National Laboratory)

Birds of a Feather

BoF 1A

OpenCHAMI for collaborators and the collaborator-curious

Travis Cotton and Alex Lovell-Troy (Los Alamos National Laboratory)

Birds of a Feather

BoF 1C

HPE Slingshot Birds of a Feather

Jesse Treger (HPE)

Birds of a Feather

BoF 2B

Bird of Feather on Artificial Intelligence and Machine Learning for HPC Workload Analysis (AIMLHPCWorkload2024)

Kadidia Konate and Richard Gerber (Lawrence Berkeley National Laboratory)

Birds of a Feather

BoF 2A

2024 HPC Testathon: Experiences and Results

Veronica Melesse Vergara (Oak Ridge National Laboratory), Bilel Hadri (King Abdullah University of Science and Technology), and Maciej Cytowski (Pawsey Supercomputing Research Centre)

Birds of a Feather

BoF 2C

Architecting a Cloud-based Supercomputing as-a-Service Solution

Pete Mendygral and Kirti Devi (Hewlett Packard Enterprise)

Birds of a Feather

BoF 3A

System Monitoring Working Group

Craig West (BOM) and Stephen Leak (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Break

Break

Coffee Break

Break

Coffee Break

Break

Coffee Break (sponsored by Altair)

Break

Coffee Break (sponsored by Linaro)

Break

Coffee Break (sponsored by SchedMD)

Break

Coffee Break (sponsored by Thinlinc)

Break

Coffee Break (sponsored by VAST)

Break

Coffee Break (sponsored by Pier Group)

Break

Coffee Break

Break

Coffee Break

CUG Board

CUG Board

CUG Board & Sponsors Lunch (closed)

CUG sponsors (non-HPE) are invited to join the CUG Board for an informal lunch discussion.

CUG Board

HPE Executive Lunch (closed)

HPE Executives and representatives are invited to join the CUG Board for an informal lunch discussion.

CUG Board

New CUG Board / Old CUG Board Lunch (closed)

Newly elected board members are invited to an informal lunch with the prior CUG Board to discuss remaining activities for the week as well as future plans. Please bring food from the standard lunch buffet to the private area at the far end of the restaurant.

CUG Program Committee

CUG Program Committee

Program Committee Dinner (invite only)

Participants that helped with the reviews and program committee are invited to a private event. 6.30pm meet at the Beer Corner to be seated at 7pm in The Mark which is within COMO The Treasury, https://statebuildings.com/functions/the-mark/.

CUG Program Committee

CUG Advisory Board Lunch Cabinet (closed)

The CUG Advisory Board is comprised of chairs and liaisons from the special interest groups and program committee members. This session is typically lead by the CUG Vice President to discuss the program, provide guidance to session chairs for the week, and to receive feedback to improve processes and content for future events.

CUG Program Committee

CUG Advisory Board

The CUG Advisory Board is comprised of chairs and liaisons from the special interest groups and program committee members. This session is typically lead by the CUG Vice President to receive direct feedback from the conference and improve future events.

Lunch

Lunch

Lunch (open to PEAD and XTreme participants)

Lunch

Lunch (sponsored by Nvidia)

Lunch

Lunch (sponsored by Codee)

Lunch

Lunch (sponsored by Nvidia)

Lunch

Lunch (sponsored by Codee)

Networking/Social Event

Networking/Social Event

WHPC+ Australasia and AMD Diversity and Inclusion Breakfast

Women in High Performance Computing Australasia(WHPC+) and AMD invite you to attend a community networking breakfast from 7:00 to 8:20am at the Westin Perth, in the Banksia Room. WHPC+ was created to promote diversity in the HPC industry by encouraging new people into the field and retaining those who are already here. This event is generously sponsored by AMD who are very supportive of the Australasian chapter. This event is conveniently located in the beautiful Westin Perth hotel so that you can easily get to the first meeting session of the day at 8:30 am in Ballroom 2. Come along to meet and learn from others who are championing diversity and inclusion in HPC! While this event is free to attend, numbers are capped so registration is required: https://pawsey.org.au/event/whpc-australasia-and-amd-diversity-and-inclusion-breakfast/

Papers

Presentation, Paper

Technical Session 1B

Chair: Jim Williams (Los Alamos National Laboratory)

Enhancing HPC Service Management on Alps using FirecREST API

Juan Pablo Dorsch, Andreas Fink, Eirini Koutsaniti, and Rafael Sarmiento (Swiss National Supercomputing Centre)

Automated Hardware-Aware Node Selection for Cluster Computing

Manuel Sopena Ballesteros, Miguel Gila, Matteo Chesi, and Mark Klein (Swiss National Supercomputing Centre, ETH Zurich)

Versatile Software-defined Cluster on Cray HPE EX Systems

Maxime Martinasso, Mark Klein, Benjamin Cumming, Miguel Gila, and Felipe Cruz (Swiss National Supercomputing Centre, ETH Zurich)

Presentation, Paper

Technical Session 1A

Chair: Lena M Lopatina (LANL)

CPE Updates

Barbara Chapman (HPE)

A Deep Dive Into NVIDIA's HPC Software

Jeff Larkin and Becca Zandstein (NVIDIA)

Slurm 24.05 and Beyond

Tim Wickberg (SchedMD LLC)

Presentation, Paper

Technical Session 1C

Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)

Towards the Development of an Exascale Network Digital Twin

John Holmen (Oak Ridge National Laboratory); Md Nahid Newaz (Oakland University); and Srikanth Yoginath, Matthias Maiterth, Amir Shehata, Nick Hagerty, Christopher Zimmer, and Wesley Brewer (Oak Ridge National Laboratory)

A Performance Deep Dive into HPC-AI Workflows with Digital Twins

Ana Gainaru (Oak Ridge National Laboratory); Greg Eisenhauer (Georgia Institute of Technology); and Fred Suter, Norbert Podhorszki, and Scott Klasky (Oak Ridge National Laboratory)

Optimizing Checkpoint-Restart Mechanisms for HPC with DMTCP in Containers at NERSC

Madan Timalsina, Lisa Gerhardt, Johannes Blaschke, Nicholas Tyler, and William Arndt (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Presentation, Paper

Technical Session 2B

Chair: Lena M Lopatina (LANL)

EMOI: CSCS Extensible Monitoring and Observability Infrastructure

Massimo Benini (CSCS); Jeff Hanson (HPE); and Dino Conciatore, Gianni Mario Ricciardi, Michele Brambilla, Monica Frisoni, Mathilde Gianolli, Gianna Marano, and Jean-Guillaume Piccinali (CSCS)

Swordfish/Redfish and ClusterStor - Using Advanced Monitoring to Improve Insight into Complex I/O Workflows.

Torben Kling Petersen, Tim Morneau, Dan Matthews, and Nathan Rutman (HPE)

CADDY: Scalable Summarizations over Voluminous Telemetry Data for Efficient Monitoring

Saptashwa Mitra, Scott Ragland, Vanessa Zambrano, Dipanwita Mallick, Charlie Vollmer, Lance Kelley, and Nithin Singh Mohan (Hewlett Packard Enterprise)

In the rapidly evolving landscape of High-Performance Computing (HPC), the efficient management and analysis of telemetry data is pivotal for ensuring system robustness and performance optimization. As HPC systems scale in complexity and capability, traditional data processing methodologies struggle to meet the demands of rapid real-time analytics and large-scale data management. This paper introduces an innovative framework, Caddy, which employs a novel approach to HPC telemetry storage and interactive analysis. Built on the foundation of HPE's Slingshot interconnect and the Fabric AIOps (FAIO) system, Caddy aims to address the critical need for a memory-efficient, scalable, and real-time analytical solution for seamless monitoring over large HPC environments.

Fabric AIOps’ reliance on voluminous telemetry data generated from Slingshot’s nodes and switches poses significant challenges for traditional disk-based storage solutions and their ability to efficiently parse, analyze, extract real-time insights, identify potential bottlenecks, and ensure seamless operation. Telemetry analytics, particularly in cases of aggregation over large sections of the fabric, involves high disk I/O, network transfers, and record processing. Their interactivity and consistency are further impeded by constantly streaming telemetry and multiple simultaneous analytical queries.

In-memory analytics can offer significant speed-up over disk-based approaches and enable significantly accelerated aggregation computations, but limited memory capacity at scale remains a critical hurdle for traditional caching. This paper explores Caddy, a novel in-memory storage and summarization framework, emphasizing its ability to generate and accurately apply dynamic updates to low-dimensional representations of telemetry data as well as provide quick data access during query evaluations. Caddy facilitates instant insights, minimizes memory footprint, and retains accuracy in analytical processes, even under intense HPC demands.

System benchmarks over Caddy demonstrate that a Caddy-enabled FAIO live mode reduces query latency of a single-frame by ~2x for a switch-level telemetry aggregation. Caddy also reduced the memory footprint of in-memory telemetry drastically - for instance, Bins with temporal sizes of 10, 15, and 30 mins, the compression factors achieved were ~425x, 600x, and 1200x, respectively,

Command Lines vs. Requested Resources: How Well Do They Align?

Ben Fulton, Abhinav Thota, Scott Michael, and Jefferson Davis (Indiana University)

Presentation, Paper

Technical Session 2A

Chair: Jim Rogers (Oak Ridge National Laboratory)

Updated Node Power Management For New HPE Cray EX255a and EX254n Blades

Brian Collum and Steven Martin (Hewlett Packard Enterprise)

HPE Cray EX Power Monitoring Counters

Steven Martin, Brian Collum, and Sean Byland (HPE)

First Analysis on Cooling Temperature Impacts on MI250x Exascale Nodes (HPE Cray EX235A)

Torsten Wilde (HPE), Michael Ott (LRZ), and Pete Guyan (HPE)

EVeREST: An Effective and Versatile Runtime Energy Saving Tool

Anna Yue (Hewlett Packard Enterprise, University of Minnesota) and Sanyam Mehta and Torsten Wilde (Hewlett Packard Enterprise)

Presentation, Paper

Technical Session 2C

Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)

Optimising the Processing and Storage of Radio Astronomy Data

Alexander Williamson (International Centre for Radio Astronomy Research, University of Western Australia); Pascal Elahi (Pawsey Supercomputing Research Centre); Richard Dodson and Jonghwan Rhee (International Centre for Radio Astronomy Research, University of Western Australia); and Qian Gong (Oak Ridge National Laboratory)

Performance and scaling of the LFRic weather and climate model on different generations of HPE Cray EX supercomputers

J. Mark Bull (EPCC, The University of Edinburgh); Andrew Coughtrie (Met Office, UK); Deva Deeptimahanti (Pawsey Supercomputing Research Centre); Mark Hedley (Met Office, UK); Caoimhin Laoide-Kemp (EPCC, The University of Edinburgh); Christopher Maynard (Met Office); Harry Shepherd (Met Office, UK); Sebastiaan Van De Bund and Michele Weiland (EPCC, The University of Edinburgh); and Benjamin Went (Met Office, UK)

Disaggregated memory in OpenSHMEM applications – Approach and Benefits

Clarete Crasta, sharad singhal, Faizan barmawer, Ramesh Chaurasiya, sajeesh KV, Dave Emberson, Harumi Kuno, and John Byrne (Hewlett Packard)

Migrating Complex Workflows to the Exascale: Challenges for Radio Astronomy

Pascal Jahan Elahi (Pawsey Supercomputing Research Centre) and Matt Austin, Eric Bastholm, Paulus Lahur, Wasim Raja, Maxim Voronkov, Mark Wieringa, Matthew Whiting, Daniel Mitchell, and Stephen Ord (CSIRO)

Presentation, Paper

Technical Session 3B

Chair: Gabriel Hautreux (CINES)

Spack Based Production Programming Environments on Cray Shasta

Paul Ferrell and Timothy Goetsch (LANL)

Containers-first user environments on HPE Cray EX

Felipe Cruz and Alberto Madonna (Swiss National Supercomputing Centre)

Cloud-Native Slurm management on HPE Cray EX

Felipe A. Cruz, Manuel Sopena, and Guilherme Peretti-Pezzi (Swiss National Supercomputing Centre)

Presentation, Paper

Technical Session 3A

Chair: Bilel Hadri (KAUST Supercomputing Lab)

Early Application Experiences on Aurora at ALCF: Moving From Petascale to Exascale Systems

Colleen Bertoni, JaeHyuk Kwack, Thomas Applencourt, Abhishek Bagusetty, Yasaman Ghadar, Brian Homerding, Christopher Knight, Ye Luo, Mathialakan Thavappiragasam, John Tramm, Esteban Rangel, Umesh Unnikrishnan, Timothy J. Williams, and Scott Parker (Argonne National Laboratory)

Streaming Data in HPC Workflows Using ADIOS

Greg Eisenhauer (Georgia Institute of Technology); Norbert Podhorszki, Ana Gainaru, and Scott Klasky (Oak Ridge National Laboratory); Philip Davis and Manish Parashar (University of Utah); Matthew Wolf (Samsung SAIT); Eric Suchtya (Oak Ridge National Laboratory); Erick Fredj (Toga Networks, Jerusalem College of Technology); Vicente Bolea (Kitware, Inc); Franz Pöschel, Klaus Steiniger, and Michael Bussmann (Center for Advanced Systems Understanding); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf); and Sunita Chandrasekaran (University of Delaware)

Enrichment and Acceleration of Edge to Exascale Computational Steering STEM Workflow using Common Metadata Framework

Gayathri Saranathan (Hewlett Packard Enterprise, Hewlett Packard Labs); Martin Foltin, Aalap Tripathy, and Annmary Justine (Hewlett Packard Enterprise); Ayana Ghosh, Maxim Ziatdinov, and Kevin Roccapriore (Oak Ridge National Laboratory); and Suparna Bhattacharya, Paolo Faraboschi, and Sreenivas Rangan Sukumaran (Hewlett Packard Enterprise)

Presentation, Paper

Technical Session 3C

Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

CSM-based Software Stack Overview 2024

Harold Longley and Jason Sollom (Hewlett Packard Enterprise)

Overview of HPCM

Peter Guyan and Sue Miller (HPE)

Seamless Cluster Migration in CSM

Miguel Gila and Manuel Sopena Ballesteros (Swiss National Supercomputing Centre)

Presentation, Paper

Technical Session 4B

Chair: Brett Bode (National Center for Supercomputing Applications/University of Illinois, National Center for Supercomputing Applications)

Scalability and Performance of OFI and UCX on ARCHER2

Jaffery Irudayasamy and Juan F. R. Herrera (EPCC, The University of Edinburgh); Evgenij Belikov (EPCC, The University of EdinburghE); and Michael Bareford (EPCC, The University of Edinburgh)

Using P4 for Cassini-3 Software Development Environment

Hardik Soni, Frank Zago, Khaled Diab, Igor Gorodetsky, and Puneet Sharma (HPE)

Running NCCL and RCCL Applications on HPE Slingshot NIC

Jesse Treger and Caio Davi (HPE)

Enabling NCCL on Slingshot 11 at NERSC

Jim Dinan (NVIDIA), Peter Harrington (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Igor Gorodetsky (HPE), Josh Romero (NVIDIA), Steven Farrell (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Ian Ziemba (HPE), and Wahid Bhimji and Shashank Subramanian (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Presentation, Paper

Technical Session 4A

Chair: Lena M Lopatina (LANL)

Multi-stage Approach for Identifying Defective Hardware in Frontier

Nick Hagerty (Oak Ridge National Laboratory), Andy Warner (HPE), and Jordan Webb (Oak Ridge National Laboratory)

From Frontier to Framework: Enhancing Hardware Triage for Exascale Machines

Isa Wazirzada, Abhishek Mehta, and Vinanti Phadke (Hewlett Packard Enterprise)

Full-stack Approach to HPC Testing

Pascal Jahan Elahi and Craig Meyer (Pawsey Supercomputing Research Centre)

An Approach to Continuous Testing

Francine Lapid and Shivam Mehta (Los Alamos National Laboratory)

Presentation, Paper

Technical Session 4C

Chair: Gabriel Hautreux (CINES)

LLM Serving With Efficient KV-Cache Management Using Triggered Operations

Aditya Dhakal, Pedro Bruel, Gourav Rattihalli, Sai Rahul Chalamalasetti, and Dejan Milojicic (Hewlett Packard Enterprise, Hewlett Packard Labs)

From Chatbots to Interfaces: Diversifying the Application of Large Language Models for Enhanced Usability

Jonathan Sparks, Pierre Carrier, and Gallig Renaud (Hewlett Packard Enterprise)

Delivering Large Language Model Platforms With HPC

Laura Huber, Abhinav Thota, Scott Michael, and Jefferson Davis (Indiana University)

System for Recommendation and Evaluation of Large Language Models for practical tasks in Science

Cong Xu, Tarun Kumar, Martin Foltin, Annmary Justine, Sergey Serebryakov, Arpit Shah, Agnieszka Ciborowska, Ashish Mishra, Gyanaranjan Nayak, Suparna Bhattacharya, and Paolo Faraboschi (Hewlett Packard Enterprise)

Presentation, Paper

Technical Session 5B

Chair: Adrian Jackson (EPCC, The University of Edinburgh)

Leveraging GNU Parallel for Optimal Utilization of HPC Resources on Frontier and Perlmutter Supercomputers

Ketan Maheshwari (Oak Ridge National Laboratory), William Arndt (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), and Rafael Ferreira Da Silva (Oak Ridge National Laboratory)

Portable Support for GPUs and Distributed-Memory Parallelism in Chapel

Andrew Stone and Engin Kayraklioglu (Hewlett Packard Enterprise)

PaCER: Accelerating Science on Setonix

Maciej Cytowski and Ann Backhaus (Pawsey Supercomputing Research Centre) and Joseph Schoonover (Fluid Numerics)

Presentation, Paper

Technical Session 5A

Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)

Power and Performance analysis of GraceHopper superchips on HPE Cray EX systems

Benjamin Donald Cumming and Miguel Gila (CSCS), Brian Collum (HPE), Sebastian Keller (CSCS), and Bryan Villalon and Steven James Martin (HPE)

Accelerating Scientific Workflows with the NVIDIA Grace Hopper Platform

Gabriel Noaje (NVIDIA)

GROMACS on AMD GPU-Based HPC Platforms: Using SYCL for Performance and Portability

Andrey Alekseenko (KTH/SciLifeLab); Szilárd Páll (KTH/PDC); and Erik Lindahl (KTH, Stockholm University)

GROMACS is a widely-used molecular dynamics software package with a focus on performance, portability, and maintainability across a broad range of platforms. Thanks to its early algorithmic redesign and flexible heterogeneous parallelization, GROMACS has successfully harnessed GPU accelerators for more than a decade. With the diversification of accelerator platforms in HPC and no obvious choice for a well-suited multi-vendor programming model, the GROMACS project found itself at a crossroads. The performance and portability requirements, as well as a strong preference for a standards-based programming model, motivated our choice to use SYCL for production on both new HPC GPU platforms: AMD and Intel. Since the GROMACS 2022 release, the SYCL backend has been the primary means to target AMD GPUs in preparation for exascale HPC architectures like LUMI and Frontier. SYCL is a cross-platform, royalty-free, C++17-based standard for programming hardware accelerators, from embedded to HPC. It allows using the same code to target GPUs from all three major vendors with minimal specialization, which offers major portability benefits. While SYCL implementations build on native compilers and runtimes, whether such an approach is performant is not immediately evident. Biomolecular simulations have challenging performance characteristics: latency sensitivity, the need for strong scaling, and typical iteration times as short as hundreds of microseconds. Hence, obtaining good performance across the range of problem sizes and scaling regimes is particularly challenging. Here, we share the results of our work on readying GROMACS for AMD GPU platforms using SYCL, and demonstrate performance on Cray EX235a machines with MI250X accelerators. Our findings illustrate that portability is possible without major performance compromises. We provide a detailed analysis of node-level kernel and runtime performance with the aim of sharing best practices with the HPC community on using SYCL as a performance-portable GPU framework.

Presentation, Paper

Technical Session 5C

Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

Cray EX Security Experiences

Ben Matthews (NCAR/UCAR)

Best of Times, Worst of Times: A Cautionary Tale of Vulnerability Handling

Aaron Scantlin (National Energy Research Scientific Computing Center)

AIOPS Empowered: Failure Prediction in System Management Software Tools

Deepak Nanjundaiah and SUBRAHMANYA VINAYAK JOSHI (HPE)

Presentation, Paper

Technical Session 6B

Chair: Paul L. Peltz Jr. (Oak Ridge National Laboratory)

POD: Reconfiguring Compute and Storage Resources Between Cray EX Systems

Eric Roman and Tina Declerck (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) and Sean Lynn (Hewlett Packard Enterprise)

Zero Downtime System Upgrade Strategy

Alden Stradling and Joshi Fullop (Los Alamos National Laboratory)

Multitenancy on HPE Cray EX: network segmentation and isolation

Chris Gamboni (Swiss National Supercomputing Centre)

Presentation, Paper

Technical Session 6A

Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)

Unification of Alerting Engines for Monitoring in System Management

Raghul Vasudevan, Ambresh Gupta, and Sinchana Karnik (Hewlett Packard)

HPE Cray EX255a Telemetry - Improved Configurability and Performance

Sean Byland, Steven Martin, and Brian Collum (HPE)

Best Practices for deployment of LDMS on the HPE Cray EX platform

James Brandt, Kevin Stroup, and Ann Gentile (Sandia National Laboratories)

Presentation, Paper

Technical Session 6C

Chair: Bilel Hadri (KAUST Supercomputing Lab)

ClusterStor Tiering, Overview, Setup, and Performance

Nathan Rutman (Hewlett Packard)

Exploring new software-defined storage technology using VAST on Cray EX systems

Mark Klein, Chris Gamboni, Gennaro Oliva, and Salvatore Di Nardo (Swiss National Supercomputing Centre, ETH Zurich); Maria Gutierrez (VAST Data); and Riccardo Di Maria and Miguel Gila (Swiss National Supercomputing Centre, ETH Zurich)

Reducing Mean Time to Resolution (MTTR) for complex HPC-based systems with next generation automated service tools.

Michael Cush (HPE)

Presentation, Paper

Technical Session 7A

Chair: John Holmen (Oak Ridge National Laboratory)

Proactive Precision: Enhancing High-Performance Computing with Early Job Failure Detection

Dipanwita Mallick, Siddhi Potdar, Saptashwa Mitra, Nithin Mohan, and Charlie Vollmer (Hewlett Packard Enterprise)

Presentation, Paper

Technical Session 8B

Chair: Raj Gautam (ExxonMobil)

Using HPE-Provided Resources to Integrate HPE Support into Internal Incident Management

John Gann, Daniel Gens, and Elizabeth Bautista (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Presentation, Paper

Technical Session 8A

Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

Optimizing I/O Patterns to Speed up Non-contiguous Data Retrieval and Analyses

Scott Klasky, Qian Gong, and Norbert Podhorszki (Oak Ridge National Laboratory)

Presentation, Paper

Technical Session 8C

Chair: Jim Rogers (Oak Ridge National Laboratory)

Building LDMS Slingshot Switch Samplers

Kevin Stroup, Cory Lueninghoener, Jim Brandt, and Ann Gentile (Sandia National Laboratories)

Plenary

Plenary

Plenary: Welcome, Keynote

Opening

Ashley Barker (Oak Ridge National Laboratory)

Welcome to Country Ceremony

Maciej Cytowski (Pawsey)

Welcome from the CUG President

Ashley Barker (Oak Ridge National Laboratory)

Talk by Dr. Sarah Pearce, SKA-Low Telescope Director

Sarah Pearce (SKA-Low Telescope)

Convergence of Energy Efficient Scientific Computing and GenAI

Gabriel Noaje (NVIDIA)

High Performance Remote Linux Desktops with ThinLinc

Robert Henschel and Aaron Sowry (Cendio AB)

Unlocking Exascale Debugging and Performance Engineering with Linaro Forge

Marcin Krzysztofik (Linaro)

Plenary

Plenary: CUG site, HPE update

Welcome by Pawsey

Mark Stickells (Pawsey Supercomputing Centre)

HPE corporate update by Gerald Kleyn

Gerald Kleyn (HPE)

Plenary

Plenary: CUG Board Updates (Open), CUG Elections, and Best papers

CUG Board Updates, SIG Presentations, and Board Elections – Open Session

Ashley Barker (Oak Ridge National Laboratory)

Nine Months in the life of an all-flash file system

Lisa Gerhardt, Stephen Simms, David Fox, Ershaad Basheer, and Kirill Lozinskiy (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); Michael Moore (HPE); and Wahid Bhimji (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Isambard-AI: a leadership-class supercomputer optimised specifically for Artificial Intelligence

Simon McIntosh-Smith, Sadaf Alam, and Christopher Woods (University of Bristol)

Plenary

Plenary: Sponsors talks, HPE 1-100

The Biggest Change to HPC Job Scheduling and Resource Management in 30 Years

Branden Bauer (Altair Engineering, Inc.)

Codee: Automatic Code Inspection Tools for Performance and Code Modernization

Manuel Arenaz (Codee)

AMD Together We Advance Supercomputing

Adam Bestavros (AMD)

HPE 1 on 100 (HPE Customers only: no HPE partners or CUG sponsors)

Trish Damkroger (HPE)

Plenary

Plenary: CUG 2024, Invited speakers

CUG2025 site presentation

TBD TBD (TBD)

VAST Data

Maria Perez Gutierrez (VAST)

Advancing Gas Turbine Development using HPC: Challenges and Rewards

Richard D. Sandberg and Melissa Kozul (University of Melbourne)

Plenary

CUG 2024 Closing

Presentations

Presentation, Paper

Technical Session 1B

Chair: Jim Williams (Los Alamos National Laboratory)

Enhancing HPC Service Management on Alps using FirecREST API

Juan Pablo Dorsch, Andreas Fink, Eirini Koutsaniti, and Rafael Sarmiento (Swiss National Supercomputing Centre)

Automated Hardware-Aware Node Selection for Cluster Computing

Manuel Sopena Ballesteros, Miguel Gila, Matteo Chesi, and Mark Klein (Swiss National Supercomputing Centre, ETH Zurich)

Versatile Software-defined Cluster on Cray HPE EX Systems

Maxime Martinasso, Mark Klein, Benjamin Cumming, Miguel Gila, and Felipe Cruz (Swiss National Supercomputing Centre, ETH Zurich)

Presentation, Paper

Technical Session 1A

Chair: Lena M Lopatina (LANL)

CPE Updates

Barbara Chapman (HPE)

A Deep Dive Into NVIDIA's HPC Software

Jeff Larkin and Becca Zandstein (NVIDIA)

Slurm 24.05 and Beyond

Tim Wickberg (SchedMD LLC)

Presentation, Paper

Technical Session 1C

Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)

Towards the Development of an Exascale Network Digital Twin

John Holmen (Oak Ridge National Laboratory); Md Nahid Newaz (Oakland University); and Srikanth Yoginath, Matthias Maiterth, Amir Shehata, Nick Hagerty, Christopher Zimmer, and Wesley Brewer (Oak Ridge National Laboratory)

A Performance Deep Dive into HPC-AI Workflows with Digital Twins

Ana Gainaru (Oak Ridge National Laboratory); Greg Eisenhauer (Georgia Institute of Technology); and Fred Suter, Norbert Podhorszki, and Scott Klasky (Oak Ridge National Laboratory)

Optimizing Checkpoint-Restart Mechanisms for HPC with DMTCP in Containers at NERSC

Madan Timalsina, Lisa Gerhardt, Johannes Blaschke, Nicholas Tyler, and William Arndt (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Presentation, Paper

Technical Session 2B

Chair: Lena M Lopatina (LANL)

EMOI: CSCS Extensible Monitoring and Observability Infrastructure

Massimo Benini (CSCS); Jeff Hanson (HPE); and Dino Conciatore, Gianni Mario Ricciardi, Michele Brambilla, Monica Frisoni, Mathilde Gianolli, Gianna Marano, and Jean-Guillaume Piccinali (CSCS)

Swordfish/Redfish and ClusterStor - Using Advanced Monitoring to Improve Insight into Complex I/O Workflows.

Torben Kling Petersen, Tim Morneau, Dan Matthews, and Nathan Rutman (HPE)

CADDY: Scalable Summarizations over Voluminous Telemetry Data for Efficient Monitoring

Saptashwa Mitra, Scott Ragland, Vanessa Zambrano, Dipanwita Mallick, Charlie Vollmer, Lance Kelley, and Nithin Singh Mohan (Hewlett Packard Enterprise)

In the rapidly evolving landscape of High-Performance Computing (HPC), the efficient management and analysis of telemetry data is pivotal for ensuring system robustness and performance optimization. As HPC systems scale in complexity and capability, traditional data processing methodologies struggle to meet the demands of rapid real-time analytics and large-scale data management. This paper introduces an innovative framework, Caddy, which employs a novel approach to HPC telemetry storage and interactive analysis. Built on the foundation of HPE's Slingshot interconnect and the Fabric AIOps (FAIO) system, Caddy aims to address the critical need for a memory-efficient, scalable, and real-time analytical solution for seamless monitoring over large HPC environments.

Fabric AIOps’ reliance on voluminous telemetry data generated from Slingshot’s nodes and switches poses significant challenges for traditional disk-based storage solutions and their ability to efficiently parse, analyze, extract real-time insights, identify potential bottlenecks, and ensure seamless operation. Telemetry analytics, particularly in cases of aggregation over large sections of the fabric, involves high disk I/O, network transfers, and record processing. Their interactivity and consistency are further impeded by constantly streaming telemetry and multiple simultaneous analytical queries.

In-memory analytics can offer significant speed-up over disk-based approaches and enable significantly accelerated aggregation computations, but limited memory capacity at scale remains a critical hurdle for traditional caching. This paper explores Caddy, a novel in-memory storage and summarization framework, emphasizing its ability to generate and accurately apply dynamic updates to low-dimensional representations of telemetry data as well as provide quick data access during query evaluations. Caddy facilitates instant insights, minimizes memory footprint, and retains accuracy in analytical processes, even under intense HPC demands.

System benchmarks over Caddy demonstrate that a Caddy-enabled FAIO live mode reduces query latency of a single-frame by ~2x for a switch-level telemetry aggregation. Caddy also reduced the memory footprint of in-memory telemetry drastically - for instance, Bins with temporal sizes of 10, 15, and 30 mins, the compression factors achieved were ~425x, 600x, and 1200x, respectively,

Command Lines vs. Requested Resources: How Well Do They Align?

Ben Fulton, Abhinav Thota, Scott Michael, and Jefferson Davis (Indiana University)

Presentation, Paper

Technical Session 2A

Chair: Jim Rogers (Oak Ridge National Laboratory)

Updated Node Power Management For New HPE Cray EX255a and EX254n Blades

Brian Collum and Steven Martin (Hewlett Packard Enterprise)

HPE Cray EX Power Monitoring Counters

Steven Martin, Brian Collum, and Sean Byland (HPE)

First Analysis on Cooling Temperature Impacts on MI250x Exascale Nodes (HPE Cray EX235A)

Torsten Wilde (HPE), Michael Ott (LRZ), and Pete Guyan (HPE)

EVeREST: An Effective and Versatile Runtime Energy Saving Tool

Anna Yue (Hewlett Packard Enterprise, University of Minnesota) and Sanyam Mehta and Torsten Wilde (Hewlett Packard Enterprise)

Presentation, Paper

Technical Session 2C

Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)

Optimising the Processing and Storage of Radio Astronomy Data

Alexander Williamson (International Centre for Radio Astronomy Research, University of Western Australia); Pascal Elahi (Pawsey Supercomputing Research Centre); Richard Dodson and Jonghwan Rhee (International Centre for Radio Astronomy Research, University of Western Australia); and Qian Gong (Oak Ridge National Laboratory)

Performance and scaling of the LFRic weather and climate model on different generations of HPE Cray EX supercomputers

J. Mark Bull (EPCC, The University of Edinburgh); Andrew Coughtrie (Met Office, UK); Deva Deeptimahanti (Pawsey Supercomputing Research Centre); Mark Hedley (Met Office, UK); Caoimhin Laoide-Kemp (EPCC, The University of Edinburgh); Christopher Maynard (Met Office); Harry Shepherd (Met Office, UK); Sebastiaan Van De Bund and Michele Weiland (EPCC, The University of Edinburgh); and Benjamin Went (Met Office, UK)

Disaggregated memory in OpenSHMEM applications – Approach and Benefits

Clarete Crasta, sharad singhal, Faizan barmawer, Ramesh Chaurasiya, sajeesh KV, Dave Emberson, Harumi Kuno, and John Byrne (Hewlett Packard)

Migrating Complex Workflows to the Exascale: Challenges for Radio Astronomy

Pascal Jahan Elahi (Pawsey Supercomputing Research Centre) and Matt Austin, Eric Bastholm, Paulus Lahur, Wasim Raja, Maxim Voronkov, Mark Wieringa, Matthew Whiting, Daniel Mitchell, and Stephen Ord (CSIRO)

Presentation, Paper

Technical Session 3B

Chair: Gabriel Hautreux (CINES)

Spack Based Production Programming Environments on Cray Shasta

Paul Ferrell and Timothy Goetsch (LANL)

Containers-first user environments on HPE Cray EX

Felipe Cruz and Alberto Madonna (Swiss National Supercomputing Centre)

Cloud-Native Slurm management on HPE Cray EX

Felipe A. Cruz, Manuel Sopena, and Guilherme Peretti-Pezzi (Swiss National Supercomputing Centre)

Presentation, Paper

Technical Session 3A

Chair: Bilel Hadri (KAUST Supercomputing Lab)

Early Application Experiences on Aurora at ALCF: Moving From Petascale to Exascale Systems

Colleen Bertoni, JaeHyuk Kwack, Thomas Applencourt, Abhishek Bagusetty, Yasaman Ghadar, Brian Homerding, Christopher Knight, Ye Luo, Mathialakan Thavappiragasam, John Tramm, Esteban Rangel, Umesh Unnikrishnan, Timothy J. Williams, and Scott Parker (Argonne National Laboratory)

Streaming Data in HPC Workflows Using ADIOS

Greg Eisenhauer (Georgia Institute of Technology); Norbert Podhorszki, Ana Gainaru, and Scott Klasky (Oak Ridge National Laboratory); Philip Davis and Manish Parashar (University of Utah); Matthew Wolf (Samsung SAIT); Eric Suchtya (Oak Ridge National Laboratory); Erick Fredj (Toga Networks, Jerusalem College of Technology); Vicente Bolea (Kitware, Inc); Franz Pöschel, Klaus Steiniger, and Michael Bussmann (Center for Advanced Systems Understanding); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf); and Sunita Chandrasekaran (University of Delaware)

Enrichment and Acceleration of Edge to Exascale Computational Steering STEM Workflow using Common Metadata Framework

Gayathri Saranathan (Hewlett Packard Enterprise, Hewlett Packard Labs); Martin Foltin, Aalap Tripathy, and Annmary Justine (Hewlett Packard Enterprise); Ayana Ghosh, Maxim Ziatdinov, and Kevin Roccapriore (Oak Ridge National Laboratory); and Suparna Bhattacharya, Paolo Faraboschi, and Sreenivas Rangan Sukumaran (Hewlett Packard Enterprise)

Presentation, Paper

Technical Session 3C

Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

CSM-based Software Stack Overview 2024

Harold Longley and Jason Sollom (Hewlett Packard Enterprise)

Overview of HPCM

Peter Guyan and Sue Miller (HPE)

Seamless Cluster Migration in CSM

Miguel Gila and Manuel Sopena Ballesteros (Swiss National Supercomputing Centre)

Presentation, Paper

Technical Session 4B

Chair: Brett Bode (National Center for Supercomputing Applications/University of Illinois, National Center for Supercomputing Applications)

Scalability and Performance of OFI and UCX on ARCHER2

Jaffery Irudayasamy and Juan F. R. Herrera (EPCC, The University of Edinburgh); Evgenij Belikov (EPCC, The University of EdinburghE); and Michael Bareford (EPCC, The University of Edinburgh)

Using P4 for Cassini-3 Software Development Environment

Hardik Soni, Frank Zago, Khaled Diab, Igor Gorodetsky, and Puneet Sharma (HPE)

Running NCCL and RCCL Applications on HPE Slingshot NIC

Jesse Treger and Caio Davi (HPE)

Enabling NCCL on Slingshot 11 at NERSC

Jim Dinan (NVIDIA), Peter Harrington (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Igor Gorodetsky (HPE), Josh Romero (NVIDIA), Steven Farrell (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Ian Ziemba (HPE), and Wahid Bhimji and Shashank Subramanian (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Presentation, Paper

Technical Session 4A

Chair: Lena M Lopatina (LANL)

Multi-stage Approach for Identifying Defective Hardware in Frontier

Nick Hagerty (Oak Ridge National Laboratory), Andy Warner (HPE), and Jordan Webb (Oak Ridge National Laboratory)

From Frontier to Framework: Enhancing Hardware Triage for Exascale Machines

Isa Wazirzada, Abhishek Mehta, and Vinanti Phadke (Hewlett Packard Enterprise)

Full-stack Approach to HPC Testing

Pascal Jahan Elahi and Craig Meyer (Pawsey Supercomputing Research Centre)

An Approach to Continuous Testing

Francine Lapid and Shivam Mehta (Los Alamos National Laboratory)

Presentation, Paper

Technical Session 4C

Chair: Gabriel Hautreux (CINES)

LLM Serving With Efficient KV-Cache Management Using Triggered Operations

Aditya Dhakal, Pedro Bruel, Gourav Rattihalli, Sai Rahul Chalamalasetti, and Dejan Milojicic (Hewlett Packard Enterprise, Hewlett Packard Labs)

From Chatbots to Interfaces: Diversifying the Application of Large Language Models for Enhanced Usability

Jonathan Sparks, Pierre Carrier, and Gallig Renaud (Hewlett Packard Enterprise)

Delivering Large Language Model Platforms With HPC

Laura Huber, Abhinav Thota, Scott Michael, and Jefferson Davis (Indiana University)

System for Recommendation and Evaluation of Large Language Models for practical tasks in Science

Cong Xu, Tarun Kumar, Martin Foltin, Annmary Justine, Sergey Serebryakov, Arpit Shah, Agnieszka Ciborowska, Ashish Mishra, Gyanaranjan Nayak, Suparna Bhattacharya, and Paolo Faraboschi (Hewlett Packard Enterprise)

Presentation, Paper

Technical Session 5B

Chair: Adrian Jackson (EPCC, The University of Edinburgh)

Leveraging GNU Parallel for Optimal Utilization of HPC Resources on Frontier and Perlmutter Supercomputers

Ketan Maheshwari (Oak Ridge National Laboratory), William Arndt (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), and Rafael Ferreira Da Silva (Oak Ridge National Laboratory)

Portable Support for GPUs and Distributed-Memory Parallelism in Chapel

Andrew Stone and Engin Kayraklioglu (Hewlett Packard Enterprise)

PaCER: Accelerating Science on Setonix

Maciej Cytowski and Ann Backhaus (Pawsey Supercomputing Research Centre) and Joseph Schoonover (Fluid Numerics)

Presentation, Paper

Technical Session 5A

Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)

Power and Performance analysis of GraceHopper superchips on HPE Cray EX systems

Benjamin Donald Cumming and Miguel Gila (CSCS), Brian Collum (HPE), Sebastian Keller (CSCS), and Bryan Villalon and Steven James Martin (HPE)

Accelerating Scientific Workflows with the NVIDIA Grace Hopper Platform

Gabriel Noaje (NVIDIA)

GROMACS on AMD GPU-Based HPC Platforms: Using SYCL for Performance and Portability

Andrey Alekseenko (KTH/SciLifeLab); Szilárd Páll (KTH/PDC); and Erik Lindahl (KTH, Stockholm University)

GROMACS is a widely-used molecular dynamics software package with a focus on performance, portability, and maintainability across a broad range of platforms. Thanks to its early algorithmic redesign and flexible heterogeneous parallelization, GROMACS has successfully harnessed GPU accelerators for more than a decade. With the diversification of accelerator platforms in HPC and no obvious choice for a well-suited multi-vendor programming model, the GROMACS project found itself at a crossroads. The performance and portability requirements, as well as a strong preference for a standards-based programming model, motivated our choice to use SYCL for production on both new HPC GPU platforms: AMD and Intel. Since the GROMACS 2022 release, the SYCL backend has been the primary means to target AMD GPUs in preparation for exascale HPC architectures like LUMI and Frontier. SYCL is a cross-platform, royalty-free, C++17-based standard for programming hardware accelerators, from embedded to HPC. It allows using the same code to target GPUs from all three major vendors with minimal specialization, which offers major portability benefits. While SYCL implementations build on native compilers and runtimes, whether such an approach is performant is not immediately evident. Biomolecular simulations have challenging performance characteristics: latency sensitivity, the need for strong scaling, and typical iteration times as short as hundreds of microseconds. Hence, obtaining good performance across the range of problem sizes and scaling regimes is particularly challenging. Here, we share the results of our work on readying GROMACS for AMD GPU platforms using SYCL, and demonstrate performance on Cray EX235a machines with MI250X accelerators. Our findings illustrate that portability is possible without major performance compromises. We provide a detailed analysis of node-level kernel and runtime performance with the aim of sharing best practices with the HPC community on using SYCL as a performance-portable GPU framework.

Presentation, Paper

Technical Session 5C

Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

Cray EX Security Experiences

Ben Matthews (NCAR/UCAR)

Best of Times, Worst of Times: A Cautionary Tale of Vulnerability Handling

Aaron Scantlin (National Energy Research Scientific Computing Center)

AIOPS Empowered: Failure Prediction in System Management Software Tools

Deepak Nanjundaiah and SUBRAHMANYA VINAYAK JOSHI (HPE)

Presentation, Paper

Technical Session 6B

Chair: Paul L. Peltz Jr. (Oak Ridge National Laboratory)

POD: Reconfiguring Compute and Storage Resources Between Cray EX Systems

Eric Roman and Tina Declerck (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) and Sean Lynn (Hewlett Packard Enterprise)

Zero Downtime System Upgrade Strategy

Alden Stradling and Joshi Fullop (Los Alamos National Laboratory)

Multitenancy on HPE Cray EX: network segmentation and isolation

Chris Gamboni (Swiss National Supercomputing Centre)

Presentation, Paper

Technical Session 6A

Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)

Unification of Alerting Engines for Monitoring in System Management

Raghul Vasudevan, Ambresh Gupta, and Sinchana Karnik (Hewlett Packard)

HPE Cray EX255a Telemetry - Improved Configurability and Performance

Sean Byland, Steven Martin, and Brian Collum (HPE)

Best Practices for deployment of LDMS on the HPE Cray EX platform

James Brandt, Kevin Stroup, and Ann Gentile (Sandia National Laboratories)

Presentation, Paper

Technical Session 6C

Chair: Bilel Hadri (KAUST Supercomputing Lab)

ClusterStor Tiering, Overview, Setup, and Performance

Nathan Rutman (Hewlett Packard)

Exploring new software-defined storage technology using VAST on Cray EX systems

Mark Klein, Chris Gamboni, Gennaro Oliva, and Salvatore Di Nardo (Swiss National Supercomputing Centre, ETH Zurich); Maria Gutierrez (VAST Data); and Riccardo Di Maria and Miguel Gila (Swiss National Supercomputing Centre, ETH Zurich)

Reducing Mean Time to Resolution (MTTR) for complex HPC-based systems with next generation automated service tools.

Michael Cush (HPE)

Presentation, Paper

Technical Session 7A

Chair: John Holmen (Oak Ridge National Laboratory)

Proactive Precision: Enhancing High-Performance Computing with Early Job Failure Detection

Dipanwita Mallick, Siddhi Potdar, Saptashwa Mitra, Nithin Mohan, and Charlie Vollmer (Hewlett Packard Enterprise)

Presentation, Paper

Technical Session 8B

Chair: Raj Gautam (ExxonMobil)

Using HPE-Provided Resources to Integrate HPE Support into Internal Incident Management

John Gann, Daniel Gens, and Elizabeth Bautista (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Presentation, Paper

Technical Session 8A

Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

Optimizing I/O Patterns to Speed up Non-contiguous Data Retrieval and Analyses

Scott Klasky, Qian Gong, and Norbert Podhorszki (Oak Ridge National Laboratory)

Presentation, Paper

Technical Session 8C

Chair: Jim Rogers (Oak Ridge National Laboratory)

Building LDMS Slingshot Switch Samplers

Kevin Stroup, Cory Lueninghoener, Jim Brandt, and Ann Gentile (Sandia National Laboratories)

Tutorials

Tutorial

Tutorial 1B

Image Deployment and System Monitoring with HPCM

Peter Guyan, Sue Miller, Andy Warner, and Raghul Vasudevan (HPE)

Tutorial

Tutorial 1A

Supercomputer Affinity on HPE Systems

Edgar A. Leon and Jane E. Herriman (Lawrence Livermore National Laboratory)

Tutorial

Tutorial 1C

Omnitools: Performance Analysis Tools for AMD GPUs

Samuel Antao (AMD)

Tutorial

Tutorial 1B Continued

Image Deployment and System Monitoring with HPCM

Peter Guyan, Sue Miller, Andy Warner, and Raghul Vasudevan (HPE)

Tutorial

Tutorial 1A Continued

Supercomputer Affinity on HPE Systems

Edgar A. Leon and Jane E. Herriman (Lawrence Livermore National Laboratory)

Tutorial

Tutorial 1C Continued

Omnitools: Performance Analysis Tools for AMD GPUs

Samuel Antao (AMD)

Tutorial

Tutorial 2B

Automated Inspection of C/C++/Fortran Code Using Codee for Performance Optimization on HPE/Cray

Manuel Arenaz (Codee - Appentra Solutions)

Tutorial

Tutorial 2A

Monitoring, Tuning, and Troubleshooting a CSM system

Harold Longley and Jason Sollom (Hewlett Packard Enterprise)

Tutorial

Tutorial 2C

MGARD & ADIOS-2: A framework for extreme scale I/O with online data reduction

Scott Klasky, Qian Gong, and Norbert Podhorszki (ORNL)

Tutorial

Tutorial 2B Continued

Automated Inspection of C/C++/Fortran Code Using Codee for Performance Optimization on HPE/Cray

Manuel Arenaz (Codee - Appentra Solutions)

Tutorial

Tutorial 2A Continued

Monitoring, Tuning, and Troubleshooting a CSM system

Harold Longley and Jason Sollom (Hewlett Packard Enterprise)

Tutorial

Tutorial 2C Continued

MGARD & ADIOS-2: A framework for extreme scale I/O with online data reduction

Scott Klasky, Qian Gong, and Norbert Podhorszki (ORNL)

Tutorial

Lightning Tutorial 7B

Data Science Beyond the Laptop: Handling Data of Any Size with Arkouda

Ben McDonald and Michelle Strout (HPE)

Tutorial

Lightning Tutorial 7C

Exploring high performance object storage using DAOS

Adrian Jackson (EPCC, The University of Edinburgh)

XTreme

XTreme (Approved NDA Members Only)

XTreme (Approved NDA Members Only)

XTreme (Approved NDA Members Only)

XTreme (Approved NDA Members Only)

XTreme (Approved NDA Members Only)

XTreme (Approved NDA Members Only)

XTreme (Approved NDA Members Only)

XTreme (Approved NDA Members Only)

Created 2024-5-7 17:35