CUG2024 Proceedings




Birds of a Feathers
Birds of a Feather
Programming Environments, Applications, and Documentation (PEAD)
The PEAD (Programming Environments, Applications, and Documentation) is a CUG Special Interest Group that provides a forum for discussion and information exchange between CUG sites and Cray/HPE. The group focus includes system usability, performance of programming environments (including compilers, libraries, and tools), scientific applications running on Cray/HPE systems, user support, communication, and documentation. The group host meetings at CUG each year to help foster discussions surrounding these topics between HPE and member sites. Following a successful event at last year's CUG, this year the PEAD SIG will meet Sunday, May 05, from 1:00 PM - 5:00 PM. We are planning topics surrounding the HPE PE roadmap, training collaborations, HPE documentation, as well as Fortran support. All topics will be interactive and discussion based. Registration for the event is required. Lunch will be available for everyone who registers for the meeting.
PEAD Introduction
Chris Fuson (Oak Ridge National Laboratory)
Abstract
HPE Fortran Support
Bill Long (HPE)
Abstract
HPE/Cray CPE Roadmap
Barbara Chapman (HPE)
Abstract
Programming Environment Management BoF
Nick Hagerty (Oak Ridge National Laboratory) and David Carlson (Stony Brook University)
Abstract
Birds of a Feather
Programming Environments, Applications, and Documentation (PEAD)
HPE Documentation and Training Updates
Peggy Sanchez (HPE)
Abstract
Collaborative Development of HPC Training Materials
Lipi Gupta (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Ann Backhaus (Pawsey Supercomputing Research Centre), and Jane Herriman (Lawrence Livermore National Laboratory)
Abstract
pdf
HPE User Engagement Survey
Barbara Chapman, Peggy Sanchez, and Kaylie Anderson (HPE)
Abstract
Open Discussions
Chris Fuson (Oak Ridge National Laboratory)
Abstract
Birds of a Feather
BoF 1B
High Performance Data-centre Digital Twins
Matthias Maiterth (Oak Ridge National Laboratory); Tim Dykes and Jess Jones (HPE HPC/AI EMEA Research Lab); Adrian Jackson and Michele Weiland (EPCC, The University of Edinburgh); and Wes Brewer (Oak Ridge National Laboratory)
Abstract
pdf
Birds of a Feather
BoF 1A
OpenCHAMI for collaborators and the collaborator-curious
Travis Cotton and Alex Lovell-Troy (Los Alamos National Laboratory)
Abstract
pdf
Birds of a Feather
BoF 1C
HPE Slingshot Birds of a Feather
Jesse Treger (HPE)
Abstract
Birds of a Feather
BoF 2B
Bird of Feather on Artificial Intelligence and Machine Learning for HPC Workload Analysis (AIMLHPCWorkload2024)
Kadidia Konate and Richard Gerber (Lawrence Berkeley National Laboratory)
Abstract
Birds of a Feather
BoF 2A
2024 HPC Testathon: Experiences and Results
Veronica Melesse Vergara (Oak Ridge National Laboratory), Bilel Hadri (King Abdullah University of Science and Technology), and Maciej Cytowski (Pawsey Supercomputing Research Centre)
Abstract
Birds of a Feather
BoF 2C
Architecting a Cloud-based Supercomputing as-a-Service Solution
Pete Mendygral and Kirti Devi (Hewlett Packard Enterprise)
Abstract
Birds of a Feather
BoF 3A
System Monitoring Working Group
Craig West (BOM) and Stephen Leak (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract

Break
Break
Coffee Break
Break
Coffee Break
Break
Coffee Break (sponsored by Altair)
Break
Coffee Break (sponsored by Linaro)
Break
Coffee Break (sponsored by SchedMD)
Break
Coffee Break (sponsored by Thinlinc)
Break
Coffee Break (sponsored by VAST)
Break
Coffee Break (sponsored by Pier Group)
Break
Coffee Break
Break
Coffee Break

CUG Board
CUG Board
CUG Board & Sponsors Lunch (closed)
CUG sponsors (non-HPE) are invited to join the CUG Board for an informal lunch discussion.
CUG Board
HPE Executive Lunch (closed)
HPE Executives and representatives are invited to join the CUG Board for an informal lunch discussion.
CUG Board
New CUG Board / Old CUG Board Lunch (closed)
Newly elected board members are invited to an informal lunch with the prior CUG Board to discuss remaining activities for the week as well as future plans. Please bring food from the standard lunch buffet to the private area at the far end of the restaurant.

CUG Program Committee
CUG Program Committee
Program Committee Dinner (invite only)
Participants that helped with the reviews and program committee are invited to a private event. 6.30pm meet at the Beer Corner to be seated at 7pm in The Mark which is within COMO The Treasury, https://statebuildings.com/functions/the-mark/.
CUG Program Committee
CUG Advisory Board Lunch Cabinet (closed)
The CUG Advisory Board is comprised of chairs and liaisons from the special interest groups and program committee members. This session is typically lead by the CUG Vice President to discuss the program, provide guidance to session chairs for the week, and to receive feedback to improve processes and content for future events.
CUG Program Committee
CUG Advisory Board
The CUG Advisory Board is comprised of chairs and liaisons from the special interest groups and program committee members. This session is typically lead by the CUG Vice President to receive direct feedback from the conference and improve future events.

Lunch
Lunch
Lunch (open to PEAD and XTreme participants)
Lunch
Lunch (sponsored by Nvidia)
Lunch
Lunch (sponsored by Codee)
Lunch
Lunch (sponsored by Nvidia)
Lunch
Lunch (sponsored by Codee)

Networking/Social Event
Networking/Social Event
WHPC+ Australasia and AMD Diversity and Inclusion Breakfast
Women in High Performance Computing Australasia(WHPC+) and AMD invite you to attend a community networking breakfast from 7:00 to 8:20am at the Westin Perth, in the Banksia Room. WHPC+ was created to promote diversity in the HPC industry by encouraging new people into the field and retaining those who are already here. This event is generously sponsored by AMD who are very supportive of the Australasian chapter. This event is conveniently located in the beautiful Westin Perth hotel so that you can easily get to the first meeting session of the day at 8:30 am in Ballroom 2. Come along to meet and learn from others who are championing diversity and inclusion in HPC! While this event is free to attend, numbers are capped so registration is required: https://pawsey.org.au/event/whpc-australasia-and-amd-diversity-and-inclusion-breakfast/

Papers
Presentation, Paper
Technical Session 1B
Chair: Jim Williams (Los Alamos National Laboratory)
Enhancing HPC Service Management on Alps using FirecREST API
Juan Pablo Dorsch, Andreas Fink, Eirini Koutsaniti, and Rafael Sarmiento (Swiss National Supercomputing Centre)
Abstract
pdf, pdf
Automated Hardware-Aware Node Selection for Cluster Computing
Manuel Sopena Ballesteros, Miguel Gila, Matteo Chesi, and Mark Klein (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
pdf, pdf
Versatile Software-defined Cluster on Cray HPE EX Systems
Maxime Martinasso, Mark Klein, Benjamin Cumming, Miguel Gila, and Felipe Cruz (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
pdf
Presentation, Paper
Technical Session 1A
Chair: Lena M Lopatina (LANL)
CPE Updates
Barbara Chapman (HPE)
Abstract
A Deep Dive Into NVIDIA's HPC Software
Jeff Larkin and Becca Zandstein (NVIDIA)
Abstract
Slurm 24.05 and Beyond
Tim Wickberg (SchedMD LLC)
Abstract
pdf
Presentation, Paper
Technical Session 1C
Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)
Towards the Development of an Exascale Network Digital Twin
John Holmen (Oak Ridge National Laboratory); Md Nahid Newaz (Oakland University); and Srikanth Yoginath, Matthias Maiterth, Amir Shehata, Nick Hagerty, Christopher Zimmer, and Wesley Brewer (Oak Ridge National Laboratory)
Abstract
pdf, pdf
A Performance Deep Dive into HPC-AI Workflows with Digital Twins
Ana Gainaru (Oak Ridge National Laboratory); Greg Eisenhauer (Georgia Institute of Technology); and Fred Suter, Norbert Podhorszki, and Scott Klasky (Oak Ridge National Laboratory)
Abstract
pdf
Optimizing Checkpoint-Restart Mechanisms for HPC with DMTCP in Containers at NERSC
Madan Timalsina, Lisa Gerhardt, Johannes Blaschke, Nicholas Tyler, and William Arndt (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 2B
Chair: Lena M Lopatina (LANL)
EMOI: CSCS Extensible Monitoring and Observability Infrastructure
Massimo Benini (CSCS); Jeff Hanson (HPE); and Dino Conciatore, Gianni Mario Ricciardi, Michele Brambilla, Monica Frisoni, Mathilde Gianolli, Gianna Marano, and Jean-Guillaume Piccinali (CSCS)
Abstract
pdf, pdf
Swordfish/Redfish and ClusterStor - Using Advanced Monitoring to Improve Insight into Complex I/O Workflows.
Torben Kling Petersen, Tim Morneau, Dan Matthews, and Nathan Rutman (HPE)
Abstract
pdf
CADDY: Scalable Summarizations over Voluminous Telemetry Data for Efficient Monitoring
Saptashwa Mitra, Scott Ragland, Vanessa Zambrano, Dipanwita Mallick, Charlie Vollmer, Lance Kelley, and Nithin Singh Mohan (Hewlett Packard Enterprise)
Abstract
pdf, pdf
Command Lines vs. Requested Resources: How Well Do They Align?
Ben Fulton, Abhinav Thota, Scott Michael, and Jefferson Davis (Indiana University)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 2A
Chair: Jim Rogers (Oak Ridge National Laboratory)
Updated Node Power Management For New HPE Cray EX255a and EX254n Blades
Brian Collum and Steven Martin (Hewlett Packard Enterprise)
Abstract
pdf
HPE Cray EX Power Monitoring Counters
Steven Martin, Brian Collum, and Sean Byland (HPE)
Abstract
pdf
First Analysis on Cooling Temperature Impacts on MI250x Exascale Nodes (HPE Cray EX235A)
Torsten Wilde (HPE), Michael Ott (LRZ), and Pete Guyan (HPE)
Abstract
pdf
EVeREST: An Effective and Versatile Runtime Energy Saving Tool
Anna Yue (Hewlett Packard Enterprise, University of Minnesota) and Sanyam Mehta and Torsten Wilde (Hewlett Packard Enterprise)
Abstract
pdf
Presentation, Paper
Technical Session 2C
Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)
Optimising the Processing and Storage of Radio Astronomy Data
Alexander Williamson (International Centre for Radio Astronomy Research, University of Western Australia); Pascal Elahi (Pawsey Supercomputing Research Centre); Richard Dodson and Jonghwan Rhee (International Centre for Radio Astronomy Research, University of Western Australia); and Qian Gong (Oak Ridge National Laboratory)
Abstract
pdf
Performance and scaling of the LFRic weather and climate model on different generations of HPE Cray EX supercomputers
J. Mark Bull (EPCC, The University of Edinburgh); Andrew Coughtrie (Met Office, UK); Deva Deeptimahanti (Pawsey Supercomputing Research Centre); Mark Hedley (Met Office, UK); Caoimhin Laoide-Kemp (EPCC, The University of Edinburgh); Christopher Maynard (Met Office); Harry Shepherd (Met Office, UK); Sebastiaan Van De Bund and Michele Weiland (EPCC, The University of Edinburgh); and Benjamin Went (Met Office, UK)
Abstract
pdf, pdf
Disaggregated memory in OpenSHMEM applications – Approach and Benefits
Clarete Crasta, sharad singhal, Faizan barmawer, Ramesh Chaurasiya, sajeesh KV, Dave Emberson, Harumi Kuno, and John Byrne (Hewlett Packard)
Abstract
pdf
Migrating Complex Workflows to the Exascale: Challenges for Radio Astronomy
Pascal Jahan Elahi (Pawsey Supercomputing Research Centre) and Matt Austin, Eric Bastholm, Paulus Lahur, Wasim Raja, Maxim Voronkov, Mark Wieringa, Matthew Whiting, Daniel Mitchell, and Stephen Ord (CSIRO)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 3B
Chair: Gabriel Hautreux (CINES)
Spack Based Production Programming Environments on Cray Shasta
Paul Ferrell and Timothy Goetsch (LANL)
Abstract
pdf, pdf
Containers-first user environments on HPE Cray EX
Felipe Cruz and Alberto Madonna (Swiss National Supercomputing Centre)
Abstract
pdf
Cloud-Native Slurm management on HPE Cray EX
Felipe A. Cruz, Manuel Sopena, and Guilherme Peretti-Pezzi (Swiss National Supercomputing Centre)
Abstract
Presentation, Paper
Technical Session 3A
Chair: Bilel Hadri (KAUST Supercomputing Lab)
Early Application Experiences on Aurora at ALCF: Moving From Petascale to Exascale Systems
Colleen Bertoni, JaeHyuk Kwack, Thomas Applencourt, Abhishek Bagusetty, Yasaman Ghadar, Brian Homerding, Christopher Knight, Ye Luo, Mathialakan Thavappiragasam, John Tramm, Esteban Rangel, Umesh Unnikrishnan, Timothy J. Williams, and Scott Parker (Argonne National Laboratory)
Abstract
pdf, pdf
Streaming Data in HPC Workflows Using ADIOS
Greg Eisenhauer (Georgia Institute of Technology); Norbert Podhorszki, Ana Gainaru, and Scott Klasky (Oak Ridge National Laboratory); Philip Davis and Manish Parashar (University of Utah); Matthew Wolf (Samsung SAIT); Eric Suchtya (Oak Ridge National Laboratory); Erick Fredj (Toga Networks, Jerusalem College of Technology); Vicente Bolea (Kitware, Inc); Franz Pöschel, Klaus Steiniger, and Michael Bussmann (Center for Advanced Systems Understanding); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf); and Sunita Chandrasekaran (University of Delaware)
Abstract
pdf, pdf
Enrichment and Acceleration of Edge to Exascale Computational Steering STEM Workflow using Common Metadata Framework
Gayathri Saranathan (Hewlett Packard Enterprise, Hewlett Packard Labs); Martin Foltin, Aalap Tripathy, and Annmary Justine (Hewlett Packard Enterprise); Ayana Ghosh, Maxim Ziatdinov, and Kevin Roccapriore (Oak Ridge National Laboratory); and Suparna Bhattacharya, Paolo Faraboschi, and Sreenivas Rangan Sukumaran (Hewlett Packard Enterprise)
Abstract
pdf
Presentation, Paper
Technical Session 3C
Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)
CSM-based Software Stack Overview 2024
Harold Longley and Jason Sollom (Hewlett Packard Enterprise)
Abstract
pdf
Overview of HPCM
Peter Guyan and Sue Miller (HPE)
Abstract
Seamless Cluster Migration in CSM
Miguel Gila and Manuel Sopena Ballesteros (Swiss National Supercomputing Centre)
Abstract
pdf
Presentation, Paper
Technical Session 4B
Chair: Brett Bode (National Center for Supercomputing Applications/University of Illinois, National Center for Supercomputing Applications)
Scalability and Performance of OFI and UCX on ARCHER2
Jaffery Irudayasamy and Juan F. R. Herrera (EPCC, The University of Edinburgh); Evgenij Belikov (EPCC, The University of EdinburghE); and Michael Bareford (EPCC, The University of Edinburgh)
Abstract
pdf
Using P4 for Cassini-3 Software Development Environment
Hardik Soni, Frank Zago, Khaled Diab, Igor Gorodetsky, and Puneet Sharma (HPE)
Abstract
Running NCCL and RCCL Applications on HPE Slingshot NIC
Jesse Treger and Caio Davi (HPE)
Abstract
Enabling NCCL on Slingshot 11 at NERSC
Jim Dinan (NVIDIA), Peter Harrington (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Igor Gorodetsky (HPE), Josh Romero (NVIDIA), Steven Farrell (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Ian Ziemba (HPE), and Wahid Bhimji and Shashank Subramanian (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract
Presentation, Paper
Technical Session 4A
Chair: Lena M Lopatina (LANL)
Multi-stage Approach for Identifying Defective Hardware in Frontier
Nick Hagerty (Oak Ridge National Laboratory), Andy Warner (HPE), and Jordan Webb (Oak Ridge National Laboratory)
Abstract
pdf
From Frontier to Framework: Enhancing Hardware Triage for Exascale Machines
Isa Wazirzada, Abhishek Mehta, and Vinanti Phadke (Hewlett Packard Enterprise)
Abstract
pdf, pdf
Full-stack Approach to HPC Testing
Pascal Jahan Elahi and Craig Meyer (Pawsey Supercomputing Research Centre)
Abstract
pdf, pdf
An Approach to Continuous Testing
Francine Lapid and Shivam Mehta (Los Alamos National Laboratory)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 4C
Chair: Gabriel Hautreux (CINES)
LLM Serving With Efficient KV-Cache Management Using Triggered Operations
Aditya Dhakal, Pedro Bruel, Gourav Rattihalli, Sai Rahul Chalamalasetti, and Dejan Milojicic (Hewlett Packard Enterprise, Hewlett Packard Labs)
Abstract
pdf, pdf
From Chatbots to Interfaces: Diversifying the Application of Large Language Models for Enhanced Usability
Jonathan Sparks, Pierre Carrier, and Gallig Renaud (Hewlett Packard Enterprise)
Abstract
pdf, pdf
Delivering Large Language Model Platforms With HPC
Laura Huber, Abhinav Thota, Scott Michael, and Jefferson Davis (Indiana University)
Abstract
pdf, pdf
System for Recommendation and Evaluation of Large Language Models for practical tasks in Science
Cong Xu, Tarun Kumar, Martin Foltin, Annmary Justine, Sergey Serebryakov, Arpit Shah, Agnieszka Ciborowska, Ashish Mishra, Gyanaranjan Nayak, Suparna Bhattacharya, and Paolo Faraboschi (Hewlett Packard Enterprise)
Abstract
Presentation, Paper
Technical Session 5B
Chair: Adrian Jackson (EPCC, The University of Edinburgh)
Leveraging GNU Parallel for Optimal Utilization of HPC Resources on Frontier and Perlmutter Supercomputers
Ketan Maheshwari (Oak Ridge National Laboratory), William Arndt (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), and Rafael Ferreira Da Silva (Oak Ridge National Laboratory)
Abstract
Portable Support for GPUs and Distributed-Memory Parallelism in Chapel
Andrew Stone and Engin Kayraklioglu (Hewlett Packard Enterprise)
Abstract
pdf
PaCER: Accelerating Science on Setonix
Maciej Cytowski and Ann Backhaus (Pawsey Supercomputing Research Centre) and Joseph Schoonover (Fluid Numerics)
Abstract
Presentation, Paper
Technical Session 5A
Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)
Power and Performance analysis of GraceHopper superchips on HPE Cray EX systems
Benjamin Donald Cumming and Miguel Gila (CSCS), Brian Collum (HPE), Sebastian Keller (CSCS), and Bryan Villalon and Steven James Martin (HPE)
Abstract
Accelerating Scientific Workflows with the NVIDIA Grace Hopper Platform
Gabriel Noaje (NVIDIA)
Abstract
GROMACS on AMD GPU-Based HPC Platforms: Using SYCL for Performance and Portability
Andrey Alekseenko (KTH/SciLifeLab); Szilárd Páll (KTH/PDC); and Erik Lindahl (KTH, Stockholm University)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 5C
Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)
Cray EX Security Experiences
Ben Matthews (NCAR/UCAR)
Abstract
pdf
Best of Times, Worst of Times: A Cautionary Tale of Vulnerability Handling
Aaron Scantlin (National Energy Research Scientific Computing Center)
Abstract
AIOPS Empowered: Failure Prediction in System Management Software Tools
Deepak Nanjundaiah and SUBRAHMANYA VINAYAK JOSHI (HPE)
Abstract
Presentation, Paper
Technical Session 6B
Chair: Paul L. Peltz Jr. (Oak Ridge National Laboratory)
POD: Reconfiguring Compute and Storage Resources Between Cray EX Systems
Eric Roman and Tina Declerck (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) and Sean Lynn (Hewlett Packard Enterprise)
Abstract
Zero Downtime System Upgrade Strategy
Alden Stradling and Joshi Fullop (Los Alamos National Laboratory)
Abstract
pdf, pdf
Multitenancy on HPE Cray EX: network segmentation and isolation
Chris Gamboni (Swiss National Supercomputing Centre)
Abstract
Presentation, Paper
Technical Session 6A
Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)
Unification of Alerting Engines for Monitoring in System Management
Raghul Vasudevan, Ambresh Gupta, and Sinchana Karnik (Hewlett Packard)
Abstract
HPE Cray EX255a Telemetry - Improved Configurability and Performance
Sean Byland, Steven Martin, and Brian Collum (HPE)
Abstract
Best Practices for deployment of LDMS on the HPE Cray EX platform
James Brandt, Kevin Stroup, and Ann Gentile (Sandia National Laboratories)
Abstract
Presentation, Paper
Technical Session 6C
Chair: Bilel Hadri (KAUST Supercomputing Lab)
ClusterStor Tiering, Overview, Setup, and Performance
Nathan Rutman (Hewlett Packard)
Abstract
pdf
Exploring new software-defined storage technology using VAST on Cray EX systems
Mark Klein, Chris Gamboni, Gennaro Oliva, and Salvatore Di Nardo (Swiss National Supercomputing Centre, ETH Zurich); Maria Gutierrez (VAST Data); and Riccardo Di Maria and Miguel Gila (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
Reducing Mean Time to Resolution (MTTR) for complex HPC-based systems with next generation automated service tools.
Michael Cush (HPE)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 7A
Chair: John Holmen (Oak Ridge National Laboratory)
Proactive Precision: Enhancing High-Performance Computing with Early Job Failure Detection
Dipanwita Mallick, Siddhi Potdar, Saptashwa Mitra, Nithin Mohan, and Charlie Vollmer (Hewlett Packard Enterprise)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 8B
Chair: Raj Gautam (ExxonMobil)
Using HPE-Provided Resources to Integrate HPE Support into Internal Incident Management
John Gann, Daniel Gens, and Elizabeth Bautista (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 8A
Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)
Optimizing I/O Patterns to Speed up Non-contiguous Data Retrieval and Analyses
Scott Klasky, Qian Gong, and Norbert Podhorszki (Oak Ridge National Laboratory)
Abstract
Presentation, Paper
Technical Session 8C
Chair: Jim Rogers (Oak Ridge National Laboratory)
Building LDMS Slingshot Switch Samplers
Kevin Stroup, Cory Lueninghoener, Jim Brandt, and Ann Gentile (Sandia National Laboratories)
Abstract

Plenary
Plenary
Plenary: Welcome, Keynote
Opening
Ashley Barker (Oak Ridge National Laboratory)
Abstract
Welcome to Country Ceremony
Maciej Cytowski (Pawsey)
Abstract
Welcome from the CUG President
Ashley Barker (Oak Ridge National Laboratory)
Abstract
Talk by Dr. Sarah Pearce, SKA-Low Telescope Director
Sarah Pearce (SKA-Low Telescope)
Abstract
Convergence of Energy Efficient Scientific Computing and GenAI
Gabriel Noaje (NVIDIA)
Abstract
High Performance Remote Linux Desktops with ThinLinc
Robert Henschel and Aaron Sowry (Cendio AB)
Abstract
Unlocking Exascale Debugging and Performance Engineering with Linaro Forge
Marcin Krzysztofik (Linaro)
Abstract
Plenary
Plenary: CUG site, HPE update
Welcome by Pawsey
Mark Stickells (Pawsey Supercomputing Centre)
Abstract
HPE corporate update by Gerald Kleyn
Gerald Kleyn (HPE)
Abstract
Plenary
Plenary: CUG Board Updates (Open), CUG Elections, and Best papers
CUG Board Updates, SIG Presentations, and Board Elections – Open Session
Ashley Barker (Oak Ridge National Laboratory)
Abstract
Nine Months in the life of an all-flash file system
Lisa Gerhardt, Stephen Simms, David Fox, Ershaad Basheer, and Kirill Lozinskiy (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); Michael Moore (HPE); and Wahid Bhimji (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract
pdf, pdf
Isambard-AI: a leadership-class supercomputer optimised specifically for Artificial Intelligence
Simon McIntosh-Smith, Sadaf Alam, and Christopher Woods (University of Bristol)
Abstract
pdf, pdf
Plenary
Plenary: Sponsors talks, HPE 1-100
The Biggest Change to HPC Job Scheduling and Resource Management in 30 Years
Branden Bauer (Altair Engineering, Inc.)
Abstract
Codee: Automatic Code Inspection Tools for Performance and Code Modernization
Manuel Arenaz (Codee)
Abstract
AMD Together We Advance Supercomputing
Adam Bestavros (AMD)
Abstract
HPE 1 on 100 (HPE Customers only: no HPE partners or CUG sponsors)
Trish Damkroger (HPE)
Abstract
Plenary
Plenary: CUG 2024, Invited speakers
CUG2025 site presentation
TBD TBD (TBD)
Abstract
VAST Data
Maria Perez Gutierrez (VAST)
Abstract
Advancing Gas Turbine Development using HPC: Challenges and Rewards
Richard D. Sandberg and Melissa Kozul (University of Melbourne)
Abstract
Plenary
CUG 2024 Closing

Presentations
Presentation, Paper
Technical Session 1B
Chair: Jim Williams (Los Alamos National Laboratory)
Enhancing HPC Service Management on Alps using FirecREST API
Juan Pablo Dorsch, Andreas Fink, Eirini Koutsaniti, and Rafael Sarmiento (Swiss National Supercomputing Centre)
Abstract
pdf, pdf
Automated Hardware-Aware Node Selection for Cluster Computing
Manuel Sopena Ballesteros, Miguel Gila, Matteo Chesi, and Mark Klein (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
pdf, pdf
Versatile Software-defined Cluster on Cray HPE EX Systems
Maxime Martinasso, Mark Klein, Benjamin Cumming, Miguel Gila, and Felipe Cruz (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
pdf
Presentation, Paper
Technical Session 1A
Chair: Lena M Lopatina (LANL)
CPE Updates
Barbara Chapman (HPE)
Abstract
A Deep Dive Into NVIDIA's HPC Software
Jeff Larkin and Becca Zandstein (NVIDIA)
Abstract
Slurm 24.05 and Beyond
Tim Wickberg (SchedMD LLC)
Abstract
pdf
Presentation, Paper
Technical Session 1C
Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)
Towards the Development of an Exascale Network Digital Twin
John Holmen (Oak Ridge National Laboratory); Md Nahid Newaz (Oakland University); and Srikanth Yoginath, Matthias Maiterth, Amir Shehata, Nick Hagerty, Christopher Zimmer, and Wesley Brewer (Oak Ridge National Laboratory)
Abstract
pdf, pdf
A Performance Deep Dive into HPC-AI Workflows with Digital Twins
Ana Gainaru (Oak Ridge National Laboratory); Greg Eisenhauer (Georgia Institute of Technology); and Fred Suter, Norbert Podhorszki, and Scott Klasky (Oak Ridge National Laboratory)
Abstract
pdf
Optimizing Checkpoint-Restart Mechanisms for HPC with DMTCP in Containers at NERSC
Madan Timalsina, Lisa Gerhardt, Johannes Blaschke, Nicholas Tyler, and William Arndt (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 2B
Chair: Lena M Lopatina (LANL)
EMOI: CSCS Extensible Monitoring and Observability Infrastructure
Massimo Benini (CSCS); Jeff Hanson (HPE); and Dino Conciatore, Gianni Mario Ricciardi, Michele Brambilla, Monica Frisoni, Mathilde Gianolli, Gianna Marano, and Jean-Guillaume Piccinali (CSCS)
Abstract
pdf, pdf
Swordfish/Redfish and ClusterStor - Using Advanced Monitoring to Improve Insight into Complex I/O Workflows.
Torben Kling Petersen, Tim Morneau, Dan Matthews, and Nathan Rutman (HPE)
Abstract
pdf
CADDY: Scalable Summarizations over Voluminous Telemetry Data for Efficient Monitoring
Saptashwa Mitra, Scott Ragland, Vanessa Zambrano, Dipanwita Mallick, Charlie Vollmer, Lance Kelley, and Nithin Singh Mohan (Hewlett Packard Enterprise)
Abstract
pdf, pdf
Command Lines vs. Requested Resources: How Well Do They Align?
Ben Fulton, Abhinav Thota, Scott Michael, and Jefferson Davis (Indiana University)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 2A
Chair: Jim Rogers (Oak Ridge National Laboratory)
Updated Node Power Management For New HPE Cray EX255a and EX254n Blades
Brian Collum and Steven Martin (Hewlett Packard Enterprise)
Abstract
pdf
HPE Cray EX Power Monitoring Counters
Steven Martin, Brian Collum, and Sean Byland (HPE)
Abstract
pdf
First Analysis on Cooling Temperature Impacts on MI250x Exascale Nodes (HPE Cray EX235A)
Torsten Wilde (HPE), Michael Ott (LRZ), and Pete Guyan (HPE)
Abstract
pdf
EVeREST: An Effective and Versatile Runtime Energy Saving Tool
Anna Yue (Hewlett Packard Enterprise, University of Minnesota) and Sanyam Mehta and Torsten Wilde (Hewlett Packard Enterprise)
Abstract
pdf
Presentation, Paper
Technical Session 2C
Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)
Optimising the Processing and Storage of Radio Astronomy Data
Alexander Williamson (International Centre for Radio Astronomy Research, University of Western Australia); Pascal Elahi (Pawsey Supercomputing Research Centre); Richard Dodson and Jonghwan Rhee (International Centre for Radio Astronomy Research, University of Western Australia); and Qian Gong (Oak Ridge National Laboratory)
Abstract
pdf
Performance and scaling of the LFRic weather and climate model on different generations of HPE Cray EX supercomputers
J. Mark Bull (EPCC, The University of Edinburgh); Andrew Coughtrie (Met Office, UK); Deva Deeptimahanti (Pawsey Supercomputing Research Centre); Mark Hedley (Met Office, UK); Caoimhin Laoide-Kemp (EPCC, The University of Edinburgh); Christopher Maynard (Met Office); Harry Shepherd (Met Office, UK); Sebastiaan Van De Bund and Michele Weiland (EPCC, The University of Edinburgh); and Benjamin Went (Met Office, UK)
Abstract
pdf, pdf
Disaggregated memory in OpenSHMEM applications – Approach and Benefits
Clarete Crasta, sharad singhal, Faizan barmawer, Ramesh Chaurasiya, sajeesh KV, Dave Emberson, Harumi Kuno, and John Byrne (Hewlett Packard)
Abstract
pdf
Migrating Complex Workflows to the Exascale: Challenges for Radio Astronomy
Pascal Jahan Elahi (Pawsey Supercomputing Research Centre) and Matt Austin, Eric Bastholm, Paulus Lahur, Wasim Raja, Maxim Voronkov, Mark Wieringa, Matthew Whiting, Daniel Mitchell, and Stephen Ord (CSIRO)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 3B
Chair: Gabriel Hautreux (CINES)
Spack Based Production Programming Environments on Cray Shasta
Paul Ferrell and Timothy Goetsch (LANL)
Abstract
pdf, pdf
Containers-first user environments on HPE Cray EX
Felipe Cruz and Alberto Madonna (Swiss National Supercomputing Centre)
Abstract
pdf
Cloud-Native Slurm management on HPE Cray EX
Felipe A. Cruz, Manuel Sopena, and Guilherme Peretti-Pezzi (Swiss National Supercomputing Centre)
Abstract
Presentation, Paper
Technical Session 3A
Chair: Bilel Hadri (KAUST Supercomputing Lab)
Early Application Experiences on Aurora at ALCF: Moving From Petascale to Exascale Systems
Colleen Bertoni, JaeHyuk Kwack, Thomas Applencourt, Abhishek Bagusetty, Yasaman Ghadar, Brian Homerding, Christopher Knight, Ye Luo, Mathialakan Thavappiragasam, John Tramm, Esteban Rangel, Umesh Unnikrishnan, Timothy J. Williams, and Scott Parker (Argonne National Laboratory)
Abstract
pdf, pdf
Streaming Data in HPC Workflows Using ADIOS
Greg Eisenhauer (Georgia Institute of Technology); Norbert Podhorszki, Ana Gainaru, and Scott Klasky (Oak Ridge National Laboratory); Philip Davis and Manish Parashar (University of Utah); Matthew Wolf (Samsung SAIT); Eric Suchtya (Oak Ridge National Laboratory); Erick Fredj (Toga Networks, Jerusalem College of Technology); Vicente Bolea (Kitware, Inc); Franz Pöschel, Klaus Steiniger, and Michael Bussmann (Center for Advanced Systems Understanding); Richard Pausch (Helmholtz-Zentrum Dresden-Rossendorf); and Sunita Chandrasekaran (University of Delaware)
Abstract
pdf, pdf
Enrichment and Acceleration of Edge to Exascale Computational Steering STEM Workflow using Common Metadata Framework
Gayathri Saranathan (Hewlett Packard Enterprise, Hewlett Packard Labs); Martin Foltin, Aalap Tripathy, and Annmary Justine (Hewlett Packard Enterprise); Ayana Ghosh, Maxim Ziatdinov, and Kevin Roccapriore (Oak Ridge National Laboratory); and Suparna Bhattacharya, Paolo Faraboschi, and Sreenivas Rangan Sukumaran (Hewlett Packard Enterprise)
Abstract
pdf
Presentation, Paper
Technical Session 3C
Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)
CSM-based Software Stack Overview 2024
Harold Longley and Jason Sollom (Hewlett Packard Enterprise)
Abstract
pdf
Overview of HPCM
Peter Guyan and Sue Miller (HPE)
Abstract
Seamless Cluster Migration in CSM
Miguel Gila and Manuel Sopena Ballesteros (Swiss National Supercomputing Centre)
Abstract
pdf
Presentation, Paper
Technical Session 4B
Chair: Brett Bode (National Center for Supercomputing Applications/University of Illinois, National Center for Supercomputing Applications)
Scalability and Performance of OFI and UCX on ARCHER2
Jaffery Irudayasamy and Juan F. R. Herrera (EPCC, The University of Edinburgh); Evgenij Belikov (EPCC, The University of EdinburghE); and Michael Bareford (EPCC, The University of Edinburgh)
Abstract
pdf
Using P4 for Cassini-3 Software Development Environment
Hardik Soni, Frank Zago, Khaled Diab, Igor Gorodetsky, and Puneet Sharma (HPE)
Abstract
Running NCCL and RCCL Applications on HPE Slingshot NIC
Jesse Treger and Caio Davi (HPE)
Abstract
Enabling NCCL on Slingshot 11 at NERSC
Jim Dinan (NVIDIA), Peter Harrington (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Igor Gorodetsky (HPE), Josh Romero (NVIDIA), Steven Farrell (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Ian Ziemba (HPE), and Wahid Bhimji and Shashank Subramanian (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract
Presentation, Paper
Technical Session 4A
Chair: Lena M Lopatina (LANL)
Multi-stage Approach for Identifying Defective Hardware in Frontier
Nick Hagerty (Oak Ridge National Laboratory), Andy Warner (HPE), and Jordan Webb (Oak Ridge National Laboratory)
Abstract
pdf
From Frontier to Framework: Enhancing Hardware Triage for Exascale Machines
Isa Wazirzada, Abhishek Mehta, and Vinanti Phadke (Hewlett Packard Enterprise)
Abstract
pdf, pdf
Full-stack Approach to HPC Testing
Pascal Jahan Elahi and Craig Meyer (Pawsey Supercomputing Research Centre)
Abstract
pdf, pdf
An Approach to Continuous Testing
Francine Lapid and Shivam Mehta (Los Alamos National Laboratory)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 4C
Chair: Gabriel Hautreux (CINES)
LLM Serving With Efficient KV-Cache Management Using Triggered Operations
Aditya Dhakal, Pedro Bruel, Gourav Rattihalli, Sai Rahul Chalamalasetti, and Dejan Milojicic (Hewlett Packard Enterprise, Hewlett Packard Labs)
Abstract
pdf, pdf
From Chatbots to Interfaces: Diversifying the Application of Large Language Models for Enhanced Usability
Jonathan Sparks, Pierre Carrier, and Gallig Renaud (Hewlett Packard Enterprise)
Abstract
pdf, pdf
Delivering Large Language Model Platforms With HPC
Laura Huber, Abhinav Thota, Scott Michael, and Jefferson Davis (Indiana University)
Abstract
pdf, pdf
System for Recommendation and Evaluation of Large Language Models for practical tasks in Science
Cong Xu, Tarun Kumar, Martin Foltin, Annmary Justine, Sergey Serebryakov, Arpit Shah, Agnieszka Ciborowska, Ashish Mishra, Gyanaranjan Nayak, Suparna Bhattacharya, and Paolo Faraboschi (Hewlett Packard Enterprise)
Abstract
Presentation, Paper
Technical Session 5B
Chair: Adrian Jackson (EPCC, The University of Edinburgh)
Leveraging GNU Parallel for Optimal Utilization of HPC Resources on Frontier and Perlmutter Supercomputers
Ketan Maheshwari (Oak Ridge National Laboratory), William Arndt (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), and Rafael Ferreira Da Silva (Oak Ridge National Laboratory)
Abstract
Portable Support for GPUs and Distributed-Memory Parallelism in Chapel
Andrew Stone and Engin Kayraklioglu (Hewlett Packard Enterprise)
Abstract
pdf
PaCER: Accelerating Science on Setonix
Maciej Cytowski and Ann Backhaus (Pawsey Supercomputing Research Centre) and Joseph Schoonover (Fluid Numerics)
Abstract
Presentation, Paper
Technical Session 5A
Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)
Power and Performance analysis of GraceHopper superchips on HPE Cray EX systems
Benjamin Donald Cumming and Miguel Gila (CSCS), Brian Collum (HPE), Sebastian Keller (CSCS), and Bryan Villalon and Steven James Martin (HPE)
Abstract
Accelerating Scientific Workflows with the NVIDIA Grace Hopper Platform
Gabriel Noaje (NVIDIA)
Abstract
GROMACS on AMD GPU-Based HPC Platforms: Using SYCL for Performance and Portability
Andrey Alekseenko (KTH/SciLifeLab); Szilárd Páll (KTH/PDC); and Erik Lindahl (KTH, Stockholm University)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 5C
Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)
Cray EX Security Experiences
Ben Matthews (NCAR/UCAR)
Abstract
pdf
Best of Times, Worst of Times: A Cautionary Tale of Vulnerability Handling
Aaron Scantlin (National Energy Research Scientific Computing Center)
Abstract
AIOPS Empowered: Failure Prediction in System Management Software Tools
Deepak Nanjundaiah and SUBRAHMANYA VINAYAK JOSHI (HPE)
Abstract
Presentation, Paper
Technical Session 6B
Chair: Paul L. Peltz Jr. (Oak Ridge National Laboratory)
POD: Reconfiguring Compute and Storage Resources Between Cray EX Systems
Eric Roman and Tina Declerck (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) and Sean Lynn (Hewlett Packard Enterprise)
Abstract
Zero Downtime System Upgrade Strategy
Alden Stradling and Joshi Fullop (Los Alamos National Laboratory)
Abstract
pdf, pdf
Multitenancy on HPE Cray EX: network segmentation and isolation
Chris Gamboni (Swiss National Supercomputing Centre)
Abstract
Presentation, Paper
Technical Session 6A
Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)
Unification of Alerting Engines for Monitoring in System Management
Raghul Vasudevan, Ambresh Gupta, and Sinchana Karnik (Hewlett Packard)
Abstract
HPE Cray EX255a Telemetry - Improved Configurability and Performance
Sean Byland, Steven Martin, and Brian Collum (HPE)
Abstract
Best Practices for deployment of LDMS on the HPE Cray EX platform
James Brandt, Kevin Stroup, and Ann Gentile (Sandia National Laboratories)
Abstract
Presentation, Paper
Technical Session 6C
Chair: Bilel Hadri (KAUST Supercomputing Lab)
ClusterStor Tiering, Overview, Setup, and Performance
Nathan Rutman (Hewlett Packard)
Abstract
pdf
Exploring new software-defined storage technology using VAST on Cray EX systems
Mark Klein, Chris Gamboni, Gennaro Oliva, and Salvatore Di Nardo (Swiss National Supercomputing Centre, ETH Zurich); Maria Gutierrez (VAST Data); and Riccardo Di Maria and Miguel Gila (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
Reducing Mean Time to Resolution (MTTR) for complex HPC-based systems with next generation automated service tools.
Michael Cush (HPE)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 7A
Chair: John Holmen (Oak Ridge National Laboratory)
Proactive Precision: Enhancing High-Performance Computing with Early Job Failure Detection
Dipanwita Mallick, Siddhi Potdar, Saptashwa Mitra, Nithin Mohan, and Charlie Vollmer (Hewlett Packard Enterprise)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 8B
Chair: Raj Gautam (ExxonMobil)
Using HPE-Provided Resources to Integrate HPE Support into Internal Incident Management
John Gann, Daniel Gens, and Elizabeth Bautista (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract
pdf, pdf
Presentation, Paper
Technical Session 8A
Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)
Optimizing I/O Patterns to Speed up Non-contiguous Data Retrieval and Analyses
Scott Klasky, Qian Gong, and Norbert Podhorszki (Oak Ridge National Laboratory)
Abstract
Presentation, Paper
Technical Session 8C
Chair: Jim Rogers (Oak Ridge National Laboratory)
Building LDMS Slingshot Switch Samplers
Kevin Stroup, Cory Lueninghoener, Jim Brandt, and Ann Gentile (Sandia National Laboratories)
Abstract

Tutorials
Tutorial
Tutorial 1B
Image Deployment and System Monitoring with HPCM
Peter Guyan, Sue Miller, Andy Warner, and Raghul Vasudevan (HPE)
Abstract
Tutorial
Tutorial 1A
Supercomputer Affinity on HPE Systems
Edgar A. Leon and Jane E. Herriman (Lawrence Livermore National Laboratory)
Abstract
pdf
Tutorial
Tutorial 1C
Omnitools: Performance Analysis Tools for AMD GPUs
Samuel Antao (AMD)
Abstract
pdf
Tutorial
Tutorial 1B Continued
Image Deployment and System Monitoring with HPCM
Peter Guyan, Sue Miller, Andy Warner, and Raghul Vasudevan (HPE)
Abstract
Tutorial
Tutorial 1A Continued
Supercomputer Affinity on HPE Systems
Edgar A. Leon and Jane E. Herriman (Lawrence Livermore National Laboratory)
Abstract
pdf
Tutorial
Tutorial 1C Continued
Omnitools: Performance Analysis Tools for AMD GPUs
Samuel Antao (AMD)
Abstract
pdf
Tutorial
Tutorial 2B
Automated Inspection of C/C++/Fortran Code Using Codee for Performance Optimization on HPE/Cray
Manuel Arenaz (Codee - Appentra Solutions)
Abstract
pdf, zip
Tutorial
Tutorial 2A
Monitoring, Tuning, and Troubleshooting a CSM system
Harold Longley and Jason Sollom (Hewlett Packard Enterprise)
Abstract
pdf
Tutorial
Tutorial 2C
MGARD & ADIOS-2: A framework for extreme scale I/O with online data reduction
Scott Klasky, Qian Gong, and Norbert Podhorszki (ORNL)
Abstract
Tutorial
Tutorial 2B Continued
Automated Inspection of C/C++/Fortran Code Using Codee for Performance Optimization on HPE/Cray
Manuel Arenaz (Codee - Appentra Solutions)
Abstract
pdf, zip
Tutorial
Tutorial 2A Continued
Monitoring, Tuning, and Troubleshooting a CSM system
Harold Longley and Jason Sollom (Hewlett Packard Enterprise)
Abstract
pdf
Tutorial
Tutorial 2C Continued
MGARD & ADIOS-2: A framework for extreme scale I/O with online data reduction
Scott Klasky, Qian Gong, and Norbert Podhorszki (ORNL)
Abstract
Tutorial
Lightning Tutorial 7B
Data Science Beyond the Laptop: Handling Data of Any Size with Arkouda
Ben McDonald and Michelle Strout (HPE)
Abstract
Tutorial
Lightning Tutorial 7C
Exploring high performance object storage using DAOS
Adrian Jackson (EPCC, The University of Edinburgh)
Abstract

XTreme
XTreme (Approved NDA Members Only)
XTreme (Approved NDA Members Only)
XTreme (Approved NDA Members Only)
XTreme (Approved NDA Members Only)
XTreme (Approved NDA Members Only)
XTreme (Approved NDA Members Only)
XTreme (Approved NDA Members Only)
XTreme (Approved NDA Members Only)

Created 2024-5-7 17:35