CUG2025 Proceedings




Birds of a Feathers
Birds of a Feather
Programming Environments, Applications, and Documentation (PEAD)
Introduction PEAD
Chris Fuson (Oak Ridge National Laboratory)
Abstract
CPE in a Container
Kaylie Anderson (HPE), Ben Cumming (Swiss National Supercomputing Centre), Subil Abraham (Oak Ridge National Laboratory), and Panchapakesan Chitra Shyamshankar (Argonne National Laboratory)
Abstract
Python Management
Chun Sun (HPE); Cristian Di Pietrantonio (Pawsey); Dave Carlson (Stony Brook University); and Juan Herrera (EPCC, The University of Edinburgh)
Abstract
Birds of a Feather
Programming Environments, Applications, and Documentation (PEAD)
CPE Update
Barbara Chapman (HPE)
CPE Testing
Barbara Chapman (HPE), Cristian Di Pietrantonio (Pawsey), Brian Vanderwende (NCAR), Brandon Cook (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), and Cedric Jourdain (CINES)
Abstract
Exploring the Challenges of the World-Class HPE Cray Programming Environment for Modern Software Development in Fortran
Manuel Arenaz (Codee)
Abstract
pdf
Open Floor Discussion
Chris Fuson (Oak Ridge National Laboratory)
Abstract
Birds of a Feather
BoF 1D
Security BoF
Aaron Scantlin (National Energy Research Scientific Computing Center)
Abstract
pdf
Birds of a Feather
BoF 2D
Kubernetes on HPE Supercomputers BoF
Sadaf Alam (University of Bristol), Dino Conciatore (Swiss National Supercomputing Centre), and Jesse L. Treger (HPE)
Abstract
pdf, pdf
Birds of a Feather
BoF 1B
CUG SIG System Monitoring Working Group BoF
Massimo Benini (CSCS - ETH Zurich), Lena Lopatina (Los Alamos National Laboratory), and Jeff Hanson and Pete Guyan (HPE)
Abstract
pdf, pdf
Birds of a Feather
BoF 1C
Sharing is Caring: Tackling Node-Sharing Challenges at CUG Sites
Tim Robinson (Swiss National Supercomputing Centre, ETH Zurich); Tim Wickberg (SchedMD LLC); Pengfei Ding (Lawrence Berkeley National Laboratory); and Cristian Di Pietrantonio (Pawsey Supercomputing Research Centre)
Abstract
pdf
Birds of a Feather
BoF 1A
CSM updates, iSCSI boot content projection, and other CSM topics
Harold Longley, Dennis Walker, Ravi Bissa, Jason Coverston, Siri Vias Khalsa, Ashalatha A. M, and Ravikanth Nalla (HPE)
Abstract
pdf
Birds of a Feather
BoF 3D
Rethinking Interactive HPC Resource Access: Enhancing Security and Flexibility
Maxime Martinasso (Swiss National Supercomputing Centre), Sadaf Alam (University of Bristol), and Isa Wazirzada and Larry Kaplan (HPE)
Abstract
pdf, pdf
Birds of a Feather
BoF 2B
Managing System Reliability: From system acceptance through production
Pete Guyan and Sue Miller (HPE)
Abstract
Birds of a Feather
BoF 2C
HPE Slingshot Birds-of-a-Feather
Jesse Treger (HPE)
Abstract
Birds of a Feather
BoF 2A
CPE Futures
Barbara Chapman (HPE, Stony Brook University) and Kaylie Anderson and Chun Sun (HPE)
Abstract
Paper, Presentation, Birds of a Feather
Technical Session 7A: AI/ML GPU Workloads
Session Chair: Raj Gautam (ExxonMobil)
Porting Radio Astronomy Correlation to Setonix, a HPE Cray EX system powered by AMD GPUs
Cristian Di Pietrantonio (Pawsey Supercomputing Research Centre, Curtin Institute for Radio Astronomy); Marcin Sokolowski (Curtin Institute of Radio Astronomy); Christopher Harris (Pawsey Supercomputing Research Centre); and Daniel Price and Randal Wayth (SKAO)
Abstract
pdf, pdf
Evaluating the Performance of Containerized ML and LLM Applications on the Frontier and Odo Supercomputers
Bishwo Dahal (University of Louisiana Monroe, Oak Ridge National Laboratory) and Elijah Maccarthy and Subil Abraham (Oak Ridge National Laboratory)
Abstract
pdf, pdf
BoF on Transforming Hybrid Workflows: The Role of HPE Cray Supercomputing User Services Software in Bridging HPC and AI
Tulsi Mishra, Dean Roe, and Larry Kaplan (HPE)
Abstract
pdf

Break
Break
Coffee Break
Break
Coffee Break
Break
Coffee Break
Break
Coffee Break Sponsored by SchedMD
Break
Coffee Break Sponsored by Pier Group
Break
Coffee Break Sponsored by Linaro
Break
Coffee Break Sponsored by VAST
Break
Coffee Break Sponsored by Altair
Break
Coffee Break
Break
Coffee Break

CUG Program Committee
CUG Program Committee
CUG Advisory Board (closed)

Lunch
Lunch
CUG board/ New Sites lunch (closed)
Lunch
Lunch/ PEAD & XTreme SIG Participants
Lunch
CUG Advisory Board Lunch Cabinet (closed)
Lunch
Lunch Sponsored by Codee
Lunch
CUG Board & Sponsors Lunch (closed)
Lunch
Lunch Sponsored by NVIDIA
Lunch
HPE Executive Lunch (closed)
Lunch
Lunch Sponsored by Codee
Lunch
Lunch Sponsored by NVIDIA

Networking/Social Event
Networking/Social Event
Welcome Reception
Networking/Social Event
Program Committee Dinner (invite only)
Networking/Social Event
HPE Networking Event
HPE will host their annual CUG community networking reception from 6:00 to 8:00 pm ET at the Lokal Eatery & Bar. Lokal is located at 2 2nd St, Jersey City, NJ 07302, along Jersey City’s waterfront, allowing CUG guests to enjoy expansive views of the Manhattan Skyline. Co-presented by AMD, all registered CUG attendees and their guests are invited to attend for a reception with light hors d’oeuvres and drinks. First bus will leave at 5.55pm, Lokal is about a 10 min walk from the CUG hotel. Last departure from Lokal with the bus will be at 8pm.
Networking/Social Event
CUG AMD Night Out
CUG Night out at Hudson House, 2 Chapel Ave, Jersey City, NJ

We invite all registered attendees and guests with a paid CUG night out ticket to join us for an unforgettable evening at Hudson House. Situated at the end of Port Liberte in Jersey City, NJ, this structure is an arms’ length away from the Hudson River and boasts a panoramic view of the Statue of Liberty, Brooklyn, Manhattan, and Verrazano Bridges, and of course the NYC Skyline. Coaches will depart outside the Westin Jersey City Hotel at 18:10 to arrive at Hudson House for a drink’s reception before seating for dinner at approximately 19:15. If you are making your own way to the venue, please use the full address as Google Maps takes you to a different address! Hudson House, 2 Chapel Ave is approx. a 15 – 20-minute drive. Our first bus will return to the hotel at approximately 21:00.

Papers
Paper, Presentation
Technical Session 1B: Workload manager
Session Chair: David Carlson (Institute for Advanced Computational Science, Stony Brook University)
Slinky: The Missing Link Between Slurm and Kubernetes
Tim Wickberg (SchedMD LLC)
Abstract
pdf
How Best to Leverage Cloud for (Big) HPC Sites
Bill Nitzberg and Ian Littlewood (Altair Engineering, Inc.)
Abstract
pdf
Divide and Rule: Automated Workload Distribution for Efficient User Support Services
Luca Marsella (Swiss National Supercomputing Centre)
Abstract
pdf
Paper, Presentation
Technical Session 1C: Software deployment
Session Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)
Deploying and Tracking Software with NCCS Software Provisioning
Asa Rentschler, Nicholas Hagerty, Elijah Maccarthy, and Edwin F. Posada Correa (Oak Ridge National Laboratory)
Abstract
pdf, pdf
Modern Software Deployment on a Multi-Tenant Cray-EX System
Ben Cumming, Andreas Fink, Simon Pintarelli, and John Biddiscombe (CSCS)
Abstract
pdf
Employing a Software-Driven Approach to Scalable HPC System Management
Aaron Barlow (Oak Ridge National Laboratory)
Abstract
pdf
Paper, Presentation
Technical Session 1A: Multitenancy
Session Chair: Juan F R Herrera (EPCC, The University of Edinburgh)
Infrastructure as a Service with Strong Tenant Separation on a Supercomputer
Riccardo Di Maria, Chris Gamboni, Manuel Sopena Ballesteros, Hussein Harake, Mark Klein, Marco Passerini, Miguel Gila, Maxime Martinasso, and Thomas C. Schulthess (Swiss National Supercomputing Centre) and Alun Ashton, Derek Feichtinger, Marc Caubet, Elsa Germann, Hans-Nikolai Viessmann, Achim Gsell, and Krisztian Pozsa (Paul Scherrer Institute)
Abstract
pdf, pdf
Dynamic Network Perimeterization: Isolating Tenant Workloads With VLANs, VNIs, & ACLs
Nikhil Mukundan, Dennis Walker, Stephen Han, Atif Ali, Siri Vias Khalsa, Amit Jain, Vishal Bhatia, and Vinay Karanth (HPE)
Abstract
pdf, pdf
CSCS' journey towards complete platform automation in a multi-tenant environment
Miguel Gila, Ivano Bonesana, and Alejandro Dabin (Swiss National Supercomputing Centre, CSCS)
Abstract
pdf
Paper, Presentation
Technical Session 2B: Security & Configuration Management
Session Chair: Jim Williams (Los Alamos National Laboratory)
Pragmatic Security Audits: Fortifying HPC Environments at a Consumable Pace
Alden Stradling (Los Alamos National Laboratory) and Monica Dessouky and Dennis Walker (HPE)
Abstract
pdf, pdf
Experimenting with Security Compliance Checking using ReFrame
Victor Holanda Rusu, Matteo Basso, Chris Gamboni, Fabio Zambrino, and Massimo Benini (Swiss National Supercomputing Centre)
Abstract
pdf, pdf
From Weeks to Hours: Harnessing Configuration Management and Deployment Pipelines
Dennis Walker and Siri Vias Khalsa (HPE) and Alex Lovell-Troy (Los Alamos National Laboratory)
Abstract
pdf, pdf
Rev Up Compute Node Reboots: 2x to 5x Faster
Dennis Walker (HPE) and Paul Selwood (Met Office, UK / NERC CMS)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 2C: Climate applications
Session Chair: Maciej Cytowski (Pawsey Supercomputing Research Centre)
Bit-reproducibility in UK Met Office Weather and Climate Applications
David Acreman (HPE)
Abstract
pdf
Enabling km-scale coupled climate simulations with ICON on AMD GPUs
Jussi Enkovaara (CSC - IT Center for Science Ltd.)
Abstract
pdf
MARBLChapel: Fortran-Chapel Interoperability in an Ocean Simulation
Brandon Neth and Ben Harshbarger (HPE); Scott Bachman ([C]Worthy); and Michelle Mills Strout (HPE, University of Arizona)
Abstract
pdf
Redefining Weather Forecasting Systems: The Transition to ICON and Alps
Mauro Bianco, Matthias Kraushaar, and Roberto Aielli (ETH Zurich); Oliver Fuhrer (Federal Office of Meteorology and Climatology MeteoSwiss); and Thomas Schulthess (ETH Zurich)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 2A: Slingshot
Session Chair: Brett Bode (National Center for Supercomputing Applications/University of Illinois, National Center for Supercomputing Applications)
The HPE Slingshot 400 Expedition
Houfar Azgomi, Duncan Roweth, Gregory Faanes, and Jesse Treger (HPE)
Abstract
pdf, pdf
Introduction To HPE Slingshot NIC Libfabric Environment Variables
Jesse Treger and Ian Ziemba (HPE)
Abstract
pdf
Math in Your Network: Slingshot Hardware Accelerated Reductions
Forest Godfrey and Duncan Roweth (HPE)
Abstract
pdf
Slingshot Host Software Ethernet Tuning
Ravi Bissa, Ian Ziemba, Duncan Roweth, and Forest Godfrey (HPE)
Abstract
pdf
Plenary, Paper
Plenary Session: CUG Organizational Update and Best Paper Presentation
CUG Organizational Update
Ashley Barker (Oak Ridge National Laboratory)
Abstract
Evolving HPC services to enable ML workloads on HPE Cray EX
Stefano Schuppli, Fawzi Mohamed, Henrique Mendonca, Nina Mujkanovic, Elia Palme, Dino Conciatore, Lukas Drescher, Miguel Gila, Pim Witlox, Joost VandeVondele, Maxime Martinasso, Torsten Hoefler, and Thomas Schulthess (Swiss National Supercomputing Centre)
Abstract
pdf, pdf
Alps, a versatile research infrastructure
Maxime Martinasso (Swiss National Supercomputing Centre, ETH Zurich) and Mark Klein and Thomas Schulthess (Swiss National Supercomputing Centre)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 3B: HPCM
Session Chair: Matthew A. Ezell (Oak Ridge National Laboratory)
A Brief Summary of the HPCM (HPE Performance Cluster Manager) Evolution Over Recent Releases
Sue Miller, Lee Morecroft, and Peter Guyan (HPE)
Abstract
System Visualization Using Rackmap
Troy Dey and Peter Guyan (HPE)
Abstract
Harvesting, Storing and Processing Data from our HPCM Systems
Ben Lenard, Eric Pershey, Brian Toonen, Peter Upton, Doug Waldron, Lisa Childers, Micheal Zhang, and Bryan Brickman (Argonne National Laboratory)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 3C: Future Technology
Session Chair: Juan F R Herrera (EPCC, The University of Edinburgh)
Evolving Sarus to augment Podman for HPC on Cray EX
Alberto Madonna, Gwangmu Lee, and Felipe Cruz (Swiss National Supercomputing Centre)
Abstract
pdf
What is RISC-V and why should we care?
Nick Brown (EPCC)
Abstract
pdf, pdf
A Full Stack Framework for High Performance Quantum-Classical Computing
Xin Zhan, K. Grace Johnson, and Soumitra Chatterjee (HPE); Barbara Chapman (HPE, Stony Brook University); and Masoud Mohseni, Kirk Bresniker, and Ray Beausoleil (HPE)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 3A: Data Centers
Session Chair: Lena M Lopatina (LANL)
Causality inference for Digital Twins in GPU Data Centers and Smart Grids.
Rolando Pablo Hong Enriquez, Pavana Prakash, Ebad Taheri, and Aditya Dhakal (HPE); Matthias Maiterth and Wesley Brewer (Oak Ridge National Laboratory); and Dejan Milojicic (HPE)
Abstract
pdf, pdf
AlpsB – a Geographically Distributed Infrastructure to Facilitate Large-Scale Training of Weather and Climate AI Models
Alex Upton, Jerome Tissieres, and Maxime Martinasso (Swiss National Supercomputing Centre)
Abstract
pdf
Co-design, deployment and operation of a Modular Data Centre (MDC) with air and direct-liquid cooled supercomputers
Sadaf Alam (University of Bristol); Emma Akinyemi, Martin Podstata, and Jan Over (HPE); and Simon McIntosh-Smith, Ross Barnes, Naomi Harris, and Dave Moore (University of Bristol)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 4B: GPU Energy Efficiency
Session Chair: Maciej Cytowski (Pawsey Supercomputing Research Centre)
Optimizing GPU Frequency for Sustainable HPC: Lessons Learned from a Year of Production on Adastra, an AMD GPU Supercomputer
Gabriel Hautreux, Naïma Alaoui, and Etienne Malaboeuf (CINES)
Abstract
pdf, pdf
Fine-Grained Application Energy and Power Measurements on the Frontier Exascale System
Oscar Hernandez and Wael Elwasif (Oak Ridge National Laboratory)
Abstract
pdf, pdf
EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUs
Anna Yue, Torsten Wilde, Sanyam Mehta, and Barbara Chapman (HPE)
Abstract
pdf
HPE Cray EX225a (MI300a) Blade Power Capping and HBM Page Retirement
Steven Martin, Randy Law, Leo Flores, Ron Urwin, and Larry Kaplan (HPE)
Abstract
pdf
Paper, Presentation
Technical Session 4C: Monitoring
Session Chair: David Carlson (Institute for Advanced Computational Science, Stony Brook University)
Utilization and Performance Monitoring of Ookami, an ARM Fujitsu A64FX Testbed Cluster with XDMoD
Nikolay A. Simakov, Joseph P. White, and Matthew D. Jones (SUNY University at Buffalo) and Eva Siegmann, David Carlson, and Robert J. Harrison (Stony Brook University)
Abstract
pdf
HPE Slingshot Monitoring Software: Actionable Insights for HPC and AI Systems
Sahil Patel (HPE)
Abstract
pdf
LDMS New Features for Deployment in Advanced Environments and Feedback for Operations
Jim Brandt, Ben Schwaller, Jennifer Green, Ben Allan, Cory Lueninghoener, Evan Donato, Vanessa Surjadidjaja, Sara Walton, and Ann Gentile (Sandia National Laboratories)
Abstract
pdf
Proactive Health Monitoring and Maintenance of High-Speed Slingshot Fabrics in HPC Environments
Michael Cush, Jeff Kabel, Michael Schmit, Michael Accola, and Forest Godfrey (HPE)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 4A: New Deployment
Session Chair: Jim Rogers (Oak Ridge National Laboratory)
A journey to provide GH200
Mark Klein, Thomas Schulthess, Jonathan Coles, and Miguel Gila (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
pdf
Evaluating AMD MI300A APU: Performance Insights on LLM Training via Knowledge Distillation
Dennis Dickmann (Seedbox); Philipp Offenhäuser (HPE); Rishabh Saxena (HLRS, University of Stuttgart); George Markomanolis (AMD); Alessandro Rigazzi (HPE HPC/AI EMEA Research Lab); Patrick Keller (HPE); and Kerem Kayabay and Dennis Hoppe (HLRS, University of Stuttgart)
Abstract
pdf, pdf
Evaluation of the Nvidia Grace Superchip in the HPE/Cray XD Isambard 3 supercomputer
Thomas Green and Sadaf Alam (University of Bristol)
Abstract
pdf, pdf
Separating concerns: Decoupling the Slingshot Fabric Manager from Cray System Management
Riccardo Di Maria and Chris Gamboni (Swiss National Supercomputing Centre), Davide Tacchella and Isa Wazirzada (HPE), and Mark Klein (Swiss National Supercomputing Centre)
Abstract
pdf
Paper, Presentation
Technical Session 5B: Maintaining Large Systems
Session Chair: Aaron Scantlin (National Energy Research Scientific Computing Center)
Hardware Triage Tool: Enhancements and Extensions
Isa Muhammad Wazirzada, Abhishek Mehta, Vinanti Phadke, and Bhuvan Meda Rajesh (HPE)
Abstract
pdf
Detecting operating system noise with detect-detour
Nagaraju KN, Clark Snyder, Dean Roe, and Larry Kaplan (HPE)
Abstract
pdf, pdf
Analyzing a Lifetime of Failures on a Cray XC40 Supercomputer
Kevin Brown and Tanwi Mallick (Argonne National Laboratory), Zhiling Lan (University of Illinois Chicago), Robert Ross (Argonne National Laboratory), and Christopher Carothers (Rensselaer Polytechnic Institute)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 5C: Filesystems & I/O
Session Chair: Raj Gautam (ExxonMobil)
E2000 Performance From Microbenchmarks to Applications
William Loewe, Michael Moore, Sakib Samar, and Chris Walker (HPE)
Abstract
pdf, pdf
Towards Empirical Roofline Modeling of Distributed Data Services: Mapping the Boundaries of RPC Throughput
Philip Carns, Matthieu Dorier, Rob Latham, Shane Snyder, and Amal Gueroudji (Argonne National Laboratory); Seth Ockerman (University of Wisconsin-Madison); Jerome Soumagne (HPE); Dong Dai (University of Delaware); and Robert Ross (Argonne National Laboratory)
Abstract
pdf, pdf
HPC workload characterization using eBPF
Shubh Pachchigar and Brandon Cook (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) and Brian Friesen (Lawrence Berkeley National Laboratory)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 5A: Slingshot & MPI Tuning
Session Chair: Brett Bode (National Center for Supercomputing Applications/University of Illinois, National Center for Supercomputing Applications)
MPI implementation optimization for Slingshot network
Rahulkumar Gayatri, Adam Lavely, Neil Mehta, Brandon Cook, and Afton Geil (Lawrence Berkeley National Laboratory)
Abstract
pdf, pdf
Using Different MPI Implementations on HPE Cray EX Supercomputers for Native and Containerized Applications Execution ​
Maciej Pawlik and Maciej Szpindler (Academic Computer Centre CYFRONET), Marcin Krotkiewski (University of Oslo), and Alfio Lazzaro (HPE)
Abstract
pdf
Scaling MPI Applications on Aurora
Nilakantan Mahadevan (Hewlett Packard Enterprise); Premanand Sakarda (Intel Corporation); Scott Parker, Servesh Muralidharan, Vitali Morozov, and Victor Anisimov (Argonne National Laboratory); Huda Ibeid, Anthony-Trung Nguyen, and Aditya Nishtala (Intel Corporation); Larry Kaplan and Michael Woodacre (Hewlett Packard Enterprise); and Kalyan Kumaran and JaeHyuk Kwack (Argonne National Laboratory)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 6B: Framework for HPC-AI workflows
Session Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)
Framework for tracking metadata, lineage and model provenance in hybrid simulation-AI HPC exascale workflows
Martin Foltin, Andrew Shao, Rishabh Sharma, Shreyas Kulkarni, Annmary Justine Koomthanam, Aalap Tripathy, and Cong Xu (HPE); Wenqian Dong (Oregon State University); Suparna Bhattacharya (HPE); Brian Sammuli (General Atomics); and Paolo Faraboschi (HPE)
Abstract
pdf, pdf
Search and Query Framework for Workflows with HPC and AI Models
Christopher Rickett, Sreenivas Sukumar, and Karlon West (HPE)
Abstract
pdf, pdf
FirecREST v2: Lessons Learned from Redesigning an API for Scalable HPC Resource Access
Elia Palme and Juan Pablo Dorsch (CSCS - ETH Zurich); Ali Khosravi and Giovanni Pizzi (PSI Center for Scientific Computing, Theory, and Data); and Francesco Pagnamenta, Andrea Ceriani, Eirini Koutsaniti, Rafael Sarmiento, Ivano Bonesana, and Alejandro Dabin (CSCS - ETH Zurich)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 6C: Programming Models
Session Chair: Benjamin Cumming (CSCS, ETH Zurich)
Designing GPU-aware OpenSHMEM for HPE Cray EX and XD Systems
Danielle Sikich, Naveen Namashivayam Ravichandrasekaran, Md Rahman, Elliot Joseph Ronaghan, Nathan Wichmann, and William Okuno (HPE)
Abstract
pdf, pdf
Quantifying Message Aggregation Optimisations for Energy Savings in PGAS Models
Aaron Welch and Oscar Hernandez (Oak Ridge National Laboratory) and Stephen Poole and Wendy Poole (Los Alamos National Laboratory)
Abstract
pdf, pdf
Accelerating LArTPC Simulations: Enhancing larnd-sim with GPU Optimization Techniques
Madan Timalsina (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); Matt Kramer (Lawrence Berkeley National Laboratory); Pengfei Ding (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); Ronan Doherty (Trinity College Dublin); Rishabh Dave (UC Berkeley); Nicholas Tyler, Urjoshi Sinha, and William Arndt (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); and Callum Wilkinson (Lawrence Berkeley National Laboratory)
Abstract
pdf
Paper, Presentation
Technical Session 6A: DAOS
Session Chair: Jesse A. Hanley (Oak Ridge National Laboratory)
DAOS - New Horizons for High Performance Storage
Michael Hennecke and Jerome Soumagne (HPE)
Abstract
pdf
Enhancing RPC on Slingshot for Aurora’s DAOS Storage System
Jerome Soumagne, Alexander Oganezov, Ian Ziemba, and Steve Welch (HPE); Philip Carns and Kevin Harms (Argonne National Laboratory); and John Carrier, Johann Lombardi, Mohamad Chaarawi, Zhen Liang, and Scott Peirce (HPE)
Abstract
pdf, pdf
Global Distributed Client-side Cache for DAOS
Clarete R. Crasta, John L Byrne, Abhishek Dwaraki, David Emberson, Harumi Kuno, Sekwon Lee, Ramya Ahobala Rao, Shreyas Vinayaka Basri K S, Amitha C, Chinmay Ghosh, Rishi Kesh Kumar Rajak, Sriram Ravishankar, Porno Shome, and Lance Evans (HPE)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 7B: Access Nodes & Kubernetes Management
Session Chair: Jim Williams (Los Alamos National Laboratory)
Addressing Resource Constraints on Aurora with Admin Access Nodes
Peter Upton, Ben Lenard, Ben Allen, and Cyrus Blackworth (Argonne National Laboratory)
Abstract
pdf, pdf
HPE Slingshot in the Kubernetes Ecosystem
Caio Davi and Jesse Treger (HPE)
Abstract
pdf, pdf
Building non-standard images for CSM systems
Harold Longley, Isa Wazirzada, Dennis Walker, Andy Warner, and Davide Tacchella (HPE)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 7C: Application Performance
Session Chair: Juan F R Herrera (EPCC, The University of Edinburgh)
Task-decomposed Overlapped Pressure Preconditioner for Sustained Strong Scalability on Accelerated Exascale Systems
Niclas Jansson (KTH Royal Institute of Technology)
Abstract
pdf
Supernovae in HPC: Benchmarking FLASH Across Advanced Computing Clusters
Joshua Martin, Eva Siegmann, and Alan Calder (Stony Brook University, Institute of Advanced Computational Science)
Abstract
pdf
Expanding Community Access to Real-World HPC Application I/O Characterization Data Using Darshan
Shane Snyder, Philip Carns, Robert Ross, Robert Latham, and Kevin Harms (Argonne National Laboratory)
Abstract
pdf, pdf
Paper, Presentation, Birds of a Feather
Technical Session 7A: AI/ML GPU Workloads
Session Chair: Raj Gautam (ExxonMobil)
Porting Radio Astronomy Correlation to Setonix, a HPE Cray EX system powered by AMD GPUs
Cristian Di Pietrantonio (Pawsey Supercomputing Research Centre, Curtin Institute for Radio Astronomy); Marcin Sokolowski (Curtin Institute of Radio Astronomy); Christopher Harris (Pawsey Supercomputing Research Centre); and Daniel Price and Randal Wayth (SKAO)
Abstract
pdf, pdf
Evaluating the Performance of Containerized ML and LLM Applications on the Frontier and Odo Supercomputers
Bishwo Dahal (University of Louisiana Monroe, Oak Ridge National Laboratory) and Elijah Maccarthy and Subil Abraham (Oak Ridge National Laboratory)
Abstract
pdf, pdf
BoF on Transforming Hybrid Workflows: The Role of HPE Cray Supercomputing User Services Software in Bridging HPC and AI
Tulsi Mishra, Dean Roe, and Larry Kaplan (HPE)
Abstract
pdf

Plenary
Plenary
Plenary Session: CUG 2025 Welcome, Keynote Presentation
Welcome from the CUG President, Ashley Barker
Ashley Barker (Oak Ridge National Laboratory)
Abstract
Keynote: What I’ve Learned About Supercomputing from Blowing Up Stars, Michael Zingale (Stony Brook University)
Michael Zingale (Stony Brook University)
Abstract
New Member Site: Introducing LRZ
Markus Michael Müller (LRZ)
Abstract
pdf
CUG 2026 Elections: Candidate Statements
Lipi Gupta (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)
Abstract
Plenary
Plenary Session: Stony Brook LOC Welcome, HPE Update
Welcome by Stony Brook University
Robert Harrison (Stony Brook University)
Abstract
Altair: AI/ML Intelligent Scheduling for HPC with Altair®
Bill Nitzberg (Altair Engineering, Inc.)
Abstract
pdf
NVIDIA HPC Software - Expanding HPC with Python & AI
Becca Zandstein (NVIDIA)
Abstract
pdf
HPE Corporate Update, Gerald Kleyn
Gerald Kleyn (HPE)
Abstract
Plenary, Paper
Plenary Session: CUG Organizational Update and Best Paper Presentation
CUG Organizational Update
Ashley Barker (Oak Ridge National Laboratory)
Abstract
Evolving HPC services to enable ML workloads on HPE Cray EX
Stefano Schuppli, Fawzi Mohamed, Henrique Mendonca, Nina Mujkanovic, Elia Palme, Dino Conciatore, Lukas Drescher, Miguel Gila, Pim Witlox, Joost VandeVondele, Maxime Martinasso, Torsten Hoefler, and Thomas Schulthess (Swiss National Supercomputing Centre)
Abstract
pdf, pdf
Alps, a versatile research infrastructure
Maxime Martinasso (Swiss National Supercomputing Centre, ETH Zurich) and Mark Klein and Thomas Schulthess (Swiss National Supercomputing Centre)
Abstract
pdf, pdf
Plenary, Vendor
Plenary: Sponsors Talks, HPE 1-100
Linaro: Unlocking Exascale Debugging and Performance Engineering with Linaro Forge
Rudy Shand (Linaro Ltd)
Abstract
pdf
Codee: A Tool to Enhance Correctness, Modernization, Security, Portability and Optimization in Fortran and C/C++ Software Applications
Manuel Arenaz (Codee)
Abstract
pdf
AMD: The Unreasonable Effectiveness of FP64 Precision Arithmetic
Nicholas Malaya (AMD)
Abstract
HPE 1 on 100 with Trish Damkroger (HPE Customers only. No HPE partners or CUG sponsors)
Trish Damkroger (HPE)
Abstract
Plenary
Plenary: CUG 2026, Panel
New Member Site: Introducing GeoSphere
Martin Shivraj Saini (Geosphere)
Abstract
pdf
New Member Site: Introducing Cyfronet
Patryk Lasoń (Academic Computer Centre Cyfronet AGH)
Abstract
pdf
VAST Data Platform
Jan Heichler (VAST)
Abstract
pdf
CUG2026 site presentation
TBD TBD (TBD)
Abstract
Panel: The Future of Precision in HPC, which FP is the Right One?
Ashley Barker (Oak Ridge National Laboratory)
Abstract
Plenary
CUG 2025 Closing Remarks

Presentations
Paper, Presentation
Technical Session 1B: Workload manager
Session Chair: David Carlson (Institute for Advanced Computational Science, Stony Brook University)
Slinky: The Missing Link Between Slurm and Kubernetes
Tim Wickberg (SchedMD LLC)
Abstract
pdf
How Best to Leverage Cloud for (Big) HPC Sites
Bill Nitzberg and Ian Littlewood (Altair Engineering, Inc.)
Abstract
pdf
Divide and Rule: Automated Workload Distribution for Efficient User Support Services
Luca Marsella (Swiss National Supercomputing Centre)
Abstract
pdf
Paper, Presentation
Technical Session 1C: Software deployment
Session Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)
Deploying and Tracking Software with NCCS Software Provisioning
Asa Rentschler, Nicholas Hagerty, Elijah Maccarthy, and Edwin F. Posada Correa (Oak Ridge National Laboratory)
Abstract
pdf, pdf
Modern Software Deployment on a Multi-Tenant Cray-EX System
Ben Cumming, Andreas Fink, Simon Pintarelli, and John Biddiscombe (CSCS)
Abstract
pdf
Employing a Software-Driven Approach to Scalable HPC System Management
Aaron Barlow (Oak Ridge National Laboratory)
Abstract
pdf
Paper, Presentation
Technical Session 1A: Multitenancy
Session Chair: Juan F R Herrera (EPCC, The University of Edinburgh)
Infrastructure as a Service with Strong Tenant Separation on a Supercomputer
Riccardo Di Maria, Chris Gamboni, Manuel Sopena Ballesteros, Hussein Harake, Mark Klein, Marco Passerini, Miguel Gila, Maxime Martinasso, and Thomas C. Schulthess (Swiss National Supercomputing Centre) and Alun Ashton, Derek Feichtinger, Marc Caubet, Elsa Germann, Hans-Nikolai Viessmann, Achim Gsell, and Krisztian Pozsa (Paul Scherrer Institute)
Abstract
pdf, pdf
Dynamic Network Perimeterization: Isolating Tenant Workloads With VLANs, VNIs, & ACLs
Nikhil Mukundan, Dennis Walker, Stephen Han, Atif Ali, Siri Vias Khalsa, Amit Jain, Vishal Bhatia, and Vinay Karanth (HPE)
Abstract
pdf, pdf
CSCS' journey towards complete platform automation in a multi-tenant environment
Miguel Gila, Ivano Bonesana, and Alejandro Dabin (Swiss National Supercomputing Centre, CSCS)
Abstract
pdf
Paper, Presentation
Technical Session 2B: Security & Configuration Management
Session Chair: Jim Williams (Los Alamos National Laboratory)
Pragmatic Security Audits: Fortifying HPC Environments at a Consumable Pace
Alden Stradling (Los Alamos National Laboratory) and Monica Dessouky and Dennis Walker (HPE)
Abstract
pdf, pdf
Experimenting with Security Compliance Checking using ReFrame
Victor Holanda Rusu, Matteo Basso, Chris Gamboni, Fabio Zambrino, and Massimo Benini (Swiss National Supercomputing Centre)
Abstract
pdf, pdf
From Weeks to Hours: Harnessing Configuration Management and Deployment Pipelines
Dennis Walker and Siri Vias Khalsa (HPE) and Alex Lovell-Troy (Los Alamos National Laboratory)
Abstract
pdf, pdf
Rev Up Compute Node Reboots: 2x to 5x Faster
Dennis Walker (HPE) and Paul Selwood (Met Office, UK / NERC CMS)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 2C: Climate applications
Session Chair: Maciej Cytowski (Pawsey Supercomputing Research Centre)
Bit-reproducibility in UK Met Office Weather and Climate Applications
David Acreman (HPE)
Abstract
pdf
Enabling km-scale coupled climate simulations with ICON on AMD GPUs
Jussi Enkovaara (CSC - IT Center for Science Ltd.)
Abstract
pdf
MARBLChapel: Fortran-Chapel Interoperability in an Ocean Simulation
Brandon Neth and Ben Harshbarger (HPE); Scott Bachman ([C]Worthy); and Michelle Mills Strout (HPE, University of Arizona)
Abstract
pdf
Redefining Weather Forecasting Systems: The Transition to ICON and Alps
Mauro Bianco, Matthias Kraushaar, and Roberto Aielli (ETH Zurich); Oliver Fuhrer (Federal Office of Meteorology and Climatology MeteoSwiss); and Thomas Schulthess (ETH Zurich)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 2A: Slingshot
Session Chair: Brett Bode (National Center for Supercomputing Applications/University of Illinois, National Center for Supercomputing Applications)
The HPE Slingshot 400 Expedition
Houfar Azgomi, Duncan Roweth, Gregory Faanes, and Jesse Treger (HPE)
Abstract
pdf, pdf
Introduction To HPE Slingshot NIC Libfabric Environment Variables
Jesse Treger and Ian Ziemba (HPE)
Abstract
pdf
Math in Your Network: Slingshot Hardware Accelerated Reductions
Forest Godfrey and Duncan Roweth (HPE)
Abstract
pdf
Slingshot Host Software Ethernet Tuning
Ravi Bissa, Ian Ziemba, Duncan Roweth, and Forest Godfrey (HPE)
Abstract
pdf
Paper, Presentation
Technical Session 3B: HPCM
Session Chair: Matthew A. Ezell (Oak Ridge National Laboratory)
A Brief Summary of the HPCM (HPE Performance Cluster Manager) Evolution Over Recent Releases
Sue Miller, Lee Morecroft, and Peter Guyan (HPE)
Abstract
System Visualization Using Rackmap
Troy Dey and Peter Guyan (HPE)
Abstract
Harvesting, Storing and Processing Data from our HPCM Systems
Ben Lenard, Eric Pershey, Brian Toonen, Peter Upton, Doug Waldron, Lisa Childers, Micheal Zhang, and Bryan Brickman (Argonne National Laboratory)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 3C: Future Technology
Session Chair: Juan F R Herrera (EPCC, The University of Edinburgh)
Evolving Sarus to augment Podman for HPC on Cray EX
Alberto Madonna, Gwangmu Lee, and Felipe Cruz (Swiss National Supercomputing Centre)
Abstract
pdf
What is RISC-V and why should we care?
Nick Brown (EPCC)
Abstract
pdf, pdf
A Full Stack Framework for High Performance Quantum-Classical Computing
Xin Zhan, K. Grace Johnson, and Soumitra Chatterjee (HPE); Barbara Chapman (HPE, Stony Brook University); and Masoud Mohseni, Kirk Bresniker, and Ray Beausoleil (HPE)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 3A: Data Centers
Session Chair: Lena M Lopatina (LANL)
Causality inference for Digital Twins in GPU Data Centers and Smart Grids.
Rolando Pablo Hong Enriquez, Pavana Prakash, Ebad Taheri, and Aditya Dhakal (HPE); Matthias Maiterth and Wesley Brewer (Oak Ridge National Laboratory); and Dejan Milojicic (HPE)
Abstract
pdf, pdf
AlpsB – a Geographically Distributed Infrastructure to Facilitate Large-Scale Training of Weather and Climate AI Models
Alex Upton, Jerome Tissieres, and Maxime Martinasso (Swiss National Supercomputing Centre)
Abstract
pdf
Co-design, deployment and operation of a Modular Data Centre (MDC) with air and direct-liquid cooled supercomputers
Sadaf Alam (University of Bristol); Emma Akinyemi, Martin Podstata, and Jan Over (HPE); and Simon McIntosh-Smith, Ross Barnes, Naomi Harris, and Dave Moore (University of Bristol)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 4B: GPU Energy Efficiency
Session Chair: Maciej Cytowski (Pawsey Supercomputing Research Centre)
Optimizing GPU Frequency for Sustainable HPC: Lessons Learned from a Year of Production on Adastra, an AMD GPU Supercomputer
Gabriel Hautreux, Naïma Alaoui, and Etienne Malaboeuf (CINES)
Abstract
pdf, pdf
Fine-Grained Application Energy and Power Measurements on the Frontier Exascale System
Oscar Hernandez and Wael Elwasif (Oak Ridge National Laboratory)
Abstract
pdf, pdf
EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUs
Anna Yue, Torsten Wilde, Sanyam Mehta, and Barbara Chapman (HPE)
Abstract
pdf
HPE Cray EX225a (MI300a) Blade Power Capping and HBM Page Retirement
Steven Martin, Randy Law, Leo Flores, Ron Urwin, and Larry Kaplan (HPE)
Abstract
pdf
Paper, Presentation
Technical Session 4C: Monitoring
Session Chair: David Carlson (Institute for Advanced Computational Science, Stony Brook University)
Utilization and Performance Monitoring of Ookami, an ARM Fujitsu A64FX Testbed Cluster with XDMoD
Nikolay A. Simakov, Joseph P. White, and Matthew D. Jones (SUNY University at Buffalo) and Eva Siegmann, David Carlson, and Robert J. Harrison (Stony Brook University)
Abstract
pdf
HPE Slingshot Monitoring Software: Actionable Insights for HPC and AI Systems
Sahil Patel (HPE)
Abstract
pdf
LDMS New Features for Deployment in Advanced Environments and Feedback for Operations
Jim Brandt, Ben Schwaller, Jennifer Green, Ben Allan, Cory Lueninghoener, Evan Donato, Vanessa Surjadidjaja, Sara Walton, and Ann Gentile (Sandia National Laboratories)
Abstract
pdf
Proactive Health Monitoring and Maintenance of High-Speed Slingshot Fabrics in HPC Environments
Michael Cush, Jeff Kabel, Michael Schmit, Michael Accola, and Forest Godfrey (HPE)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 4A: New Deployment
Session Chair: Jim Rogers (Oak Ridge National Laboratory)
A journey to provide GH200
Mark Klein, Thomas Schulthess, Jonathan Coles, and Miguel Gila (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
pdf
Evaluating AMD MI300A APU: Performance Insights on LLM Training via Knowledge Distillation
Dennis Dickmann (Seedbox); Philipp Offenhäuser (HPE); Rishabh Saxena (HLRS, University of Stuttgart); George Markomanolis (AMD); Alessandro Rigazzi (HPE HPC/AI EMEA Research Lab); Patrick Keller (HPE); and Kerem Kayabay and Dennis Hoppe (HLRS, University of Stuttgart)
Abstract
pdf, pdf
Evaluation of the Nvidia Grace Superchip in the HPE/Cray XD Isambard 3 supercomputer
Thomas Green and Sadaf Alam (University of Bristol)
Abstract
pdf, pdf
Separating concerns: Decoupling the Slingshot Fabric Manager from Cray System Management
Riccardo Di Maria and Chris Gamboni (Swiss National Supercomputing Centre), Davide Tacchella and Isa Wazirzada (HPE), and Mark Klein (Swiss National Supercomputing Centre)
Abstract
pdf
Paper, Presentation
Technical Session 5B: Maintaining Large Systems
Session Chair: Aaron Scantlin (National Energy Research Scientific Computing Center)
Hardware Triage Tool: Enhancements and Extensions
Isa Muhammad Wazirzada, Abhishek Mehta, Vinanti Phadke, and Bhuvan Meda Rajesh (HPE)
Abstract
pdf
Detecting operating system noise with detect-detour
Nagaraju KN, Clark Snyder, Dean Roe, and Larry Kaplan (HPE)
Abstract
pdf, pdf
Analyzing a Lifetime of Failures on a Cray XC40 Supercomputer
Kevin Brown and Tanwi Mallick (Argonne National Laboratory), Zhiling Lan (University of Illinois Chicago), Robert Ross (Argonne National Laboratory), and Christopher Carothers (Rensselaer Polytechnic Institute)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 5C: Filesystems & I/O
Session Chair: Raj Gautam (ExxonMobil)
E2000 Performance From Microbenchmarks to Applications
William Loewe, Michael Moore, Sakib Samar, and Chris Walker (HPE)
Abstract
pdf, pdf
Towards Empirical Roofline Modeling of Distributed Data Services: Mapping the Boundaries of RPC Throughput
Philip Carns, Matthieu Dorier, Rob Latham, Shane Snyder, and Amal Gueroudji (Argonne National Laboratory); Seth Ockerman (University of Wisconsin-Madison); Jerome Soumagne (HPE); Dong Dai (University of Delaware); and Robert Ross (Argonne National Laboratory)
Abstract
pdf, pdf
HPC workload characterization using eBPF
Shubh Pachchigar and Brandon Cook (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) and Brian Friesen (Lawrence Berkeley National Laboratory)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 5A: Slingshot & MPI Tuning
Session Chair: Brett Bode (National Center for Supercomputing Applications/University of Illinois, National Center for Supercomputing Applications)
MPI implementation optimization for Slingshot network
Rahulkumar Gayatri, Adam Lavely, Neil Mehta, Brandon Cook, and Afton Geil (Lawrence Berkeley National Laboratory)
Abstract
pdf, pdf
Using Different MPI Implementations on HPE Cray EX Supercomputers for Native and Containerized Applications Execution ​
Maciej Pawlik and Maciej Szpindler (Academic Computer Centre CYFRONET), Marcin Krotkiewski (University of Oslo), and Alfio Lazzaro (HPE)
Abstract
pdf
Scaling MPI Applications on Aurora
Nilakantan Mahadevan (Hewlett Packard Enterprise); Premanand Sakarda (Intel Corporation); Scott Parker, Servesh Muralidharan, Vitali Morozov, and Victor Anisimov (Argonne National Laboratory); Huda Ibeid, Anthony-Trung Nguyen, and Aditya Nishtala (Intel Corporation); Larry Kaplan and Michael Woodacre (Hewlett Packard Enterprise); and Kalyan Kumaran and JaeHyuk Kwack (Argonne National Laboratory)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 6B: Framework for HPC-AI workflows
Session Chair: Chris Fuson (ORNL, Oak Ridge National Laboratory)
Framework for tracking metadata, lineage and model provenance in hybrid simulation-AI HPC exascale workflows
Martin Foltin, Andrew Shao, Rishabh Sharma, Shreyas Kulkarni, Annmary Justine Koomthanam, Aalap Tripathy, and Cong Xu (HPE); Wenqian Dong (Oregon State University); Suparna Bhattacharya (HPE); Brian Sammuli (General Atomics); and Paolo Faraboschi (HPE)
Abstract
pdf, pdf
Search and Query Framework for Workflows with HPC and AI Models
Christopher Rickett, Sreenivas Sukumar, and Karlon West (HPE)
Abstract
pdf, pdf
FirecREST v2: Lessons Learned from Redesigning an API for Scalable HPC Resource Access
Elia Palme and Juan Pablo Dorsch (CSCS - ETH Zurich); Ali Khosravi and Giovanni Pizzi (PSI Center for Scientific Computing, Theory, and Data); and Francesco Pagnamenta, Andrea Ceriani, Eirini Koutsaniti, Rafael Sarmiento, Ivano Bonesana, and Alejandro Dabin (CSCS - ETH Zurich)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 6C: Programming Models
Session Chair: Benjamin Cumming (CSCS, ETH Zurich)
Designing GPU-aware OpenSHMEM for HPE Cray EX and XD Systems
Danielle Sikich, Naveen Namashivayam Ravichandrasekaran, Md Rahman, Elliot Joseph Ronaghan, Nathan Wichmann, and William Okuno (HPE)
Abstract
pdf, pdf
Quantifying Message Aggregation Optimisations for Energy Savings in PGAS Models
Aaron Welch and Oscar Hernandez (Oak Ridge National Laboratory) and Stephen Poole and Wendy Poole (Los Alamos National Laboratory)
Abstract
pdf, pdf
Accelerating LArTPC Simulations: Enhancing larnd-sim with GPU Optimization Techniques
Madan Timalsina (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); Matt Kramer (Lawrence Berkeley National Laboratory); Pengfei Ding (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); Ronan Doherty (Trinity College Dublin); Rishabh Dave (UC Berkeley); Nicholas Tyler, Urjoshi Sinha, and William Arndt (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); and Callum Wilkinson (Lawrence Berkeley National Laboratory)
Abstract
pdf
Paper, Presentation
Technical Session 6A: DAOS
Session Chair: Jesse A. Hanley (Oak Ridge National Laboratory)
DAOS - New Horizons for High Performance Storage
Michael Hennecke and Jerome Soumagne (HPE)
Abstract
pdf
Enhancing RPC on Slingshot for Aurora’s DAOS Storage System
Jerome Soumagne, Alexander Oganezov, Ian Ziemba, and Steve Welch (HPE); Philip Carns and Kevin Harms (Argonne National Laboratory); and John Carrier, Johann Lombardi, Mohamad Chaarawi, Zhen Liang, and Scott Peirce (HPE)
Abstract
pdf, pdf
Global Distributed Client-side Cache for DAOS
Clarete R. Crasta, John L Byrne, Abhishek Dwaraki, David Emberson, Harumi Kuno, Sekwon Lee, Ramya Ahobala Rao, Shreyas Vinayaka Basri K S, Amitha C, Chinmay Ghosh, Rishi Kesh Kumar Rajak, Sriram Ravishankar, Porno Shome, and Lance Evans (HPE)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 7B: Access Nodes & Kubernetes Management
Session Chair: Jim Williams (Los Alamos National Laboratory)
Addressing Resource Constraints on Aurora with Admin Access Nodes
Peter Upton, Ben Lenard, Ben Allen, and Cyrus Blackworth (Argonne National Laboratory)
Abstract
pdf, pdf
HPE Slingshot in the Kubernetes Ecosystem
Caio Davi and Jesse Treger (HPE)
Abstract
pdf, pdf
Building non-standard images for CSM systems
Harold Longley, Isa Wazirzada, Dennis Walker, Andy Warner, and Davide Tacchella (HPE)
Abstract
pdf, pdf
Paper, Presentation
Technical Session 7C: Application Performance
Session Chair: Juan F R Herrera (EPCC, The University of Edinburgh)
Task-decomposed Overlapped Pressure Preconditioner for Sustained Strong Scalability on Accelerated Exascale Systems
Niclas Jansson (KTH Royal Institute of Technology)
Abstract
pdf
Supernovae in HPC: Benchmarking FLASH Across Advanced Computing Clusters
Joshua Martin, Eva Siegmann, and Alan Calder (Stony Brook University, Institute of Advanced Computational Science)
Abstract
pdf
Expanding Community Access to Real-World HPC Application I/O Characterization Data Using Darshan
Shane Snyder, Philip Carns, Robert Ross, Robert Latham, and Kevin Harms (Argonne National Laboratory)
Abstract
pdf, pdf
Paper, Presentation, Birds of a Feather
Technical Session 7A: AI/ML GPU Workloads
Session Chair: Raj Gautam (ExxonMobil)
Porting Radio Astronomy Correlation to Setonix, a HPE Cray EX system powered by AMD GPUs
Cristian Di Pietrantonio (Pawsey Supercomputing Research Centre, Curtin Institute for Radio Astronomy); Marcin Sokolowski (Curtin Institute of Radio Astronomy); Christopher Harris (Pawsey Supercomputing Research Centre); and Daniel Price and Randal Wayth (SKAO)
Abstract
pdf, pdf
Evaluating the Performance of Containerized ML and LLM Applications on the Frontier and Odo Supercomputers
Bishwo Dahal (University of Louisiana Monroe, Oak Ridge National Laboratory) and Elijah Maccarthy and Subil Abraham (Oak Ridge National Laboratory)
Abstract
pdf, pdf
BoF on Transforming Hybrid Workflows: The Role of HPE Cray Supercomputing User Services Software in Bridging HPC and AI
Tulsi Mishra, Dean Roe, and Larry Kaplan (HPE)
Abstract
pdf

Program Event Contents
Program Event Content
Expanding Horizons in AI with HPC Workshop
This workshop, located at Stony Brook University on Expanding Horizons in AI with HPC, aims to explore the dynamic intersection of AI and HPC, focusing on how advanced computing can accelerate AI research and applications. As AI models become more complex and data-intensive, traditional computing systems struggle to meet the demand for scalability, efficiency, and speed. HPC offers a solution by providing the necessary infrastructure for training large-scale models, enhancing AI algorithms, and enabling breakthroughs in fields such as deep learning, natural language processing, and autonomous systems.

Registration and more details are available here: https://cug.org/cug-2025-aiwithhpc-workshop-2/
Program Event Content
Expanding Horizons in AI with HPC Workshop
This workshop, located at Stony Brook University on Expanding Horizons in AI with HPC, aims to explore the dynamic intersection of AI and HPC, focusing on how advanced computing can accelerate AI research and applications. As AI models become more complex and data-intensive, traditional computing systems struggle to meet the demand for scalability, efficiency, and speed. HPC offers a solution by providing the necessary infrastructure for training large-scale models, enhancing AI algorithms, and enabling breakthroughs in fields such as deep learning, natural language processing, and autonomous systems.

Registration and more details are available here: https://cug.org/cug-2025-aiwithhpc-workshop-2/

Tutorials
Tutorial
Tutorial 1B
Hands on with uenv and CPE in a container with Grace Hopper on Alps
Ben Cumming and Tim Robinson (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
pdf
Tutorial
Tutorial 1C
Best Practices For Operating and Maintaining Slingshot Fabrics
Forest Godfrey (Hewlett Packard Enterprise)
Abstract
pdf
Tutorial
Tutorial 1A
Monitoring HPE Cray HPC systems
Harold Longley, Sue Miller, Pete Guyan, and Raghul Vasudevan (HPE)
Abstract
pdf, gz, gz
Tutorial
Tutorial 1D
Exploring High Performance Storage with DAOS
Adrian Jackson (EPCC, The University of Edinburgh) and Mohamad Chaarawi and Kenneth Cain (HPE)
Abstract
pdf
Tutorial
Tutorial 1B Continued
Hands on with uenv and CPE in a container with Grace Hopper on Alps
Ben Cumming and Tim Robinson (Swiss National Supercomputing Centre, ETH Zurich)
Abstract
pdf
Tutorial
Tutorial 1C Continued
Best Practices For Operating and Maintaining Slingshot Fabrics
Forest Godfrey (Hewlett Packard Enterprise)
Abstract
pdf
Tutorial
Tutorial 1A Continued
Monitoring HPE Cray HPC systems
Harold Longley, Sue Miller, Pete Guyan, and Raghul Vasudevan (HPE)
Abstract
pdf, gz, gz
Tutorial
Tutorial 1D Continued
Exploring High Performance Storage with DAOS
Adrian Jackson (EPCC, The University of Edinburgh) and Mohamad Chaarawi and Kenneth Cain (HPE)
Abstract
pdf
Tutorial
Tutorial 2B
Automated Inspection of Fortran/C/C++ Code Using Codee for Correctness, Modernization, Optimization, and Security on HPE/Cray
Manuel Arenaz (Codee - Appentra Solutions)
Abstract
pdf, pdf
Tutorial
Tutorial 2C
Performance Analysis on AMD GPUs
Georgios Markomanolis (AMD)
Abstract
pdf
Tutorial
Tutorial 1A Continued
Monitoring HPE Cray HPC systems
Harold Longley, Sue Miller, Pete Guyan, and Raghul Vasudevan (HPE)
Abstract
pdf, gz, gz
Tutorial
Tutorial 2B Continued
Automated Inspection of Fortran/C/C++ Code Using Codee for Correctness, Modernization, Optimization, and Security on HPE/Cray
Manuel Arenaz (Codee - Appentra Solutions)
Abstract
pdf, pdf
Tutorial
Tutorial 2C Continued
Performance Analysis on AMD GPUs
Georgios Markomanolis (AMD)
Abstract
pdf
Tutorial
Tutorial 1A Continued
Monitoring HPE Cray HPC systems
Harold Longley, Sue Miller, Pete Guyan, and Raghul Vasudevan (HPE)
Abstract
pdf, gz, gz

Vendors
Plenary, Vendor
Plenary: Sponsors Talks, HPE 1-100
Linaro: Unlocking Exascale Debugging and Performance Engineering with Linaro Forge
Rudy Shand (Linaro Ltd)
Abstract
pdf
Codee: A Tool to Enhance Correctness, Modernization, Security, Portability and Optimization in Fortran and C/C++ Software Applications
Manuel Arenaz (Codee)
Abstract
pdf
AMD: The Unreasonable Effectiveness of FP64 Precision Arithmetic
Nicholas Malaya (AMD)
Abstract
HPE 1 on 100 with Trish Damkroger (HPE Customers only. No HPE partners or CUG sponsors)
Trish Damkroger (HPE)
Abstract

XTreme
XTreme (Approved NDA Members Only)
XTreme (Under NDA, Members Only)
XTreme (Approved NDA Members Only)
XTreme (Under NDA, Members Only)
XTreme (Approved NDA Members Only)
XTreme (Under NDA, Members Only
XTreme (Approved NDA Members Only)
XTreme (Under NDA, Members Only

Created 2025-5-15 2:51