Saturday, April 30th4:00pm-6:00pmCUG Board Meeting (closed) Board | Sunday, May 1st9:00am-5:00pmXTreme (Approved NDA Members Only) CUG Business 12:00pm-1:00pmCUG Advisory Council Meeting CUG Business, CUG Program Committee | Monday, May 2nd8:30am-12:00pmTutorial 1A Debugging and Performance Profiling on HPE Cray Supercomputers with AMD GPUs Debugging and Performance Profiling on HPE Cray Supercomputers with AMD GPUs Stephen Abbott, Constantinos Makrides, and Trey White (Hewlett Packard Enterprise) Abstract This half-day tutorial will walk through practical examples of debugging and performance profiling MPI programs that use AMD GPUs on HPE Cray Supercomputers. The tutorial will cover the range of tools provided by both the HPE Cray Programming Environment and the AMD ROCm Platform. Examples will include scalable backtraces with ATP and STAT, parallel GPU debugging with “rocgdb” and “gdb4hpc”, debugging with runtime output logs, performance profiling and tracing with both “rocprof” and HPE Cray Performance Analysis Tools, and performance debugging using compiler logs. Tutorial 8:30am-4:30pmTutorial 1B Cray System Management for HPE Cray EX Systems Cray System Management for HPE Cray EX Systems Harold Longley (Hewlett Packard Enterprise) Abstract This tutorial session will address these important questions about using Cray System Management (CSM) and related software products on the HPE Cray EX systems. How do you manage an HPE Cray EX system using CSM? What is in the DevOps-ready system management paradigm? What CLI tools access the containerized microservices via the well-documented RESTful APIs. How are containers orchestrated by Kubernetes to provide highly available, resilient services on management nodes? How does identity and access management protect critical resources? How is the system performing? What are the tools for the collection, monitoring, and analysis of telemetry and log data? Tutorial 10:00am-10:30amCoffee Break Break 12:00pm-1:00pmCUG board/ New Sites lunch (closed) Lunch Lunch (sponsored by NVIDIA) Lunch 1:00pm-1:30pmTechnical Session 0A NVIDIA HPC SDK Update NVIDIA HPC SDK Update Jeff Larkin (NVIDIA) Abstract The NVIDIA HPC SDK provides a full suite of compilers, math libraries, communication libraries, profilers, and debuggers for the NVIDIA platform. The HPC SDK is freely available to all developers on x86, Arm, and Power platforms and is included on HPE systems with NVIDIA GPUs. This presentation will provide a description of the technologies available in the HPC SDK and an update on recent developments. We will discuss NVIDIA's preferred methods of programming to the NVIDIA platform, including support for parallel programming ISO C++, Fortran, Python, compiler directives, and CUDA C++ or Fortran. Presentation 1:30pm-2:30pmBoF 1A Programming Environments, Applications, and Documentation (PEAD) Special Interest Group meeting Programming Environments, Applications, and Documentation (PEAD) Special Interest Group meeting Chris Fuson (Oak Ridge National Laboratory), Ryan Ward and Bill Sparks (Hewlett Packard), Bilel Hadri (King Abdullah University of Science and Technology), Guilherme Peretti-Pezzi (Swiss National Supercomputing Centre), and Stephen Leak (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Abstract The Programming Environments, Applications and Documentation Special Interest Group (“the SIG”) has as its mission to provide a forum for exchange of information related to the usability and performance of programming environments (including compilers, libraries and tools) and scientific applications running on Cray systems. Related topics in user support and communication (e.g. documentation) are also covered by the SIG. Birds of a Feather 2:30pm-3:00pmCoffee Break (sponsored by DDN) Break 3:00pm-5:00pmBoF 2A Future Directions for HPE’s Cray Programming Environment Future Directions for HPE’s Cray Programming Environment Barbara Chapman and Nicolas Dube (HPE) Abstract Much is changing in HPC: large-scale platforms are becoming architecturally more heterogeneous and diverse, applications are growing in their complexity and workflows increasingly combine AI with HPC. New ideas in parallel programming are being explored. Application workflows may combine the use of HPC systems with other resources, including data repositories. HPC Support Documentation Management and Best Practices HPC Support Documentation Management and Best Practices Chris Fuson (Oak Ridge National Laboratory), Victor Holanda (Swiss National Supercomputing Centre), Bilel Hadri (King Abdullah University of Science and Technology), Sanchez Peggy (Hewlett Packard), and Stephen Leak (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Abstract HPC centers provide large complex computational and storage resources to large user communities who often span diverse science domains, utilize varying workflows, and have varying levels of experience. Resource variations including the HPE Cray XE, XC, and Apollo systems combined with site-specific policies and configurations can make it challenging for users to effectively use a center’s valuable resources. Birds of a Feather | Tuesday, May 3rd8:20am-10:00amPlenary: Welcome, Keynote Trey Breckenridge; Jim Rogers; Kevin Stroup CUG Welcome CUG Welcome Abstract Welcome to CUG 2022 The Landscape of Stellar Death The Landscape of Stellar Death Anna Ho (UC Berkeley, LBNL) Abstract The night sky is dynamic: stars in distant galaxies explode and collide, and supermassive black holes at the centers of galaxies shred stars that have wandered too close. By using modern telescopes to patrol the sky, we can use discover phenomena lasting from a fraction of a second to decades. In this talk, I will describe my work using one such telescope, the Zwicky Transient Facility, to study some of the most extreme explosions in the universe in terms of energy and timescale: the deaths of stars. I will describe the data that we work with, the kind of analysis we perform, and the scientific questions we are finally able to answer. The Path to Zetta Scale The Path to Zetta Scale Wilfred Gomes and Thomas Krueger (Intel Corporation) Abstract This talk covers the technology pipeline to get to Zetta Scale . We build on the foundation around our Data center GPU architecture, codenamed Ponte Vecchio (PVC) . PVC enables a new class of Exaflop HPC/AI super computers implemented using active Logic on Logic 3D stacking with Foveros and EMIB and incorporates over 100B transistors spread across 47 tiles and 5 process nodes to deliver >1 Peta ops performance and enable exaflop computing. Using this as the background - we describe the path forward and the challenges to create Zetta Scale computing this decade. Vendor, Invited Talk 10:00am-10:30amCoffee Break (sponsored by SchedMD) Break 10:30am-12:00pmPlenary: CUG Business, new sites Trey Breckenridge; Scott Michael; Jim Rogers; Kevin Stroup CUG Business CUG Business Abstract CUG Board Reporting CINES, French national supercomuting center CINES, French national supercomuting center Gabriel Hautreux (CINES) Abstract CINES, French national supercomuting center Achieving Energy Efficiency in Long-Term Storage Achieving Energy Efficiency in Long-Term Storage Matt Ninesling (Spectra Logic) Abstract With increasing concern about global warming, there will be a greater focus on electrical energy consumption generated from non-renewable energy sources. As the demand for information technology is predicted to grow by six times over the next decade, the challenge will be how to satisfy this demand while, at the same time, not increasing, and preferably decreasing, the associated CO2 emissions. Spectra will be discussing possible methods for doing so. CSC Finland re-enters CUG with LUMI CSC Finland re-enters CUG with LUMI Pekka Manninen (CSC - IT Center for Science Ltd.) Abstract CSC - IT Center for Science has been a Cray site since late-80's, with a recent side-step to another vendor, but will now return to CUG with a recent installation of LUMI, a 550 Petaflop/s HPE Cray EX system. In the address I will present our new LUMI datacenter and the LUMI system. Altair: Multi-dimensional HPC for Breakthrough Results Altair: Multi-dimensional HPC for Breakthrough Results Branden Bauer (Altair Engineering, Inc.) Abstract Today’s hyper-competitive HPC landscape demands more than simple scheduling for CPUs. We’ll show you how to optimize across software licenses and storage requirements, accelerate with GPUs, burst to the cloud, submit jobs from anywhere, and more. GDIT’s High Performance Computing in Support of NOAA GDIT’s High Performance Computing in Support of NOAA Alan Powers (General Dynamics Information Technology) Abstract GDIT is part of General Dynamics family of companies with 28K+ employees and provides HPC support under a dozen federal and state contracts. This presentation will compare and contrast the HPC work GDIT supports for National Oceanic and Atmospheric Administration under two contracts across five data centers. The first contract is the NOAA Research and Development High Performance Computing Support (NOAA RDHPCS) contract, which entails systems across three locations (GFDL in Boulder, Colorado; ESRL Princeton, NJ; and NESCC in Fairmont, WV). The latter site hosts a Cray CS500, which debuted at no. 88 on the Top500. The other is the NOAA Weather and Climate Operational Supercomputing Systems 2 (NOAA WCOSS2) contract, under which GDIT manages two, identical, dedicated HPE Cray EX 4000 systems (2560 AMD Rome Compute Nodes) in Phoenix, AZ and Manassas, VA. These systems debuted at nos. 37 and 38 on the Top500. NOAA’s requirements for success are some of the toughest in the HPC industry to achieve to ensure that weather forecasts are generated on time, every time. Supercomputing at Microsoft Supercomputing at Microsoft Mike Kiernan (Microsoft) Abstract This short talk will introduce Microsoft's deployment and use of supercomputers, including Cray systems. Update on Arm in HPC Update on Arm in HPC Brent Gorda (Arm) Abstract Brent will update on the Arm business model, activity and progress in the HPC market. Arm continues to gain traction and gaining share in cloud and HPC via partners around the world. New CUG Site: Engineering Research Development Center DoD Supercomputing Resource Center New CUG Site: Engineering Research Development Center DoD Supercomputing Resource Center George Moncrief (ERDC DSRC) Abstract The ERDC DoD Supercomputing Resource Centers (DSRCs) is one of five DSRCs operated by the DoD High Performance Computing Modernization Program (HPCMP). We serve the high performance computing (HPC) needs of engineers and scientists throughout the DoD by providing a complete HPC environment, training, and expertise in the CFD (Computational Fluid Dynamics), CSM (Computational Structural Mechanics), and EQM (Environmental Quality Modeling and Simulation) Computational Technology Areas. New Site: AFW & ORNL HPC collaboration New Site: AFW & ORNL HPC collaboration David Hladky (AF LCMC, STI Tech) Abstract The US Air Force and the US Department of Energy’s (DOE’s) Oak Ridge National Laboratory (ORNL) launched a new high-performance weather forecasting computer system. Procured and managed by ORNL’s National Center for Computational Sciences (NCCS), the two Shasta supercomputers built by HPE Cray, a Hewlett Packard Enterprise company, provide a platform for some of the most advanced weather modeling in the world. Nvidia Nvidia Timothy Costa (NVIDIA) Abstract TBD New Site, Vendor, CUG Business, CUG Program Committee 12:00pm-1:00pmAllyship in HPC presented by Women in HPC (open to all) Lunch Lunch (sponsored by NVIDIA) Lunch 1:00pm-2:30pmTechnical Session 1A Jim Williams HPE Cray EX Shasta 22.03 Cray System Management Overview HPE Cray EX Shasta 22.03 Cray System Management Overview Harold Longley (Hewlett Packard Enterprise) Abstract Many changes have happened in the Shasta software products for HPE Cray EX systems in the last year. The changes from all software products (CSM, SMA, SDU, SAT, HFP, Slingshot, SHS, SLE, COS, UAN, CPE, WLM (Slurm and PBS Pro), Analytics, and CSM-diags) will be highlighted from the system management perspective for improvements in scalability or various functional areas. A brief overview of the Shasta software architecture will describe the Kubernetes-based management environment with micro-services that can be accessed via REST API or several CLI tools. The changes related to installation, upgrade, and rolling upgrade will be described. Starting with the installation of Shasta 1.4 on a HPE Cray EX system, the software can now be upgraded to Shasta 1.5 or later releases of the entire recipe or individual products. Augmenting HPCM System Management with Phoenix Augmenting HPCM System Management with Phoenix Matthew Ezell (Oak Ridge National Laboratory) Abstract HPE markets and supports their Cray EX systems with a choice between either the microservice-based Cray System Management (CSM) or the more traditional HPE Performance Cluster Manager (HPCM) software for system administration. HPCM (built from software previously known an CMU and Tempo) is a mature software solution with a long history of supporting SGI, HPE, and now Cray hardware. Oak Ridge National Laboratory (ORNL) evaluated HPCM for use on its Air Force Weather and Frontier systems and determined that it has the features and stability required to support a production workload. This evaluation also led ORNL to identify several opportunities for improvement and areas that HPCM could better integrate with their overall system administration strategies. ORNL extended their home-grown open source Phoenix system administration software to augment HPCM’s power/firmware management/control, hardware discovery, image build, and configuration management. This presentation will discuss HPCM and Phoenix, as well as describe how both tools are used at ORNL to manage HPE Cray EX systems. UAIs Come of Age: Hosting Multiple Custom Interactive Login Experiences Without Dedicated Hardware UAIs Come of Age: Hosting Multiple Custom Interactive Login Experiences Without Dedicated Hardware Best Paper Eric Lund (Hewlett Packard Enterprise) Abstract On HPE Cray EX systems, End-User User Access Instances (UAIs) are Kubernetes orchestrated customizable lightweight on-demand single-user User Access Node (UAN) equivalents. Broker UAIs mediate access to End-User UAIs, presenting a single SSH login and dynamic creation of tailored UAIs for each specific workflow. With UAIs sites can offer multiple distinct custom interactive login experiences through discrete Broker UAI SSH servers without requiring additional dedicated hardware. The User Access Service (UAS) manages UAIs and their configuration. Through the UAS, sites register custom container images to use as UAIs, Kubernetes Volumes to connect UAIs to external data and storage, Kubernetes Resource Specifications to guide UAI scheduling, and UAI Classes to combine these and other configuration parameters yielding targeted UAI definitions. Broker UAIs present completely customizable SSH access to UAIs based on these definitions. Presentation Technical Session 1B G. Todd Gamblin Parallel Programming with Standard C++ and Fortran Parallel Programming with Standard C++ and Fortran Jeff Larkin (NVIDIA) Abstract The HPC community has long led the way in parallel programming models out of a need to program leadership class systems, but as multi-core CPUs and GPUs have become ubiquitous, parallel programming has likewise become more mainstream. Mainline programming languages like C++, Fortran, and Python have increasingly adopted features necessary for parallel programming without additional APIs. In this presentation I will show results using standard language parallelism in C++ and Fortran to program for both multicore CPUs and GPUs. I will also demonstrate how to use these languages together with NVIDIA’s HPC SDK to program for the NVIDIA platform. I will also present the status of the next versions of C++ and Fortran in relation to parallel programming. Open Approaches to Heterogeneous Programming are Key for Surviving the New Golden Age of Computer Architecture Open Approaches to Heterogeneous Programming are Key for Surviving the New Golden Age of Computer Architecture James Reinders, James Brodman, and John Pennycook (Intel Corporation) Abstract SYCL and oneAPI aim to help us with Heterogeneous Programming by emphasizing open, multivendor, and multiarchitecture solutions. We will start by discussing the need for heterogeneous computing, and the challenges this poses for effective portable programming. An introduction to SYCL, and parts of the oneAPI initiative and tooling, will illustrate both the challenges to solve, and potential solutions for now and into the future. An open ecosystem has never been more important, and we will explain why along with opportunities to help shape the future together. We recommend reading “A New Golden Age for Computer Architecture” (LINK: https://tinyurl.com/HPgoldenage) by John Hennessy and David Patterson (CACM, Feb 2019, Vol 62, No 2, pp 48-60) as an excellent article that sums up why we think an open approach to heterogeneous programming is so important. Presentation Technical Session 1C Tina Declerck Slingshot Launched into Network Space Slingshot Launched into Network Space Gregory Faanes and Marten Terpstra (HPE); Jesse Treger (HPE, HPC); and Duncan Roweth (HPE) Abstract HPE Slingshot networks are constructed from two components, a PCIe Gen4 NIC Cassini and a 64-port switch Rosetta. Their links use standard Ethernet physical interfaces operating at 200 Gbps which can be used to construct either dragonfly or fat-tree networks. Rosetta switches operate HPE Slingshot specific adaptive routing and congestion management protocols on the fabric links that connect them together. Their edge ports, including those that connect to Cassini, support both optimized HPC and standard Ethernet protocols. The HPE Cray EX supercomputer system uses them in a dragonfly network as this provides cost effective global bandwidth at scale. Clusters of HPE Apollo servers (HPE Inc, 2021) can use either dragonfly or fat-tree1. HPE Slingshot networks are designed to support 64 to 250,000 or more endpoints. The largest system under construction has approximately 85,000 endpoints. Support for systems of this scale has a significant bearing on the design of the Rosetta and Cassini devices. This paper presents the key features and some early performance results of Rosetta and Cassini devices and systems. Software Changes to Enable Slingshot Support on HPE Systems Software Changes to Enable Slingshot Support on HPE Systems Michael Raymond (Hewlett Packard) Abstract HPE’s Slingshot fabric provides many new performance and security features. HPE’s Programming Environment team has enhanced its application launcher middleware, process management software, and communications libraries to take advantage of these. This presentation will introduce some of the new Slingshot features, show how HPE has taken advantage of them, and provide insight so that others may integrate with this new environment. We will also show a proposed roadmap for further enhancements. Attendees will emerge with an understanding of what their system is doing to aid their user and administrative requirements. Slingshot Fabric Manager Monitor Slingshot Fabric Manager Monitor John Stile (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Abstract The Slingshot Fabric Manager offers an API but no automated system for checking health or record of trends. NERSC created the Fabric Manager Monitor application to collect state from Fabric Manager API, translates the output into meaningful data, and posts the results to off-cluster time series storage. The application was written in python, Packaged into an OCI container image, and deployed in Kubernetes where it can monitor the Fabric Manager. This process allows us to have alerts for switch reboots, flapping ports, and generates data for charts showing trends over time. Presentation 2:30pm-3:00pmCoffee Break (sponsored by Altair) Break 3:00pm-4:30pmTechnical Session 2A Chris Fuson Adopting Standardized Container Runtimes in HPC Adopting Standardized Container Runtimes in HPC Aditi Gaur, Richard Shane Canon, Laurie Stephey, Douglas Jacobsen, and Daniel Fulton (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Abstract HPC has seen the rapid adoption of containers and is now undergoing a convergence with orchestration frameworks like Kubernetes. However, to achieve greater integration, HPC systems need to be able to directly use components from the broader container ecosystem in a secure and scalable manner. The ability to take advantage of user namespaces in Podman has addressed security concerns and has opened the door to a secure user experience. NERSC has recently evaluated and prototyped changes to Podman to enable it to be a full scalable replacement for HPC Container runtimes like Shifter, Singularity and Charliecloud. One of the biggest remaining gaps is in scalable launch. We will provide an overview of prototyping Podman on Shasta systems and describe changes to the storage driver to enable scalable launch, and present performance results of these changes. We will also present how we envision these improvements fitting into a larger strategy to 1) allow users to securely build containers directly on NERSC systems, 2) eventually replace Shifter at NERSC with Podman as our runtime container solution 3) potentially integrate with orchestration frameworks. By addressing these gaps, we hope to help enable more widespread and productive use of containers at NERSC. Performance-Aware Build System for HPC and AI Containers Performance-Aware Build System for HPC and AI Containers Paulo Souza, Gallig Renaud, Jonathan Sparks, and Maxim Alt (HPE) Abstract Once we have a curated, high quality container image from our container factory, it’s easy to execute the containerized application on different platforms, from a Cray Supercomputer to public cloud, bringing the question on where and how to run optimally? We propose to investigate extending our existing container factory CI/CD system to add a benchmark step at the end of the pipeline. The benchmark step will execute micro-benchmarks and the application embedded inside the container with a pre-defined dataset(s) and parameters on multiple HPE and public cloud clusters, recording the execution times of each run. For each cluster, the benchmark script might run the application multiple times, with different configurations/binaries (CPE, icc, gcc, IntelMPI, perfboost), selecting the optimal configuration for each cluster. The optimal configurations can be embedded into the published final image by the next CI step or stored on a database. If one dataset does not represent the performance characteristics of the application, our system can execute with multiple datasets, tracking the performance of each one with tags for different workloads. In addition to find the best cluster for a given workload, our system can perform a cost analysis, listing the clusters with the best performance per cost. Evaluating Integration and Performance of Containerized Climate Applications on a HPE Cray System Evaluating Integration and Performance of Containerized Climate Applications on a HPE Cray System Subil Abraham and Ryan Prout (Oak Ridge National Laboratory); Thomas Robinson, Christopher Blanton, and Luis Sal-bey (National Oceanic and Atmospheric Administration); and Matthew Davis (Oak Ridge National Laboratory) Abstract Containers have taken over large swaths of cloud computing as the most convenient way of packaging and deploying applications. The features that containers offer for packaging and deploying applications translate to High Performance Computing (HPC) as well. At The National Oceanic and Atmospheric Administration (NOAA), containers provide an easy way to build and distribute complex HPC applications, allowing faster collaboration, portability, and experiment computer environment reproducibility amongst the scientific community. The challenge arises when applications rely on Message Passing Interface (MPI). This necessitates investigation into how to properly run these applications with their own unique requirements and produce performance on par with native runs. We investigate the MPI performance for benchmarks and containerized climate models for various containers covering selection of compiler and MPI library combinations from the Cray provided Programming Environments on the Cray XC supercomputer GAEA. Performance from the benchmarks and the climate models shows that for the most part containerized applications perform on par with the natively built applications when the system optimized Cray MPICH libraries are bound into the container, and the hybrid model containers have poor performance in comparison. We also describe several challenges and our solutions in running these containers, particularly challenges with heterogenous jobs for the containerized model runs. Presentation Technical Session 2B Stephen Leak OpenFAM: Programming Disaggregated Memory OpenFAM: Programming Disaggregated Memory Singhal Sharad, Clarete Riana Crasta, Mashood K. Abdulla, Faizan Barmawer, Gautham Bhat, Ahobala Ramya Rao, Soumya P N, and Rishikesh Rajak (Hewlett Packard Enterprise) Abstract HPC clusters are increasingly handling workloads where working data sets cannot be easily partitioned or are too large to fit into local node memory. In order to enable HPC workloads to access memory external to the node, HPE has defined a programming API (OpenFAM) for developing applications that use large-scale disaggregated memory. In this paper we describe an open-source reference implementation of OpenFAM that can be used on scale-up machines, traditional HPC clusters, as well as emerging disaggregated memory architectures. We demonstrate the efficiency of the implementation using micro-benchmarks. Extending Chapel to Support Fabric Attached Memory Extending Chapel to Support Fabric Attached Memory Amitha C, Bradford Chamberlain, Sharad Singhal, and Clarete Riana Crasta (Hewlett Packard Enterprise) Abstract Fabric Attached Memory (FAM) is of increasing interest in HPC clusters because it enables fast access to large datasets required in High Performance Data Analytics (HPDA) and Exploratory Data Analytics (EDA) [1]. Most approaches to handling FAM force programmers either to use low-level APIs, which are difficult to program, or to rely upon abstractions from file systems or key-value stores, which make accessing FAM less attractive than other levels in the memory model due to the overhead they bring. The Chapel language is designed to allow HPC programmers to use high-level programming constructs that are easy to use, while delegating the task of managing data and compute partitioning across the cluster to the Chapel compiler and runtime. In this abstract we describe an approach to integrate FAM access within the Chapel language, thereby simplifying the task of programming and using FAM across distributed Chapel tasks. Improving a High Productivity Data Analytics Chapel Framework Improving a High Productivity Data Analytics Chapel Framework Prashanth Pai (Rice University), Andrej Jakovljević (University of Belgrade), Zoran Budimlić (Rice University), and Costin Iancu (Lawrence Berkeley National Laboratory) Abstract Most state-of-the-art exploratory data analysis frameworks fall into one of two extremes: they either focus on the high-performance computational or the interactive aspects of the analysis. Arkouda is a framework that attempts to integrate these two approaches by using a client-server architecture, with a Python interpreter on the client-side and a Chapel server for performing high-performance computations. The Python interpreter overloads the Python operators and transforms them into messages to the Chapel server that performs the actual computation. Presentation Technical Session 2C Bilel Hadri Crossroads - NNSA's Third Advanced Technology System Crossroads - NNSA's Third Advanced Technology System James W. Lujan (Los Alamos National Laboratory), James H. Laros III and Simond D. Hammond (Sandia National Laboratories), and Howard P. Pritchard Jr. (Los Alamos National Laboratory) Abstract Los Alamos National Laboratory and Sandia National Laboratories, as part of the New Mexico Alliance for Computing at Extreme Scale (ACES) have collaborated on the deployment of several petascale class supercomputer systems for the Department of Energy’s (DOE) National Nuclear Security Associations (NNSA) Advanced Simulation and Computing (ASC) program. The most recent platform award, Crossroads, will further the capabilities of the NNSA to solve a wide range critical challenges in support of the stockpile stewardship mission. This paper will discuss the architecture and deployment of Crossroads by ACES in partnership with HPE and Intel, the latest in the sequence of advanced technology system deployments for the NNSA complex. Crossroads will be the first system to deploy the combination of Xeon processors paired with High Bandwidth Memory (HBM). ACES anticipates that this pairing will greatly improve the performance and workflow efficiency of ASC mission codes. HPE’s emerging Slingshot technology will be utilized as the high-speed network to connect the 6,144 nodes of Crossroads, delivering a highly balanced architecture. This paper will detail the deployment of Crossroads and the latest available data regarding performance characteristics of this combination of technologies. Ookami – an Apollo 80 testbed system Ookami – an Apollo 80 testbed system Eva Siegmann and Robert Harrison (Stony Brook University) Abstract Stony Brook’s computing technology testbed, Ookami, provides researchers worldwide with access to Fujitsu A64FX processors. This Cray Apollo 80 system just entered its second year of operations. In this presentation, we will share our experiences gained during this exciting first project period. This includes a project overview, details of processes such as onboarding users, account administration, user support and training, and outreach. The talk will also give technical details such as an overview of the compilers, which play a crucial role in achieving good performance. To support users to use the system efficiently we offer various opportunities such as webinars, hands-on sessions and we also try to sustain an active user community enabling exchange between the different research groups. In February 2022 the first Ookami user group meeting will take place. This gives users the opportunity to share and discuss their findings with a wider audience. We will present the key findings and give an outlook on the next project year. This presentation is also an invitation to interested researchers to learn more about Ookami and eventually use it for their own research. Liquid Cooling for HPC, Enterprise and Beyond: How HPE Thinks of Energy Efficiency Across the Portfolio Liquid Cooling for HPC, Enterprise and Beyond: How HPE Thinks of Energy Efficiency Across the Portfolio Wade Vinson, Jason Zeiler, and Matt Slaby (HPE) Abstract Increasing power levels and low tcase requirements with the demand for more performance seem diametrically opposed to reducing carbon footprint. When you are procuring 5000 node clusters, HPE Cray EX takes care of all of that. Facility power is lowest with warmest water cooling and low site pumping power. And inside the IT – zero fan energy – even on the switches and power supplies, and lowest power conversion losses with 3-phase PSUs to 380VDC to Load dropped right where needed. Presentation 4:35pm-5:35pmBoF 3A HPE Performance Cluster Manager (HPCM) Update HPE Performance Cluster Manager (HPCM) Update Jeff Hanson (Hewlett Packard) Abstract The proposed Birds of a Feather will provide a brief overview of HPCM and focus on new features that would be of interest to CUG attendees. The presentation will be followed by discussion between HPE engineers and customers. Birds of a Feather BoF 3B OpenACC Users Experience OpenACC Users Experience Jeff Larkin (NVIDIA); Barbara Chapman (HPE, Stonybrook University); Will Sawyer (CSCS); and Jack Wells (NVIDIA) Abstract OpenACC is a widely used API for directives-based acceleration on heterogeneous architectures. Since its first release more than a decade ago it has garnered significant support among legacy applications running on GPU-based architectures. This user-friendly programming model has facilitated acceleration of over 200 applications including FV3, COSMO, GTS, M3D-C1, E3SM, ADSCFD, VASP on multiple platforms and is viewed as an entry-level programming model for the top supercomputers such as Perlmutter, Summit, Sunway Taihulight, and Piz Daint. The OpenACC organization also sponsors dozens of hackathons and training events each year. This BoF will include updates on support of the OpenACC specification from HPE and NVIDIA. Multiple users have also been invited to present their results using OpenACC. Finally, the BoF will conclude with an interactive Q&A session with the audience and presenters. Birds of a Feather BoF 3C The future of HPC: Data Movement and Workflows Orchestration The future of HPC: Data Movement and Workflows Orchestration Sharda Krishna (Hewlett Packard Enterprise) Abstract In this session, we will focus on workflows orchestration driven by the convergence of HPC workloads with AI, ML/DL and data analytics. We will cover topics such as data movement, data sovereignty and the factors that influence the orchestration choices- portability of runtime environments, adoption of containerization, and productivity vs optimization. This session will be designed as a panel consisting of HPE and customer technical leaders to discuss strategic considerations, technical challenges, share HPE’s direction and gather customer insight and feedback. Birds of a Feather 6:00pm-9:00pmHPE/ Cray Networking Event details Abstract HPE/ Cray Networking Event Networking/Social Event | Wednesday, May 4th8:20am-10:00amPlenary: CUG Elections, Keynote CUG Elections CUG Elections Abstract Candidate statements and official voting period. Machine Learning for Fundamental Physics Machine Learning for Fundamental Physics Benjamin Nachman (Lawrence Berkeley National Laboratory) Abstract TDB AMD advantage for advancing HPC performance AMD advantage for advancing HPC performance Siddhartha Karkare (AMD) Abstract Discussion covering the latest AMD CPU and GPU products, combined with the 3rd Gen Infinity Architecture for HPC, and how these industry leading AMD technologies are helping to drive HPC workloads to new levels of performance. CUG Business, Vendor, Invited Talk 10:00am-10:30amCoffee Break (sponsored by Spectra) Break 10:30am-12:00pmPlenary: HPE Update Vendor, Invited Talk 12:00pm-1:00pmHPE/ CUG Exec (closed) Lunch Lunch (sponsored by AMD) Lunch 2:00pm-2:30pmCoffee Break (sponsored by Arm) Break 2:30pm-3:30pmPlenary: Best Paper details Abstract Best Paper award and talk Storage Optimizations for the Research Data Lifecycle Storage Optimizations for the Research Data Lifecycle Greg Mason (DDN) Abstract This presentation will discuss the changing landscape of requirements for research data storage and how AI and advanced analytics has added to or changed these requirements. Learn about how a variety of research institutions are handling the wide variety of data types, retention and access requirements, and diverse sets of processing capabilities along with recent features DDN has implemented to help accelerate research computing and simplify data management. Slurm Slurm Danny Auble (SchedMD) Abstract Slurm is an open source workload manager used on many TOP500 systems which provides a rich set of features including topology-aware optimized resource allocation, cloud bursting, hierarchical bank accounts with fair-share job prioritization and many resource limits. From Planning to Performance: OpenACC Roadmap From Planning to Performance: OpenACC Roadmap Jack Wells (NVIDIA, OpenACC) Abstract For more than 10 years, OpenACC has proven a user-friendly and accessible API for directives-based acceleration on heterogenous architectures including GPU-based systems. This talk will present updates on the OpenACC roadmap for both the specification and the Organization as a whole, including alignment with standard language parallelism, resumed support from HPE for OpenACC, highlighted success stories from real-world applications and the future initiatives of OpenACC organization to support the growing research and developer communities. BAE Systems New Site Talk BAE Systems New Site Talk Scott Grabow (BAE Systems) Abstract Let me introduce you to BAE Systems Inc., new member. Both the United States (BAE Systems, Inc.) and British (BAE Systems, plc) have experiences with HPC systems internally and for our customers. Much of our technical staff have experience going back to the 1990s and earlier, working for both vendors and customer organizations. We advance our customer’s missions by ensuring they access breakthrough technologies that solve their toughest problems. Vendor, Presentation, Invited Talk 3:30pm-4:00pmCoffee Break Break 4:00pm-5:00pmTechnical Session 3A Tina Declerck Deploying Cray EX Systems with CSM at LANL Deploying Cray EX Systems with CSM at LANL Alden Stradling, Steven Johnson, and Graham Van Heule (Los Alamos National Laboratory) Abstract Los Alamos National Laboratory has deployed (over the last year and a half) a pair of Cray Shasta machines – a development testbed named Guaje and and production machine named Chicoma, which will soon comprise the bulk of LANL's open science research computing portfolio. Configuring and Managing The Perlmutter Supercomputer: Lessons Learned and Best Practices Developed During Deployment and Operations Configuring and Managing The Perlmutter Supercomputer: Lessons Learned and Best Practices Developed During Deployment and Operations Douglas Jacobsen, Aditi Gaur, Brian Friesen, Chris Samuel, David Fox, Eric Roman, James Botts, and John Stile (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) and Harold Longley (HPE) Abstract The perlmutter supercomputer and test systems provide an early look at the CrayEx supercomputers as well as Cray Systems Management software. NERSC is leveraging this cloud-native software and ethernet-based networking to enable tremendous flexibility in management policies and methods as well as user accessibility. Based on work performed using every released version of the Cray software stack for CrayEX, NERSC has developed, in close collaboration with HPE through the perlmutter System Software COE, methodologies for efficiently managing this collection of Perlmutter-related systems. In this work we describe how we template and synchronize configurations and software between systems and orchestrate manipulations of the configuration of the managed system. Key to this is a secured external management VM that provides both a configuration origin for the system and an interactive management space. Leveraging this external management system we simultaneously create a systems-development environment as well as secure key aspects of the CrayEX system. In addition we review some of the specific techniques that NERSC has used to manage and control the HPE Slingshot network, as well as our work-to-date on continuous operations. These capabilities have enabled NERSC users to remain highly productive even as the system is being further extended and developed. Presentation Technical Session 3B Stephen Leak HPC Molecular Simulation Tries Out a New GPU: Experiences on Early AMD Test Systems for the Frontier Supercomputer HPC Molecular Simulation Tries Out a New GPU: Experiences on Early AMD Test Systems for the Frontier Supercomputer Ada Sedova and Russell Davidson (Oak Ridge National Laboratory), Mathieu Taillefumier (Swiss National Supercomputing Centre), and Wael Elwasif (Oak Ridge National Laboratory) Abstract Molecular simulation is an important tool for numerous efforts in physics, chemistry, and the biological sciences. Simulating molecular dynamics requires extremely rapid calculations to enable sufficient sampling of simulated temporal molecular processes. The Hewlett Packard Enterprise (HPE) Cray EX Frontier supercomputer installed at the Oak Ridge Leadership Computing Facility (OLCF) will provide an exascale resource for open science, and will feature graphics processing units (GPUs) from Advanced Micro Devices (AMD). The future LUMI supercomputer in Finland will be based on an HPE Cray EX platform as well. Here we test the ports of several widely used molecular dynamics packages that have each made substantial use of acceleration with NVIDIA GPUs, on Spock, the early Cray pre-Frontier testbed system at the OLCF which employs AMD GPUs. These programs are used extensively in industry for pharmaceutical and materials research, as well as academia, and are also frequently deployed on high-performance computing (HPC) systems, including national leadership HPC resources. We find that in general, performance is competitive and installation is straightforward, even at these early stages in a new GPU ecosystem. Our experiences point to an expanding arena for GPU vendors in HPC for molecular simulation. Presentation Technical Session 3C Chris Fuson Using Loki for Simplifying the Usage of Shasta Logs Using Loki for Simplifying the Usage of Shasta Logs Siqi Deng (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Abstract The Shasta system brings new challenges to monitoring system health as valuable metrics and logs become available, though in a more complicated framework compared to previous high-performance computing (HPC) systems. Presentation | Thursday, May 5th8:30am-10:00amTechnical Session 4A Ashley Barker Fallout: System Stand-up Monitoring and Analysis Package Fallout: System Stand-up Monitoring and Analysis Package Jim Brandt (Sandia National Laboratories), Mike Showerman (National Center for Supercomputing Applications/University of Illinois), Eric Roman (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Tom Tucker (Open Grid Computing), Joe Greenseid (Hewlett Packard Enterprise), and Ann Gentile (Sandia National Laboratories) Abstract As systems are being deployed it is imperative that monitoring capabilities be deployed early in the process. Quick insight, not detailed understanding, is needed to determine localized hotspots, outlier components, misconfigurations, etc., that affect basic functionality and delay acceptance. Crossroads: Status on Design, Deployment, Acceptance, and Operation Crossroads: Status on Design, Deployment, Acceptance, and Operation Anthony Agelastos and Kevin Stroup (Sandia National Laboratories) and Jennifer Green (Los Alamos National Laboratory) Abstract The New Mexico Alliance for Computing at Extreme Scale (ACES) next-generation Advanced Technology System (ATS), Crossroads, designed by Los Alamos National Laboratory (LANL), Sandia National Laboratories (SNL), and HPE, will be deployed at LANL in support of DOE NNSA’s Advanced Simulation Computing (ASC) mission objectives. Unique to the DOE NNSA Complex’s ATS-class systems, Crossroads’ architecture presents opportunities and challenges in satisfying the computational requirements of mission simulation and modeling workloads. We highlight some of the interesting elements of the initial design and revisions, deployment and acceptance planning, along with operational issues observed and anticipated in this procurement. Additionally, we discuss status of the acceptance activities including integration, setup, and preliminary results from the test suite, comprising micro-benchmarks, mini-applications, and production applications. DOE NNSA Laboratories, LANL, Lawrence Livermore National Laboratories (LLNL), and SNL’s test-suite and corresponding figure-of-merit thresholds ensure success of mission workloads by confirming Crossroad’s performance capabilities meet specifications. Challenges with the hardware and software stacks encountered as Crossroads matures to meet its performance and usability goals will be discussed. Approaching the Final Frontier: Lessons Learned from the Deployment of HPE/Cray EX Spock and Crusher supercomputers Approaching the Final Frontier: Lessons Learned from the Deployment of HPE/Cray EX Spock and Crusher supercomputers Veronica G. Vergara Larrea, Reuben Budiardja, Matt Davis, Matt Ezell, Jesse Hanley, Christopher Zimmer, Michael Brim, and Wael Elwasif (Oak Ridge National Laboratory) Abstract In 2021, the Oak Ridge Leadership Computing Facility (OLCF) deployed Spock and Crusher, the first two user-facing HPE/Cray EX systems as precursors to Frontier. Both systems were transitioned to operations in 2021 with the goal of providing users with platforms to begin porting scientific applications in preparation for Frontier’s arrival. While Spock’s architecture is one-generation removed from Frontier’s hardware, it exposed users earlier to the new HPE/Cray Programming Environment designed for AMD GPUs. Spock is a 36-node HPE/Cray EX supercomputer with one AMD EPYC 7662 processor and 4 AMD MI100 GPUs per node. Crusher, on the other hand, is a 192-node HPE/Cray EX supercomputer with one AMD EPYC 7A53 processor and 4 AMD MI250X GPUs per node with the same hardware as Frontier. In this paper, we present an overview of the challenges and lessons learned encountered during the deployment and the transition to operations of both systems. These include issues identified with the programming environment, layout and process binding via SLURM, and providing access to the center-wide file systems. We also discuss settings added locally to improve the user experience, current work arounds in-place, and the processes developed to capture the status of evolving issues in our user-facing documentation. Presentation Technical Session 4B Bilel Hadri Accelerating X-Ray Tracing for Exascale Systems using Kokkos Accelerating X-Ray Tracing for Exascale Systems using Kokkos Felix Wittwer (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); Nicholaus Sauter, Derek Mendez, Billy Poon, Aaron Brewster, and James Holton (Lawrence Berkeley National Laboratory); Michael Wall (Los Alamos National Laboratory); William Hart (Sandia National Laboratories); and Deborah Bard and Johannes Blaschke (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Abstract The upcoming exascale computing systems Frontier and Aurora will draw much of their computing power from GPU accelerators. The hardware for these systems will be provided by AMD and Intel, respectively, each supporting their own GPU programming model. Applications that harness one of these exascale systems are faced with the challenge to avoid lock-in and preserve performance portability. Performance Analysis and Tuning on A64FX Performance Analysis and Tuning on A64FX Alan Calder, Tony Curtis, Catherine Feldman, Robert Harrison, and Eva Siegmann (Stony Brook University) Abstract This paper analyzes and tunes the performance of applications on Fujitsu A64FX processors using the NSF-sponsored Ookami testbed at Stony Brook University. These are the same processors that power Fugaku, the world’s currently fastest supercomputer. Instrumentation with PAPI enables analysis of applications and kernels using a roofline model which guides subsequent optimization. Applications studied include FLASH, a component-based scientific simulation software package, and OpenFOAM, a computational fluid dynamics solver. Both applications have a wide user base and therefore are of common interest to the community. Comparison is made to performance on x86 architectures. Presentation Technical Session 4C Stephen Leak Performance of Parallel IO on the 5860-node HPE Cray EX System ARCHER2 Performance of Parallel IO on the 5860-node HPE Cray EX System ARCHER2 David Henty (EPCC, The University of Edinburgh) Abstract EPCC has recently started supporting the new UK National Supercomputer service ARCHER2, a 5860-node, 750,080-core HPE Cray EX system. In this paper we investigate the parallel IO performance that can be achieved on ARCHER2 and compare to experiences on the previous system ARCHER, a Cray XC30. The parallel IO libraries MPI-IO, HDF5 and NetCDF are benchmarked for collective writing to a single shared file, as well as the file-per-process approach for comparison. Results are obtained using a simple IO benchmark - https://github.com/davidhenty/benchio - which writes a large, regular, three-dimensional distributed dataset to file. We measure performance on two Lustre filesystems, one with spinning disks and the other using solid state NVMe storage. We find that although we can saturate the IO bandwidth writing multiple files, parallel performance for a single shared file is well below the expected rate. Although this appears to be because the libraries are not optimally configured for a system where a single process cannot saturate the bandwidth of one storage unit, attempts to optimise this only lead to marginal improvements. Expanding data management services beyond traditional parallel file systems with HPE Data Management Framework Expanding data management services beyond traditional parallel file systems with HPE Data Management Framework Kirill Malkin (HPE) Abstract Traditional parallel file systems have long been the high performance storage option of choice for HPC systems and applications. Centralized data management systems like DMF use connectors and tools provided by Lustre and Spectrum Scale to perform hierarchical storage management, protection, scalable search and other valuable services. But what about other types of storage like non-parallel file systems, Network Attached Storage and object storage that are proprietary and/or don’t have the necessary connectors and tools for the central data management system? These storage systems create silos of data management and every HPC center is looking for better ways to manage them. HPE Cray ClusterStor E1000 Performance Improvements and Results for Various Protocols HPE Cray ClusterStor E1000 Performance Improvements and Results for Various Protocols John Fragalla (Hewlett Packard) Abstract ClusterStor E1000 was first introduced in 2019 and launched in 2020. It has gone through multiple performance enhancements since its introduction, while continuing support of InfiniBand and Ethernet and adding additional protocol support, with the latest being klibfabric (KFI). In this presentation HPE will share the latest performance results with the recent v6.0 software release, showing MDTEST, IOR buffered and direct IO for fix time and fix data, single stream results, and Random IOPS for the various supported protocols. Presentation 10:00am-10:30amCoffee Break (sponsored by OpenACC) Break CUG Advisory Council Debrief CUG Business, CUG Program Committee 10:30am-11:30amBoF 4C Future of Containers on Compute Nodes Future of Containers on Compute Nodes Ron Neyland (Hewlett Packard Enterprise) Abstract The ability to utilize containers and containerized software for executing HPC applications is a highly desired and requested feature in the Supercomputing world. HPE is actively working to facilitate the support of containerized applications on compute nodes and has determined that this is an area that is ripe with opportunity to enable our customers. This session will delve into the current state of leveraging containers with compute nodes and where we are heading over the next two years. It will cover topics such as technologies that can be leveraged (K8S, Singularity, HPC, OCI,… ) , security, container orchestration and management. This session is intended to be an opportunity to learn about HPE’s direction as well as provide feedback to impact HPE’s strategy. Birds of a Feather 10:30am-12:00pmTechnical Session 5A Tina Declerck Cluster Health Check Diagnostics Suite Cluster Health Check Diagnostics Suite Prasanth Kurian and Amarnath Chilumukuru (Hewlett Packard Enterprise) Abstract Cluster Health check suite of diagnostics is a comprehensive set of diagnostics covering all the HPE HPC hardware platforms like Cray EX, Apollo 9000, 2000, 6500, 6000, 70, 35, 20, ProLiant DL servers and SGI 8600. It offers a Unified solution in terms of CLI, log collection, Analysis and diagnostics coverage on both Shasta and HPCM cluster managers. Diagnostics are available to run the periodic checks easily and in a quick turnaround time during the maintenance cycle. Cluster Health Check helps to isolate the faulty nodes and prevent jobs being scheduled on those nodes ensuring more successful runs. Administrators can use these diagnostics to triage issues prior to addition of nodes into the production cluster. Cluster Health check has a wide diagnostic coverage. It has diagnostics for verifying hardware and software in terms of functionality and performance. Major components are CPU, Memory, Network, disk, GPUs and fabric. Centralized logging infrastructure has ability to capture wide variety of information in a scalable environment to aid effective debugging. Its comprehensive analysis report creates the summary of health problems reported while running diagnostics. These diagnostics are tested extensively on scalable environment and is ready to run on Exascale clusters. Crayport to HPE DCE Migration: Bidirectional Incident Management for ServiceNow and HPE DCE Crayport to HPE DCE Migration: Bidirectional Incident Management for ServiceNow and HPE DCE Daniel Gens, John Gann, and Elizabeth Bautista (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) Abstract In High Performance Computing (HPC), there is a demand to streamline incident management workflows to lower mean time to repair (MTTR), reduce operational costs, and increase staff efficiency. Incident management platforms supporting these workflows do not natively integrate. Automating these processes requires developing custom system integrations. Presentation Technical Session 5B Juan F R Herrera Early experiences in supporting OpenSHMEM on HPE Slingshot NIC (Slingshot 11) Early experiences in supporting OpenSHMEM on HPE Slingshot NIC (Slingshot 11) Naveen Namashivayam Ravichandrasekaran (HPE) Abstract OpenSHMEM is a Partitioned Global Address Space (PGAS) library interface specification. It is a culmination of a standardization effort among many implementers and users of SHMEM programming model. Cray OpenSHMEMX is a HPE proprietary software implementation of the OpenSHMEM standards specification. It is the HPE vendor supported OpenSHMEM implementation on various HPE systems, specifically, on all HPE Cray EX supercomputer systems with HPE Slingshot interconnect. Performance of different routing protocols on HPE Cray EX: OpenFabrics and UCX Performance of different routing protocols on HPE Cray EX: OpenFabrics and UCX Michael Bareford, David Henty, William Lucas, and Andrew Turner (EPCC, The University of Edinburgh) Abstract In this presentation, we report on a comparison on performance between different routing protocols underlying the Cray MPICH library on a HPE Cray EX system for a variety of application and synthetic benchmarks. ARCHER2, the UK National Supercomputing Service is based around a very large, CPU-based HPE Cray EX system (750,080 cores, 5,860 nodes) with a Slingshot interconnect in a dragonfly topology. The system allows users to select, at runtime, between two different underlying routing protocols: OpenFabrics (OFI) and Mellanox UCX (UCX). As part of the commissioning work for the service, we have compared the performance of OFI and UCX for different applications from a variety of research areas (CASTEP, CP2K, GROMACS, NEMO, OpenSBLI, VASP) and the performance of the OSU MPI benchmarks. We find that the choice of routing protocol can have a profound effect on application performance and that the best choice is dependent on the number of nodes, the application and the benchmark case used for the performance evaluation. This makes providing general advice to users challenging. We summarise the data we have gathered so far and the advice we provide to users; and provide an overview of what future investigations we have planned. Effective use of MPI+OpenMP on an HPE Cray EX supercomputer Effective use of MPI+OpenMP on an HPE Cray EX supercomputer Holly Judge (EPCC, The University of Edinburgh) Abstract ARCHER2 is a HPE Cray EX supercomputer based at the University of Edinburgh. It is the UK national supercomputing service for scientific research in the UK, containing a total of 5,860 nodes (750,080 cores). The Cray EX consists of dual AMD 7742 EPYC TM processors and has 128 cores per node. Commonly, scientific applications are run using MPI, where one process is spawned on each computer core. For this machine this results in a large number of processes being used, particularly when running on many nodes. Using this many processes has been shown to have a negative impact on the performance of MPI communications for some use cases. As an alternative, MPI+OpenMP may be used. This approach naturally results in fewer processes and this can reduce the overhead of communications, and have other positive affects, such as lowering the memory requirements. Presentation 12:00pm-1:00pmCUG board transition (closed) Lunch Lunch (sponsored by Intel) Lunch 1:00pm-2:00pmTechnical Session 6A Chris Fuson Enabling Scientific AI at Scale on the Perlmutter System at NERSC Enabling Scientific AI at Scale on the Perlmutter System at NERSC Wahid Bhimji, Steven Farrell, and Peter Harrington (Lawrence Berkeley National Laboratory) Abstract This presentation will cover lessons learned in enabling deep learning and AI workloads on the Perlmutter HPC system currently being deployed at NERSC. Perlmutter is an HPE/Cray Shasta system comprising over 6000 NVIDIA A100 GPUs. We describe our experiences in deploying performant deep learning software and tools that operate at scale on this system. We will then cover AI performance and benchmarking on HPC systems, including measurements and analysis on Perlmutter Phase 1. Finally we also highlight some particular AI for Science applications that can now exploit this resource, using cutting-edge AI to drive new scientific insight in diverse science disciplines including material science, climate, cosmology and particle physics. Predicting batch queue job wait times for informed scheduling of urgent HPC workloads Predicting batch queue job wait times for informed scheduling of urgent HPC workloads Nick Brown (EPCC) Abstract There is increasing interest in the use of HPC machines for urgent workloads to help tackle disasters as they unfold. Whilst batch queue systems are not ideal in supporting such workloads, many disadvantages can be worked around by accurately predicting when a waiting job will start to run. However there are numerous challenges in achieving such a prediction with high accuracy, not least because the queue's state can change rapidly and depend upon many factors. In this work we explore a novel machine learning approach for predicting queue wait times, hypothesising that such a model can capture the complex behaviour resulting from the queue policy and other interactions to generate accurate job start times. Presentation Technical Session 6B Tina Declerck Network Integration of Perlmutter at NERSC Network Integration of Perlmutter at NERSC Ershaad A. Basheer, Eric Roman, and Tavia Stone Gibbins (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); Christopher Samuel and Lisa Gerhardt (Lawrence Berkeley National Laboratory); Douglas M. Jacobsen (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); and Ashwin Selvarajan, Damian Hazen, and Ronal Kumar (Lawrence Berkeley National Laboratory) Abstract In this paper we describe the integration of NERSC's HPE/Cray EX "Perlmutter" into the NERSC datacenter. Perlmutter connects to NERSC via 4 networks. The Customer Access Network (CAN) provides administrative access to the API gateways; Users access the 40 login nodes over the high-speed network (HSN) via an internal two-tier load balancer. A second external management network provides access to the first master node and SMNet out-of-band switch ports. GPFS is IP-routed through 24 gateway nodes to a separate Infiniband fabric. The Slingshot network connects to the site network via a 64-port lag. Since every node in the cluster has a routable IP address, nodes operate directly with other NERSC resources over the high-bandwidth Ethernet fabric. We show that the resulting configuration is (1) secure because it allows us to isolate the administrative networks via routing rules in the datacenter routers, (2) reliable because administrators can access the NCNs and switches independently of the availability of the smnet or hsn, and (3) fast because all high-bandwidth traffic is confined to the HSN via edge routers. As a result, NERSC has been able to make perlmutter into a reliable high-performance system for our users. Automated service monitoring in the deployment of ARCHER2 Automated service monitoring in the deployment of ARCHER2 Kieran Leach, Philip Cass, Steven Robson, and Eimantas Kazakevicius (EPCC/University of Edinburgh); Martin Lafferty (HPE); and Andrew Turner and Alan Simpson (EPCC/University of Edinburgh) Abstract The ARCHER2 service, a CPU based HPE Cray EX system with 750,080 cores (5,860 nodes), has been deployed throughout 2020 and 2021, going into full service in December of 2021. A key part of the work during this deployment was the integration of ARCHER2 into our local monitoring systems. As ARCHER2 was one of the very first large-scale EX deployments, this involved close collaboration and development work with the HPE team through a global pandemic situation where collaboration and co-working was significantly more challenging than usual. The deployment included the creation of automated checks and visual representations of system status which needed to be made available to external parties for diagnosis and interpretation. We will describe how these checks have been deployed and how data gathered played a key role in the deployment of ARCHER2, the commissioning of the plant infrastructure, the conduct of HPL runs for submission to the Top500 and contractual monitoring of the availability of the ARCHER2 service during its commissioning and early life. Presentation 2:00pm-2:30pmCoffee Break Break 2:30pm-3:00pmCUG 2022 Conference close CUG Close CUG Close Abstract CUG 2022 Closing Remarks CUG Business |
Saturday, April 30th4:00pm-6:00pmCUG Board Meeting (closed) Board | Sunday, May 1st9:00am-5:00pmXTreme (Approved NDA Members Only) CUG Business 12:00pm-1:00pmCUG Advisory Council Meeting CUG Business, CUG Program Committee | Monday, May 2nd8:30am-12:00pmTutorial 1A Debugging and Performance Profiling on HPE Cray Supercomputers with AMD GPUs Tutorial 8:30am-4:30pmTutorial 1B Cray System Management for HPE Cray EX Systems Tutorial 10:00am-10:30amCoffee Break Break 12:00pm-1:00pmCUG board/ New Sites lunch (closed) Lunch Lunch (sponsored by NVIDIA) Lunch 1:00pm-1:30pmTechnical Session 0A NVIDIA HPC SDK Update Presentation 1:30pm-2:30pmBoF 1A Programming Environments, Applications, and Documentation (PEAD) Special Interest Group meeting Birds of a Feather 2:30pm-3:00pmCoffee Break (sponsored by DDN) Break 3:00pm-5:00pmBoF 2A Future Directions for HPE’s Cray Programming Environment HPC Support Documentation Management and Best Practices Birds of a Feather | Tuesday, May 3rd8:20am-10:00amPlenary: Welcome, Keynote Trey Breckenridge; Jim Rogers; Kevin Stroup CUG Welcome The Landscape of Stellar Death The Path to Zetta Scale Vendor, Invited Talk 10:00am-10:30amCoffee Break (sponsored by SchedMD) Break 10:30am-12:00pmPlenary: CUG Business, new sites Trey Breckenridge; Scott Michael; Jim Rogers; Kevin Stroup CUG Business CINES, French national supercomuting center Achieving Energy Efficiency in Long-Term Storage CSC Finland re-enters CUG with LUMI Altair: Multi-dimensional HPC for Breakthrough Results GDIT’s High Performance Computing in Support of NOAA Supercomputing at Microsoft Update on Arm in HPC New CUG Site: Engineering Research Development Center DoD Supercomputing Resource Center New Site: AFW & ORNL HPC collaboration Nvidia New Site, Vendor, CUG Business, CUG Program Committee 12:00pm-1:00pmAllyship in HPC presented by Women in HPC (open to all) Lunch Lunch (sponsored by NVIDIA) Lunch 1:00pm-2:30pmTechnical Session 1A Jim Williams HPE Cray EX Shasta 22.03 Cray System Management Overview Augmenting HPCM System Management with Phoenix UAIs Come of Age: Hosting Multiple Custom Interactive Login Experiences Without Dedicated Hardware Presentation Technical Session 1B G. Todd Gamblin Parallel Programming with Standard C++ and Fortran Open Approaches to Heterogeneous Programming are Key for Surviving the New Golden Age of Computer Architecture Presentation Technical Session 1C Tina Declerck Slingshot Launched into Network Space Software Changes to Enable Slingshot Support on HPE Systems Slingshot Fabric Manager Monitor Presentation 2:30pm-3:00pmCoffee Break (sponsored by Altair) Break 3:00pm-4:30pmTechnical Session 2A Chris Fuson Adopting Standardized Container Runtimes in HPC Performance-Aware Build System for HPC and AI Containers Evaluating Integration and Performance of Containerized Climate Applications on a HPE Cray System Presentation Technical Session 2B Stephen Leak OpenFAM: Programming Disaggregated Memory Extending Chapel to Support Fabric Attached Memory Improving a High Productivity Data Analytics Chapel Framework Presentation Technical Session 2C Bilel Hadri Crossroads - NNSA's Third Advanced Technology System Ookami – an Apollo 80 testbed system Liquid Cooling for HPC, Enterprise and Beyond: How HPE Thinks of Energy Efficiency Across the Portfolio Presentation 4:35pm-5:35pmBoF 3A HPE Performance Cluster Manager (HPCM) Update Birds of a Feather BoF 3B OpenACC Users Experience Birds of a Feather BoF 3C The future of HPC: Data Movement and Workflows Orchestration Birds of a Feather 6:00pm-9:00pmHPE/ Cray Networking Event details Networking/Social Event | Wednesday, May 4th8:20am-10:00amPlenary: CUG Elections, Keynote CUG Elections Machine Learning for Fundamental Physics AMD advantage for advancing HPC performance CUG Business, Vendor, Invited Talk 10:00am-10:30amCoffee Break (sponsored by Spectra) Break 10:30am-12:00pmPlenary: HPE Update Vendor, Invited Talk 12:00pm-1:00pmHPE/ CUG Exec (closed) Lunch Lunch (sponsored by AMD) Lunch 2:00pm-2:30pmCoffee Break (sponsored by Arm) Break 2:30pm-3:30pmPlenary: Best Paper details Storage Optimizations for the Research Data Lifecycle Slurm From Planning to Performance: OpenACC Roadmap BAE Systems New Site Talk Vendor, Presentation, Invited Talk 3:30pm-4:00pmCoffee Break Break 4:00pm-5:00pmTechnical Session 3A Tina Declerck Deploying Cray EX Systems with CSM at LANL Configuring and Managing The Perlmutter Supercomputer: Lessons Learned and Best Practices Developed During Deployment and Operations Presentation Technical Session 3B Stephen Leak HPC Molecular Simulation Tries Out a New GPU: Experiences on Early AMD Test Systems for the Frontier Supercomputer Presentation Technical Session 3C Chris Fuson Using Loki for Simplifying the Usage of Shasta Logs Presentation | Thursday, May 5th8:30am-10:00amTechnical Session 4A Ashley Barker Fallout: System Stand-up Monitoring and Analysis Package Crossroads: Status on Design, Deployment, Acceptance, and Operation Approaching the Final Frontier: Lessons Learned from the Deployment of HPE/Cray EX Spock and Crusher supercomputers Presentation Technical Session 4B Bilel Hadri Accelerating X-Ray Tracing for Exascale Systems using Kokkos Performance Analysis and Tuning on A64FX Presentation Technical Session 4C Stephen Leak Performance of Parallel IO on the 5860-node HPE Cray EX System ARCHER2 Expanding data management services beyond traditional parallel file systems with HPE Data Management Framework HPE Cray ClusterStor E1000 Performance Improvements and Results for Various Protocols Presentation 10:00am-10:30amCoffee Break (sponsored by OpenACC) Break CUG Advisory Council Debrief CUG Business, CUG Program Committee 10:30am-11:30amBoF 4C Future of Containers on Compute Nodes Birds of a Feather 10:30am-12:00pmTechnical Session 5A Tina Declerck Cluster Health Check Diagnostics Suite Crayport to HPE DCE Migration: Bidirectional Incident Management for ServiceNow and HPE DCE Presentation Technical Session 5B Juan F R Herrera Early experiences in supporting OpenSHMEM on HPE Slingshot NIC (Slingshot 11) Performance of different routing protocols on HPE Cray EX: OpenFabrics and UCX Effective use of MPI+OpenMP on an HPE Cray EX supercomputer Presentation 12:00pm-1:00pmCUG board transition (closed) Lunch Lunch (sponsored by Intel) Lunch 1:00pm-2:00pmTechnical Session 6A Chris Fuson Enabling Scientific AI at Scale on the Perlmutter System at NERSC Predicting batch queue job wait times for informed scheduling of urgent HPC workloads Presentation Technical Session 6B Tina Declerck Network Integration of Perlmutter at NERSC Automated service monitoring in the deployment of ARCHER2 Presentation 2:00pm-2:30pmCoffee Break Break 2:30pm-3:00pmCUG 2022 Conference close CUG Close CUG Business |