CUG2019 Proceedings

Birds of a Feather

BoF 3B

Chair: Bilel Hadri (KAUST Supercomputing Lab)

Programming Environments, Applications, and Documentation (PEAD) Special Interest Group meeting

Bilel Hadri (KAUST Supercomputing Lab)

Birds of a Feather

BoF 3C

Chair: Sadaf R. Alam (CSCS)

Tools and Utilities for Data Science Workloads and Workflows

Sadaf R. Alam (Swiss National Supercomputing Centre), Mike Ringenburg (Cray Inc.), and Maxime Martinasso (Swiss National Supercomputing Centre)

Birds of a Feather

BoF 12A

Chair: Sadaf R. Alam (CSCS)

Third Annual Meeting on Opportunities for containers in HPC ecosystems

Sadaf R. Alam (Swiss National Supercomputing Centre), Shane Canon (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Lucas Benedicic (Swiss National Supercomputing Centre), and Douglas Jacobsen (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Birds of a Feather

BoF 12B

Chair: Bilel Hadri (KAUST Supercomputing Lab)

Managing Effectively the User Software Ecosystem

Bilel Hadri (KAUST Supercomputing Lab); Peggy Sanchez (Cray Inc.); Guilherme Peretti-Pezzi (Swiss National Supercomputing Centre); Christopher Fuson, (Oak Ridge National Laboratory); and Yun He and Mario Melara (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Birds of a Feather

BoF 12C

Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)

Large-scale System Acceptance Testing: Procedures, Tests, and Automation

Veronica G. Vergara Larrea and Reuben Budiardja (Oak Ridge National Laboratory (ORNL))

Birds of a Feather

BoF 24C

Chair: Colin McMurtrie (Swiss National Supercomputing Centre)

Open Discussion with CUG Board

Colin McMurtrie (Swiss National Supercomputing Centre)

New Sites

New Site

New Site 8

Chair: Brian Skjerven (Pawsey Supercomputing Centre)

New Site - Indian Institute of Science

Prasun Dhang (Indian Institute of Science)

New Site

New Site 17

Chair: Brian Skjerven (Pawsey Supercomputing Centre)

New Site - IBS Center for Climate Physics (ICCP)

Muyoung Heo (ICCP)

New Site

Site Talk 21

Chair: Trey Breckenridge (Mississippi State University)

ORNL’s Frontier - Direction of Discovery

Jim Rogers (ORNL)

New Site

New Site 27

Chair: Brian Skjerven (Pawsey Supercomputing Centre)

New Site - Paderborn University

Christian Plessl (Paderborn University)

Paper/Presentation

Technical Session 10A

Chair: Jim Rogers (Oak Ridge National Laboratory)

Shasta System Management Overview

Harold W. Longley (Cray Inc.)

Shasta System Monitoring Framework

Patricia Langer (Cray)

Significant Advances in Cray System Architecture for Diagnostics, Availability, Resiliency, and Health

Stephen Fisher and Christer Lundin (Cray Inc.)

Paper/Presentation

Technical Session 10B

Chair: Scott Michael (Indiana University)

Cray Performance Tools: New Functionality and Future Directions

Heidi Poxon (Cray Inc.)

Porting Quantum ESPRESSO Hybrid Functional DFT to GPUs Using CUDA Fortran

Thorsten Kurth (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); Joshua Romero, Everett Phillips, and Massimiliano Fatica (NVIDIA); and Brandon Cook, Rahul Gayatri, Zhengji Zhao, and Jack Deslippe (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

Accelerating modern scientific simulations with FPGAs

Tobias Kenter, Paolo Gorlani, and Christian Plessl (Paderborn University)

Paper/Presentation

Technical Session 10C

Chair: Bilel Hadri (KAUST Supercomputing Lab)

Dynamically Provisioning Cray DataWarp Storage

François Tessier, Maxime Martinasso, Mark Klein, Matteo Chesi, and Miguel Gila (Swiss National Supercomputing Centre)

pdf, pdf

Exploring Lustre Overstriping For Shared File Performance on Disk and Flash

Michael Moore (Cray, Inc.) and Patrick Farrell (Whamcloud)

pdf, pdf

Designing an All-Flash Lustre File System for the 2020 NERSC Perlmutter System

Kirill Lozinskiy, Glenn K. Lockwood, Lisa Gerhardt, Ravi Cheema, Damian Hazen, and Nicholas J. Wright (Lawrence Berkeley National Laboratory)

pdf, pdf

Paper/Presentation

Technical Session 11A

Chair: Veronica G. Vergara Larrea (Oak Ridge National Laboratory)

Exploring New Monitoring and Analysis Capabilities on Cray’s Software Preview System

Jim Brandt and Ann Gentile (Sandia National Laboratories), Joe Greenseid (Cray Inc.), William Kramer (National Center for Supercomputing Applications/University of Illinois), Patti Langer and Aamir Rashad (Cray Inc.), and Michael Showerman (National Center for Supercomputing Applications)

Reimagining image management in the new Shasta environment

Harold W. Longley and Eric Cozzi (Cray Inc.)

Exploring the Mysterious Universe of Shasta Software for Perlmutter

James F. Botts and Douglas M. Jacobsen (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Paper/Presentation

Technical Session 11B

Chair: Abhinav S. Thota (Indiana University)

Challenges in Providing an Interactive Service with Jupyter on Large-Scale HPC Systems

Tim Robinson, Lucas Benedicic, Mark Klein, and Maxime Martinasso (Swiss National Supercomputing Centre)

Resource Management in a Heterogeneous Environment

Jonathan Sparks (Cray Inc.)

Cray Programming Environments within Containers on Cray XC Systems

Maxime Martinasso, Miguel Gila, William Sawyer, Rafael Sarmiento, Guilherme Peretti-Pezzi, and Vasileios Karakasis (ETH Zurich / CSCS)

pdf, pdf

Paper/Presentation

Technical Session 11C

Chair: David Hancock (Indiana University)

Scheduling Data Streams for Low Latency and High Throughput on a Cray XC40 Using Libfabric

Farouk Salem, Thorsten Schütt, Florian Schintke, and Alexander Reinefeld (Zuse Institute Berlin)

pdf, pdf

Characterizing Full-system Network Performance and Congestion Management Capabilities with Improved Network Benchmarks

Peter Mendygral, Nathan Wichmann, Duncan Roweth, Krishna Kandalla, and Kim McMahon (Cray Inc.)

New Lustre Features to Improve Lustre Metadata and Small-File Performance

John Fragalla; Bill Loewe, PhD.; and Torben Kling-Petersen, PhD. (Cray Inc.)

pdf, pdf

Paper/Presentation

Technical Session 19A

Chair: Brian Skjerven (Pawsey Supercomputing Centre)

I/O Performance Analysis of Science Applications Using HDF5 File-level Provenance

Tonglin Li and Quincey Koziol (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); Houjun Tang (Lawrence Berkeley National Laboratory, Lawrence Berkeley National Lab); Jialin Liu (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); and Suren Byna (Lawrence Berkeley National Laboratory)

pdf, pdf

Roofline-based Performance Efficiency of HPC Benchmarks and Applications on Current Generation of Processor Architectures

JaeHyuk Kwack, Thomas Applencourt, Colleen Bertoni, Yasaman Ghadar, Huihuo Zheng, Christopher Knight, and Scott Parker (Argonne National Laboratory)

pdf, pdf

Experiences porting mini-applications to OpenACC and OpenMP on heterogeneous systems

Veronica G. Vergara Larrea and Reuben Budiardja (Oak Ridge National Laboratory), Rahulkumar Gayatri and Christopher Daley (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), and Oscar Hernandez and Wayne Joubert (Oak Ridge National Laboratory (ORNL))

pdf, pdf

Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System

Charlene Yang and Thorsten Kurth (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory) and Samuel Williams (Computational Research Division/Lawrence Berkeley National Laboratory)

pdf, pdf

Paper/Presentation

Technical Session 19B

Chair: David Hancock (Indiana University)

Uncovering Lustre Performance Issues in Operational Weather Forecasting at DMI with View for ClusterStor

Thomas Lorenzen (Danish Meteorological Institute DMI) and Torben Kling Petersen (Cray Inc.)

Analysis of parallel I/O use on the UK national supercomputing service, ARCHER using Cray LASSi and EPCC SAFE

Andrew Turner and Dominic Sloan-Murphy (EPCC, The University of Edinburgh); Karthee Sivalingam and Harvey Richardson (Cray Inc.); and Julian Kunkel (University of Reading)

Interfacing HDF5 with A Scalable Object-centric Storage System on Hierarchical Storage

Jingqing Mu and Jerome Soumagne (The HDF Group); Suren Byna, Quincey Koziol, and Houjun Tang (Lawrence Berkeley National Laboratory); and Richard Warren (The HDF Group)

pdf, pdf

Hybrid Flash/Disk Storage Systems with Lustre

Nathan Rutman (Cray)

Paper/Presentation

Technical Session 19C

Chair: Abhinav S. Thota (Indiana University)

Running Alchemist on Cray XC and CS Series Supercomputers: Dask and PySpark Interfaces, Deployment Options, and Data Transfer Times

Kai Rothauge (UC Berkeley); Haripriya Ayyalasomayajula, Kristyn J. Maschhoff, and Michael Ringenburg (Cray Inc.); and Michael W. Mahoney (UC Berkeley)

pdf, pdf

Machine learning on Crays to optimise petrophysical workflows in oil and gas exploration

Nick Brown (EPCC)

pdf, pdf

Scalable Reinforcement Learning on Cray Systems

Ananda Vardhan Kommaraju, Kristyn J. Maschhoff, Michael F. Ringenburg, and Benjamin Robbins (Cray Inc.)

pdf, pdf

Urika-GX Platform's Multi-Tenancy Support: Lessons Learned

Oleksandr Shcherbakov, Dennis Hoppe, Thomas Bönisch, and Michael Gienger (High Performance Computing Center Stuttgart) and Stefan Andersson, Juri Kuebler, and Nina Mujkanovic (Cray Inc.)

Paper/Presentation

Technical Session 28A

Chair: Tina Declerck (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

Hardware Discovery and Maintenance Workflows in Shasta Systems

Steven Presser and Brent Shields (Cray Inc.)

pdf, pdf

The Beat Goes On... Cascade XC Release Schedule and Patching Stragegy

Kelly Mark (Cray Inc.)

The Art of Conversation with CrayPort (Bidirectional Record Management)

Daniel Gens, Owen James, and Elizabeth Bautista (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) and Melissa Abdelbaky (Lawrence Berkeley National Laboratory)

pdf, pdf

Paper/Presentation

Technical Session 28B

Chair: Ann Gentile (Sandia National Laboratories)

Continuous Deployment Automation in Supercomputer Operations: Techniques, Experiences and Challenges

Nicholas Cardo, Matteo Chesi, and Miguel Gila (Swiss National Supercomputing Centre)

pdf, pdf

The role of emerging orchestration and execution models in HPC Environments

Richard Canon (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center) and Jonathan Sparks (Cray Inc.)

PBS Professional on Shasta

Lisa Endrjukaitis and Vincent Stumpf (Altair Engineering, Inc.)

Paper/Presentation

Technical Session 28C

Chair: Bilel Hadri (KAUST Supercomputing Lab)

FirecREST: RESTful API on Cray XC systems

Felipe A. Cruz and Maxime Martinasso (Swiss National Supercomputing Centre)

pdf, pdf

User-Friendly Data Management for Scientific Computing Users

Kirill Lozinskiy, Lisa Gerhardt, Annette Greiner, Ravi Cheema, Damian Hazen, Kristy Kallback-Rose, and Rei Lee (Lawrence Berkeley National Laboratory)

pdf, pdf

Implementation of a multi-purpose DataHub: Making the XC-40 more attractive to Data Scientists at KAUST

Samuel Kortas (KAUST) and Kristyn Maschhoff and Jim Maltby (Cray Inc.)

Paper/Presentation

Technical Session 29A

Chair: Kevin Buckley (Pawsey Supercomputing Centre)

Using Slurm to Balance the XC Equation

Douglas Jacobsen (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center), Brian Christiansen (SchedMD LLC), and Christopher Samuel (Lawr)

Optimisation of PBS Hooks on the Cray XC40

Sam Clarke (Met Office, UK)

pdf, pdf

Impact of Large Jobs and Reservations on Cray Systems Using Slurm

Yun (Helen) He (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory); Emily Zhang (University of California, Berkeley); and Woo-Sun Yang (National Energy Research Scientific Computing Center/Lawrence Berkeley National Laboratory)

PBS 18 Multiplatform Scheduling

Peter Schmid (General Electric)

Paper/Presentation

Technical Session 29B

Chair: Jim Rogers (Oak Ridge National Laboratory)

Statistical Analysis of Titan Reliability as it reaches End of Life

Jim Rogers (Oak Ridge National Laboratory)

Perlmutter: A 2020 pre-exascale system optimized for Science

Katie Antpyas, Jay Srinivasan, and Nicholas Wright (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Measuring and Mitigating Processor Performance Inconsistencies

Kevin D. Stroup (Los Alamos National Laboratory, Cray Inc.) and Paul Peltz (Oak Ridge National Laboratory)

Evaluating Compiler Vectorization Capabilities on Blue Waters

Celso L. Mendes, Gregory H. Bauer, Brett Bode, and William T. Kramer (National Center for Supercomputing Applications/University of Illinois)

pdf, pdf

Paper/Presentation

Technical Session 29C

Chair: Brian Skjerven (Pawsey Supercomputing Centre)

Modelling the earth’s geomagnetic environment on Cray machines using PETSc and SLEPc

Nick Brown (EPCC)

The British Geological Survey's global geomagnetic model, Model of the Earth's Magnetic Environment (MEME), is an important tool for calculating the earth's magnetic field, which is continually in flux. Whilst the ability to collect data from ground based observation sites and satellites has grown rapidly, the memory bound nature of the code has proved a significant limitation in modelling problem sizes required by modern science. In this paper we describe work done replacing the bespoke, sequential, eigen-solver with that of the SLEPc package for solving the system of normal equations. This work had a dual purpose, to break through the memory limit of the code, and thus support the modelling of much larger systems, by supporting execution on distributed machines, and to improve performance. But when adopting SLEPc it was not just the solving of the normal equations, but also fundamentally how we build and distribute the data structures. We describe an approach for building symmetric matrices in a way that provides good load balance and avoids the need for close co-ordination between the processes or replication of work. We also study the memory bound nature of the code from an irregular memory accesses perspective and combine detailed profiling with software cache prefetching to significantly optimise this. Performance and scaling characteristics are explored on ARCHER, a Cray XC30, where we achieved a speed up for the solver of 294 times by replacing the model's bespoke approach with SLEPc. This work also provided the ability to model much larger system sizes, up to 100,000 model coefficients, which is also demonstrated. Some of the challenges of modelling systems of this large scale are explored, and mitigations including hybrid MPI+OpenMP along with the use of iterative solvers are also considered. The result of this work is a modern MEME model that is not only capable of simulating problem sizes demanded by state of the art geomagnetism but also acts as further evidence to the utility of the SLEPc libary.

pdf, pdf

Sustained Petascale Direct Numerical Simulation on Cray XC40 Systems (Trinity, Shaheen2 and Cori)

Bilel Hadri (KAUST Supercomputing Lab), Matteo Parsani (King Abdullah University of Science and Technology), Maxwell Hutchinson (Citrine Informatics), Alexander Heinecke (Intel Corporation), and Lisandro Dalcin and David Keyes (King Abdullah University of Science and Technology)

pdf, pdf

Unravelling the Origin of Magnetic fields in Accretion Discs Through Numerical Simulations

Prasun Dhang and Prateek Sharma (Indian Institute of Science, Bangalore)

Massively Parallel SVD Solver on Cray Supercomputers

Hatem Ltaief and Dalal Sukkari (KAUST), Aniello Esposito (Cray Inc.), Yuji Nakatsukasa (University of Oxford), and David Keyes (KAUST)

Plenary

General Session 4

Chair: Colin McMurtrie (Swiss National Supercomputing Centre)

CUG Welcome

Colin McMurtrie (Swiss National Supercomputing Centre)

Keynote: Robust Deep Learning Inference with Limited Resources

Vincent Gripon (IMT Atlantique, Université de Montréal)

Plenary

General Session 7

Chair: Brian Skjerven (Pawsey Supercomputing Centre)

Scaling Results From the First Generation of Arm-based Supercomputers

Simon McIntosh-Smith, James Price, Andrei Poenaru, and Tom Deakin (University of Bristol)

pdf, pdf

Driving Innovation in HPC

Trish Damkroger (Intel Corporation)

Plenary

General Session 13

Chair: Colin McMurtrie (Swiss National Supercomputing Centre)

CUG Business

Colin McMurtrie (Swiss National Supercomputing Centre)

Fifty Years and Counting: The past and future of Numerical Weather Prediction at Environment & Climate Change Canada (ECCC

Richard Hogue (Meteorological Service of Canada)

Plenary

General Session 16

Chair: Brian Skjerven (Pawsey Supercomputing Centre)

Cray Corporate Update

Peter Ungaro (Cray Inc.)

Plenary

General Session 20

Chair: Brian Skjerven (Pawsey Supercomputing Centre)

1 on 100 or More

Peter Ungaro (Cray Inc.)

Plenary

General Session 25

Chair: Jim Rogers (Oak Ridge National Laboratory)

The HPC Processor Landscape: A Panel Discussion of Shasta Architecture Options

David Cownie (AMD), Brent Gorda (Arm), Jeff Watters (Intel), and Tom Reed (NVIDIA)

Plenary

General Session 30

Chair: Colin McMurtrie (Swiss National Supercomputing Centre)

CUG 2019 Conference Close

Colin McMurtrie (Swiss National Supercomputing Centre)

CUG2020 Site Presentation

TBD TBD (TBD)

Technical Workshop

Technical Workshop 1A

Chair: Wade Doll (Cray Inc.)

Shasta Hardware Technical Workshop

Wade Doll and Bob Alverson (Cray Inc.)

Technical Workshop

Technical Workshop 1A Continued

Chair: Wade Doll (Cray Inc.)

Shasta Hardware Technical Workshop

Wade Doll and Bob Alverson (Cray Inc.)

Technical Workshop

Technical Workshop 2A

Chair: Larry Kaplan (Cray Inc.)

Shasta Software Technical Workshop

Larry Kaplan, Matt Haines, Jason Rouault, Harold Longley, Bill Sparks, and John Fragalla (Cray Inc.) and Dave Poulsen (Cray)

Technical Workshop

Technical Workshop 2A Continued

Chair: Larry Kaplan (Cray Inc.)

Shasta Software Technical Workshop

Larry Kaplan, Matt Haines, Jason Rouault, Harold Longley, Bill Sparks, and John Fragalla (Cray Inc.) and Dave Poulsen (Cray)

Tutorials

Tutorial

Tutorial 1B

Chair: Michael Ringenburg (Cray, Inc)

Analytics and AI on Cray Systems

Michael Ringenburg and Kristyn Maschhoff (Cray Inc.) and Mustafa Mustafa, Thorsten Kurth, and Steven Farrell (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

pdf, pdf

Tutorial

Tutorial 1C

Chair: John Levesque (Cray Inc.)

Preparing an application for Hybrid Supercomputing using Cray's Tool Suite

John Levesque (Cray Inc.)