CUG2016 Proceedings

Monday, May 9th

8:30am-10:00am

Tutorial 1A

Bartholomew

Cray Management System for XC Systems with SMW 8.0/CLE 6.0

Tutorial

Tutorial 1B

Harpley

Knights Landing and Your Application: Getting Everything from the Hardware

Tutorial

Tutorial 1C

Beaumont

CUG 2016 Cray XC Power Monitoring and Management Tutorial

Tutorial

10:30am-12:00pm

Tutorial 1A Continued

Bartholomew

Cray Management System for XC Systems with SMW 8.0/CLE 6.0

Tutorial

Tutorial 1B Continued

Harpley

Knights Landing and Your Application: Getting Everything from the Hardware

Tutorial

Tutorial 1C Continued

Beaumont

CUG 2016 Cray XC Power Monitoring and Management Tutorial

Tutorial

1:00pm-2:30pm

Tutorial 2A

Bartholomew

Cray Management System for XC Systems with SMW 8.0/CLE 6.0

Tutorial

Tutorial 2B

Harpley

Getting the full potential of OpenMP on Many-core systems

Tutorial

Tutorial 2C

Beaumont

eLogin Made Easy - An Introduction and Tutorial on the new Cray External Login Node.

Tutorial

3:00pm-4:30pm

Tutorial 2A Continued

Bartholomew

Cray Management System for XC Systems with SMW 8.0/CLE 6.0

Tutorial

Tutorial 2B Continued

Harpley

Getting the full potential of OpenMP on Many-core systems

Tutorial

Tutorial 2C Continued

Beaumont

eLogin Made Easy - An Introduction and Tutorial on the new Cray External Login Node.

Tutorial

4:45pm-6:00pm

Interactive 3A

Bartholomew

Matteo Chesi

Jobs I/O monitoring for Lustre at scale

Birds of a Feather

Interactive 3B

Harpley

Richard S. Canon

Containers for HPC

Birds of a Feather

Interactive 3C

Beaumont

David Hancock

Open Discussion with CUG Board

Birds of a Feather

Tuesday, May 10th

7:30am-8:15am

Interactive 4A

Bartholomew

Derek Burke

Sonexion Collaboration (Invitation Only)

Birds of a Feather

Interactive 4B

Harpley

CJ Corbett

Cray and HPC in the Cloud

Birds of a Feather

Interactive 4C

Beaumont

Birds of a Feather

8:30am-10:00am

General Session 5

Minories Suite

David Hancock

CUG Welcome

The Strength of a Common Goal

The Strength of a Common Goal

Florence Rabier (ECMWF - European Centre for Medium-Range Weather Forecasts)

Abstract

ECMWF is an intergovernmental organisation supported by 34 European States. It provides forecasts of global weather to 15 days ahead as well as monthly and seasonal forecasts. The National Meteorological Services of Member and Co-operating States use ECMWF's products for their own national duties, in particular to give early warning of potentially damaging severe weather.

It operates a sophisticated prediction model of the global atmosphere and oceans known as the Integrated Forecasting System (IFS), running on some two and a half million lines of codes. Operational since 1994, the IFS is constantly updated to add new features to adapt to the latest HPC advances.

ECMWF has been using supercomputers since 1977. It operates one of the largest supercomputer facilities of its type in Europe for meteorology worldwide and holds the world's largest archive of numerical weather prediction and observational data. Member and Co-operating States can access ECMWF's basic computing facilities, the meteorological archive, and temporary tape storage. Member States also have access to the supercomputers and permanent tape storage.

Progress in numerical weather prediction is intimately connected with progress in supercomputing. Over the years, more computing power has enabled us to increase the skill and detail of our forecasts. This has brought huge value to society, not least through early warnings of severe weather. But as the forecasting system becomes more complex, with current computing architectures, it will soon be impossible to issue forecasts within schedule and at a reasonable cost. Supercomputer energy consumption at ECMWF would have to increase unviably, from about 4 megawatts today to perhaps 50 megawatts or more in ten years’ time, if the more complex forecasting systems of the future were to be run on the current architecture.

A new generation of computing systems with exascale capabilities promise much greater energy efficiency – but they will rely on parallel processing at levels to which current NWP codes are not adapted. Changes are needed throughout the entire NWP processing chain if we are to exploit these new opportunities for energy efficiency.

ECMWF’s Scalability Programme, launched in 2013, brings together meteorological modellers, computer scientists and hardware providers from around the world for a coordinated approach to hardware and software development.

This ten-year programme encompasses the entire NWP processing chain, from processing and assimilating observational data to delivering forecasts to Member States. Projects will cover six main areas:

• Observational data processing
• Data assimilation
• Numerical methods
• Numerical data processing
• IFS code adaptation
• Computer architecture support

Quo Vadis HPC?

Numerical weather prediction ready to embrace Exascale!

Plenary

10:30am-12:00pm

General Session 6

Minories Suite

Andrew Winfer

Cray Corporate Update

Cray Products Update

Cray Future Directions

Plenary

1:00pm-2:30pm

Technical Session 7A

Bartholomew

Helen He

Performance on Trinity (a Cray XC40) with Acceptance-Applications and Benchmarks

Improving I/O Performance of the Weather Research and Forecast (WRF) Model

Performance Evaluation of Apache Spark on Cray XC Systems

Paper

Applications & Programming Environments

Technical Session 7B

Harpley

Jason Hill

Collective I/O Optimizations for Adaptive Mesh Refinement Data Writes on Lustre File System

Finally, A Way to Measure Frontend I/O Performance.

A Classification of Parallel I/O Toward Demystifying HPC I/O Best Practices

Paper

Filesystems & I/O

Technical Session 7C

Beaumont

Ashley Barker

Unified Workload Management for the Cray XC30/40 System with Univa Grid Engine

Driving More Efficient Workload Management on Cray Systems with PBS Professional

Broadening Moab for Expanding User Needs

Paper

User Services

3:00pm-5:00pm

Technical Session 8A

Bartholomew

Bilel Hadri

Trinity: Architecture and Early Experience

Code Porting to Cray XC-40 Lesson Learned

Early Experiences Writing Performance Portable OpenMP 4 Codes

A Reasoning And Hypothesis Generation Framework Based On Scalable Graph Analytics: Enabling Discoveries In Medicine Using Cray Urika-XA And Urika-GD

Paper

Applications & Programming Environments

Technical Session 8B

Harpley

Jenett Tillotson

Technical Publications New Portal

Improving User Notification on Frequently Changing HPC Environments

Slurm Overview and Road Map

Interactive Visualization of Scheduler Usage Data

Paper

User Services

Technical Session 8C

Beaumont

Jean-Guillaume Piccinali

Crossing the Rhine - Moving to CLE 6.0 System Management

Scaling Security in a Complex World

The NERSC Data Collect Environment

Making the jump to Light Speed with Cray’s DataWarp - An Administrator’s Perspective

Paper

Systems Support

5:15pm-6:15pm

Interactive 9A

Bartholomew

Jason Hill

Systems Support SIG Meeting

Birds of a Feather

Interactive 9B

Harpley

Chris Fuson

Best Practices for Managing HPC User Documentation and Communication

Birds of a Feather

Interactive 9C

Beaumont

Peter Messmer

GPU accelerated Cray XC systems: Where HPC meets Big Data

Birds of a Feather

Wednesday, May 11th

7:30am-8:15am

Interactive 10A

Bartholomew

Michael Showerman

Addressing the challenges of "systems monitoring" data flows

Birds of a Feather

Interactive 10B

Harpley

Peggy A. Sanchez

Technical Documentation and Users

Birds of a Feather

Interactive 10C

Beaumont

Birds of a Feather

8:30am-10:00am

General Session 11

Minories Suite

David Hancock

CUG Business

Weather and Climate Services and need of High Performance Computing (HPC) Resources

Weather and Climate Services and need of High Performance Computing (HPC) Resources

Petteri Taalas (WMO - World Meteorological Organization)

Abstract

We are entering a new era in technological innovation and in use and integration of different sources of information for the well-being of society and their ability to cope with multi-hazards through weather and climate services. New predictive tools that will detail weather conditions down to neighbourhood and street level, and provide early warnings a month ahead, and forecasts from rainfall to energy consumption will be some of the main outcome of the research activities in weather science over the next decade.

As weather and climate science advances, critical questions are arising such as about the possible sources of predictability on weekly, monthly and longer time-scales; seamless prediction; the development and application of new observing systems; the effective utilization of massively-parallel supercomputers; the communication, interpretation, and application of weather-related information; and the quantification of the societal impacts. The science is primed for a step forward informed by the realization that there can be predictive power on all space and time-scales arising from currently poorly-understood sources of potential predictability.

Globally the tendency of weather forecasting is moving towards impact-based direction. Besides forecasting the physical parameters, like temperature, wind and precipitation customers and general public are becoming more interested in the impacts of weather and climate phenomena. These are for example traffic disturbances, impacts on energy availability and demand, impacts on agriculture and tourism. In the case of climate services the customers are e.g. investors, industry, insurance sector, construction companies, consumer businesses and agriculture. Besides the impact-based forecasting there is a tendency to move towards multi-hazard forecasting. As an example the Japanese earthquake led to a severe tsunami, which caused the Fukushima nuclear accident with dispersion of radioactive release to both atmosphere and ocean.

The further evolution of HPCs is having an impact on weather and climate service providers globally. Most of the national services cannot afford the largest available computers for running high-resolution NWP and climate models with advanced physics. A key question is the possibility to get access to products provided by using the best models run at the largest HPCs. At the moment several countries and the European Commission are moving towards open data policies with free access to NWP model data. This allows also countries without own HPC resources and modelling skills to get access to state of the art products. On the other hand the growing interest of private sector in NWP modelling may have an opposite impact. The global strength of meteorological community so far has been relatively free exchange of know-how and the ability to collaborate across national borders, which is a key factor behind the success of the ECMWF.

At the moment the national weather services are considering how to ensure access to better HPC resources. For example the five Nordic and three Baltic countries have agreed to seek for a joint HPC solution to enhance the value for money of national HPC investment resources. It is also discussed whether the countries should in the future buy own HPCs or rather get access to resource owned by e.g. private sector.

Plenary

10:30am-12:00pm

General Session 12

Minories Suite

Andrew Winfer

Accelerating Science with the NERSC Burst Buffer Early User Program

Accelerating Science with the NERSC Burst Buffer Early User Program

Wahid Bhimji, Deborah Bard, Melissa Romanus AbdelBaky, David Paul, Andrey Ovsyannikov, and Brian Friesen (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); Matt Bryson (Lawrence Berkeley National Laboratory); Joaquin Correa and Glenn K. Lockwood (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); Vakho Tsulaia, Suren Byna, and Steve Farrell (Lawrence Berkeley National Laboratory); Doga Gursoy (Argonne National Laboratory); Chris Daley (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center); Vince Beckner, Brian Van Straalen, David Trebotich, Craig Tull, and Gunther H. Weber (Lawrence Berkeley National Laboratory); and Nicholas J. Wright, Katie Antypas, and Mr Prabhat (Lawrence Berkeley National Laboratory/National Energy Research Scientific Computing Center)

Abstract

NVRAM-based Burst Buffers are an important part of the emerging HPC storage landscape. The National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory recently installed one of the first Burst Buffer systems as part of its new Cori supercomputer, collaborating with Cray on the development of the DataWarp software. NERSC has a diverse user base comprised of over 6500 users in 750 different projects spanning a wide variety of scientific applications, including climate modeling, combustion, fusion, astrophysics, computational biology, and many more. The potential applications of the Burst Buffer at NERSC are therefore also considerable and diverse. We describe here the Burst Buffer Early User Program at NERSC, which selected a number of research projects to gain early access to the Burst Buffer and exercise its different capabilities to enable new scientific advancements. We present details of the program, in-depth performance results and lessons-learnt from highlighted projects.

pdf, pdf

Is Cloud A Passing Fancy?

Plenary

1:00pm-1:45pm

General Session 13

Minories Suite

David Hancock

1 on 100 or More...

Plenary

2:00pm-3:00pm

Technical Session 14A

Bartholomew

Chris Fuson

Estimating the Performance Impact of the MCDRAM on KNL Using Dual-Socket Ivy Bridge nodes on Cray XC30

Cray Performance Tools Enhancements for Next Generation Systems

Paper

Applications & Programming Environments

Technical Session 14B

Harpley

Ashley Barker

Architecture and Design of Cray DataWarp

Exascale HPC Storage – A possibility or a pipe dream?

Paper

Filesystems & I/O

Technical Session 14C

Beaumont

Jim Rogers

ACES and Cray Collaborate on Advanced Power Management for Trinity

Cray XC Power Monitoring and Control for Knights Landing (KNL)

Paper

Systems Support

3:30pm-5:00pm

Technical Session 15A

Bartholomew

Zhengji Zhao

Lonestar 5: Customizing the Cray XC40 Software Environment

The Cray Programming Environment: Current Status and Future Directions

Making Scientific Software Installation Reproducible On Cray Systems Using EasyBuild

Paper

Applications & Programming Environments

Technical Session 15B

Harpley

Jason Hill

The Evolution of Lustre Networking at Cray

Extreme Scale Storage & IO

Managing your Digital Data Explosion

Paper

Filesystems & I/O

Technical Session 15C

Beaumont

Helen He

SLURM. Our way. A tale of two XCs transitioning to SLURM.

Early experiences configuring a Cray CS Storm for Mission Critical Workloads

Analysis of Gemini Interconnect Recovery Mechanisms: Methods and Observations

Paper

Systems Support

5:15pm-6:15pm

Interactive 16A

Bartholomew

Timothy W. Robinson

Programming Environments, Applications and Documentation SIG Meeting

Birds of a Feather

Interactive 16B

Harpley

Cory Spitz

Evolving parallel file systems in response to the changing storage and memory landscape

Birds of a Feather

Interactive 16C

Beaumont

Wendy L. Palm

Security Issues on Cray Systems

Birds of a Feather

Thursday, May 12th

7:30am-8:15am

Interactive 17A

Bartholomew

Bill Nitzberg

PBS Professional: Welcome to the Open Source Community

Birds of a Feather

Interactive 17B

Harpley

CJ Corbett

Cray and HPC in the Cloud: Discussion and Conclusion

Birds of a Feather

Interactive 17C

Beaumont

Birds of a Feather

8:30am-10:00am

Technical Session 18A

Bartholomew

Tina Declerck

Opportunities for container environments on Cray XC30 with GPU devices

Shifter: Containers for HPC

Dynamic RDMA Credentials

Paper

Applications & Programming Environments

Technical Session 18B

Harpley

Jason Hill

Characterizing the Performance of Analytics Workloads on the Cray XC40

Interactive Data Analysis using Spark on Cray Urika Appliance (WITHDRAWN)

Experiences Running Mixed Workloads On Cray Analytics Platform

Paper

Applications & Programming Environments

Technical Session 18C

Beaumont

Jim Rogers

Network Performance Counter Monitoring and Analysis on the Cray XC Platform

Design and implementation of a scalable monitoring system for Trinity

Dynamic Model Specific Register (MSR) Data Collection as a System Service

Paper

Systems Support

10:30am-12:00pm

Technical Session 19A

Bartholomew

Zhengji Zhao

Optimizing Cray MPI and SHMEM Software Stacks for Cray-XC Supercomputers based on Intel KNL Processors

What's new in Allinea's tools: from easy batch script integration and remote access to energy profiling.

Configuring and Customizing the Cray Programming Environment on CLE 6.0 Systems

Paper

Applications & Programming Environments

Technical Session 19B

Harpley

Richard Barrett

FCP: A Fast and Scalable Data Copy Tool for High Performance Parallel File Systems

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Psync - Parallel Synchronization Of Multi-Pebibyte File Systems

Paper

Filesystems & I/O

Technical Session 19C

Beaumont

Tina Declerck

The GNI Provider Layer for OFI libfabric

Big Data Analytics on Cray XC Series DataWarp using Hadoop, Spark and Flink

Performance Test of Parallel Linear Equation Solvers on Blue Waters - Cray XE6/XK7 system

Paper

Applications & Programming Environments

1:00pm-2:30pm

Technical Session 20A

Bartholomew

Chris Fuson

Scaling hybid coarray/MPI miniapps on Archer

Enhancing Scalability of the Gyrokinetic Code GS2 by using MPI Shared Memory for FFTs

Scalable Remote Memory Access Halo Exchange with Reduced Synchronization Cost

Paper

Applications & Programming Environments

Technical Session 20B

Harpley

Bilel Hadri

Directive-based Programming for Highly-scalable Nodes

Balancing particle and Mesh Computation in a Particle-In-Cell Code

Computational Efficiency Of The Aerosol Scheme In The Met Office Unified Model

Paper

Applications & Programming Environments

Technical Session 20C

Beaumont

Andrew Winfer

How to Automate and not Manage under Rhine/Redwood

The Intel® Omni-Path Architecture: Game-Changing Performance, Scalability, and Economics

The Hidden Cost of Large Jobs - Drain Time Analysis at Scale

Paper

Systems Support

3:00pm-5:00pm

Technical Session 21A

Bartholomew

Richard Barrett

Stitching Threads into the Unified Model

On Enhancing 3D-FFT Performance in VASP

Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers

Cori - A System to Support Data-Intensive Computing

Paper

Applications & Programming Environments

Technical Session 21B

Harpley

Frank M. Indiviglio

H5Spark: Bridging the I/O Gap between Spark and Scientific Data Formats on HPC Systems

The time is now. Unleash your CPU cores with Intel® SSDs

Introducing a new IO tier for HPC Storage

Paper

Filesystems & I/O

Technical Session 21C

Beaumont

David Hancock

Maintaining Large Software Stacks in a Cray Ecosystem with Gentoo Portage

Early Application Experiences on Trinity - the Next Generation of Supercomputing for the NNSA

Executing dynamic heterogeneous workloads on Blue Waters with RADICAL-Pilot

Evaluating Shifter for HPC Applications

Paper

Applications & Programming Environments

5:05pm-5:35pm

General Session 22

Minories Suite

David Hancock

Conference Closing Session

Plenary