CUG2015 Proceedings

Saturday, April 25th

Sunday, April 26th

Monday, April 27th

8:30am-10:00am

Tutorial 1A

Magnolia Ballroom

Next Generation Cray Management System for XC Systems

Tutorial

Systems

Tutorial 1B

Willow

Cray XC Power Monitoring and Control

Tutorial

Systems

Tutorial 1C

Sycamore

Preparing for a smooth landing: Intel’s Knights Landing and Modern Applications

Tutorial

PE & Applications, Systems

10:30am-12:00pm

Tutorial 1A continued

Magnolia Ballroom

Next Generation Cray Management System for XC Systems

Tutorial

Systems

Tutorial 1B continued

Willow

Cray XC Power Monitoring and Control

Tutorial

Filesystems & I/O

Tutorial 1C continued

Sycamore

Preparing for a smooth landing: Intel’s Knights Landing and Modern Applications

Tutorial

PE & Applications

1:00pm-2:30pm

Tutorial 2A

Magnolia Ballroom

Next Generation Cray Management System for XC Systems

Tutorial

Systems

Tutorial 2B

Willow

Job-Level Tracking with XALT: A Tutorial for System Administrators and Data Analysts

Tutorial

PE & Applications, Systems

Tutorial 2C

Sycamore

Debugging, Profiling and Tuning Applications on Cray CS and XC Systems

Tutorial

PE & Applications, Systems

3:00pm-4:30pm

Tutorial 2A continued

Magnolia Ballroom

Next Generation Cray Management System for XC Systems

Tutorial

Systems

Tutorial 2B continued

Willow

Job-Level Tracking with XALT: A Tutorial for System Administrators and Data Analysts

Tutorial

PE & Applications, Systems

Tutorial 2C continued

Sycamore

Debugging, Profiling and Tuning Applications on Cray CS and XC Systems

Tutorial

PE & Applications

4:45pm-6:00pm

Interactive 3A

Magnolia Ballroom

Jason Hill

Systems Support SIG Meeting

Birds of a Feather

Systems

Interactive 3B

Willow

Timothy W. Robinson

Programming Environments, Applications and Documentation SIG Meeting

Birds of a Feather

PE & Applications

Interactive 3C

Sycamore

Birds of a Feather

Tuesday, April 28th

7:30am-8:15am

Interactive 4A

Magnolia Ballroom

Ashley Barker

System Testing and Resiliency in HPC

Birds of a Feather

PE & Applications, Systems

Interactive 4B

Willow

Birds of a Feather

Interactive 4C

Sycamore

Birds of a Feather

8:30am-10:00am

General Session 5

Magnolia Ballroom

David Hancock

CUG Welcome

Supercomputing in an Era of Big Data and Big Collaboration

Invited Talk

Plenary

10:30am-12:00pm

General Session 6

Magnolia Ballroom

Nicholas Cardo

Cray Corporate Update

Invited Talk

Plenary

1:00pm-2:30pm

Technical Session 7A

Magnolia Ballroom

Matthew A. Ezell

Innovations for The Cray

Slurm Road Map 15.08

Driving More Efficient Workload Management on Cray Systems with PBS Professional

Paper

Systems

Technical Session 7B

Willow

Sharif Islam

How Distributed Namespace Boosts Lustre Metadata Performance

Toward Understanding Life-Long Performance of a Sonexion File System

Paper

Filesystems & I/O

Technical Session 7C

Sycamore

Frank M. Indiviglio

Porting the Urika-GD Graph Analytic Database to the XC30/40 Platform

Implementing a social-network analytics pipeline using Spark on Urika XA

A Graph Mining "App-Store" for Urika-GD

Paper

PE & Applications

3:00pm-5:00pm

Technical Session 8A

Magnolia Ballroom

Tina Butler

Realtime process monitoring on the Cray XC30

Cray DataWarp: Administration & SLURM integration

Bright Cluster Manager - Managing your cluster for HPC, Hadoop and OpenStack

Cray System Snapshot Analyzer (SSA)

Paper

Systems

Technical Session 8B

Willow

Jason Hill

A Storm (Lake) is Coming to Fast Fabrics: The Next-Generation Intel® Omni-Path Architecture

The Accelerated Road to Exascale

Data Transfer Study for HPSS Archiving

Applying Advanced IO Architectures to Improve Efficiency in Single and Multi-Cluster Environments

Paper

Filesystems & I/O, Systems

Technical Session 8C

Sycamore

Gregory Bauer

Sorting at Scale on BlueWaters in a Cosmological Simulation

Parallel Software usage on UK National HPC Facilities 2009-2015: How well have applications kept up with increasingly parallel hardware?

Use of Continuous Integration Tools for Application Performance Monitoring

Paper

PE & Applications

5:15pm-6:15pm

Interactive 9A

Magnolia Ballroom

Michael Showerman

Systems monitoring of Cray systems

Birds of a Feather

Systems

Interactive 9B

Willow

Suzanne T. Parete-Koon

Getting the Most Out of HPC User Groups

Birds of a Feather

PE & Applications, Systems

Interactive 9C

Sycamore

Duncan J. Poole

Experiences with OpenACC

Birds of a Feather

PE & Applications

Wednesday, April 29th

8:30am-10:00am

Technical Sessions 10A

Magnolia Ballroom

Don Maxwell

Experience with GPUs on the Titan Supercomputer from a Reliability, Performance and Power Perspective

Detecting and Managing GPU Failures

Enabling Advanced Operational Analysis Through Multi-Subsystem Data Integration on Trinity

Paper

Systems

Technical Sessions 10B

Willow

Sharif Islam

Tuning Parallel I/O on Blue Waters for Writing 10 Trillion Particles

Evaluation of Parallel I/O Performance and Energy Consumption with Frequency Scaling on Cray XC30

A More Realistic Way of Stressing the End-to-end I/O System

Paper

Filesystems & I/O

Technical Sessions 10C

Sycamore

Zhengji Zhao

The Cray Programming Environment: Current Status and Future Directions

Using Reveal to Automate Parallelization for Many-Core Systems

An Investigation of Compiler Vectorization on Current and Next-generation Intel processors using Benchmarks and Sandia’s Sierra Application

Paper

PE & Applications

10:30am-12:00pm

General Session 11

Magnolia Ballroom

Nicholas Cardo

CUG 2015 Best Paper

Changing Needs/Solutions/Roles

Invited Talk

Plenary

1:00pm-1:45pm

General Session 12

Magnolia Ballroom

David Hancock

1 on 100 or More

Invited Talk

Plenary

2:00pm-3:30pm

General Session 13

Magnolia Ballroom

David Hancock

CUG Business Meeting

Scalability Limits for Scientific Simulation

CUG Business Meeting

Invited Talk

Plenary

3:45pm-5:15pm

Technical Session 14A

Magnolia Ballroom

Matthew A. Ezell

Cray XC System Node Level Diagnosability

Cray XC System Level Diagnosability Roadmap Update

Lustre Resiliency: Understanding Lustre Message Loss and Tuning for Resiliency

Paper

Systems

Technical Session 14B

Willow

Tina Butler

The time is now. Unleash your CPU cores with Intel® SSDs

DataWarp: First Experiences

Paper

Filesystems & I/O

Technical Session 14C

Sycamore

Timothy W. Robinson

Using Maali to Efficiently Recompile Software Post-CLE Updates on the Cray XC Systems

PGI C++ with OpenACC

Contain This, Unleashing Docker for HPC

Paper

PE & Applications

5:30pm-6:15pm

Interactive Session 15A

Magnolia Ballroom

Customer Support Modernisation

Birds of a Feather

Interactive Session 15B

Willow

SIG Structure Discussion

Birds of a Feather

CUG Board, Filesystems & I/O, PE & Applications, Systems, XTreme

Interactive Session 15C

Sycamore

Birds of a Feather

Thursday, April 30th

8:30am-10:00am

General Session 16

Magnolia Ballroom

Nicholas Cardo

New Member Lightning Talk, Hong Kong Sanatorium & Hospital

New Member Lightning Talk, Argonne National Laboratory

New Member Lightning Talk, European Centre for Medium Range Weather Forecasts

Panel Discussion - Does Big Data Imply Big Compute?

Invited Talk

Plenary

10:30am-12:00pm

Technical Session 17A

Magnolia Ballroom

Jim Rogers

Overview of the KAUST’s Cray X40 System – Shaheen II

Resource Utilization Reporting Two Year Update

Cray Advanced Platform Monitoring and Control (CAPMC)

Paper

Systems

Technical Session 17B

Willow

Zhengji Zhao

Optimizing Cray MPI and Cray SHMEM for Current and Next Generation Cray-XC Supercomputers

Illuminating and Electrifying OpenMP + MPI Performance

Performance and Extension of a Particle Transport Code using Hybrid MPI/OpenMP Programming Models

Paper

PE & Applications

Technical Session 17C

Sycamore

Suzanne T. Parete-Koon

Application Performance on a Cray XC30 Evaluation System with Xeon Phi Coprocessors at HLRN-III

Climate Science Performance, Data and Productivity on Titan

Memory Scalability and Efficiency Analysis of Parallel Codes

Paper

PE & Applications

12:00pm-1:00pm

CUG Board Lunch (closed)

Elm

David Hancock

description

CUG Board

1:00pm-2:30pm

Technical Session 18A

Magnolia Ballroom

Chris Fuson

Custom Product Integration and the Cray Programming Environment

Cray Storm Programming

HPC Workforce Preparation

Paper

PE & Applications, Systems

Technical Session 18B

Willow

Brett Bode

Utilizing Unused Resources To Improve Checkpoint Performance

Sonexion - SW versions/roadmap

Lustre Metadata DNE Performance on Seagate Lustre System

Paper

Filesystems & I/O

Technical Session 18C

Sycamore

Abhinav S. Thota

Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two Parallel Neuronal Network Simulation Algorithms

The Impact of High-Performance Computing Best Practice Applied to Next-Generation Sequencing Workflows

parallelization of whole genome analysis on a Cray XE6

Paper

PE & Applications

3:00pm-4:30pm

Technical Session 19A

Magnolia Ballroom

Jim Rogers

Monitoring and Analyzing Job Performance Using Resource Utilization Reporting (RUR) on A Cray XE6 System

Molecular Modelling and the Cray XC30 Power Management Counters

Implementing "Pliris-C/R" Resiliency Features Into the EIGER Application

Paper

Systems

Technical Session 19B

Willow

Veronica G. Vergara Larrea

Experiences Running and Optimizing the Berkeley Data Analytics Stack on Cray Platforms

Cyber-threat analytics using graph techniques

Staying Out of the Wind Tunnel with Virtual Aerodynamics

Paper

Filesystems & I/O

Technical Session 19C

Sycamore

Gregory Bauer

Reducing Cluster Compatibility Mode (CCM) Complexity

Preparation of codes for Trinity

Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer

Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer

Esteban Meneses (University of Pittsburgh), Xiang Ni (University of Illinois at Urbana-Champaign), and Terry Jones and Don Maxwell (Oak Ridge National Laboratory)

The unprecedented computational power of current supercomputers now makes possible the exploration of complex problems in many scientific fields, from genomic analysis to computational fluid dynamics. Modern machines are powerful because they are massive: they assemble millions of cores and a huge quantity of disks, cards, routers, and other components. But it is precisely the size of these machines that glooms the future of supercomputing. A system that comprises many components has a high chance to fail, and fail often. In order to make the next generation of supercomputers usable, it is imperative to use some type of fault tolerance platform to run applications on large machines. Most fault tolerance strategies can be optimized for the peculiarities of each system and boost efficacy by keeping the system productive. In this paper, we aim to understand how failure characterization can improve resilience in several layers of the software stack: applications, runtime systems, and job schedulers. We examine the Titan supercomputer, one of the fastest systems in the world. We analyze a full year of Titan in production and distill the failure patterns of the machine. By looking into Titan's log files and using the criteria of experts, we provide a detailed description of the types of failures. In addition, we inspect the job submission files and describe how the system is used. Using those two sources, we cross correlate failures in the machine to executing jobs and provide a picture of how failures affect the user experience. We believe such characterization is fundamental in developing appropriate fault tolerance solutions for Cray systems similar to Titan. We also investigate how failures impact long-running jobs. We provide a series of recommendations for developing resilient software on supercomputers.

pdf

Paper

PE & Applications

4:45pm-5:15pm

General Session 20

Magnolia Ballroom

David Hancock

Invited Talk

Plenary

Friday, May 1st