CUG2012 Final Proceedings

Monday 8:00 A.M. - 10:30 A.M.

Monday 8:00 A.M. - 10:30 A.M.
Köln

Tutorial (1C)

Chair: John Noe (Sandia National Laboratories)

Introduction to Debugging for the Cray Systems

David Lecomber (Allinea Software)

Monday 8:00 A.M. - 10:30 A.M.
Bonn

Tutorial (1B)

Chair: Jason Hill (Oak Ridge National Laboratory)

Lustre 2.x Architecture

Johann Lombardi (whamcloud, Inc.)

Monday 8:00 A.M. - 10:30 A.M.
Hamburg

Tutorial (1A)

Chair: Mark Fahey (National Institute for Computational Sciences)

Application development for the XK6

John Levesque and jeff Larkin (Cray Inc.)

Monday 10:30 A.M. - 11:00 A.M.

Monday 10:30 A.M. - 11:00 A.M.
Maritim Foyer

Break. Whamcloud, Sponsor

Dan Ferber (Whamcloud)

Monday 11:00 A.M. - 12:00 P.M.

Monday 11:00 A.M. - 12:00 P.M.
Köln / Bonn / Hamburg

Opening General Session (2)

Chair: Nick Cardo (National Energy Research Scientific Computing Center)

CUG Welcome

Nick Cardo (National Energy Research Scientific Computing Center)

The Future of HPC

Michael M. Resch (High Performance Computing Center Stuttgart)

Monday 12:00 P.M. - 1:00 P.M.

Monday 12:00 P.M. - 1:00 P.M.
Restaurant Rôtisserie

Lunch. Xyratex Technology Ltd, Sponsor

Monday 1:00 P.M. - 2:30 P.M.

Monday 1:00 P.M. - 2:30 P.M.
Köln

Technical Sessions (3C)

Chair: Liz Sim (EPCC, The University of Edinburgh)

Comparing One-Sided Communication With MPI, UPC and SHMEM

Christopher M. Maynard (University of Edinburgh)

pdf, pdf

Balancing shared memory and messaging interactions in UPC on the XE6

Ahmad Anbar, Olivier Serres, Asila Wati, Lubomir Riha and Tarek El-Ghazawi (The George Washington University)

Performance of Fortran Coarrays on the Cray XE6

David Henty (EPCC, The University of Edinburgh)

pdf, pdf

Monday 1:00 P.M. - 2:30 P.M.
Bonn

Technical Sessions (3B)

Chair: Rolf Rabenseifner (High Performance Computing Center Stuttgart)

Developing hybrid OpenMP/MPI parallelism for Fluidity-ICOM - next generation geophysical fluid modelling technology

Xiaohu Guo (Science and Technology Facilities Council), Gerard Gorman (Department of Earth Science and Engineering, Imperial College London, London SW7 2AZ, UK) and Andrew Sunderland and Mike Ashworth (Science and Technology Facilities Council)

pdf, pdf

Porting and Optimizing VERTEX-PROMETHEUS on the Cray XE6 at HLRS for Three-Dimensional Simulations of Core-Collapse Supernova Explosions of Massive Stars

Florian Hanke (Max-Planck-Institut fuer Astrophysik), Andreas Marek (Rechenzentrum Garching) and Bernhard Mueller and Hans-Thomas Janka (Max-Planck-Institut fuer Astrophysik)

Monday 1:00 P.M. - 2:30 P.M.
Hamburg

Technical Sessions (3A)

Chair: Liam Forbes (Arctic Region Supercomputing Center)

Cray OS Road Map

Charlie Carroll (Cray Inc.)

pdf, pdf

Reliability and Resiliency of XE6 and XK6 Systems: Trends, Observations, Challenges

Steven J. Johnson (Cray Inc.)

pdf, pdf

Online Diagnostics at Scale

Don Maxwell (Oak Ridge National Laboratory) and Jeff Becklehimer (Cray Inc.)

pdf, pdf

Monday 2:30 P.M. - 3:00 P.M.

Monday 2:30 P.M. - 3:00 P.M.
Maritim Foyer

Break. Bright Computing, Sponsor

Mark Blessing (Bright Computing)

Monday 3:00 P.M. - 4:00 P.M.

Monday 3:00 P.M. - 4:00 P.M.
Köln

Interactive Session (4C)

Chair: Tara Fly (Cray)

Getting Up and Running with Cluster Compatibility Mode

Tara Fly (Cray Inc.)

Monday 3:00 P.M. - 4:00 P.M.
Bonn

Interactive Session (4B)

Chair: Helen He (National Energy Research Scientific Computing Center)

Programming Environments, Applications and Documentation SIG

Helen He (National Energy Research Scientific Computing Center)

Monday 3:00 P.M. - 4:00 P.M.
Hamburg

Interactive Session (4A)

Chair: Nick Cardo (National Energy Research Scientific Computing Center)

Open discussion with CUG Board

Nick Cardo (National Energy Research Scientific Computing Center)

Monday 4:30 P.M. - 10:00 P.M.

Monday 4:30 P.M. - 10:00 P.M.
Cannstatter Wasen

Das Stuttgarter Frühlingsfest. DataDirect Networks, Sponsor

Tuesday 8:30 A.M. - 10:00 A.M.

Tuesday 8:30 A.M. - 10:00 A.M.
Köln / Bonn / Hamburg

General Session (5)

Chair: David Hancock (Indiana University)

Cray Corporate Update

Peter Ungaro (Cray Inc.)

HPC Systems

Peter Ungaro (Cray Inc.)

Storage and Data Management

Barry Bolding (Cray Inc.)

Tuesday 10:00 A.M. - 10:30 A.M.

Tuesday 10:00 A.M. - 10:30 A.M.
Maritim Foyer

Break. Altair Corporation, Sponsor

Mary Bass (Altair Engineering, Inc.)

Tuesday 10:30 A.M. - 12:00 P.M.

Tuesday 10:30 A.M. - 12:00 P.M.
Köln / Bonn / Hamburg

General Session (6)

Chair: David Hancock (Indiana University)

From PetaScale to ExaScale: How to Improve Sustained Performance?

Wolfgang E. Nagel (Technische Universität Dresden)

1 on 100 or More

Peter Ungaro (Cray Inc.)

Tuesday 12:00 P.M. - 1:00 P.M.

Tuesday 12:00 P.M. - 1:00 P.M.
Restaurant Rôtisserie

Lunch. Adaptive Computing, Sponsor

Starla Mehaffey (Adaptive Computing)

Adaptive Computing manages the world’s largest computing installations with its Moab® self-optimizing cloud management and HPC workload management solutions. The patented Moab multi-dimensional intelligence engine delivers policy-based governance, allowing customers to consolidate resources, allocate and manage services, optimize service levels and reduce operational costs. Our leadership in IT decision engine software has been recognized with over 45 patents and over a decade of battle-tested performance resulting in a solid Fortune 500 and Top500 supercomputing customer base.

The Moab intelligence engine is unique in its ability to accelerate and automate both complex IT decisions and processes through multi-dimensional policies. Only Moab can automate decisions and processes across business priorities and SLAs, current and future time horizons, and heterogeneous physical and virtual resources and management tools, as well as many other dimensions. Adaptive Computing’s mission is to bring higher levels of decision, control, and self-optimization to the challenges of deploying and managing large and complex IT environments so they accelerate business performance at a reduced cost.

Customers look to Adaptive Computing to solve today’s complex management problems so they can lower costs, improve efficiency and service levels, and accelerate the IT that powers their business. Adaptive Computing products offer solutions to key challenges including:

• Speeding the delivery of IT services to the business • Improving IT flexibility to meet SLA’s and priorities • Reducing capital costs by maximizing resource utilization • Reducing operating costs by eliminating manual management across heterogeneous IT • Managing IT service and resource usage cost transparency • Reducing instability and disruptive errors in IT services

Adaptive’s Moab products accelerate, automate, and self-optimize IT workloads. Built for high scale, Moab meets the challenges of today’s complex HPC and cloud computing environments. Moab acts as a brain on top of existing infrastructure, enabling computing systems to self-optimize and deliver higher return on investment. The Moab product family includes:

• Moab Cloud Suite for self-optimizing cloud management • Moab HPC Suite for self-optimizing HPC workload management

The company’s global headquarters is in Provo, Utah (USA), with European offices in the United Kingdom and Asia Pacific offices in Singapore. This enables Adaptive Computing to deliver products and solutions to customers around the globe with local region sales as well as consulting and support services to ensure success. The company currently has over 120 employees and has grown steadily every year since inception to meet growing customer demands and needs.

Tuesday 1:00 P.M. - 2:30 P.M.

Tuesday 1:00 P.M. - 2:30 P.M.
Köln

Technical Sessions (7C)

Chair: Mark Fahey (National Institute for Computational Sciences)

Case Studies in Deploying Cluster Compatibility Mode

Tara Fly, David Henseler and John Navitsky (Cray Inc.)

pdf, pdf

Cray Cluster Compatibility Mode on Hopper

Zhengji Zhao, Yun (Helen) He and Katie Antypas (Lawrence Berkeley National Laboratory)

pdf, pdf

My Cray can do that? Supporting Diverse workloads on the Cray XE-6.

Richard S. Canon, Jay Srinivasan and Lavanya Ramakrishnan (Lawrence Berkeley National Laboratory)

pdf, pdf

Tuesday 1:00 P.M. - 2:30 P.M.
Bonn

Technical Sessions (7B)

Chair: Ashley Barker (ORNL)

Accelerated Debugging: Bringing Allinea DDT to OpenACC on the Cray XK6 beyond Petascale

David Lecomber (Allinea Software)

Third Party Tools for Titan

Richard Graham, Oscar Hernandez, Christos Kartsaklis, Joshua Ladd and Jens Domke (Oak Ridge National Laboratory) and Jean-Charles Vasnier, Stephane Bihan and Georges-Emmanuel Moulard (CAPS Enterprise)

pdf, pdf

The Eclipse Parallel Tools Platform: Toward an Integrated Development Environment for Improved Software Engineering on Crays

Jay Alameda and Jeffrey L. Overbey (National Center for Supercomputing Applications/University of Illinois)

pdf, pdf

Tuesday 1:00 P.M. - 2:30 P.M.
Hamburg

Technical Sessions (7A)

Chair: Jason Hill (Oak Ridge National Laboratory)

Cray’s Lustre Support Model and Roadmap

Cory Spitz (Cray Inc.)

pdf, pdf

Lustre Roadmap and Releases

Dan Ferber (Whamcloud)

DDN Exascale Directions And Cray Product/Partnership Update

Keith Miller (DataDirect Networks)

Tuesday 2:30 P.M. - 3:00 P.M.

Tuesday 2:30 P.M. - 3:00 P.M.
Maritim Foyer

Break. Allinea Software, Sponsor

Tuesday 3:00 P.M. - 5:00 P.M.

Tuesday 3:00 P.M. - 5:00 P.M.
Köln

Technical Sessions (8C)

Chair: Liz Sim (EPCC, The University of Edinburgh)

Porting and optimisation of the Met Office Unified Model on PRACE architectures

Pier Luigi Vidale (NCAS-Climate, Dept. of Meteorology, Univ. of Reading. UK), Malcolm Roberts and Matthew Mizielinski (Met Office Hadley Centre, UK), Simon Wilson (Met Office, UK / NERC CMS), Grenville Lister (NERC CMS, Univ. of Reading), Oliver Darbyshire (Met Office, UK) and Tom Edwards (Cray Centre of Excellence for HECToR)

Adaptive and Dynamic Load Balancing for Weather Forecasting Models

Celso L. Mendes (University of Illinois), Eduardo R. Rodrigues (IBM-Research, Brazil), Jairo Panetta (CPTEC/INPE, Brazil) and Laxmikant V. Kale (University of Illinois)

pdf, pdf

Porting the Community Atmosphere Model - Spectral Element Code to Utilize GPU Accelerators

Matthew Norman (Oak Ridge National Laboratory), Jeffrey Larkin (Cray Inc.), Richard Archibald (Oak Ridge National Laboratory), Ilene Carpenter (National Renewable Energy Laboratory), Valentine Anantharaj (Oak Ridge National Laboratory), Paulius Micikevicius (NVIDIA) and Katherine Evans (Oak Ridge National Laboratory)

pdf, pdf

Performance Evaluation and Optimization of the ls1-MarDyn Molecular Dynamics code on the Cray XE6

Christoph Niethammer (High Performance Computing Center Stuttgart)

pdf, pdf

Tuesday 3:00 P.M. - 5:00 P.M.
Bonn

Technical Sessions (8B)

Chair: Rolf Rabenseifner (High Performance Computing Center Stuttgart)

The Cray Programming Environment: Current Status and Future Directions

Luiz DeRose (Cray Inc.)

Cray Performance Measurement and Analysis Tools for the Cray XK System

Heidi Poxon (Cray Inc.)

Cray Scientific Libraries : New Features and Advanced Usage

Adrian Tate (Cray Inc.)

Applying Automated Optimisation Techniques to HPC Applications

Thomas Edwards (Cray Inc.)

Porting and optimising applications to a new processor architecture, a different compiler or the introduction of new features in the software or hardware environment can generate a large number of new parameters that have the potential to affect application performance. Vendors attempt to provide sensible defaults that perform well in general, for example grouping compiler optimisations into flag groupings and setting the default value of environment variables, they are inevitably based on the experience gained or expected behaviour of a normal application. In many cases applications will exhibit some behaviour that differs from the norm, for example requiring identical floating point results when changing MPI decompositions, or sending or receiving messages of unusual or irregular sizes. Manually finding the combination of flags and environment variables that provide optimum performance whilst maintaining a set of application specific criteria can be time consuming and tedious. There are a wide variety of potential algorithms and techniques that can be employed, each with various merits and suitability to the problem of optimising an HPC application. This paper explores, evaluates and compares techniques for automated optimisation HPC application parameters within fixed numbers of iterations.uiring identical floating point results when hanging MPI decompositions, or sending or receiving messages of unusual or irregular sizes. In many cases programmers opt to automate the optimisation process, using the computer to find an optimal solution. There are, however, a wide variety of potential algorithms and techniques that can be employed to perform the search, each with various merits. This paper will explore, evaluate and compare a set of techniques for automated optimisation, focusing specifically properties of HPC applications. Drawing on the author's practical experience with real-world applications the cost in compute resources compared to the runtime improvements gained can be evaluated and considered.

pdf, pdf

Tuesday 3:00 P.M. - 5:00 P.M.
Hamburg

Technical Sessions (8A)

Chair: Tina Butler (National Energy Research Scientific Computing Center)

Xyratex ClusterStor Architecture

Torben Kling Petersen (Xyratex)

Minimizing Lustre ping effects at scale on Cray systems

Cory Spitz, Nic Henke, Doug Petesch and Joe Glenski (Cray Inc.)

pdf, pdf

Cray Sonexion

Hussein Harake (CSCS)

pdf, pdf

A Next-Generation Parallel File System Environment for the OLCF

Galen Shipman, David Dillow, Douglas Fuller, Raghul Gunasekaran, Jason Hill, Youngjae Kim, Sarp Oral, Doug Reitz, James Simmons and Feiyi Wang (Oak Ridge National Laboratory)

pdf, pdf

Tuesday 5:00 P.M. - 5:00 P.M.

Tuesday 5:00 P.M. - 5:00 P.M.
Maritim Foyer

Break

Tuesday 5:00 P.M. - 5:45 P.M.

Tuesday 5:00 P.M. - 5:45 P.M.
Köln

Interactive Session (9C)

Chair: Jim Rogers (Oak Ridge National Laboratory)

(Invitation Only) Methods for Assessing XK6 Accelerator Utilization

Jim Rogers (Oak Ridge National Laboratory)

Tuesday 5:00 P.M. - 5:45 P.M.
Bonn

Interactive Session (9B)

Chair: David Wallace (Cray, Inc.)

Removing Barriers to Application Performance

David Wallace (Cray Inc.)

Tuesday 5:00 P.M. - 5:45 P.M.
Hamburg

Interactive Session (9A)

Chair: Joni Virtanen (CSC - IT Center for Science Ltd)

System Support SIG

Cary Whitney (National Energy Research Scientific Computing Center)

Tuesday 6:30 P.M. - 9:30 P.M.

Cray Social

Wednesday 8:30 A.M. - 10:00 A.M.

Wednesday 8:30 A.M. - 10:00 A.M.
Köln / Bonn / Hamburg

General Session (10)

Chair: Nick Cardo (National Energy Research Scientific Computing Center)

CUG Business

Nick Cardo (National Energy Research Scientific Computing Center)

PRACE for Science and Industry

Richard Kenway (University of Edinburgh)

CUG Business

Nick Cardo (National Energy Research Scientific Computing Center)

Wednesday 10:00 A.M. - 10:30 A.M.

Wednesday 10:00 A.M. - 10:30 A.M.
Maritim Foyer

Break. The Portland Group, Sponsor

Pat Brooks (The Portland Group)

The Portland Group® (a.k.a. PGI®) is a premier supplier of software compilers and tools for parallel computing. PGI's goal is to provide the highest performance, production quality compilers and software development tools.

The Portland Group offers high performance scalar and parallel Fortran, C and C++ compilers and tools for workstations, servers and clusters based on:

• 64-bit x86 (x64) processors from Intel (Intel 64) and AMD (AMD64)

• NVIDIA CUDA-enabled GPGPUs

• Linux, MacOS and Windows operating systems

PGI offers native scalar and parallelizing compiler products for the following high-level languages:

• Fortran 2003, OpenMP 3.0 compliant, GPU-enabled

• ANSI C99 extensions, OpenMP 3.0 compliant, GPU-enabled

• ANSI/ISO C++ , OpenMP 3.0 compliant

PGI Unified Binary™ technology enables applications built with PGI compilers to execute efficiently and produce accurate results on either Intel or AMD CPU-based systems, and to dynamically detect and use NVIDIA GPU accelerators when available.

With uniform features and capabilities across operating systems, PGI products enable application development and optimization on platforms ranging from mobile laptops to the world’s fastest supercomputers.

GPU Programming

---------------

PGI is the only independent supplier of compilers to provide all of the following capabilities for performing optimized integrated native compilation for all x86+NVIDIA accelerator platforms:

• Global optimization, inter-procedural optimization, vectorization, shared-memory parallelization.

• Profile-feedback optimization and heterogeneous parallel code- generation capabilities.

• No external pre-processor dependence.

In addition, the PGI Fortran compiler includes support for CUDA Fortran extensions. Co-defined by NVIDIA and PGI, CUDA Fortran enables explicit GPU accelerator programming through direct control of all aspects of data movement and offloading of compute-intensive functions.

The PGI Fortran and C compilers also include support for the PGI Acclerator programming model, an implicit high-level model where offloading of compute-intensive code regions from a host CPU to an accelerator is accomplished using Fortran directives or C pragmas. The PGI Accelerator programming model includes support for the OpenACC 1.0 standard for directive-based GPU programming. Programs written using directives retain portability to other platforms and other compilers.

PGI Products

------------

• PGI Workstation™ – single-user node-locked license

• PGI Server™ – multi-user network-floating license

• PGI CDK® Cluster Development Kit® – multi-user network-floating license with scalable MPI debugger and profiler

• PGI Visual Fortran® – PGI Fortran integrated with Microsoft Visual Studio; available in single-user, multi-use, and as part of the PGI CDK for Windows.

PGI Tools

---------

In addition to the full suite of parallel language compilers, all PGI products contain the PGDBG ® OpenMP/MPI graphical parallel debugger and the PGPROF ® OpenMP/MPI/GPU performance profiler.

PGI offers the only multi-core x64 parallel compilers, debugger and profiler available with parallelization support integrated directly into the compilers, debugger and profiler. This enables faster development, higher performance and much higher reliability for the programmer.

Further Information

-------------------

PGI offers a unrestricted free trial license. Registration is required. Follow this link to get started now: https://www.pgroup.com/account/register.php.

Wednesday 10:30 A.M. - 12:00 P.M.

Wednesday 10:30 A.M. - 12:00 P.M.
Köln

Technical Sessions (11C)

Chair: Ashley Barker (ORNL)

The PGI Fortran and C99 OpenACC Compilers

Brent Leback, Michael Wolfe and Douglas Miles (The Portland Group)

pdf, pdf

Performance Studies of A Co-Array Fortran Application Versus MPI

Mike Ashworth (Science and Technology Facilities Council)

Tools for Benchmarking, Tracing, and Simulating SHMEM Applications

Mitesh R. Meswani, Laura Carrington and Allan Snavely (San Diego Supercomputer Center) and Stephen Poole (Oak Ridge National Laboratory)

Wednesday 10:30 A.M. - 12:00 P.M.
Bonn

Technical Sessions (11B)

Chair: Helen He (National Energy Research Scientific Computing Center)

Open MPI for Cray XE/XK Systems

Manjunath Gorentla Venkata and Richard L. Graham (Oak Ridge National Laboratory) and Nathan T. Hjelm and Samuel K. Gutierrez (Los Alamos National Laboratory)

pdf, pdf

Early Results from the ACES Interconnection Network Project

Scott Hemmert (Sandia National Laboratories), Duncan Roweth (Cray Inc.) and Richard Barrett (Sandia National Laboratories)

Analyses and Modeling of Applications Used to Demonstrate Sustained Petascale Performance on Blue Waters

Gregory H. Bauer (National Center for Supercomputing Applications), Torsten Hoefler (National Center for Supercomputing Applications/University of Illinois), William Kramer (National Center for Supercomputing Applications) and Robert A. Fiedler (Cray Inc.)

pdf, pdf

Wednesday 10:30 A.M. - 12:00 P.M.
Hamburg

Technical Sessions (11A)

Chair: Jason Hill (Oak Ridge National Laboratory)

Lustre at Petascale: Experiences in Troubleshooting and Upgrading

Matthew A. Ezell (Oak Ridge National Laboratory) and Richard F. Mohr, Ryan Braby and John Wynkoop (National Institute for Computational Sciences)

pdf, pdf

NetApp E-Series Storage Systems: The Lego Approach to HPC Storage

Didier Gava (NetApp, Inc.)

Integrated Simulation of Object-Based File System for High-Performance Computing

Hao Zhang (University of Tennessee) and Haihang You and Mark Fahey (National Institute for Computational Sciences)

Besides requiring significant computational power, a large-scale scientific computing application in high-performance computing (HPC) usually involves large quantity of data. An inappropriate I/O configuration might severely degrade the performance of an application, thereby decreasing the overall user productivity. Moreover, tuning I/O performance of an application on a real file system of a supercomputer can be dangerous, expensive and time-consuming. Even in the application level, an improper I/O configuration might hinder the entire supercomputer. Also, a tuning and testing process always takes a long time and uses considerable computation and storage resources.

In order to allow a user to evaluate the I/O performance of a job before its execution, an integrated simulator is developed in this work to simulate the object-based parallel file system, such as the Lustre file system, along with its workload. Our ultimate objective is to achieve automatically tuning of a job’s I/O configuration in the application level, by running a parameter optimization framework over the file system simulator, in order to provide specific information, such as the number of processors that operate I/O, to a user to improve the I/O performance of the job.

In this work, an integrated object-based parallel file system simulator is implemented, which integrates both an object-based parallel file system simulation (OBPFS) and a virtual client generator (VCG). The OBPFS is designed as a collection of abstract functional models, which work coordinately and concurrently to simulate important behaviors of a real object-based file system. The VCG is developed to continuously provide virtual clients to the OBPFS with a similar pattern to the real-world supercomputer workload. When developing the integrated simulator, we tried to balance realism and simplicity, which allows the simulator to simulate a massive parallel file system with millions of I/O operations from hundreds of clients concurrently, and to get an acceptable simulation result within an acceptable amount of time. We also tried to implement the simulator to be modular, extensible, scalable and portable, to make it not so hard to understand and adapt to simulate other similar systems. Although the proposed simulator is designed based on the architecture of the Lustre file system, it should be applicable to other file systems with similar properties. The experimental result using the proposed simulator is presented in this paper, which is compared with the actual testing result over the Kraken supercomputer, which is a Cray XT5 supercomputer with the Lustre file system.

Wednesday 12:00 P.M. - 1:00 P.M.

Wednesday 12:00 P.M. - 1:00 P.M.
Restaurant Rôtisserie

Lunch. ANSYS, Sponsor

Wim Slagter (ANSYS)

ANSYS brings clarity and insight to customers' most complex design challenges through fast, accurate and reliable engineering simulation. Our technology enables organizations ― no matter their industry ― to predict with confidence that their products will thrive in the real world. Customers trust our software to help ensure product integrity and drive business success through innovation. Founded in 1970, ANSYS employs more than 2,000 professionals, many of them expert in engineering fields such as finite element analysis, computational fluid dynamics, electronics and electromagnetics, and design optimization. ANSYS is passionate about pushing the limits of world-class technology, all so our customers can turn their design concepts into successful, innovative products. ANSYS users today scale their largest simulations across thousands of processing cores, conducting simulations with more than a billion cells. They create incredibly dense meshes, model complex geometries, and consider complicated multiphysics phenomena. ANSYS is committed to delivering HPC performance and capability to take customers to new heights of simulation fidelity, engineering insight and continuous innovation. ANSYS partners with key hardware vendors such as Cray to ensure customers can get the most accurate solution in the fastest amount of time. The collaboration helps customers in all industries navigate the rapidly changing high-performance computing (HPC) landscape. ANSYS HPC products support highly scalable use of HPC - providing virtually unlimited access to HPC capacity for high-fidelity simulation within a workgroup or across a distributed enterprise, using local workstations, department clusters, or enterprise servers, wherever resources and people are located. HPC solutions from ANSYS enable enhanced engineering productivity by accelerating simulation throughput, enabling customers to consider more design ideas and make efficient product development decisions based on enhanced understanding of performance tradeoffs. The ANSYS approach to HPC licensing is cross-physics, providing customers with a single solution that can be leveraged across disciplines. Customers can ‘buy once’ and ‘deploy once’, getting more value from their investment in ANSYS. Our leadership in HPC is a differentiator that will return significant value to customers. Over the years, our steady growth and financial strength reflect our commitment to innovation and R&D. We reinvest 15 percent of our revenues each year into research to continually refine our software. We are listed on the NASDAQ stock exchange. Headquartered south of Pittsburgh, U.S.A., ANSYS has more than 60 strategic sales locations throughout the world with a network of channel partners in 40+ countries. Visit www.ansys.com for more information.

Wednesday 1:00 P.M. - 2:30 P.M.

Wednesday 1:00 P.M. - 2:30 P.M.
Köln

Technical Sessions (12C)

Chair: John Noe (Sandia National Laboratories)

A Heat Re-Use System for the Cray XE6 and Future Systems at PDC, KTH

Gert Svensson (KTH/PDC) and Johan Söderberg (Hifab)

pdf, pdf

Analysis and Optimization of a Molecular Dynamics Code using PAPI and the Vampir Toolchain

Thomas William (Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)) and Robert Henschel and D. K. Berry (Indiana University)

pdf, pdf

Simulating Laser-Plasma Interactions in Experiments at the National Ignition Facility on a Cray XE6

Steven H. Langer, Abhinav Bhatele, G. Todd Gamblin, Charles H. Still, Denise E. Hinkel, Michael E. Kumbera, A. Bruce Langdon and Edward A. Williams (Lawrence Livermore National Laboratory)

pdf, pdf

Wednesday 1:00 P.M. - 2:30 P.M.
Bonn

Technical Sessions (12B)

Chair: Larry Kaplan (Cray Inc.)

The Impact of a Fault Tolerant MPI on Scalable Systems Services and Applications

Richard Graham, Joshua Hursey, Geoffroy Vallee, Thomas Naughton and Swen Bohm (Oak Ridge National Laboratory)

pdf, pdf

Leveraging the Cray Linux Environment Core Specialization Feature to Realize MPI Asynchronous Progress on the Cray XE

Howard Pritchard, Duncan Roweth, David Henseler and Paul Cassella (Cray Inc.)

pdf, pdf

Debugging and Optimizing Scalable Applications on the Cray

Chris Gottbrath (Rogue Wave Software)

pdf, pdf

Wednesday 1:00 P.M. - 2:30 P.M.
Hamburg

Technical Sessions (12A)

Chair: Liam Forbes (Arctic Region Supercomputing Center)

Blue Waters - A Super System for Super Challenges

William Kramer (National Center for Supercomputing Applications/University of Illinois)

Early experiences with the Cray XK6 hybrid CPU and GPU MPP platform

Sadaf Alam, Jeffrey Poznanovic, Ugo Varetto and Nicola Bianchi (Swiss National Supercomputing Centre), Antonio Penya (UJI) and Nina Suvanphim (Cray Inc.)

pdf, pdf

Titan: Early experience with the Cray XK6 at Oak Ridge National Laboratory

Arthur S. Bland, Jack C. Wells, Otis E. Messer, II, Oscar R. Hernandez and James H. Rogers (Oak Ridge National Laboratory)

pdf, pdf

Wednesday 2:30 P.M. - 3:00 P.M.

Wednesday 2:30 P.M. - 3:00 P.M.
Maritim Foyer

Break. Rogue Wave Software, Sponsor

Rogue Wave Software, Inc. is the largest independent provider of cross-platform software development tools and embedded components for the next generation of HPC applications. Rogue Wave products reduce the complexity of prototyping, developing, debugging, and optimizing multi-processor and data-intensive applications. Rogue Wave customers are industry leaders in the Global 2000, ISVs, OEMs, government laboratories and research institutions that leverage computationally-complex and data-intensive applications to enable innovation and outperform competitors. Developing parallel, data-intensive applications is hard. We make it easier.

Many of Cray’s customers utilize TotalView to debug software on their systems, as well as the IMSL Numerical Libraries to implement advanced mathematics and statistics capabilities. TotalView has been used on Cray systems for more than 20 years, has been certified on the latest Gemini™ Interconnect and we work closely to ensure compatibility with each major system revision. Cray’s latest offering, the Cray XK6™, brings the power of NVIDIA processors to bear, with TotalView fully supporting the use of CUDA in these systems. Rogue Wave’s products have demonstrated years of consistent reliability, have been thoroughly tested on Cray equipment, and are fully supported on a worldwide basis.

TotalView® is a highly scalable debugger that provides troubleshooting for a wide variety of applications including: serial, parallel, multi-threaded, multiprocess, and remote applications. A GUI-based source code defect analysis tool for C, C++ and Fortran applications, TotalView gives you unprecedented control over processes and thread execution and visibility into program state and variables. It allows you to debug one or many processes and/or threads with complete control over program execution. You can reproduce and troubleshoot difficult problems that can occur in concurrent programs that take advantage of threads, OpenMP, MPI, or GPUs. TotalView enables efficient debugging of memory errors and leaks and diagnosis of subtle problems like deadlocks and race conditions. It includes sophisticated memory debugging and analysis, reverse debugging and CUDA debugging capabilities.

The IMSL Numerical Libraries are a comprehensive set of mathematical and statistical functions that programmers can embed into their software applications. The libraries can be embedded into C, C# for .NET, Java™ and Fortran applications, and can be used in a broad range of applications -- including programs that help airplanes fly, predict the weather, enable innovative study of the human genome, predict stock market behavior and provide risk management and portfolio optimization.

Wednesday 3:15 P.M. - 10:00 P.M.

Wednesday 3:15 P.M. - 10:00 P.M.
Schloss Solitude

CUG Night Out

Thursday 8:30 A.M. - 10:00 A.M.

Thursday 8:30 A.M. - 10:00 A.M.
Köln

Technical Sessions (13C)

Chair: Liz Sim (EPCC, The University of Edinburgh)

A fully distributed CFD framework for massively parallel systems

Jens Zudrop, Harald Klimach, Manuel Hasert, Kannan Masilamani and Sabine Roller (Applied Supercomputing in Engineering, German Research School for Simulation Sciences GmbH and RWTH Aachen University)

pdf, pdf

Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters

Guochun Shi (National Center for Supercomputing Applications), Steve Gottlieb (Indiana University) and Michael Showerman (National Center for Supercomputing Applications)

pdf, pdf

High-productivity Software Development for Accelerators

Thomas Bradley (NVIDIA)

Thursday 8:30 A.M. - 10:00 A.M.
Bonn

Technical Sessions (13B)

Chair: Tina Butler (National Energy Research Scientific Computing Center)

Expose, Compile, Analyze, Repeat: How to make effective use of Titan without programming in Cuda

Robert M. Whitten (Oak Ridge National Laboratory)

pdf, pdf

Software Usage on Cray Systems across Three Centers (NICS, ORNL and CSCS)

Bilel Hadri and Mark Fahey (National Institute for Computational Sciences), Timothy W. Robinson (Swiss National Supercomputing Centre) and William Renaud (Oak Ridge National Laboratory)

pdf, pdf

Running Large Scale Jobs on a Cray XE6 System

Yun (Helen) He and Katie Antypas (Lawrence Berkeley National Laboratory)

pdf, pdf

Thursday 8:30 A.M. - 10:00 A.M.
Hamburg

Technical Sessions (13A)

Chair: Liam Forbes (Arctic Region Supercomputing Center)

Application Workloads on the Jaguar Cray XT5 System

Wayne Joubert (Oak Ridge National Laboratory) and Shiquan Su (National Institute for Computational Sciences)

Understanding the effects of process placement on application performance on an AMD Interlagos processor

Kalyana Chadalavada and Manisha Gajbe (National Center for Supercomputing Applications/University of Illinois)

PBS Professional 11: A Walkthrough of Architecture Improvements for Cray Users and Administrators

Scott J. Suchyta and Lisa Endrjukaitis (Altair Engineering, Inc.) and Jason Coverston (Cray Inc.)

Thursday 10:00 A.M. - 10:30 A.M.

Thursday 10:00 A.M. - 10:30 A.M.
Maritim Foyer

Break. NVIDIA Corporation, Sponsor

Liza Gabrielson (NVIDIA)

Thursday 10:30 A.M. - 12:00 P.M.

Thursday 10:30 A.M. - 12:00 P.M.
Köln

Technical Sessions (14C)

Chair: Tina Butler (National Energy Research Scientific Computing Center)

Swift - a parallel scripting language for petascale many-task applications

Ketan Maheshwari (Argonne National Laboratory), Mihael Hategan and David Kelly (University of Chicago), Justin Wozniak (Argonne National Laboratory), Jon Monette, Lorenzo Pesce and Daniel Katz (University of Chicago), Michael Wilde (Argonne National Laboratory) and David Strenski and Duncan Roweth (Cray Inc.)

Shared Library Performance on Hopper

Zhengji Zhao (Lawrence Berkeley National Laboratory), Mike Davis (Cray Inc.) and Katie Antypas, Yushu Yao, Rei Lee and Tina Butler (Lawrence Berkeley National Laboratory)

pdf, pdf

The Effects of Compiler Optimizations on Materials Science and Chemistry Applications at NERSC

Megan Bowling, Zhengji Zhao and Jack Deslippe (Lawrence Berkeley National Laboratory)

pdf, pdf

Thursday 10:30 A.M. - 12:00 P.M.
Bonn

Technical Sessions (14B)

Chair: Larry Kaplan (Cray Inc.)

uRiKA: Graph Appliance for Relationship Analytics in Big Data

Amar Shan (Cray Inc.)

Blue Waters Testing Environment

Joseph Muggli, Brett Bode, Torsten Hoefler, William Kramer and Celso L. Mendes (National Center for Supercomputing Applications/University of Illinois)

Acceptance and performance testing are critical elements of providing and optimizing HPC systems for scientific users. This paper will present the design and implementation of the testing harness for the Blue Waters Cray XE6/XK6 being installed at NCSA/University of Illinois. The Blue Waters system will be a leading-edge system in terms of computational power, on- and off-line storage size and performance, external networking performance, and the breadth of software needed to support a diverse NSF user community. Such a large and broad environment must not only be fully validated for system acceptance, but also continually retested over time to avoid regressions in performance following new software installations or hardware failures. This frequency of testing demands an automated means for running the tests and validating the results as well as tracking the results over time.

The INCA testing package was selected as the main framework because it provides much of the desired functionality for a test harness. Some of INCA's featured abilities are the straightforward wrapping of individual tests by researchers who might not be familiar with the harness API, the ability to perform periodic regression testing for monitoring and checking software updates, version control of tests, the hierarchical grouping of individual tests, and a dashboard feature to provide a succinct overview of current acceptance and performance test results.

In addition to describing the testing framework, the paper will also present an overview of the set of software and hardware tests being implemented for Blue Waters. These tests range from core performance (CPU, network, and storage), to the functionality of software layers (standards compliance and interoperability of MPI, OpenMP, Co-array FORTRAN, UPC, etc.), to the functionality of external tools, such as Eclipse, within the user environment. Differing test versions will validate functionality, do full performance characterization, or be suitable for a regression test suite.

The regression test suite will ensure that Blue Waters not only satisfies all of the requirements for acceptance, but also maintains those characteristics throughout it’s production lifetime.

pdf, pdf

Optimizing HPC and IT Efficiency from an ISV Perspective

Wim Slagter (ANSYS)

Thursday 10:30 A.M. - 12:00 P.M.
Hamburg

Technical Sessions (14A)

Chair: Jason Hill (Oak Ridge National Laboratory)

NCRC Grid Allocation Management

Frank Indiviglio and Ron Bewtra (National Oceanic and Atmospheric Administration)

pdf, pdf

Speed Job Completion with Topology-Based Intelligent Scheduling

David Hill (Adaptive Computing)

Practical Support Solutions for a Workflow-Oriented Cray Environment

Adam G. Carlyle, Ross G. Miller, Dustin B. Leverman, William A. Renaud and Don E. Maxwell (Oak Ridge National Laboratory)

pdf, pdf

Thursday 12:00 P.M. - 1:00 P.M.

Thursday 12:00 P.M. - 1:00 P.M.
Restaurant Rôtisserie

Lunch. NetApp Inc, Sponsor

Dennis Watts (NetApp, Inc.)

Thursday 1:00 P.M. - 2:30 P.M.

Thursday 1:00 P.M. - 2:30 P.M.
Köln

Technical Sessions (15C)

Chair: Liam Forbes (Arctic Region Supercomputing Center)

The year in review (in Cray Security)

Wendy L. Palm (Cray Inc.)

Threat Management and Incident Coordination between National Data Centers for Scientific Computing

Urpo Kaila and Joni Virtanen (CSC - IT Center for Science Ltd)

Early Applications Experience on the Cray XK6 at the Oak Ridge Leadership Computing Facility

Arnold Tharrington, Hai Ah Nam, Wayne Joubert, W. Michael Brown and Valentine G. Anantharaj (Oak Ridge National Laboratory)

Thursday 1:00 P.M. - 2:30 P.M.
Bonn

Technical Sessions (15B)

Chair: John Noe (Sandia National Laboratories)

Early Application Experiences with the Intel MIC Architecture in a Cray CX1

R. Glenn Brook, Bilel Hadri, Vincent C. Betro, Ryan C. Hulguin and Ryan Braby (National Institute for Computational Sciences)

pdf, pdf

High-Performance Exact Diagonalization Techniques

Sergei Isakov (ETH Zurich), William Sawyer, Gilles Fourestey and Adrian Tineo (Swiss National Supercomputing Centre) and Matthias Troyer (ETH Zurich)

pdf, pdf

Developing Integrated Data Services for Cray Systems with a Gemini Interconnect

Ron Oldfield (Sandia National Laboratories), Todd Kordenbrock (Hewlett Packard) and Gerald Lofstead (Sandia National Laboratories)

pdf, pdf

Thursday 1:00 P.M. - 2:30 P.M.
Hamburg

Technical Sessions (15A)

Chair: Tina Butler (National Energy Research Scientific Computing Center)

A Single Pane of Glass: Bright Cluster Manager for Cray

Matthijs van Leeuwen and Martijn de Vries (Bright Computing, Inc.)

Real Time Analysis and Event Prediction Engine

Joseph 'Joshi' Fullop, Ana Gainaru and Joel Plutchak (National Center for Supercomputing Applications)

pdf, pdf

Node Health Checker

Kent J. Thomson (Cray Inc.)

pdf, pdf

Thursday 2:30 P.M. - 2:45 P.M.

Thursday 2:30 P.M. - 2:45 P.M.
Maritim Foyer

Break

Thursday 2:45 P.M. - 3:15 P.M.

Thursday 2:45 P.M. - 3:15 P.M.
Köln / Bonn / Hamburg

Closing General Session (16)

Chair: Nick Cardo (National Energy Research Scientific Computing Center)