CUG2012 Final Proceedings

Sunday, April 29th

6:00PM - 7:00PM

Welcome Reception

Maritim Lobby

Special Event

Monday, April 30th

8:00AM - 10:30AM

Tutorial (1A)

Hamburg

Mark Fahey

Application developmen...

Tutorial

Tutorial (1B)

Bonn

Jason Hill

Lustre 2.x Architecture

Tutorial

Tutorial (1C)

Köln

John Noe

Introduction to Debugg...

Tutorial

10:30AM - 11:00AM

Break. Whamcloud, Sponsor

Maritim Foyer

Break

11:00AM - 12:00PM

Opening General Session (2)

Köln / Bonn / Hamburg

Nick Cardo

CUG Welcome

The Future of HPC

Invited Talk

12:00PM - 1:00PM

Lunch. Xyratex Technology L...

Restaurant Rôtisserie

Lunch

1:00PM - 2:30PM

Technical Sessions (3A)

Hamburg

Liam Forbes

Cray OS Road Map

Reliability and Resili...

Online Diagnostics at...

Paper

Technical Sessions (3B)

Bonn

Rolf Rabenseifner

Developing hybrid Open...

Xiaohu Guo (Science and Technology Facilities Council), Gerard Gorman (Department of Earth Science and Engineering, Imperial College London, London SW7 2AZ, UK) and Andrew Sunderland and Mike Ashworth (Science and Technology Facilities Council)

Most modern high performance computing platforms can be described as clusters of multi-core compute nodes. The trend for compute nodes is towards greater numbers of lower power cores, with a decreasing memory to core ratio. This is imposing a strong evolutionary pressure on numerical algorithms and software to efficiently utilise the available memory and network bandwidth.

Unstructured finite elements codes have been effectively parallelised with domain decomposition methods by using libraries such as the Message Passing Interface for a long time. However, there are many algorithmic and implementation optimisation opportunities when threading is used for intra-node parallelisation for the latest multi-core/many-core platforms. For example, reduced memory requirements, cache sharing, reduced number of partitions and less MPI communication. While OpenMP is promoted as being easy to use and allows incremental parallelisation of codes, naive implementations frequently yield poor performance. In practice, as with MPI, equal care and attention should be exercised over algorithm and hardware details when programming with OpenMP.

In this paper, we report progress implementing hybrid OpenMP-MPI for finite element matrix assembly within the unstructured finite element application software named Fluidity. The OpenMP parallel algorithm uses graph colouring to identify independent sets of elements that can be assembled simultaneously with no race conditions. Unstructured finite element codes are well known to be memory bound, therefore, particular attention is paid to ccNUMA architectures where data locality is particularly important to achieve good intra-node scaling characteristics. The profiling and the benchmark results on the latest CRAY platforms show that the best performance can be achieved by pure OpenMP within a node.

Keywords: Fluidity; FEM; OpenMP; MPI; ccNUMA; Graph Colouring;

pdf, pdf

Porting and Optimizing...

Paper

Technical Sessions (3C)

Köln

Liz Sim

Comparing One-Sided Co...

Balancing shared memor...

Performance of Fortran...

Paper

2:30PM - 3:00PM

Break. Bright Computing, Sp...

Maritim Foyer

Break

3:00PM - 4:00PM

Interactive Session (4A)

Hamburg

Nick Cardo

Open discussion with C...

Birds of a Feather

Interactive Session (4B)

Bonn

Helen He

Programming Environmen...

Birds of a Feather

Interactive Session (4C)

Köln

Tara Fly

Getting Up and Running...

Birds of a Feather

4:30PM - 10:00PM

Das Stuttgarter Frühlingsfe...

Cannstatter Wasen

Special Event

Tuesday, May 1st

8:30AM - 10:00AM

General Session (5)

Köln / Bonn / Hamburg

David Hancock

Cray Corporate Update

HPC Systems

Storage and Data Manag...

Invited Talk

10:00AM - 10:30AM

Break. Altair Corporation,...

Maritim Foyer

Break

10:30AM - 12:00PM

General Session (6)

Köln / Bonn / Hamburg

David Hancock

From PetaScale to ExaS...

1 on 100 or More

Invited Talk

12:00PM - 1:00PM

Lunch. Adaptive Computing,...

Restaurant Rôtisserie

Starla Mehaffey (Adaptive Computing)

Adaptive Computing manages the world’s largest computing installations with its Moab® self-optimizing cloud management and HPC workload management solutions. The patented Moab multi-dimensional intelligence engine delivers policy-based governance, allowing customers to consolidate resources, allocate and manage services, optimize service levels and reduce operational costs. Our leadership in IT decision engine software has been recognized with over 45 patents and over a decade of battle-tested performance resulting in a solid Fortune 500 and Top500 supercomputing customer base.

The Moab intelligence engine is unique in its ability to accelerate and automate both complex IT decisions and processes through multi-dimensional policies. Only Moab can automate decisions and processes across business priorities and SLAs, current and future time horizons, and heterogeneous physical and virtual resources and management tools, as well as many other dimensions. Adaptive Computing’s mission is to bring higher levels of decision, control, and self-optimization to the challenges of deploying and managing large and complex IT environments so they accelerate business performance at a reduced cost.

Customers look to Adaptive Computing to solve today’s complex management problems so they can lower costs, improve efficiency and service levels, and accelerate the IT that powers their business. Adaptive Computing products offer solutions to key challenges including:

• Speeding the delivery of IT services to the business • Improving IT flexibility to meet SLA’s and priorities • Reducing capital costs by maximizing resource utilization • Reducing operating costs by eliminating manual management across heterogeneous IT • Managing IT service and resource usage cost transparency • Reducing instability and disruptive errors in IT services

Adaptive’s Moab products accelerate, automate, and self-optimize IT workloads. Built for high scale, Moab meets the challenges of today’s complex HPC and cloud computing environments. Moab acts as a brain on top of existing infrastructure, enabling computing systems to self-optimize and deliver higher return on investment. The Moab product family includes:

• Moab Cloud Suite for self-optimizing cloud management • Moab HPC Suite for self-optimizing HPC workload management

The company’s global headquarters is in Provo, Utah (USA), with European offices in the United Kingdom and Asia Pacific offices in Singapore. This enables Adaptive Computing to deliver products and solutions to customers around the globe with local region sales as well as consulting and support services to ensure success. The company currently has over 120 employees and has grown steadily every year since inception to meet growing customer demands and needs.

Lunch

1:00PM - 2:30PM

Technical Sessions (7A)

Hamburg

Jason Hill

Cray’s Lustre Support...

Lustre Roadmap and Rel...

DDN Exascale Direction...

Paper

Technical Sessions (7B)

Bonn

Ashley Barker

Accelerated Debugging:...

Third Party Tools for...

The Eclipse Parallel T...

Paper

Technical Sessions (7C)

Köln

Mark Fahey

Case Studies in Deploy...

Cray Cluster Compatibi...

My Cray can do that?...

Paper

2:30PM - 3:00PM

Break. Allinea Software, Sp...

Maritim Foyer

Break

3:00PM - 5:00PM

Technical Sessions (8A)

Hamburg

Tina Butler

Xyratex ClusterStor Ar...

Minimizing Lustre ping...

Cray Sonexion

A Next-Generation Para...

Paper

Technical Sessions (8B)

Bonn

Rolf Rabenseifner

The Cray Programming E...

Cray Performance Measu...

Cray Scientific Librar...

Applying Automated Opt...

Thomas Edwards (Cray Inc.)

Porting and optimising applications to a new processor architecture, a different compiler or the introduction of new features in the software or hardware environment can generate a large number of new parameters that have the potential to affect application performance. Vendors attempt to provide sensible defaults that perform well in general, for example grouping compiler optimisations into flag groupings and setting the default value of environment variables, they are inevitably based on the experience gained or expected behaviour of a normal application. In many cases applications will exhibit some behaviour that differs from the norm, for example requiring identical floating point results when changing MPI decompositions, or sending or receiving messages of unusual or irregular sizes. Manually finding the combination of flags and environment variables that provide optimum performance whilst maintaining a set of application specific criteria can be time consuming and tedious. There are a wide variety of potential algorithms and techniques that can be employed, each with various merits and suitability to the problem of optimising an HPC application. This paper explores, evaluates and compares techniques for automated optimisation HPC application parameters within fixed numbers of iterations.uiring identical floating point results when hanging MPI decompositions, or sending or receiving messages of unusual or irregular sizes. In many cases programmers opt to automate the optimisation process, using the computer to find an optimal solution. There are, however, a wide variety of potential algorithms and techniques that can be employed to perform the search, each with various merits. This paper will explore, evaluate and compare a set of techniques for automated optimisation, focusing specifically properties of HPC applications. Drawing on the author's practical experience with real-world applications the cost in compute resources compared to the runtime improvements gained can be evaluated and considered.

pdf, pdf

Paper

Technical Sessions (8C)

Köln

Liz Sim

Porting and optimisati...

Adaptive and Dynamic L...

Porting the Community...

Performance Evaluation...

Paper

5:00PM - 5:00PM

Break

Maritim Foyer

Break

5:00PM - 5:45PM

Interactive Session (9A)

Hamburg

Joni Virtanen

System Support SIG

Birds of a Feather

Interactive Session (9B)

Bonn

David Wallace

Removing Barriers to A...

Birds of a Feather

Interactive Session (9C)

Köln

Jim Rogers

(Invitation Only) Meth...

Birds of a Feather

6:30PM - 9:30PM

Cray Social

Special Event

Wednesday, May 2nd

8:30AM - 10:00AM

General Session (10)

Köln / Bonn / Hamburg

Nick Cardo

CUG Business

PRACE for Science and...

CUG Business

Invited Talk

10:00AM - 10:30AM

Break. The Portland Group,...

Maritim Foyer

Pat Brooks (The Portland Group)

The Portland Group® (a.k.a. PGI®) is a premier supplier of software compilers and tools for parallel computing. PGI's goal is to provide the highest performance, production quality compilers and software development tools.

The Portland Group offers high performance scalar and parallel Fortran, C and C++ compilers and tools for workstations, servers and clusters based on:

• 64-bit x86 (x64) processors from Intel (Intel 64) and AMD (AMD64)

• NVIDIA CUDA-enabled GPGPUs

• Linux, MacOS and Windows operating systems

PGI offers native scalar and parallelizing compiler products for the following high-level languages:

• Fortran 2003, OpenMP 3.0 compliant, GPU-enabled

• ANSI C99 extensions, OpenMP 3.0 compliant, GPU-enabled

• ANSI/ISO C++ , OpenMP 3.0 compliant

PGI Unified Binary™ technology enables applications built with PGI compilers to execute efficiently and produce accurate results on either Intel or AMD CPU-based systems, and to dynamically detect and use NVIDIA GPU accelerators when available.

With uniform features and capabilities across operating systems, PGI products enable application development and optimization on platforms ranging from mobile laptops to the world’s fastest supercomputers.

GPU Programming

---------------

PGI is the only independent supplier of compilers to provide all of the following capabilities for performing optimized integrated native compilation for all x86+NVIDIA accelerator platforms:

• Global optimization, inter-procedural optimization, vectorization, shared-memory parallelization.

• Profile-feedback optimization and heterogeneous parallel code- generation capabilities.

• No external pre-processor dependence.

In addition, the PGI Fortran compiler includes support for CUDA Fortran extensions. Co-defined by NVIDIA and PGI, CUDA Fortran enables explicit GPU accelerator programming through direct control of all aspects of data movement and offloading of compute-intensive functions.

The PGI Fortran and C compilers also include support for the PGI Acclerator programming model, an implicit high-level model where offloading of compute-intensive code regions from a host CPU to an accelerator is accomplished using Fortran directives or C pragmas. The PGI Accelerator programming model includes support for the OpenACC 1.0 standard for directive-based GPU programming. Programs written using directives retain portability to other platforms and other compilers.

PGI Products

------------

• PGI Workstation™ – single-user node-locked license

• PGI Server™ – multi-user network-floating license

• PGI CDK® Cluster Development Kit® – multi-user network-floating license with scalable MPI debugger and profiler

• PGI Visual Fortran® – PGI Fortran integrated with Microsoft Visual Studio; available in single-user, multi-use, and as part of the PGI CDK for Windows.

PGI Tools

---------

In addition to the full suite of parallel language compilers, all PGI products contain the PGDBG ® OpenMP/MPI graphical parallel debugger and the PGPROF ® OpenMP/MPI/GPU performance profiler.

PGI offers the only multi-core x64 parallel compilers, debugger and profiler available with parallelization support integrated directly into the compilers, debugger and profiler. This enables faster development, higher performance and much higher reliability for the programmer.

Further Information

-------------------

PGI offers a unrestricted free trial license. Registration is required. Follow this link to get started now: https://www.pgroup.com/account/register.php.

Break

10:30AM - 12:00PM

Technical Sessions (11A)

Hamburg

Jason Hill

Lustre at Petascale: E...

NetApp E-Series Storag...

Integrated Simulation...

Hao Zhang (University of Tennessee) and Haihang You and Mark Fahey (National Institute for Computational Sciences)

Besides requiring significant computational power, a large-scale scientific computing application in high-performance computing (HPC) usually involves large quantity of data. An inappropriate I/O configuration might severely degrade the performance of an application, thereby decreasing the overall user productivity. Moreover, tuning I/O performance of an application on a real file system of a supercomputer can be dangerous, expensive and time-consuming. Even in the application level, an improper I/O configuration might hinder the entire supercomputer. Also, a tuning and testing process always takes a long time and uses considerable computation and storage resources.

In order to allow a user to evaluate the I/O performance of a job before its execution, an integrated simulator is developed in this work to simulate the object-based parallel file system, such as the Lustre file system, along with its workload. Our ultimate objective is to achieve automatically tuning of a job’s I/O configuration in the application level, by running a parameter optimization framework over the file system simulator, in order to provide specific information, such as the number of processors that operate I/O, to a user to improve the I/O performance of the job.

In this work, an integrated object-based parallel file system simulator is implemented, which integrates both an object-based parallel file system simulation (OBPFS) and a virtual client generator (VCG). The OBPFS is designed as a collection of abstract functional models, which work coordinately and concurrently to simulate important behaviors of a real object-based file system. The VCG is developed to continuously provide virtual clients to the OBPFS with a similar pattern to the real-world supercomputer workload. When developing the integrated simulator, we tried to balance realism and simplicity, which allows the simulator to simulate a massive parallel file system with millions of I/O operations from hundreds of clients concurrently, and to get an acceptable simulation result within an acceptable amount of time. We also tried to implement the simulator to be modular, extensible, scalable and portable, to make it not so hard to understand and adapt to simulate other similar systems. Although the proposed simulator is designed based on the architecture of the Lustre file system, it should be applicable to other file systems with similar properties. The experimental result using the proposed simulator is presented in this paper, which is compared with the actual testing result over the Kraken supercomputer, which is a Cray XT5 supercomputer with the Lustre file system.

Paper

Technical Sessions (11B)

Bonn

Helen He

Open MPI for Cray XE/X...

Early Results from the...

Analyses and Modeling...

Paper

Technical Sessions (11C)

Köln

Ashley Barker

The PGI Fortran and C9...

Performance Studies of...

Tools for Benchmarking...

Paper

12:00PM - 1:00PM

Lunch. ANSYS, Sponsor

Restaurant Rôtisserie

Wim Slagter (ANSYS)

ANSYS brings clarity and insight to customers' most complex design challenges through fast, accurate and reliable engineering simulation. Our technology enables organizations ― no matter their industry ― to predict with confidence that their products will thrive in the real world. Customers trust our software to help ensure product integrity and drive business success through innovation. Founded in 1970, ANSYS employs more than 2,000 professionals, many of them expert in engineering fields such as finite element analysis, computational fluid dynamics, electronics and electromagnetics, and design optimization. ANSYS is passionate about pushing the limits of world-class technology, all so our customers can turn their design concepts into successful, innovative products. ANSYS users today scale their largest simulations across thousands of processing cores, conducting simulations with more than a billion cells. They create incredibly dense meshes, model complex geometries, and consider complicated multiphysics phenomena. ANSYS is committed to delivering HPC performance and capability to take customers to new heights of simulation fidelity, engineering insight and continuous innovation. ANSYS partners with key hardware vendors such as Cray to ensure customers can get the most accurate solution in the fastest amount of time. The collaboration helps customers in all industries navigate the rapidly changing high-performance computing (HPC) landscape. ANSYS HPC products support highly scalable use of HPC - providing virtually unlimited access to HPC capacity for high-fidelity simulation within a workgroup or across a distributed enterprise, using local workstations, department clusters, or enterprise servers, wherever resources and people are located. HPC solutions from ANSYS enable enhanced engineering productivity by accelerating simulation throughput, enabling customers to consider more design ideas and make efficient product development decisions based on enhanced understanding of performance tradeoffs. The ANSYS approach to HPC licensing is cross-physics, providing customers with a single solution that can be leveraged across disciplines. Customers can ‘buy once’ and ‘deploy once’, getting more value from their investment in ANSYS. Our leadership in HPC is a differentiator that will return significant value to customers. Over the years, our steady growth and financial strength reflect our commitment to innovation and R&D. We reinvest 15 percent of our revenues each year into research to continually refine our software. We are listed on the NASDAQ stock exchange. Headquartered south of Pittsburgh, U.S.A., ANSYS has more than 60 strategic sales locations throughout the world with a network of channel partners in 40+ countries. Visit www.ansys.com for more information.

Lunch

1:00PM - 2:30PM

Technical Sessions (12A)

Hamburg

Liam Forbes

Blue Waters - A Super...

Early experiences with...

Titan: Early experien...

Paper

Technical Sessions (12B)

Bonn

Larry Kaplan

The Impact of a Fault...

Leveraging the Cray Li...

Debugging and Optimizi...

Paper

Technical Sessions (12C)

Köln

John Noe

A Heat Re-Use System f...

Analysis and Optimizat...

Simulating Laser-Plasm...

Steven H. Langer, Abhinav Bhatele, G. Todd Gamblin, Charles H. Still, Denise E. Hinkel, Michael E. Kumbera, A. Bruce Langdon and Edward A. Williams (Lawrence Livermore National Laboratory)

The National Ignition Facility (NIF) [1] is a high energy density experimental facility run for the National Nu- clear Security Administration (NNSA) by Lawrence Livermore National Laboratory. NIF houses the world’s most powerful laser. The National Ignition Campaign (NIC) has a goal of using the NIF laser to ignite a fusion target by the end of FY12. Achieving fusion ignition in the laboratory will be a major step towards fusion energy.

NIC is currently considering several possible ignition target designs. The NIF laser can fire a limited number of shots, so simulations play a major roll in selecting the designs to be used in experiments.

The NIF lasers beams reach intensities of 1e15 W/cm**2 in spots. That is high enough that interactions between the laser beams and fluctuations in the density of the ions and electrons may scatter laser light away from the target.

pF3D is a laser-plasma interaction code used to assess proposed experimental designs for expected levels of scattering and to help understand measurements of scattered light in NIF experiments.

NIF experiments have shown that laser-plasma interactions transfer significant amounts of energy between beams and increase the amount of backscattered light relative to what would occur without energy transfer. pF3D has run several simulations with two interacting beams and is starting to run simulations with three interacting beams. These simulations require over 200 billion zones and run for several weeks. The pF3D simulations presented in this paper were run on Cielo, a Cray XE-6 at Los Alamos National Laboratory. These simulations help us understand key experiments currently being carried out with the NIF laser.

This paper reports on several modifications we have made to pF3D in the past year. These changes help pF3D run better on Cielo and they are also a step in preparing for future exascale computers.

pdf, pdf

Paper

2:30PM - 3:00PM

Break. Rogue Wave Software,...

Maritim Foyer

Rogue Wave Software, Inc. is the largest independent provider of cross-platform software development tools and embedded components for the next generation of HPC applications. Rogue Wave products reduce the complexity of prototyping, developing, debugging, and optimizing multi-processor and data-intensive applications. Rogue Wave customers are industry leaders in the Global 2000, ISVs, OEMs, government laboratories and research institutions that leverage computationally-complex and data-intensive applications to enable innovation and outperform competitors. Developing parallel, data-intensive applications is hard. We make it easier.

Many of Cray’s customers utilize TotalView to debug software on their systems, as well as the IMSL Numerical Libraries to implement advanced mathematics and statistics capabilities. TotalView has been used on Cray systems for more than 20 years, has been certified on the latest Gemini™ Interconnect and we work closely to ensure compatibility with each major system revision. Cray’s latest offering, the Cray XK6™, brings the power of NVIDIA processors to bear, with TotalView fully supporting the use of CUDA in these systems. Rogue Wave’s products have demonstrated years of consistent reliability, have been thoroughly tested on Cray equipment, and are fully supported on a worldwide basis.

TotalView® is a highly scalable debugger that provides troubleshooting for a wide variety of applications including: serial, parallel, multi-threaded, multiprocess, and remote applications. A GUI-based source code defect analysis tool for C, C++ and Fortran applications, TotalView gives you unprecedented control over processes and thread execution and visibility into program state and variables. It allows you to debug one or many processes and/or threads with complete control over program execution. You can reproduce and troubleshoot difficult problems that can occur in concurrent programs that take advantage of threads, OpenMP, MPI, or GPUs. TotalView enables efficient debugging of memory errors and leaks and diagnosis of subtle problems like deadlocks and race conditions. It includes sophisticated memory debugging and analysis, reverse debugging and CUDA debugging capabilities.

The IMSL Numerical Libraries are a comprehensive set of mathematical and statistical functions that programmers can embed into their software applications. The libraries can be embedded into C, C# for .NET, Java™ and Fortran applications, and can be used in a broad range of applications -- including programs that help airplanes fly, predict the weather, enable innovative study of the human genome, predict stock market behavior and provide risk management and portfolio optimization.

Break

3:15PM - 10:00PM

CUG Night Out

Schloss Solitude

Special Event

Thursday, May 3rd

8:30AM - 10:00AM

Technical Sessions (13A)

Hamburg

Liam Forbes

Application Workloads...

Understanding the effe...

PBS Professional 11: A...

Paper

Technical Sessions (13B)

Bonn

Tina Butler

Expose, Compile, Analy...

Software Usage on Cray...

Running Large Scale Jo...

Paper

Technical Sessions (13C)

Köln

Liz Sim

A fully distributed CF...

Tuning And Understandi...

High-productivity Soft...

Paper

10:00AM - 10:30AM

Break. NVIDIA Corporation,...

Maritim Foyer

Break

10:30AM - 12:00PM

Technical Sessions (14A)

Hamburg

Jason Hill

NCRC Grid Allocation M...

Speed Job Completion w...

Practical Support Solu...

Adam G. Carlyle, Ross G. Miller, Dustin B. Leverman, William A. Renaud and Don E. Maxwell (Oak Ridge National Laboratory)

The National Climate-Computing Research Center (NCRC), a joint computing center between Oak Ridge National Laboratory (ORNL) and the National Oceanic and Atmospheric Administration (NOAA), employs integrated workflow software and data storage resources to enable production climate simulations on the Cray XT6/XE6 named "Gaea". The use of highly specialized workflow software and a necessary premium on data integrity together create a support environment with unique challenges. This paper details recent support efforts to improve the NCRC end-user experience and to safeguard the corresponding scientific workflow.

Monitoring and reporting of disk usage on Lustre filesystems can be a resource-intensive task, and can affect meta-data performance if not done in a centralized and scalable way. LustreDU is a non-intrusive tool that was developed at ORNL to address this issue by providing an end-user utility that queries a database which is populated daily for reporting disk utilization on directories in the NCRC Lustre file systems.

The NCRC system is housed at ORNL, and has sets of geographically remote end-users at (3) separate sites, with a corresponding support staff team at each location. Conveying system status information to each remote center in a timely manner became important early into the project. The NCRC System Dashboard is a web interface and a set of corresponding system checks created by ORNL support staff to concisely and expediently inform those operational teams remote from the main data center of changes in system status.

Filesystem issues and outages cause disruption to the automated workflow employed by NCRC end-users. Lustre-aware Moab is our response to this issue. By integrating knowledge of the filesystem state into the system's job scheduler, the workflow can be paused when a file system issue is detected. When the issue is resolved, affected jobs can be rerun, effectively rolling back the workflow's progression to a valid state.

pdf, pdf

Paper

Technical Sessions (14B)

Bonn

Larry Kaplan

uRiKA: Graph Applian...

Blue Waters Testing En...

Joseph Muggli, Brett Bode, Torsten Hoefler, William Kramer and Celso L. Mendes (National Center for Supercomputing Applications/University of Illinois)

Acceptance and performance testing are critical elements of providing and optimizing HPC systems for scientific users. This paper will present the design and implementation of the testing harness for the Blue Waters Cray XE6/XK6 being installed at NCSA/University of Illinois. The Blue Waters system will be a leading-edge system in terms of computational power, on- and off-line storage size and performance, external networking performance, and the breadth of software needed to support a diverse NSF user community. Such a large and broad environment must not only be fully validated for system acceptance, but also continually retested over time to avoid regressions in performance following new software installations or hardware failures. This frequency of testing demands an automated means for running the tests and validating the results as well as tracking the results over time.

The INCA testing package was selected as the main framework because it provides much of the desired functionality for a test harness. Some of INCA's featured abilities are the straightforward wrapping of individual tests by researchers who might not be familiar with the harness API, the ability to perform periodic regression testing for monitoring and checking software updates, version control of tests, the hierarchical grouping of individual tests, and a dashboard feature to provide a succinct overview of current acceptance and performance test results.

In addition to describing the testing framework, the paper will also present an overview of the set of software and hardware tests being implemented for Blue Waters. These tests range from core performance (CPU, network, and storage), to the functionality of software layers (standards compliance and interoperability of MPI, OpenMP, Co-array FORTRAN, UPC, etc.), to the functionality of external tools, such as Eclipse, within the user environment. Differing test versions will validate functionality, do full performance characterization, or be suitable for a regression test suite.

The regression test suite will ensure that Blue Waters not only satisfies all of the requirements for acceptance, but also maintains those characteristics throughout it’s production lifetime.

pdf, pdf

Optimizing HPC and IT...

Paper

Technical Sessions (14C)

Köln

Tina Butler

Swift - a parallel scr...

Shared Library Perform...

The Effects of Compile...

Paper

12:00PM - 1:00PM

Lunch. NetApp Inc, Sponsor

Restaurant Rôtisserie

Lunch

1:00PM - 2:30PM

Technical Sessions (15A)

Hamburg

Tina Butler

A Single Pane of Glass...

Real Time Analysis and...

Node Health Checker

Paper

Technical Sessions (15B)

Bonn

John Noe

Early Application Expe...

High-Performance Exact...

Developing Integrated...

Paper

Technical Sessions (15C)

Köln

Liam Forbes

The year in review (in...

Threat Management and...

Early Applications Exp...

Paper

2:30PM - 2:45PM

Break

Maritim Foyer

Break

2:45PM - 3:15PM

Closing General Session (16)

Köln / Bonn / Hamburg

Nick Cardo

Invited Talk