Proceedings

Monday Tuesday Wednesday Thursday Friday
Final Attendee List Go to the pages for Monday-Friday to find papers presented that day.
To search for a paper by title or author, go to the Abstracts page.

Sorted by Day, Session Number and Time

(Revised Thu, Jul 1, 2004)

Day

Number

Time

Author(s)

Paper title

Abstract

Monday

1A

1:00

John Levesque and Jim Schwarzmeier, Cray Inc., and Patrick Worley (ORNL)

Cray X1 Architectural Overview and Optimization

The first half of this tutorial will provide an architectural background on the Cray X1/X1E, and programming in Co-Array Fortran. The second half of the tutorial will cover ORNL's performance tuning approach and scalability case studies including POP, GYRO and CAM.

Monday

1B

1:00

Frithjov Iversen, Cray Inc.

Cray X1 System Performance

This session will target tools and procedures available to X1 system administrators for measuring and improving X1 utilization and performance, based on experience in Cray Software Product Support (SPS). Topics may include monitoring utilities, tunable kernel parameters, workload scheduling issues, and I/O configuration.

Monday

2A

3:15

John Levesque and Jim Schwarzmeier, Cray Inc., and Patrick Worley (ORNL)

Cray X1 Optimization: A Customer's Perspective

The first half of this tutorial will provide an architectural background on the Cray X1/X1E, and programming in Co-Array Fortran. The second half of the tutorial will cover ORNL's performance tuning approach and scalability case studies including POP, GYRO and CAM.

Monday

2B

3:15

Richard Lagerstrom and Mike Karo, Cray Inc.; Bill Nitzberg, Altair/PBSPro
Cray White Paper

Scheduling psched and PBSPro

This tutorial will focus on placement scheduling, including oversubscription and the interaction between psched and PBS Pro. Now that customers have had hands-on experience with the Cray X1 and PBS Pro, this tutorial will discuss different ways of configuring both psched and PBS Pro to achieve the maximum throughput of the system.

Tuesday

3

8:40

Tom Sterling

Keynote Address

Tuesday

3

9:30

Jim Rottsolk, Cray Inc.

Corporate Update: "Jim's Top Ten"

Tuesday

3

10:00

Pete Ungaro, Cray Inc.

HPC Market Update

Tuesday

4

11:00

Dave Kiefer, Cray Inc.

Cray Product Roadmap: Hardware and Software (including X1E)

 

Tuesday

5A

2:00

Patrick Worley (ORNL) and John Levesque, Cray Inc.

The Performance Evolution of the Parallel Ocean Program on the Cray X1

Slides for this presentation.

We describe our experiences in repeated cycles of performance optimization, benchmarking, and performance analysis of the Parallel Ocean Program (POP) on the Cray X1 at Oak Ridge National Laboratory. We discuss the implementation and performance impact of Co-Array Fortran replacements for communication latency-sensitive routines. We also discuss the performance evolution of the system software during 2003-2004, and the impact that this had on POP performance.

Tuesday

5A

2:30

Manojkumar Krishnan, Jarek Nieplocha, and Vinod Tipparaju, PNNL

Exploiting Architectural Support for Shared Memory Communication on the Cray X1 to Optimize Bandwidth Intensive Computations

The Cray X1 supports an extensive set of communication protocols including message passing, shared memory, and remote memory access (RMA). This paper uses an example of the parallel dense matrix multiplication as an example of a bandwidth intensive computational kernel and discusses tradeoffs associated with optimizing performance of this kernel using different communication protocols. We will show how the differences in the architectural support for shared memory communication on the SGI Altix and Cray X1 affect the performance of different strategies for the parallel matrix multiplication algorithm. For example, for a matrix size of 12000 on 128 (MSP) processors, the computational rate of 10GFlop/s per processor was achieved.

Tuesday

5A

3:00

Tuesday

5A

3:00

Jeffrey Vetter (ORNL), P.A. Agarwal, R.A. Alexander, A.S. Bland, E.F. D'Azevedo, J. B. Drake, T.H. Dunigan, M.R. Fahey, R.A. Fahey, A. Geist, R. J. Harrison, A. Mezzacappa, T.C. Schulthess, J.S. Vetter, J.B. White, III, P.H. Worley, T. Zacharia, T.H. Dunning, Jr., J.A. Nichols , (ORNL); R. Kendall, Ames Laboratory; W.D. Gropp, D. Kaushik, R. Stevens, S. Balay, Argonne National Laboratory; J. Colgan, M.S. Pindzola, Auburn University; D. Keyes, Columbia University; T. Packwood, Cray Inc.; B. Gorda, L. Oliker, H. Simon, Lawrence Berkeley National Laboratory; E. Apra, M. Gordon, M. Krishnakumar, J. Nieplocha, T.L. Windus, Pacific Northwest National Laboratory; J. Dongarra, P. Luszczek, University of Tennessee

ORNL Cray X1 Evaluation Status Report

Over the past year, DOE has been evaluating the Cray X1 at ORNL's Center for Computational Sciences. DOE’s investigation included microbenchmarks, kernels, and applications from areas such as climate, fusion, materials, astrophysics, and biology. In this overview talk, we will share some of our performance results for and experiences with the Cray X1.

Tuesday

5B

2:00

Gail Alverson, Cray Inc.

Cray RS Programming Environment

Red Storm is a massively parallel supercomputer being developed for Sandia National Laboratories, with over 10,000 AMD OpteronTM processors connected by an innovative high speed, high bandwidth 3D mesh interconnect designed by Cray. Cray is developing a product based on the Red Storm architecture and software, called Cray RS. This talk describes the programming environment available on Cray Red Storm and Cray RS. It will outline the compilers, libraries, tools, module support, and launch mechanism that form the integrated environment for the applications developer.

Tuesday

5B

2:30

Janet Lebens and Bryan Hardy, Cray Inc.

UNICOS/mp Common Criteria Evaluation

Slides for this presentation

UNICOS/mp is undergoing a Common Criteria evaluation. The U.S. NIAP (National Information Assurance Partnership) will oversee and approve the certification. This paper will give some background of the Common Criteria and NIAP, and give a status of the UNICOS/mp evaluation.

Tuesday

5B

3:00

Suzanne Kelly (SNLA)

A Use Case Model for RAS in an MPP Environment

Slides for this presentation.

A use case model is an effective way of specifying how Reliability, Availability, and Serviceability (RAS) features would be employed in an operational Massively Parallel Processors (MPP) system. As part of a research project on RAS for MPPs, one such model was developed. A brief introduction to the use case technique will be followed by a discussion of the developed model.

Tuesday

6A-C

Special Interest Group (SIG) Meetings—open to all

These meetings provide a forum to become acquainted with other CUG sites, which share your interests, to discuss the important issues and areas of special interest. Liaisons from Cray Inc. will be available at each SIG to help address questions and to foster communications between the SIGs and Cray. You are encouraged to attend any of these open meetings.

Tuesday

6A

4:00

Virginia Bedford (ARSC) and Betty Day (DOD), Chair

Systems and Integration SIG

Tuesday

6A

4:45

Jim Glidewell (BCS)

Operating Systems SIG

Tuesday

6B

4:00

Mark Fahey (ORNL) and Chris Catherasoo (JPL), Chair

Applications SIG

Tuesday

6B

4:45

David Gigrich (BCS)

Programming Environment SIG

Tuesday

6C

4:00

Scott Kneller (DOD) and Kenneth Matney, Sr. (ORNL), Chair

Operations SIG

Tuesday

6C

4:45

Mike Pettipher (MCC) and Betty Day (DOD), Chair

User Services SIG

Wednesday

7

7:15

James Schwarzmeier, Cray Inc.

Cray Workshop-Cray X1 Basic Optimization Techniques

Wednesday

8

8:30

Thomas Zacharia (ORNL)

Keynote Address

Wednesday

8

9:15

Paul Terry, Cray Inc.

Cray XD1 Product (OctigaBay)

This talk will provide a brief overview of the new Cray XD1 product.

Wednesday

8

9:45

Nathan Wichman, Cray Inc.

Corporate Comparisons using HPCC

This talk delves into details of the HPCC benchmark suite and the Cray X1 performance results.

Wednesday

9

11:00

Barbara Horner-Miller (ARSC)

CUG Business, Advisory Council Introduction, Elections, and Review of On Line Tech Forum

Wednesday

9

11:30

Jim Glidewell (BCS), Liam Forbes (ARSC), Arthur Bland (ORNL), and Michael Pettipher (MCC)

CSA (Cray Standard Accounting) Is No More - Panel Discussion

The resource usage tool, Cray Standard Accounting (CSA), that we've all come to know and love is not available on the Cray X1. Several CUG member sites (The Boeing Company, Arctic Region Supercomputing Center, Oak Ridge National Laboratory, etc.) have agreed to participate in this panel session and will discuss the various options available to them in replacing CSA. Overviews of the different resource usage tracking and billing methods utilized by these sites on the Cray X1 will be presented. Issues and problems encountered will also be discussed, along with future direction. If you would like to join this panel, please contact Jim Glidewell or David Gigrich prior to the day of the session.

Wednesday

10A

2:00

Edward Anderson, Maynard Brandt, and Chao Yang, Cray Inc.

LINPACK Benchmark Optimizations on a Virtual Processor Grid

Slides for this presentation.

The "massively parallel" LINPACK benchmark is a familiar measure of high-performance computing performance and a useful check of the scalability of multi-processor computing systems. The Cray X1 performs well on this benchmark, achieving in excess of 90% of peak performance per processor and near linear scalability. This talk will outline Cray's implementation of the LINPACK benchmark code and show how it leads to computational kernels that are easier to optimize than those of ScaLAPACK or the High Performance LINPACK code, HPL. Specific areas in which the algorithm has been tuned for the Cray X1 are in communicating across rows or down columns, in interchanging rows to implement partial pivoting, and in the matrix multiply kernel. The Cray LINPACK benchmark code has also been adapted to run on a virtual processor grid, that is, a p-by-q grid where p times q is a multiple of the number of processors. This feature improves the performance of row or column operations when the number of processors does not factorize neatly. The same technique could be applied to any 2-D grid decomposition in which there are many row and column blocks per processor.

Wednesday

10A

2:45

Nathan Wichmann, Cray Inc.

HPCC Optimizations and Results for the Cray X1

Slides for this presentation.

A new benchmark call HPCC has recently been proposed to evaluate High Performance systems. Cray will discuss the porting and optimization of this new benchmark and show results. The presentation will conclude by comparing the Cray X1 to the competition using HPCC.

Wednesday

10B

2:00

Amar Shan, Cray Inc.

The Cray XD1 Technical Overview

This will provide a detailed technical overview of the new Cray XD1 product.

Wednesday

10B

2:45

Peggy Gazzola, Cray Inc.

Cray X1 System Administration--UNICOS/mp 2.4 Update

This session will cover general Cray X1 administration. Discussion topics will include system configuration, system monitoring, and problem diagnosis.

Wednesday

11A

4:00

Joe Swartz (CSCF), Marianne Spurrier, LMCO, and Bruce Black, Cray Inc.

Solution of Out-of-Core Lower-Upper Decomposition for Complex Valued Matrices

Slides for this presentation.

Out-of-core complex valued matrix lower-upper decomposition and solution software is developed for the Cray X1 using CAF. Asynchronous I/O and two different disk storage schemes are evaluated in this work.

Wednesday

11A

4:30

Michael Pettipher (MCC)

PARAFEM—Performance of a Suite of Finite Element Analysis Codes on the Cray X1

Slides for this presentation

A suite of finite element analysis codes developed at the University of Manchester have been parallelized and now are being ported to the Cray X1. This talk will report on the performance of these codes and on any porting issues moving from scalar to vector architecture.

Wednesday

11B

4:00

Jim Glidewell and Barry Sharp (BCS)

StorNext SAN on an X1

Slides for this presentation.

The Boeing Company is one of the first commercial sites to own and operate a Cray X1 computer. Once Boeing's Cray T916 and T932 computers are phased out by year end the DMF system Boeing has come to depend on will no longer be available. This talk will cover the steps found necessary to put the ADIC/StorNext product into production for seven systems; Cray X1, CPES, JPES (Java Progamming Environment), SGI Origin, and three Linux Networx PC Clusters along with the problems encountered, and user interface changes. The advantages and shortcomings of ADIC/StorNext compared to DMF will be addressed, as well as its overall performance, reliability, operations and a brief overview of transitioning the T90/DMF data to StorNext's management.

Wednesday

11B

4:30

Liam Forbes and Jan Julian (ARSC)

An Initial Foray Into Configuring Resources and Reporting Utilization on the Cray X1

Slides for this presentation.

This paper & presentation will cover ARSC's initial (and ongoing) experiences configuring the Cray X1 resource managers, psched & PBS. It will also share our efforts to report resource utilization based upon the accounting mechanisms used by those managers. Our goal is to share what we've already learned and solicit ideas & information from other X1 sites.

Wednesday

12A

5:00

John Levesque and Nathan Wichman, Cray Inc.

Workshop-Cray X1 Scaling and Performance

 

Wednesday

12B

5:00

 TBD

BOF

 Open for a BOF

Wednesday

12C

5:00

 TBD

BOF

 Open for a BOF

Thursday

13A

8:30

Stéphane Ethier (ORNL)

Performance Study of the 3D Particle-in-Cell Code GTC on the Cray X1 and Earth Simulator Computers

Slides for this presentation

The particle-in-cell algorithm has always been a challenge for vector computers because of its intrinsic gather-scatter operations. Having been written and optimized for super-scalar architecture, PPPL's Gyrokinetic Toroidal Code was recently ported to the Cray X1 and the Earth Simulator, requiring some code modifications that substantially increased its memory footprint. In this talk, optimization and performance studies will be presented, including some interesting observations related to the architectural differences between the X1 and the Earth Simulator.

Thursday

13A

9:15

Marek Niezgodka, L. Bolikowski, F. Rakowski, W. Rudnicki, and P. Bala (WARSAWU)

Optimization of the Selected Quantum Chemistry Codes on the Cray X1

Slides for this presentation.

In this paper we present optimization of the selected Quantum Chemistry application on the Cray X1. In particular, we have optimized combined density functional molecular dynamics code used for the investigation of the properties of molecular and biomolecular systems. The density functional part, namely Tight Binding Density functional code, has been successfully vectorized, as well as standard molecular dynamics part. In result, good performance has been achieved, which allowed us to start investigations of complex molecular systems. The molecular dynamics code GROMOS used by us in this study has been also parallelized using Co-Array Fortran extensions and its performance will be reported.

Thursday

13A

10:00

Eric Stahlberg and Sankaraganesh Manikantan (OSC); Malali Gowda and Guo-liang Wang, Ohio State University; Jeff Doak, Cray, Inc.

High Performance Genome Scale Comparisions for the SAGE Method Utilizing Cray Bioinformatics Library (CBL) Primitives

Slides for this presentation.

SAGE analysis requires comparing very large numbers of relatively small sequences against many larger sequences varying greatly in size. Additionally, both exact and nearly exact comparisons are of significant interest. A general application has been developed and optimized on the Cray SV1 and X1 employing the CBL and PCBL libraries to enable cross-platform portability while maintaining development and platform execution efficiency. Analysis times have been successively reduced from multiple days to a matter of minutes with appropriate combinations of algorithm choice, method invocation, memory utilization and parallelization.

Thursday

13B

8:30

John Drake and Pat  Worley (ORNL) and  Matthew Cordery and Ilene Carpenter, Cray Inc.

Experience with the Full CCSM

Slides for this presentation.

We present our experiences and initial performance of the Community Climate System Model (CCSM3) on the Cray X1. This is the primary model for global climate simulation in the US and is being run on a variety of systems for the current IPCC project. The CCSM is the result of a community modeling effort coordinated by NCAR and has demonstrated performance portability across vector and cache based parallel architectures. The application is composed of five executables: a sea ice model (CSIM), a land model (CLM), an ocean model (POP), an atmospheric model (CAM) and a flux coupler (cpl6). Each model communicates with the coupler using MPI and is also parallelized with MPI.

Thursday

13B

9:15

Howard Pritchard, Jeff Nicholson, and Jim Schwarzmeier, Cray Inc.

Optimizing MPI Collectives for X1

Slides for this presentation

Traditionally MPI collective operations have been based on point-to-point messages, with possible optimizations for system topologies and multiple communication protocols. The Cray X1 scatter/gather hardware and shared memory mapping features allow for significantly different approaches to MPI collectives leading to substantial performance gains over standard methods, especially for short message lengths and higher process counts. This paper describes some of the algorithms used, implementation features, and relevant performance data.

Thursday

13B

10:00

Rolf Rabenseifner and Panagiotis Adamidis (HLRS)

Collective Reduction Operation on Cray X1 and Other Platforms

Slides for this presentation.

A 5-year-profiling in production mode has shown that more than 40% of the execution time of Message Passing Interface (MPI) routines was spent in the collective communication routines MPI_Allreduce and MPI_Reduce. Although MPI implementations have been available for about 10 years and all vendors are committed to this Message Passing Interface standard, the vendors’ and publicly available reduction algorithms could be accelerated with new algorithms by a factor between 3 (IBM, sum) and 100 (Cray T3E, maxloc) for long vectors. This paper presents new algorithms optimized for different choices of vector size and number of processes. The focus is on bandwidth dominated protocols for power-of-two and non-power-of-two number of processes, optimizing the load balance in communication and computation.

Thursday

14A

11:00

Vinod Tipparaju, Manojkumar Krishnan, Bruce Palmer, and Jarek Nieplocha, PNNL

Optimizing Performance of Global Arrays Toolkit on the Cray X1

Slides for this presentation.

 The Cray X1 represents a shared-memory architecture; however, most programming models supported by Cray present it to application developers as a distributed memory or a global address space system. The Global Arrays (GA) toolkit supports a portable shared memory programming environment. With GA, the programmer can view a distributed data structure as a single object and access it as if it resided in shared memory. This approach helps to raise the level of abstraction and program composition as comparing to the programming models with fragmented memory view (e.g., MPI, Co-Array Fortran, SHMEM, and UPC). In addition to other application areas, GA is a widely used programming model in computational chemistry. We will describe how the GA is implemented on the Cray X1, and how the performance of its basic communication primitives was optimized based on the shared memory access capabilities of the hardware.

Thursday

14A

11:30

Adrian Tate (MCC)

Cray X1 ScaLAPACK Optimization

Slides for this presentation.

Cray Inc. and the University of Manchester continue to collaborate on the optimization and development of ScaLAPACK for use on the Cray X1 and its successors. This project has produced a partially optimized BLACS library that uses Co-array Fortran and Shmem instead of MPI. Whilst there are some immediate benefits in making these internal replacements, Cray plans to produce a full Co-array Fortran ScaLAPACK library in the future, for which there are considerable difficulties and concerns. In this paper I will discuss the implementation of the immediate changes to the BLACS, give an explanation of the longer-term strategy and give the theory behind some of those concerns.

Thursday

14A

12:00

Thursday

14B

11:00

Thomas Maier, James B. White III (Trey), and Thomas Schulthess (ORNL)

Towards Full Simulations of High-Temperature Superconductors

Slides for this presentation

The Cray X1 in the Center for Computational Sciences at Oak Ridge National Laboratory is enabling significant new science in the simulation of high-temperature "cuprate" superconductors. We will describe the method of dynamic cluster approximation with quantum Monte Carlo, along with its computational requirements. We will then show the unique capabilities of the X1 for supporting this method, porting experiences, performance, and the resulting new scientific results.

Thursday

14B

11:30

Mark Fahey (ORNL) and Jeff Candy, General Atomics

GYRO—Analyzing New Physics in Record Time

Slides for this presentation

GYRO solves the 5-dimensional gyrokinetic-Maxwell (GKM) equations in shaped plasma geometry, using either a local or global radial domain. It has been ported to a variety of modern MPP platforms including a number of commodity clusters, IBM SPs and recently the Cray X1. We have been able to quickly design and analyze new physics scenarios in record time: (i) transport barrier studies (submitted to Phys. Plasmas), (ii) local versus global code comparison (submitted as Phys. Plasmas letter), (iii) kinetic electron and finite-beta generalizations of a community-wide benchmark case (currently underway). Current numerical work involves solving a precision problem that only occurs on the X1, re-coding a collision routine to vectorize across tridiagonal solves, and a new parallel implementation for the field solves which currently replicate work rather than fully distribute it.

Thursday

14B

12:00

Michael Pindzola, F.J. Robicheaux, S.D. Loch, and U. Kleiman, Auburn University; D.R. Schultz, T. Minami, and T-G. Lee, (ORNL) D.C. Griffin and C.P. Ballance, Rollins College; J.P. Colgan and C.J. Fontes, (LANL); N.R. Badnell, A.D. Whiteford, M.G. O'Mullane, and H.P. Summers, University of Strathclyde; K. Berrington, Sheffield Hallam University

Computational Atomic and Molecular Physics

A consortium has been formed to study the many-body quantal dynamics of atoms and molecules utilizing the most advanced computing platforms. Advances in understanding are applied to many general science areas including: controlled fusion, atmospheric chemistry, cold atom condensate dynamics, laser interactions with matter, and observational astrophysics. Beginning Fall of 2003, several of our AMO research computer codes have been tested on the Cray X1 at the ORNL/CCS. The consortium is supported by grants from the US Department of Energy, the US National Science Foundation, the US National Aeronautics and Space Administration, the UK Engineering and Physical Sciences Research Council, and the UK Particle Physics and Astronomy Research Council. Our AMO research computer codes are run on massively parallel machines at DOE/NERSC, ORNL/CCS, Daresbury/CSE, NSF/SDSC, and LANL.

Thursday

14C

11:00

Thursday

15A

2:00

Scott Parker (UIUCNCSA), Sarah Anderson and Dave Strenski, Cray Inc.

Evaluation of the sPPM Benchmark on the Cray X1

Slides for this presentation.

The sPPM benchmark code solves a 3D gas dynamics problem on a uniform Cartesian mesh using a simplified version of the PPM (Piecewise Parabolic Method) code. It utilizes domain decomposition with message passing for distributed parallelism and may also simultaneously exploit threading for multiprocessing shared memory parallelism. This talk will demonstrate how sPPM was ported and optimized on the Cray X1 and achieved 5.9 Gflops on a single MSP and 663 Gflops on 120 MSPs. We'll discuss the relative performance advantages involved in running sPPM and similar codes in possible combinations of the SSP, MSP, OpenMP and or MPI programming models, and make comparisons with other computer system platforms and architectures where possible.

Thursday

15A

2:30

James B. White III (Trey) (ORNL)

Dangerously Clever X1 Application Tricks

Slides for this presentation.

I describe some optimization techniques on the Cray X1 that are either profoundly unportable or counterintuitive. For example, you can use small, static co-arrays, Cray pointers, and the "volatile" attribute to pass arbitrary high-bandwidth, minimal-latency messages with no procedure-call overhead. Also, it may be advantageous to bring "if" statements inside "do" loops for vectorization. This talk will show how and why.

Thursday

15A

3:00

Peter Cebull (INEEL)

Experiences in the Performance Analysis and Optimization of a Deterministic Radiation Transport Code on the Cray SV1

Slides for this presentation.

A deterministic radiation transport code that solves the Boltzmann transport equation on three-dimensional tetrahedral meshes was ported to the Cray SV1. Cray's performance analysis tools were used to pinpoint the most significant areas of code, which were then modified to allow full vectorization and improve performance. Timing results are given, including those for some non-vector platforms, and parallel scalability of the OpenMP version of the code is also discussed.

Thursday

15B

2:00

Thomas Baring (ARSC)

Cray X1 Porting Experiences

Slides for this presentation.

ARSC staff and users have ported a number of codes, both parallel and serial, to a 128 processor Cray X1 that was put into full production at ARSC on November 4, 2003. This paper presents some of the challenges and rewards from these porting experiences, and will describe specific optimization techniques attempted and performance realized.

Thursday

15B

2:30

Paul Burton, CGAM and Bob Carruthers, Cray Inc.

Porting the UK Met Office's Unified Model to the Cray X1

Slides for this presentation.

The UK Met Office's Unified Model (UM) is one of the world's leading weather forecasting and climate prediction models. It is used by the Met Office's Hadley Centre and a large group of academic researchers both within and outside the UK for prediction and research into climate change. The Centre for Global Atmospheric Modeling (CGAM) carries out core-strategic research and provides support for the UK's academic community in large-scale atmospheric and coupled modeling. CGAM is a major user of the UK academic community's HPC systems, and as such is closely involved with the selection and evaluation of future systems. In this paper we will present a port of the latest generation of the UM to the Cray X1 and compare and contrast the performance with a number of other platforms, including the NEC SX6 as used by the Met Office's Hadley Centre and various HPC machines currently available to the UK academic community, such as the 1280 CPU IBM p690 system "HPCx."

Thursday

15B

3:00

Vince Wayland (ORNL)

Porting and Performance of PCM on the Cray X1
Slides for this presentation

This talk will describe the experience of porting PCM to the Cray X1. Its performance on the Cray X1 compared to other platforms will be discussed. PCM is the Parallel Climate Model and has been ported and run extensively on most high performance computing platforms.

Thursday

16A

4:00

Sam Cable and Thomas Oppe (CSC-VBURG)

Optimization of LESlie3D for the Cray X1 Architecture

Slides for this presentation.

LESlie3D is a Computational Fluid Dynamics (CFD) code that solves the fully compressible Filtered Navier-Stokes equations, the energy equation, and the chemical species equations using an explicit finite-volume approach. A version of this code for structured meshes was included in the Department of Defense High Performance Computing Modernization Program’s (HPCMP) Technology Insertion 2003 (TI-03) benchmark suite used for performance evaluation of HPC platforms. This version was optimized by Cray applications specialists for the X1 platform. This paper presents the techniques used for the optimization, as well as a comparison of the optimized and unoptimized versions of LESlie3D made using counter data obtained from CrayPAT on the X1 and from the Performance Application Program Interface (PAPI) on an IBM Power4.

Thursday

16A

4:30

Alexander Akkerman (FORD), Dimitri Nicolopoulos, MCube; Herv Chevanne and Dave Strenski, Cray Inc.

Performance Evaluation of Radioss-CFD on the Cray X1

Slides for this presentation.

Aero-acoustics is the study of noise generated by a moving fluid, such as air or water, often interacting with structures. MCube began working with Cray in 2000 to look at a key automotive problem, the noise created by side-view mirrors. Since then they have investigated more complex engineering problems, including exhaust pipes, intakes, HVAC systems and centrifugal and axial fans. This talk will present results from the latest efforts from MCube and Cray in porting and optimizing Radioss-CFD to the Cray X1.

Thursday

16B

4:00

Jim Glidewell (BCS)

Early X1 Experiences at Boeing

Slides for this presentation.

A description of the challenges, issues, and successes we have experienced during delivery and first few months of production on a Cray X1 in a commercial datacenter environment.

Thursday

16B

4:30

Jan Julian and Liam Forbes (ARSC)

Backup and Recovery of the X1 Complex Internal Machines

Slides for this presentation.

A classic self imposed destruction of the CPES primary disk leads to a review and actions to provide recovery processes for CPES, CWS, and CNS support processors which will be described in this paper. Lessons learned in the practice recovery of the CPES, CWS, and CNS disks will also be discussed. Brief conclusions will be drawn on the undesirability of maintaining the Support Processors as "black boxes."

Thursday

16C

4:00

James B. White III (Trey) and Forrest Hoffman (ORNL); Mariana Vertenstein, NCAR

Adventures in Vectorizing the Community Land Model

Slides for this presentation.

We describe the extensive efforts to modify the Community Land Model for vectorization, following the experimental results described at CUG 2003. We cover both the technical details of the old and new software structures and the sociological details of negotiating allowable modifications with application scientists. We present before and after results for various systems, including the Cray X1 and the Earth Simulator.

Thursday

16C

4:30

George L. Mesina and Peter Cebull (INEEL)

Extreme Vectorization in RELAP5-3D

Slides for this presentation.

RELAP5-3D is the world's most highly used software program for nuclear power plant safety analysis. Performance measurements on a Cray SV1 showed that good gains could be made by vectorization of two subroutines, each of which does most of its calculations in a loop of over 1400 lines of code. Moreover, inlining the many subroutines called from within these loops effectively increases the number of lines to approximately 10,000. Many separate techniques were used to vectorize these loops and the Megaflop rate of these subroutines increased by up to a factor of 14.

Thursday

17A

5:00

Luiz DeRose, Cray Inc.

Workshop-Cray Tools: Current Status and Future Plans

 

Thursday

17B

5:00

 TBD

BOF

  Open for a BOF

Thursday

17C

5:00

 TBD

BOF

  Open for a BOF

Friday

18A

8:30

Bill Long, Cray Inc.

Fortran 2003

Slides for this presentation.

The final draft of Fortran 2003 will be completed just before CUG 2004. This major revision of the Fortran language includes support for interoperability with C, object oriented programming, IEEE arithmetic management, and new I/O features. These new features will be summarized along with Cray's implementation plans for the compiler supporting the X1 and follow-on systems. The next revision of Fortran after 2004 has already started. Some of the features being discussed will be previewed.

Friday

18A

9:15

Luiz DeRose, Steve Kaufmann, and Bill Homer, Cray Inc.

CrayPat Update

The Cray Performance Analysis Tool, CrayPat, can be used to analyze and evaluate the performance of applications running on the Cray X1 system. By applying CrayPat to an application using a systematic approach, overall performance can be measured, and performance bottlenecks can be identified and isolated. Topics include examples of use, recently added features, and plans for future development, including ease of use.

Friday

18A

10:00

Phillip Merkey (MTU) and Dave Strenski, Cray Inc.

UPC—From Beowulfs to the X1

Slides for this presentation.

Unified Parallel C was developed on/for the T3E and is now available on the full spectrum of high performance machines from Beowulf clusters to the X1. This talk will address the issues associated with developing a programming model that runs efficiently on platforms with widely varying characteristics. It will also address some of the payoffs, including the ability to use UPC in the classroom and provide students with hands on experience in High Performance Computing.

Friday

18B

8:30

James Galbraith and Eric Greenwade (INEEL)

Visualization of a Deterministic Radiation Transport Model Using Standard Visualization Tools

Slides for this presentation.

Output from a deterministic radiation transport model running on a Cray SV1 was imported into a standard distributed, parallel, visualization tool for analysis. Standard output files, consisting of tetrahedral meshes, were imported into the visualization tool through the creation of a plug-in module specific to this application. Visualization samples are included, providing visualization of steady state results as well as animations of convergence. Different plot types, operators, and other features are utilized to enhance the analysis and assist in reporting the results of the analysis.

Friday

18B

9:15

Hongzhang Shan (NERSC) and Erich Strohmaier and Lenoid Oliker, LBL

Optimizing Performance of Superscalar Codes For a Single Cray X1 MSP Processor

Slides for this presentation.

The scientific high-performance computing community currently has many applications optimized for execution on systems with superscalar processors. To achieve a high percentage of peak performance on the Cray X1 MSP vector processor however, codes have to be structured to enable both vectorized and multi-streamed execution. In this paper we investigate how codes optimized for superscalar processors perform on the Cray X1. We also determine the effort required to optimize their performance.

For this study we select four applications from the SPLASH2 application suite (1-D FFT, Radix, Ocean, and Nbody), two kernels from the NAS benchmark suite (3-D FFT and CG), and a matrix-matrix multiplication kernel. We find that most codes do not performance well on the X1 without optimizations. In some cases, using compiler directives significantly improves the achieved performance. However, most of the codes have to be restructured at the source code level to obtain longer vectors. Sometimes even the choice of algorithm used for the code has to be reconsidered. We have also found that the memory bank conflicts may often cause substantial performance loss. Memory bank conflicts happen quite often during the optimization activity of increasing the vector length and have to be avoided explicitly. Finally, we are currently investigating how different strategies for using the cache (a unique design of vector architecture on Cray X1) affects performance.

Friday

18B

10:00

Wendell Anderson and Marco Lanzagorta (NRL)

Effect of Tripling the Memory Bandwidth on the Cray MTA-2

Slides for this presentation.

In October 2002, the Naval Research Laboratory installed and began operating a 40-processor Cray Multi-Threaded Architecture (MTA-2) computer. In September 2003, a top plane was added to the MTA increasing the available memory bandwidth from 11 Gigabytes per second to almost 32 Gigabytes per second. At the same time, the basic clock rate was increased from 200 MHz. to 220 MHz. In order to evaluate the actual performance improvement obtained from the addition of the top plane on the solution times for scientific simulations, the three codes used during machine acceptance and a production code that had been run heavily during the first year of operation were rerun and an analysis of the running times was performed. No changes were made to any of the source codes. All of the codes ran at least 10% faster with codes that had been previously memory bandwidth limited executing as much as 3 times faster.

Friday

19A

11:00

Burton Smith, Cray Inc.

Cascade

The talk will provide an overview of the Cascade project which is being funded by the Defense Advanced Research Projects Agency, DARPA. The project's goal is to provide a new generation of economically viable, scalable, high productivity computing systems for national security and industrial user communities in the 2007 to 2010 timeframe. This project was originally composed of 4 selected contractors (Cray Inc., SGI, IBM, and Sun Microsystems Inc.) and structured into three phases.

Friday

19A

11:40

William J. Camp, James L. Tomkins, and Robert W. Leland (SNLA)

Cray Red Storm—a Decadal Architecture

We describe the design philosophy and detailed realization of the Red Storm massively parallel computer architecture. We also lay out its expected performance on major DOE applications. Finally we give a brief status report on the Red Storm development project.

Friday

19A

12:20

Neil Pundit (SNLA)

Next CUG