CUG 2003 Abstracts

CUG 2003 Proceedings
Monday	Tuesday	Wednesday	Thursday	Friday
Go to the pages for Monday-Friday to find papers presented that day. To search for a paper by title or author, go to the Abstracts page. Final Attendee List

Proceedings Paper Abstracts

Sorted by Day, Session Number and Time

Day	Number	Time	Authors	Paper Title	Abstract
Monday	1	8:30 and 11:00	Harvey Wasserman, Darren J. Kerbyson, and Adolfy Hoisie (LANL)	Performance Analysis and Prediction for Large-Scale Scientific Applications	This tutorial provides a methodical, simplified approach to performance analysis and modeling of large-scale, parallel, scientific applications. The heart of the tutorial covers analytical modeling of application scalability using several real case studies. The case studies demonstrate how performance modeling can be used to estimate performance that can be expected from a future computer system, diagnose system performance "glitches" in comparison with true application performance during system installation, accurately identify performance bottlenecks in existing systems, provide a tuning "roadmap" to application developers, and enable "point-design" studies for computer architects designing new systems.
Monday	4A	2:00 and 4:00	John Levesque, Jim Schwarzmeier, and Nathan Wichmann, Cray Inc.	Cray X1 Optimization	This tutorial will cover important issues of using the Cray X1 system, different coding examples relevant to HPC applications, and Cray X1 performance optimization. The talk will explain the Cray X1 tightly integrated Vector MPP hardware, and the various methods that can be used to employ the Multi-Streaming Processor (MSP). We will also discuss 32- and 64-bit operation, direct memory access across the entire system using SHMEM, Co-Array Fortran or UPC, and very high memory interconnect bandwidth. The second part of the talk investigates kernels to illustrate how the compiler optimizes input code, and how to program for the Cray X1 to get the most out of the MSPs. We will explain how the Cray X1 can be used as a shared memory vector processor, a distributed memory vector processor via MPI, Co-Arrays or UPC, and a combined "hybrid" shared/distributed system. The last portion of the tutorial will cover the optimization of several applications from start to finish, illustrating how the Cray tools are employed to identify and eliminate the performance bottlenecks.
Monday	4B	4:00	Jim Tennessen and Robert Hoehn, Cray Inc.	Cray X1 System Site Planning Slides	Cray X1 Site Planning overview will detail the site preparations required to prepare a site for Cray X1 liquid-cooled and air-cooled systems. The presentation will cover site access, electrical power requirements, circuit requirements, environmental requirements, heat rejection to air, heat rejection to water and water piping requirements for the liquid-cooled chassis, floor loading and floor cutouts for the various components that make up the Cray X1 system.
Monday	4B	4:45	Peggy Gazzola, Cray Inc.	Cray X1 System Administration and Configuration Slides	This session will cover general Cray X1 Administration and Configuration topics. Differences between administration of the Cray X1 system and Cray PVP/T3E systems will be highlighted.
Tuesday	5	8:45	Janet Bednarek, University of Dayton	Aeronautical Research in Ohio	Just as in many ways Ohio can claim to be the birthplace of aviation, so it can also claim to be the birthplace of modern aeronautical engineering research, for not only did Wilbur and Orville Wright invent the airplane, they also invented modern aeronautical engineering research. And since the seminal work of the brothers from Dayton, the state of Ohio has been a location for a rich variety of important aeronautical research activities. A full accounting of the accomplishments and contributions of aeronautical research in Ohio would take longer than the time allotted. For that reason, I will focus on events at two important nodes of activity – Dayton and Cleveland.
Tuesday	6	11:30	Mark Fahey and James White (ORNL)	DOE Ultrascale Evaluation Plan of the Cray X1 Slides	In November of 2002, Oak Ridge National Laboratory organized a workshop to develop an evaluation plan for the Cray X1 using applications of relevance to the Department of Energy (DOE) Office of Science. This workshop was followed by application-specific workshops in fusion science, climate modeling, and materials science, in February and March of 2003. We describe the findings of these workshops and the resulting plan to evaluate the Cray X1 for ultrascale simulation within the DOE.
Tuesday	7A	2:00	Steve Johnson, Cray Inc. and Paul Rutherford, ADIC	Storage Management and SAN Directions on the Cray X1 Slides	The management of storage and data is a significant capability needed at all HPC sites. This talk will address the strategy and capability being developed for the Cray X1 system, and Cray's relationship with ADIC (Advanced Digital Information Corp.) in implementing this strategy. Also to be discussed are challenges, trends and various technologies existing in this area of HPC. ADIC product technology and its capabilities will be reviewed as it applies to HPC and the Cray storage management strategy, including DMF migration.
Tuesday	7B	2:00	Helene Kulsrud (IDA)	Sorting on the Cray X1 Slides	Sorting of 63 bit integers is an important benchmark for High Performance Computation. This talk will describe how machine portable sorting codes fared when ported to the Cray X1 and what was and is being done to improve performance. These codes reveal strengths and weaknesses in parallelization, memory bandwidth and scalar performance.
Tuesday	7A	2:45	Jay Blakeborough and Wendy Palm, Cray Inc.	Cray Open Software (COS) Slides	It's time to review how far we've come with the COS software package. The use of “we" is important as the success of the project and its future continues to rely on the relationships that we've built with our customers and the open source community. This discussion will cover the history, goals, and future plans for the COS software package.
Tuesday	7B	2:45	Jay Boisseau, Kent Milfeld, and Chona Guiang (TACC)	Exploring the Effects of Hyper-Threading on Scientific Applications Figures Slides	The Texas Advanced Computing Center is replacing its 256-processor Cray T3E with a 512-processor Cray-Dell Linux cluster based on Intel Xeon processors. One of the performance-enhancing architectural features in the latest Xeon/Pentium 4 microarchitecture is Hyper-Threading (HT) technology. HT proves a low level parallelism (simultaneous multitasking) directly within the microarchitectural framework of the processor core. HT can potentially accomplish more work via simultaneous instruction execution and memory latency hiding when the Instruction Level Parallelism (ILP) of a single thread has a low level of instruction retirement per cycle. We will investigate the performance characteristics of this technology on a variety of scientific benchmarks and applications and discuss what kinds of applications might benefit from this technology.
Tuesday	8A	4:00	Andrew Johnson (NCS-MINN)	Computational Fluid Dynamics Applications on the Cray X1 Architecture: Experiences, Algorithms, and Performance Analysis Slides	We present our experiences and performance results of our in-house computational fluid dynamics (CFD) and computational solid mechanics (CSM) codes on the new Cray X1 parallel/vector/multi-streaming architecture. These codes are fully parallel based on MPI, incorporate mesh partitioning strategies, and are based on the finite element methods built for fully unstructured meshes. Our vectorization strategies will be presented along with a detailed performance analysis with comparisons to other parallel architectures such as the Cray T3E-1200.
Tuesday	8B	4:00	Greg Fischer and Charlie Carroll, Cray Inc.	Parallel Programming Model Update Slides	This talk will present status for the various compiler-based parallel programming models available on the Cray X1: UPC, Co-array Fortran and OpenMP. Current functionality and performance will be presented as well as plans for future work.
Tuesday	8A	4:45	Patrick Worley (ORNL)	Early Performance Evaluation of the Cray X1 at Oak Ridge National Laboratory Slides	Baseline performance data will be presented for a number of kernels and applications on the Cray X1. Applications will be drawn from chemistry, climate, fusion, and materials. Performance diagnostics will be used to indicate where, and how, performance might be improved.
Tuesday	8B	4:45	Geir Johansen, Cray Inc.	C and C++ Programming for the Vector Processor Slides	The performance of codes executing on the Cray X1 and other Cray PVP machines is maximized by writing code that vectorizes. There are methods for programming an algorithm in C and C++ to improve vector optimization of the code. Code sequences that run well on scalar processor machines may benefit from restructuring to optimally perform on a vector processor machine. This paper outlines a style of C and C++ programming that will give the Cray Standard C and C++ compilers a greater opportunity to generate vectorized code.
Wednesday	10A	8:00	Bradford Blasing (NCS-MINN)	Cray X1 Early Experience Slides	The Cray X1 represents the first new design supercomputer from Cray, Inc. to come out in six years. Network Computing Services, as prime contractor for the Army High Performance Computing Research Center (AHPCRC), has acquired and installed two of the five early-production air-cooled Cray X1 systems, and a liquid-cooled production Cray X1 system. The AHPCRC's interest and involvement in the Cray X1 project dates back five years. The AHPCRC's early experience with the Cray X1 operating system will be presented and compared with the current production releases. Lessons learned in porting codes to the Cray X1 along with performance results will be presented. Access to the Cray X1 is via the DoD High Performance Computing Modernization Program Office version of Kerberos, and experience in porting Kerberos to the Cray X1 will be presented. The AHPCRC's future plans for the Cray X1 in terms of shared file systems, scheduler enhancements, and code porting will also be discussed.
Wednesday	10B	8:00	Steve Kaufmann and Bill Homer, Cray Inc.	CrayPat - Cray X1 Performance Analysis Tool Slides	To provide adequate performance analysis capabilities, a new performance analysis tool - CrayPat - was developed. It enables a user to instrument a program and perform multiple experiments, resulting in detailed experiment data files. These files are accepted by a reporting mechanism, allowing the user to aggregate, display, format, and export collected performance data in a myriad of ways.
Wednesday	10A	8:45	James White (ORNL), Nathan Wichmann and Matthew Cordery, Cray Inc.	An Optimization Experiment with the Community Land Model on the Cray X1 Slides	Version 2.1 of the Community Land Model (CLM) uses data and control structures that challenge the capabilities of the Cray Fortran compiler. We describe an optimization experiment where we modified the CLM data structure within a computationally expensive tree of subroutines and compared performance with the original code. The modifications produce a 20% increase in performance on an IBM p690 and 5.86 to 7.29 times this performance on a Cray X1. The modified code may also be as maintainable and extensible as the original code.
Wednesday	10B	8:45	Bob Moench, Cray Inc.	Etnus TotalView on the Cray X1 Slides	Cray has ported the Etnus TotalView Debugger to the Cray X1 platform. This presentation will be a demonstration and tutorial of using TotalView on the Cray X1 system. A tour will be given of both the Graphical User Interface and the Command Line Interface. Extensions specific to the Cray X1 as well as any current limitations will be covered.
Wednesday	13	2:00	Paul Muzio and Richard Walsh (NCS-MINN)	Total Life Cycle Cost Comparison: Cray X1 and Pentium 4 Cluster Slides	We present the results of a comparison of the cost to acquire, install and operate an integrated system such as the Cray X1 as compared to a large commodity cluster-based system. Included in the analysis is an assessment of facility costs, power, and cooling; systems support costs; projected system utilization rates; sustained performance; and downtime.
Wednesday	13	2:45	Paul Muzio (NCS-MINN), Arthur Bland (ORNL), Barbara Horner-Miller (ARSC), Suresh Shukla (BCS), and Per Nyberg, Cray Inc.	Total Cost of Operation (TCO) Panel	The initial cost of hardware is no longer the driving force behind the purchase of today's high-performance computing systems. Other factors associated with the hardware, such as maintenance contracts, licensing agreements, power consumption, staffing, floor space requirements, and so forth must also be considered. Several CUG member sites (ORNL, ARSC, Boeing, etc) and Cray Inc. have agreed to participate in this panel session to discuss the various “Total Cost of Operation” issues which they believe to be important for systems at their sites. If you would like to join this panel, please contact Paul Muzio, Sally Haerer, or David Gigrich before the day of the session.
Wednesday	14A	4:00	Jay Boisseau, Kent Milfeld, and Chona Guiang (TACC)	Application Performance on Dual-Processor Cluster Nodes Slides	The Texas Advanced Computing Center is installing a large Cray-Dell cluster based on dual-processor nodes with Intel Xeon processors. The use of dual-processor nodes is extremely common in large Intel-based clusters due to the relatively small incremental cost of adding the second processor. However, this cost is small precisely because the node architecture is essentially identical for dual-processor systems as for single-processor systems: the pair of processors is forced to share the memory bandwidth in each node. In this presentation we will demonstrate the performance characteristics and cost benefits of dual-processor nodes compared to single-processor nodes on a variety of scientific benchmarks. We will then discuss the effects of memory contention on performance for scientific applications in general.
Wednesday	14B	4:00	Wendell Anderson, Marco Lanzagorta, and C. Stephen Hellberg (NRL)	Analyzing Quantum Systems Using the Cray MTA-2 Slides	Analysis of the thermodynamic properties of many-body quantum systems requires the determination of the smallest eigenvalues of a very sparse matrix. Techniques developed at NRL to solve this problem use a Lanzcos method that requires several multiplications of the matrix by a dense vector, an operation ideally suited to the architecture of the Cray MTA.
Wednesday	14A	4:45	Harvey Wasserman, Adolfy Hoisie, and Darren J. Kerbyson (LANL)	Performance Modeling the Earth Simulator and ASCI Q Slides	This work gives a detailed analysis of the relative performance between the Earth Simulator and systems built using Alpha processors. Detailed performance models encapsulating fully the behavior of two codes representative of ASCI computations are used. One result of this analysis is an equivalent-sized Alpha-based machine that would be required to obtain the same performance as the Earth Simulator.
Wednesday	14B	4:45	Jonathan Gibson (MCC)	Finite-Element Analysis on the Cray MTA-2 Slides	We present a comparison of the performance of finite-element codes on the Cray MTA-2 with that of more conventional HPC machines, explaining how its efficient memory access can give it a distinct advantage over other machines for such problems.
Thursday	16A	8:00	James Maltby (CRAY)	The Cray BioLib: A High Performance Library for Bioinformatics Applications Slides	The new Cray Bioinformatics Library (BioLib) is designed to perform genomic searching, sorting and bit manipulation operations useful in the the analysis of nucleotide and amino acid sequence data. The library makes use of unique Cray vector hardware features and compressed data formats to speed throughput and minimize storage. The features of the library and performance on sample scientific applications will be described.
Thursday	16B	8:00	Tony Meys (NCS-MINN)	Modeling the Weather on a Cray X1	The Army High Performance Computing Research Center (AHPCRC) installed two early-production Cray X1 computers in September 2002. This paper will discuss code migration, optimization, testing, and use of two numerical weather prediction codes on the Cray X1: the Fifth-Generation NCAR / Penn State Mesoscale (MM5) and the Weather Research and Forecasting (WRF) Model. Early experiences working with these models on the Cray X1 will be discussed along with a sampling of results.
Thursday	16A	8:45	James Long (ARSC)	The Portable Cray Bioinformatics Library Slides	The Portable CRAY Bioinformatics Library, a C implementation of the proprietary CRAY version, is designed to compile and run on a variety of Unix platforms. Features of the portable version will be presented, along with basic performance data comparing the portable version against the CRAY version on the ARSC SV1ex in addition to timings on other platforms.
Thursday	16B	8:45	Alexander Akkerman (Ford) and Dave Strenski, Cray Inc.	Porting FCRASH to the Cray X1 Architecture	FCRASH is an explicit, nonlinear dynamics, finite element code for analyzing the transient dynamic response of three-dimensional solids and structures. The code is used as a safety research tool to simulate vehicle impact at both the full-vehicle and component levels. This paper will describe our experience porting FCRASH to the Cray X1 architecture and present performance results relative to the Cray T90 platform.
Thursday	16A	9:45	Quinn Li and Johnny C. H. Loke, Miami University of Ohio, Eric Stahlberg; (OSC), and Dave Strenski, Cray Inc.	Genome-wide Compilation of mRNA Polyadenylation Signals in Arabidopsis Using Advance Searching Technologies Slides	New computing technologies have recently become available to search and rapidly analyze large numbers of sequences for critical information. Two technologies are of particular interest: general high-throughput sequencing libraries available on Cray systems such as the SV1 and field-programmable gate arrays (FPGA) available as add-ons to such systems as the Sun Fire 6800. We will be applying and adapting these technologies for the study of polyadenylation signals in Arabidopsis. Messenger RNA 3-end processing in eukaryotic cells involves polyadenylation, which is crucial for mRNA stability and translation regulation of genes. We are interested in studying the regulatory sequence elements (cis-elements) residing in the mRNAs which direct the processing during polyadenylation reactions. In plants, there is only very limited information about these cis-elements. As such it becomes an obstacle for accurate gene annotation in many genome projects. Technologies from both areas will be applied and adapted to characterize regions of significance to polyadenylation. To utilize the TimeLogic DeCypher FPGA system, a Hidden Markov Model (HMM) consensus cis-elements from the cDNAs with known poly(A) sites will be located. Based on the outcome, a model can be established for genome wide large-scale searches in Arabidopsis genome using TimeLogic DeCypher systems. With a similar objective, the capabilities of the Cray SV1 bioinformatics libraries will be exploited also to look for these patterns, but with a modified approach. With these results, we will attempt to confirm the consensus of polyadenylation signals in genomic sequences in Arabidopsis with molecular techniques. The availability of complete genomic sequences of many species will enable us to adapt this computation technique to study other plants, such as rice. Further results will be presented.
Thursday	16B	9:45	Rolf Rabenseifner (RUS)	Hybrid Parallel Programming: Performance Problems and Chances	Most HPC systems are clusters of shared memory nodes. Parallel programming must combine the distributed memory parallelization on the node inter-connect with the shared memory parallelization inside of each node. Various hybrid MPI+OpenMP programming models are compared with pure MPI. Benchmark results of several platforms are presented.
Thursday	17A	11:00	Arthur Bland, Richard Alexander, Don Maxwell, and Ken Matney (ORNL)	Early Operations Experience with the Cray X1 at the Oak Ridge National Laboratory Center for Computational Sciences Slides	Oak Ridge National Laboratory (ORNL) has a history of acquiring, evaluating, and hardening early high-performance computing systems. In Winter of 2003, ORNL acquired one of the first production Cray X1 systems. We describe our experiences with installing, integrating, and maintaining this system, along with plans for expansion and further system development.
Thursday	17B	11:00	Don Mason, Cray Inc.	Cray System Software Features for Cray X1 System Slides	This talk will overview the current and planned major software features for the Cray X1. Information on how features relate to Cray X1 configurations and previous Cray architectures will be presented.
Thursday	17A	11:45	Michael Pettipher (MCC)	Managing Supercomputing Resources at the University of Manchester Slides	The University of Manchester has provided a supercomputing service for over 30 years. We also provide other national services and support a very large local user community. While the systems on which the services are provided have changed enormously during this period, the objective to provide the user community with the simplest and most flexible way to access such resources has not. The fact that we have been involved in such activities for a long period, and that our staff are very familiar with the requirements of a diverse range of users, has resulted in the desire to provide a single, consistent, flexible system to manage the wide range of services provided. The system now in use satisfies all of these objectives, taking into account specific requirements of the different services. A major objective has been to devolve resource management as far as possible to the end users, thus reducing central management and increasing control by the users themselves. The whole system is web based, thus helping to make it as accessible as possible to the whole user community. This talk will explain the rationale behind the system used, show how it is used both at a user and an administrator level, and also indicate how it will be further developed.
Thursday	17B	11:45	Tom Goozen, Cray Inc.	Cray X1 MPI Implementation Slides	With the Cray X1 architecture several changes in algorithms were put into effect to take advantage of the high bandwidth between processors. Several key routines were completely rewritten to take advantage of the Remote Translation Table (RTT) memory access feature in the hardware. This facilitated a very fast and efficient means to move large amounts of data between processors. The mechanism used to do basic sends and receives was changed and provided a very "hot path" to reduce latencies. A unique algorithm to reduce barrier timings was installed that rivalled the Cray T3E hardware barrier. Other examples of algorithm changes will also be presented.
Thursday	18A	2:00	Gene McGill (ARSC)	The ARSC Storage Update Slides	ARSC is currently in the process of migrating from a Unicos/DMF storage model to a Sun Solaris/SAM-QFS model. The new systems support a supercomputing center comprising various computing architectures (Cray T3E, SV1ex, Cray X1, IBM p690 and SP3, SGI) and missions of high-performance computing and visualization support. This paper will describe details of this conversion touching on the topics of User experience, SAM vs. DMF, and STK 9x40B tape drive experiences.
Thursday	18B	2:00	Mary Beth Hribar, Cray Inc.	Cray X1 Scientific Libraries Slides	The first releases of the Cray X1 Scientific Libraries (libsci) contain BLAS, LAPACK and FFT routines. This talk will explain the added interfaces for the Cray X1 and how to use the libsci routines. There will be a discussion of current performance and of future plans for the libsci library.
Thursday	18A	2:45	Jay Blakeborough, Cray Inc.	Cray Networking on Product Line Systems Slides	This discussion will focus on the networking capabilities and plans for currently supported product line systems. The Cray Networking Subsystem (CNS) will be introduced along with current performance reports. Future directions and research plans for improving the scaling of networking on the Cray X1 will also be presented.
Thursday	18B	2:45	Adrian Tate and Kevin Roy (MCC)	Optimization of ScaLAPACK on the Cray X1	Scalability of numerical parallel library routines is naturally inhibited by communications overhead, since the sharing of computation amongst processors becomes outweighed by the cost of communications between those processors. Hence, a fixed problem size will naturally reach saturation point in terms of scalability. In the context of ScaLAPACK, this saturation point is often too low (below 32 processors) for use in capability codes. The University of Manchester and Cray Inc. have begun a collaboration that will endeavor to address this issue by way of a complete overhaul of the the existing communications layer to ScaLAPACK, thus decreasing the time spent in inter-process communications and increasing the scalability of the library routines. Cray's new X1 architecture is designed to make remote memory referencing much quicker, and specific features allow very quick message passing via Co-array Fortran. With these features in mind, a suite of new communication subroutines is being developed to produce highly scalable parallel numerical library routines for the Cray X1.
Thursday	19A	4:00	Michael Karo, Cray Inc.	PBS Pro on the Cray X1 Platform Slides	Resource management software has evolved tremendously over the past several years to address the challenges involved with effectively coordinating, monitoring, and scheduling compute resources in increasingly diverse environments. Cray has adopted PBS Pro from Altair Engineering in order to address these challenges, and provide its customers with a POSIX-compliant, feature-rich workload management suite. Integrated with PScheD on Cray X1 systems, PBS Pro works to provide a unified view of the underlying compute resources. By extending the scheduling and management capabilities of PScheD, users and administrators may utilize available resources with increased efficiency.
Thursday	19A	4:45	Richard Lagerstrom, Cray Inc.	Cray X1 Application Scheduling Slides	Delivering the power of the Cray X1 to user applications in an efficient way is a scheduling challenge. PScheD for the Cray X1 builds on the scheduling foundations developed for the Cray T3E to offer a flexible and efficient application scheduler to both the system administrator and the users. PScheD, working directly with the hardware, and PBS Pro, working with the resources represented by the hardware, together help satisfy the needs of the users and the organization.
Friday	21A	8:00	Thomas Baring (ARSC)	SX-6 Compare and Contrast	The Arctic Region Supercomputing Center (ARSC) installed a single 8-CPU Cray SX-6 node, and made it available last August to the broader U.S. HPC community for benchmarking and testing. Our experiences thus far suggest that, to those accustomed to traditional Cray PVP systems, the SX-6 architecture and user environments are simultaneously familiar and peculiar. We also note that performance sustained by vectorizable user codes, per SX-6 CPU has been gratifying while parallel speedup has proven more elusive.<BR>
Friday	21B	8:00	Bill Long, Cray Inc.	Fortran 2000 Slides	Fortran 2000 is a major revision of the Fortran standard due for final approval next year. A few of the many new features include improved data abstraction, standardized interoperability with C, enhanced I/O, and facilities for object oriented programming. The presentation will include a summary of the new features and which ones are already available in the Cray ftn compiler, and our plans for completing the implementation of the new standard.
Friday	21A	8:45	Andrew Johnson (NCS-MINN) and Cory Quammen, AHPCRC	Large Scale Scientific Visualization on Cray MPP Architectures Slides	We present various algorithms and strategies we use to visualize large scientific data sets in parallel on distributed-memory architectures such as the Cray T3E and Cray X1 systems. The data sets we visualize are usually the result of computational fluid dynamics (CFD) analysis based on unstructured meshes of any element type or mixed element type meshes. These algorithms and strategies are incorporated into the “Presto” visualization software developed at the Army HPC Research Center, and are capable of visualizing large remote data sets due to its client-server implementation framework.
Friday	21B	8:45	Bracy Elton, Cray Inc.	Getting the Most Out of the FFTs in the Cray X1 Scientific Libraries Slides	We explain how to use the FFT/signal processing routines in the Cray X1 Scientific Libraries, introducing new features in the process. We then discuss performance issues ranging from stride selection to radix-specific considerations. Finally, we present current performance and discuss future directions.
Friday	22	9:45	Carl Winstead (JPL) and Vincent McKoy, Caltech	Electron-Molecule Collision Calculations on Vector and MPP Systems Slides	In this talk, we will describe the computational challenges in applying quantum mechanical calculations to the development of collision cross section sets needed to model the interaction of low-energy electrons with fluorocarbon gases used in plasma processing of semiconductors. While many of the required calculations can make effective use of MPP and cluster architectures, others are best suited to vector processors. We will outline the practical motivation for our work and the computational requirements imposed by the nature of the problem. We will then discuss use of the Cray SV-1 in our work, and we will compare performance of our MPP code on past and current platforms, including the Cray T3D and T3E, the Origin 2000, and various clusters.
Friday	23	11:00	Brian Koblenz, Cray Inc.	Cray Red Storm	Cray is developing a large scale MPP system for Sandia National Laboratories. This talk will describe the architecture, performance characteristics and user environment for this system.
Back to the CUG 2003 Proceedings page

Day

Number

Time

Authors

Paper Title

Abstract

Monday

8:30
and
11:00

Harvey Wasserman, Darren J. Kerbyson, and Adolfy Hoisie (LANL)

Performance Analysis and Prediction for Large-Scale Scientific Applications

This tutorial provides a methodical, simplified approach to performance analysis and modeling of large-scale, parallel, scientific applications. The heart of the tutorial covers analytical modeling of application scalability using several real case studies. The case studies demonstrate how performance modeling can be used to estimate performance that can be expected from a future computer system, diagnose system performance "glitches" in comparison with true application performance during system installation, accurately identify performance bottlenecks in existing systems, provide a tuning "roadmap" to application developers, and enable "point-design" studies for computer architects designing new systems.

Monday

2:00
and
4:00

John Levesque, Jim Schwarzmeier, and Nathan Wichmann, Cray Inc.

Cray X1 Optimization

This tutorial will cover important issues of using the Cray X1 system, different coding examples relevant to HPC applications, and Cray X1 performance optimization. The talk will explain the Cray X1 tightly integrated Vector MPP hardware, and the various methods that can be used to employ the Multi-Streaming Processor (MSP). We will also discuss 32- and 64-bit operation, direct memory access across the entire system using SHMEM, Co-Array Fortran or UPC, and very high memory interconnect bandwidth. The second part of the talk investigates kernels to illustrate how the compiler optimizes input code, and how to program for the Cray X1 to get the most out of the MSPs. We will explain how the Cray X1 can be used as a shared memory vector processor, a distributed memory vector processor via MPI, Co-Arrays or UPC, and a combined "hybrid" shared/distributed system. The last portion of the tutorial will cover the optimization of several applications from start to finish, illustrating how the Cray tools are employed to identify and eliminate the performance bottlenecks.

Monday