CUG SUMMIT 2001 Abstracts

May 21-23, 2001

Indian Wells, California, USA

Final Program Abstracts

Cray Focused Program

Session Day Session Number Session Time Presentation Title Author(s) Abstract
Monday 3A 2:00 SV1e Performance of User Codes Tom Baring and Jeff McAllister (ARSC) The first SV1e processor upgrade at a user site was accomplished at the Arctic Region Supercomputing Center (ARSC) on April 11, 2001. In this talk, we will discuss performance data collected on several significant user codes. Of particular interest are issues related to the SV1/SV1e cache, memory, and MSPs.
Monday 3B 2:00 Architecture, Implementation, and Deployment of a High Performance, High Capacity Resilient Mass Storage Server (RMSS) Beata Sarnowska, Terry Jones, John Kothe (GRUMMAN) and David Magee, and Frank Lovato (NAVO) In late 1998, Logicon ISS began the process of re-engineering the mass storage system at the Naval Oceanographic Office DoD Major Shared Resource Center (NAVOCEANO MSRC). The purpose of this effort was to provide significantly improved file system availability, refresh the technology and the architectural design, and position NAVOCEANO MSRC for emerging technologies. The end configuration was a dual Sun E10000 server with 3 TB of switched Fibre Channel Disk Arrays and STK 9840 tape devices. The theoretical peak capacity of the system is in excess of 2TB/day data traffic, managing up to 1 PetaByte of storage with total external network throughput of 220 MB/sec. This presentation discusses the technology and considerations used in the design of the NAVOCEANOMSRC Resilient Mass Storage Server (RMSS), the architecture of the RMSS, its implementation and integration, and the transition from the Cray DMF to the RMSS.
Monday 3C 2:00 Performance Debugging on the MTA Preston Briggs and Keith Shields, Cray Inc. One of the attractions of the MTA is ease of programming and a good deal of that ease is due to the software tools. I'll introduce the MTA's tool suite, concentrating on those commonly used for performance tuning.
Monday 3A 2:30 Optimization of SV1 Application Codes in a Production Environment David Turner and David Paul (NERSC) and Mike Stewart, and Bob Thurman, Cray Inc. NERSC's users have migrated PVP code from C90 to J90 to SV1 processors over the last couple of years. Most user codes have not yet realized the full performance potential of the cache-based SV1 processors. Furthermore, NERSC has not configured any MSP processors, out of concern for the effect it would have on overall system throughput. The purpose of this work is to investigate whether sustained high performance can coexist with high system throughput. We first detail the selection of user codes for optimization. These codes are then used to investigate techniques for single-processor optimization, multi-processor optimization, and finally MSP optimization. Included will be a consideration of the implementation of MSP processors in the NERSC production environment and an investigation of the effectiveness of scheduling work in this mixed MSP/SSP environment.
Monday 3B 2:30 SV2 I/O Direction John Badger, Cray Inc. Cray is moving to a new I/O architecture for future products including the SV2. This architecture allows us to quickly take advantage of new technologies in storage and I/O and provide our customers with solutions that integrate well into their overall computing environment. This talk describes the hardware and software components planned for SV2 I/O, and their purpose in the SV2 I/O architecture.
Monday 3C 2:30 An Abstract View of the MTA Architecture Preston Briggs and Keith Shields, Cray Inc. We attack the problem of determining potentially visible triangles from a set of several zillion. Given the MTA's flat shared memory, some problems disappear, leaving the central problem of managing the abundant parallelism.
Monday 3A 3:00 The Parallel Communication and I/O Bandwidth Benchmarks: b_eff and b_eff_io Rolf Rabenseifner (RUS) and Alice E. Koniges (LLNL) The effective communication bandwidth benchmark "b_eff" characterizes the parallel message passing performance of a system. The effective I/O bandwidth benchmark "b_eff_io" measures the parallel I/O capabilities. Both of these benchmarks were developed on a Cray T3E and have two goals: a) to get a detailed insight into the performance strengths and weaknesses of different parallel communication and I/O patterns, and b) to obtain a single bandwidth number that characterizes the average performance of the communication for b_eff, and the I/O subsystem for b_eff_io. Results of the two benchmarks are given for several systems including Cray T3E, IBM SPs, NEC SX-5, and Hitachi SR 8000.
Monday 3B 3:00 Future of Networking on Current Cray Products Jay Blakeborough, Cray Inc. This session will discuss the issues confronting us with networking on currently shipping Cray platforms. It will briefly note the directions that we investigated. The primary focus will be on the current status of the Cray L7R (Layer 7 Router) which we have licensed from Essential Communications for integration into our products. Planned interfaces will be outlined and available performance data on Gigabit Ethernet will be reported.
Tuesday 5A 8:30 TT&D Technical Training and Documentation Kathleen Kroll, Cray Inc. Training is key to making the best use of your Cray Inc. computer system. This presentation covers the technical courses and content offered by Cray Inc. It also provides a calendar of scheduled training sessions and a glimpse of the training facilities in Chippewa Falls, Wisconsin.
Tuesday 5B 8:30 Building a Linux Cluster Cary Whitney (NERSC) This will be a general overview of the PDSF cluster at LBNL and how it is being used in a production environment. In this presentation I will be covering these areas: - How it all started, PDSF and the LINUX choice, - The growth of a cluster. What we have learned., - New ground. Things being worked on for our cluster, - Our users. Who uses a LINUX clusters?, and - Where do we need to go from here?
Tuesday 5C 8:30 LAPACK 3, LAPACK 95, and Fortran 90 Linkage Issues

Slides for this presentation
Edward Anderson (EPA) The Cray Scientific Library (libsci) implementation of the linear algebra package LAPACK differs from the standard netlib version in several important and occasionally annoying ways. The latest LAPACK releases, LAPACK 3 and LAPACK 95, include new subroutines not in libsci, new tests that better exercise the existing software, and new Fortran 90 interfaces that allow dynamic memory allocation of scratch space. In this talk, I will present highlights of a publically available LAPACK supplement to libsci, with particular emphasis on some enhancements to the standard software. In addition, several open issues related to the linking of Fortran 90 modules will be presented for discussion.
Tuesday 5A 9:00 Software Publications Laurie Mertz, Cray Inc. Cray Inc. delivers documentation to our customers through various methods, from online to print. This session will give an overview of Cray Inc.'s current documentation and delivery mechanisms and thoughts on future documentation and delivery methods.
Tuesday 5B 9:00 Experiences with CRAY UNICOS-MLS and SGI IRIX/TRIX Mats Andersson (NSC-SAAB) National Supercomputer Centre at Linkˆping University Sweden has been running UNCOS-MLS since 1992 and IRIX/TRIX since summer 2000. NSC serves the Swedish academia, the Swedish Meteorological and Hydrological Institute and SAAB Aircraft and our users have both military and commercial security demands. The talk will discuss configuration and operational issues.
Tuesday 5C 9:00 Evaluation of C++ Compilers for the Cray T3E Hans-Hermann Frese, Detlef Reichardt, and Philipp Rohwetter (ZIB) The C++ compiler provided with the Cray Programming Environment 3.4 is investigated with respect to the conformance with the ISO/IEC 14882:1998 standard, the availability of the new Standard C++ library and the Standard Template Library (STL). Comparison is made with other C++ compilers, and performance issues are also addressed.
Tuesday 5A 9:30 Cray Inc.'s Programming Environment Release Mechanisms Barbara Mauzy, Cray Inc. This session will give an overview of how Cray Inc. delivers its Programming Environment (PE) software releases. This talk will describe how customers are notified about a PE release; how PE releases are delivered, including using web and ftp connections; and how PE documentation is provided.
Tuesday 5B 9:30 Memory Bandwidth Analysis of CRAY SV1 Beata Sarnowska, Terry Jones, and Dave Magee(GRUMMAN) This presentation presents the results of investigations into the memory bandwidth of the CRAY SV1 Systems. The relevant architectural issues are discussed to develop a peak theoretical bandwidth. The presentation continues with the results of memory bandwidth benchmark executions performed to develop a real-world performance profile of the SV1 memory subsystem. The presentation concludes by applying the results to develop realistic configuration guidelines.
Tuesday 5C 9:30 An Update on Developments in Co-Array Fortran, Unified Parallel C, and Titanium Robert Numrich, Cray Inc. I will describe developments in three approaches for representing the one-sided explicit message-passing approach to parallel programming as simple extensions to sequential languages. Co-Array Fortran is an extension to Fortran 90, originating within Cray Research and implemented on the CRAY-T3E; Unified Parallel C is an extension to C, originating within the National Security Agency and implemented by Compaq; Titanium is an extension to Java, originating from Berkeley with no current vendor implementation. There is a proposal going forward to the Department of Energy to produce portable versions of all three of these languages, which will make them more attractive to programmers and application developers.
Tuesday 5A 10:00 Cray Inc. Central Service Operations / Problem Escalation Procedures Roger Dagitz, Cray Inc. Cray Inc. has a strong central service organization that provides support services to the World-Wide Service Field Organization in support of our customers. This paper describes the capabilities that exist in CSO to provide logistics services, hardware and software support, technical training, documentation for servicing Cray Inc. systems, and service tools for use throughout the world. This paper also defines the procedures that are being used to ensure timely escalation of Cray's customer problems.
Tuesday 5B 10:00 A Study of the System Memory Bandwidth on an SV1 Using an Automotive Production Workload Alex Akkerman and Chuck Schwab, Ford Motor Company and Joe Kaminski and Dave Strenski, Cray Inc. It is a common perception that the memory bandwidth on the SV1 is a bottleneck that limits the number of processors the system memory can support. For example, according to the STREAM SUM benchmark, the SV1 memory bandwidth saturates at about 20 processors. We questioned whether this saturation point was true for a production system running real automotive applications. These applications might not require the full memory bandwidth allowing us to effectively utilize more than 20 processors. A small scale experiment was performed comparing the system's performance between an SV1 with 20 and 24 processors. Since results from this experiment proved favorable, we expanded it to 32 processors. The design and results of these experiments are presented in this paper.
Tuesday 5C 10:00 High Performance Parallel Programming Models Margaret Cahir, Cray Inc. MPI and OpenMP are the dominant parallel programming models in high performance computing today. Though they have been used successfully, the desire for increased efficiency and ease-of-use still exists. This talk will cover the development efforts within Cray into new programming models which combine compiler support and distributed memory concepts. Issues of using hybrid methods will also be discussed.
Tuesday 6A 11:00 Dynamics in Dense Stellar Clusters: Binary Black Holes in Galactic Centre Marc Hemsendorf, Rainer Spurzem, (Heidelverg) and David Merritt (Rutgers U.) We study the dynamics of a massive binary system in a galactic nucleus. For the numerical experiments described in this work, we apply a hybrid ``self consistent field'' (SCF) and direct Aarseth $N$--body integrator (NBODY6), which synthesises the advantages of the direct force calculation with the efficiency of the field method. The code is aimed for use on parallel architectures and is therefore applicable for collisional $N$--body integrations with extraordinarily large particle numbers ($> 10^5$). It opens the perspective to simulate the dynamics of globular clusters with realistic collisional relaxation, as well as stellar systems surrounding a supermassive black hole in galactic nuclei.
Tuesday 6B 11:00 From COTS Components to Supercomputers: A Single System Image through Distributed Resource Management Ian Lumb, Platform Computing Distributed resource management (DRM) solutions can deliver a single-system image that spans from COTS components to NUMA-based SMPs to supercomputers. This allows the price/performance compute capacity of the COTS components, and compute capability of the NUMA SMPs and supercomputers, to be realized in practice. By forming attribute associations that highlight the unique capacity/capability characteristics on a system-by-system basis, Platform Computing's Load Sharing Facility (LSF) is able to offer a very high level of transparency to the scientist or engineer. Compelling illustrations of this transparency will consider the submission, monitoring and control of parallel applications involving architecture-specific MPI, MPICH with heterogeneous system architectures, and a hybrid of MPI with OpenMP. A further example demonstrates 'grid computing' capabilities through co-scheduling an MPI parallel application between geographically separated LSF clusters. In an LSF-enabled environment, researchers can focus on their science and engineering, while transparently gaining maximal benefits from their entire computing infrastructure.
Tuesday 6C 11:00 Compiling and Running a Parallel Program on a First Generation Cray Supercluster System Frank Chism, Cray Inc. A brief overview of the tools used and steps required to compile, run, and monitor a parallel program on the Cray SuperCluster will be presented.
Tuesday 6A 11:30 Exact Diagonalization of Large Sparse Matrices: A Challenge for Modern Supercomputers Gerhard Wellein and H. Fehske, Uni-Bayreuth Exact diagonalization of very large sparse matrices is a numerical problem common to various fields in science and engineering. We present an advanced eigenvalue alorithm - the so-called Jacobi-Davidson algorithm - in combination with an efficient parallel matrix-vector multiplication based on the jagged diagonals storage (JDS) format. This JDS implementation allows the calculation of several specified eigenvalues with high accuracy both on vector and on RISC processor based supercomputers. Using a 256 processor CRAY T3E-1200 we were able to discuss fundamental questions of the metal-insulator transition in polaronic systems. In this context we present an extensive performance study on modern supercomputers such as CRAY T3E, NEC SX-series, Fujitsu VPP700 and Hitachi SR8000.
Tuesday 6B 11:30 File Transfer Agent 1.1—Secure Data Movement Among Heterogeneous Systems Martin Tuori, Platform Computing Supercomputers typically operate in combination with other submission hosts (servers and workstations). These transactions often cross organizational boundaries. FTA (File Transfer Agent) has long been the file transfer tool of choice in these environments. In response to customer need, Platform Computing has improved FTA, in the areas of security (choice of NPPA and Kerberos authentication), and cross-platform availability (FTA 1.1 has been ported to all major UNIX and LINUX platforms). This talk will present those improvements in greater detail, and explore areas for future development.
Tuesday 6A 12:00 Shared-Memory Vector Systems Compared Robert Bell (CSIRO) and Guy Robinson (ARSC) The NEC SX-5 and the Cray SV1 are the only shared-memory vector computers currently being marketed. This compares with at least five models a few years ago (J90, T90, SX-4, Fujitsu and Hitachi), with IBM, Digital, Convex, CDC and others having fallen by the wayside in the early 1990s. In this presentation, some comparisons will be made between the architecture of the survivors SX-5 and the SV1, and some performance comparisons will be given on benchmark and applications codes, and in areas not usually presented in comparisons, e.g. file systems, network performance, gzip speeds, compilation speeds, scalability and tools and libraries.
Tuesday 7A 2:00 Operating Systems SIG Open Meeting Chuck Keagle (BCS),Virginia Bedford (ARSC), Tina Butler (NERSC), John Mulholland (CSE), and Cheryl Wampler (LANL) This Open meeting will focus on UNICOS, LINUX, and UNICOS Security. Following brief introductions of the Focus Chairs, we will discuss any unanswered questions members might have concerning UNICOS, SV1, SV2, SuperCluster, Cray-NEC alliance, and UNICOS Security. Cray Liaisons and technical experts will be available to comment on various issues. We will also discuss CUG issues such as the SUMMIT format, presentation content, and ways to make CUG better serve its members.
Tuesday 7A 2:45 Programming Environments SIG Open Meeting Hans-Hermann Frese (ZIB), David Gigrich (BCS), and Guy Robinson (ARSC) The Programming Environments SIG invites you to attend its Open Meeting on Tuesday afternoon. After a brief introduction of the SIG's business and Focus areas, we shall start off with a life discussion on various issues concerning Programming Environments, Compilers and Libraries, and Software Tools. Attendees will have the opportunity to discuss their concerns with the liaisons and technical experts from Cray, Inc. Your feedback for the SIG's business and recommendations on future activities will be gratefully acknowledged.
Tuesday 7B 2:45 Communications & Data Management SIG Open Meeting Kevin Wohlever (OSC) and Paul Anderson (DOD) This will be a SIG meeting during the Cray portion of the conference. Normal SIG business will be done, including getting additional volunteers to assist with the SIG.
Tuesday 8A 4:00 Cray Q & A Panel Charlie Clark, Cray Inc. Abstract: Charlie Clare will moderate a panel of Cray Representatives of Cray Hardware Engineering, Software Development and Service organizations. They will discuss issues and and respond to questions on all aspects of Cray Service, Hardware and Software. The questions will come from:
- The CUG Survey
- Questions submitted to the Computer Services SIG chair by email (Leslie Southern, leslie@osc.edu) any time before the CUG meeting
- Questions placed in the OSC site folder during the CUG meeting. NOTE these need to be submitted by the end of day Monday 21st May
- Questions will also be taken from the floor.
This tends to be a lively session that addresses a large variety of issues.
Tuesday 8B 4:00 Tutorial: Introduction to UPC Tarek El-Ghazawi, Bill Carlson, (IDA), Tom Page, (NSA), and Greg Fisher, Cray Inc. The distributed shared memory programming model have the promise of delivering the ease of programming of the shared memory model and the flexibility and control offered by message passing. UPC is a parallel extension of ANSI C which follows the DSM model and leverages the related experience of vendors and users, as well as academic research. UPC has been gaining the support of many vendors including Cray. This tutorial will cover the basic concepts and constructs of UPC with programming examples. It will also discuss the status of UPC, performance issues, how to get involved, and future plans with focus on Cray. See hpc.gmu.edu/~upc for more details.

Back to the CUG SUMMIT Proceedings 2001 home page