CUG Proceedings

Applications and Algorithms

Porting LCPFCT to the Cray T3E Using HPF

Jay Boisseau, Bob Sinkovits, and Ken Steube
LCPFCT is an accurate, robust, easy-to-use, publicly-available library of routines for solving nonlinear, time-dependent continuity equations using the Flux-Corrected Transport (FCT) algorithm. LCPFCT has been used in a variety of applications, including fluid dynamics, plasma dynamics, magnetohydrodynamics, and reactive flows. LCPFCT has been optimized for many platforms, including CRAY PVPs. We present the first version of LCPFCT converted to HPF (LCPFCT-HPF) and optimized for the CRAY T3E. We discuss the issues involved in porting and optimizing LCPFCT and present performance data for up to 32 processors on the T3E.
Quantum Chemistry Code on Cray Massively Parallel Platforms

Richard Graham, Carlos Gonzalez, and Walter Thiel
A distributed memory version of the semi-empirical quantum chemistry program MNDO was created from the parallel vector version of this code. MNDO is authored by Professor Walter Thiel of the University of Zurich, and is marketed commercially by Oxford Molecular.

All order N**2 matrices were distributed across PEs to take advantage of the large amount of aggregate memory available on the T3D/E, with ScaLAPACK being used for the distributed linear algebra.
SHMEM, MPI, or PVM paradigms are used for communications and global reductions. All PEs are partitioned into groups, allowing each group to work on an independent part of the computation.

Performance results based on simulations of a variety of large chemical problems will be presented. In addition, issues associated with the efficient parallelization of the code, including source code implications and scalability, will be discussed.
Speedup Comparison between FE and FD Parallel CFD Codes on C98

J.P. Gregoire and B. Thomas
Parallelism appears to be a promising answer to the ever increasing users' demands on maximal computation speed and maximal mesh size in industrial CFD computations. So two CFD codes using the same time solver, N3S, a code based on FEM, and ESTET, a code based on FDM have been parallelized at the Research Center of Electricité de France.

In this paper we compare the parallelization of each code step by step, present the two resulting speed-ups and analyse why they are so different.
Capability Prototyping

Michael A. Heroux, Richard D. Laroche, and Clayton D. Andreason
Capability Prototyping is a set of efforts in SGI/Cray Applications to identify, demonstrate and promote new applications capabilities which enhance the value of high-performance computing (HPC). In this paper we describe Capability Prototyping and discuss our strategies in these efforts. We also give a "snapshot" of one current project and a view of where we plan to go next.
Portability and Performance Comparison of the Message Passing Toolkit (MPT) on the the Cray J90s and the Cray T3E

Andrea Hudson and Alex Wang
A representative subset of the HPCC/Earth and Space Sciences benchmark codes from NASA Goddard Space Flight Center have been implemented in either PVM or MPI and will be analyzed under the Message Passing Toolkit (MPT) on the Cray J90 systems and the Cray T3E. Portability across the systems will be discussed. Performance issues with respect to scaling and time to solution will be compared.
Q-Chem: A New-Generation Quantum Chemistry Program

Benny Johnson nd Richard Graham
Q-Chem is a new quantum chemistry package designed from scratch for MPP architectures. This code uses state-of-the-art quantum-chemical numerical algorithms expressed in efficient and highly scalable computational algorithms. Some computational details will be presented, and performance and scalability data will be discussed based on calculations performed on Crayís T3E platform.
Parallel Computing Applications and Environment on the T3D

Alice E. Koniges and Morris A. Jette
As part of the Parallel Applications Technology Program of Cray Research, a 256-processor T3D was sited at Lawrence Livermore National Laboratory in 1994. Today, that machine has become a workhorse of unclassified supercomputing for the laboratory with utilization rates of 96% or better. This talk will cover the range of applications on the T3D, performance highlights, and information on how to use the MPP platform as a production computer for both industrial and academic applications.
Results for a Finite-Element MHD Code on the T3D and T3E

Alice E. Koniges, S.J. Plimpton, and X. Xueqiao
NIMROD is a finite-element based magnetic fusion energy code. The code achieves parallelism by spatially decomposing the problem into an unstructured collection of structured blocks. The algorithm yields excellent scaling on both the T3D and T3E with the latter yielding a performance improvement factor of approximately 4.5.
Dynamic Programming Algorithm for RNA-Folding on Cray T3E

Siamak Pazirandeh
We have parallelized and optimized Zuker's RNA secondary structure prediction program (MFOLD) on the Cray T3E. Our application is a systolic implementation of a dynamic programming algorithm. It performs nearest neighbor communications to increase message passing efficiency. Moreover, memory accesses have been optimized for the T3E to decrease cache misses. Our code appears to be the fastest available implementation of MFOLD, as well as permitting the prediction of structures for longer RNA molecules than any other current implementation (up to 20,000 bases).
Performance of Parallel Gaussian 94 on the Cray T3E

C.P. Sosa, J. Ochterski, J. Carpenter, and M.J. Frisch
Gaussian 94 is the latest in the Gaussian series of electronic structure programs. It is an integrated system to model a broad range of molecular systems under a variety of conditions, performing its calculations from the basic laws of quantum chemistry. This new version includes methods and algorithms for scalable parallel systems such as the Cray T3E. Recent developments in parallel programming models combined with scalable parallel systems provide a new and challenging dimension for ab initio electronic structure calculations. On the other hand, scalable parallel systems provide the opportunity to easily add more processors and memory as chemists seek to solve larger problems and get faster solutions.

In this paper we discuss the Gaussian implementation using the Linda programming model on a Cray T3E scalable parallel systems connected in a bidirectional 3-D torus by a high-bandwidth, low-latency network. In particular, we look at the degree of concurrency as a function of the number of processing elements.
The Model for Cray Research and Silicon Graphics Support of Application Developers

Jeff Zais
Cray Research and Silicon Graphics have in the past independently supported their key application vendors. Many vendors, especially in the fields of science and engineering, have been important to both Cray and Silicon Graphics. A new model for supporting all Cray application developers will be described. Important featuresinclude:

Focused technical support by performance engineers in the Cray Applications Division and other groups within Silicon Graphics.
Joint projects involving customers, developers, and the performance engineers, designed to demonstrate new capabilities and tangible value to the customer.
Marketing support from industry specific teams.
On-site hardware.
Access to systems located at the Cray corporate offices and other locations.
Silicon Graphics Developer's program, including the Developer's Forum.
Special considerations for overseas developers.

Table of Contents | Author Index | CUG Home Page | Home (Title Page)