Performance and Evaluation



Experiences in Data Parallel Programming on Cray MPP Machines

J-Y Berthou and L. Colombet
It is now well known that the data parallel programming model significantly speeds up the development time of parallel code compared to the message passing programming model. But, it is also well known that data parallel compilers still have great progress to make in order to be competitive to message passing libraries.

We have implemented, using Craft (the Cray MPP data parallel language) and HPF1.1, five basic scientific codes and kernels: a bidimensional Fourier Transform, a tensor product, a Monte Carlo application, and a conjugate gradient applied to sparse matrices. The HPF1.1 programs have been compiled with the xhpf2.1 compiler from Applied Parallel Research and the pghpf2.1 compiler from Portland Group Inc. We will present the performance obtained and discuss the pros and cons of the data parallel compilers we used.

Moreover, we propose some basic extensions to HPF directives in order to improve the performance of codes based on irregular data structures like the conjugate gradient application.

Performance Evaluation of Communications and Computations on the Cray T3E

C. Calvin and L. Colombet
We present in this paper an evaluation of the performances of communication and numerical kernels on the Cray T3E. We have implemented all the main communication schemes (point-to-point and collective ones) using all the message passing libraries available. We have also implemented different basic numerical kernels (FFT and CG) in order to evaluate the computation performances and the capabilities of communication overlapping. All these benchmarks results are compared with the results obtained on the T3D.

MARQUISE: An Embedded HPC Demo

Candace S. Culhane, Paul J. Boudreaux, and Ken Sienski
The goal of the MARQUISE project is to demonstrate the impact of advanced packaging technologies on a commercial high performance computer architecture. Multi Chip Modules (MCMs), diamond substrates, and phase change spray cooling are used together to shrink a 4 processor, 1 GByte memory version of the Cray J90 supercomputer from a cabinet system down to a 6U VME form factor. Weight is reduced by 83.3% and volume is reduced by 80%. Code is currently executing on the JR94.512 test vehicle. The final SOLITAIRE prototype is scheduled for demonstration in the summer of 1997.

Should Users Run Workstation Jobs on Supercomputers?

Allen B. Downey and Victor Hazlewood
In some supercomputing environments, users are required to run editing, compiling and data-cleaning tasks on a workstation, and use supercomputers only for jobs that require them. This restriction is intended to improve the performance of the supercomputer, but it imposes significant inconvenience on users. In this paper, we examine the workload submitted to the Cray C90 and the Cray T3D at the San Diego Supercomputer Center, and observe that ìworkstation jobsî consume less than 10% of the cycles on these machines. We conclude that the cost of supporting these jobs is small in light of the productivity improvement it provides for users.

Comparing the Performance and Features of Cray T3E, Cray PVP and Scalable Node Systems

Kent Koeninger
Cray T3E, Cray PVP, and Cray Scalable Node systems will be supported into the next century. Each has unique advantages. This talk will compare the performance of various applications, classifying which platforms are optimal for what types of applications. The talk will also compare the features in common on, and unique to, these platforms. Benchmark performance results on Cray T3E, Cray PVP, and Cray Origin 2000 platforms will be given, with projections for Cray J90++, faster Cray T3E, Cray T90-P, SN1, and possibly SN2 performance.

A Survey of Scalability Techniques: Examples of Various Approaches to High-Performance Computing Through Scalable Systems

Kent Koeninger
Scalability is key to high-performance computing. This talk will contrast and compare various examples of hardware, operating systems, languages, and IO that are used to achieve high-performance on scalable systems. Scalable hardware examples will include Symmetric Multiprocessors (SMPs), Arrays (Clusters), Distributed Memory systems (MPPs), and Scalable Symmetric Multiprocessors (S2MP or CC-NUMA). Scalable operating system examples will include single threaded kernels, multi-threaded kernels, arrayed (clustered) kernels, distributed kernels (Mach and Chorus), and cellular kernels. Scalable language examples will include symmetric-memory multi-tasking (multiprocessed and multithreaded), message passing (PVM and MPI), single-sided communication message-passing (MPI-2), explicit distributed-memory languages (Split-C and F--), and implicit distributed-memory languages (HPF, CRAFT, and others). Scalable IO examples will include IO techniques for SMP, array (cluster), MPP, and S2MP based scalable systems and examples of highly-parallel IO (multiple gigabytes per second), including parallel disks and parallel tapes. The examples will include application performance figures illuminating the strengths and weaknesses of the various approaches.

Applications Performance on the Cray T3E and T3E-900

Nick Nystrom and David O'Neal
Performance of a variety of applications on the Cray T3E and T3E-900 will be presented. In particular, we will analyze application characteristics which lead to performance greater than or less than that expected based on clock rates and number of instruction pipes alone.

Performance Issues of Intercontinental Computing

Micheal Resch and Bruce Loftis
We are in the process of doing metacomputing experiments which connect large Cray T3Es at Pittsburgh Supercomputing Center and the University of Stuttgart to solve a number of very large problems. The first application is from Computational Fluid Dynamics. We will present our experiences, discuss relevant performance issues, and provide any early conclusions about the feasibility of large-scale intercontinental distributed computing.


Table of Contents | Author Index | CUG Home Page | Home (Title Page)