Already during the initial intensive learning phase with the first CRAY T3E system in Germany scientists from different research areas made intensive use of parallel applications. The 512 CRAY T3E nodes delivered in December 1996 and a growing number of users require automated and optimized scheduling of batch and interactive work. The talk will give a status update of the CRAY T3E at KFA, describe the implemented scheduling mechanisms and identify where additional functions are needed.
The first portion of this talk will provide an update of the current plans for Unicos/mk on the T3E. The talk will include a discussion of progress toward completion of features related to Unicos compatibility and the status of production system features in the field like Political Scheduling, and Checkpoint/Restart. Development efforts and the features that will assist in IRIX convergence will also be covered.
The second part of this talk will cover the status of performance and scalability from an operating system perspective. Unicos/mk was designed to be a high performance distributed system. Measurements of the T3E operating system will be presented and some comparisons made with other relevant systems.
Fair Share is the standard scheduling algorithm used for political resource control on large, multi-user UNIX systems. Promising equity, Fair Share has instead delivered frustration to its Los Alamos UNICOS users, who perceive misallocations of interactive response within a system of unreasonable complexity.
This paper reviews the design of the Kay/Lauder Fair Share system, as well as its Cray UNICOS implementation, and concludes that the underlying model is inappropriate for interactive control. A new resource manager, Opportunity Scheduling, is then presented. Salient features include: (1) direct management of interactivity (or computing opportunity), (2) job prioritization within resource groups, (3) cooperative memory scheduling, and (4) a simple, user-oriented interface.
The paper then contrasts Opportunity Scheduling with the batch system employed at Los Alamos, where throughput and cycle allocation, rather than computing opportunity, are paramount considerations. It concludes with anecdotal experiences under the system.
Richard Lagerstrom and Stephan Gipp
Large parallel processing environments present great administrative challenges if high utilization of the available resources is a goal. In many cases there is also the need to support critical or time-dependent applications at the same time as development and routine production work is going on.
To deliver high performance and utilization, a modular and highly configurable scheduler named PScheD has been developed. Scheduling as implemented by PScheD (the Political Scheduling Daemon) is based on the concept of scheduling domains. Each domain represents a region of the T3E system that is controlled by a common set of scheduling rules and consists of gang scheduling, load balancing, resource management and fair-share components. External interfaces for administrative and informational access are provided along with support for NQE and site-written scheduling extensions.
T90 Strategic Direction
The T90 session has two parts. The first is a series of short presentations by Marketing, S/W Development, T90 Engineering, and Customer Support to give the latest T90 and T90P product status, and an update on the direction for T90 series transition to SN2. The second part of the session is an open Q&A for both the Cray participants and the T90 customers.
Operating Systems Update
Table of Contents | Author Index | CUG Home Page | Home (Title Page)
This talk will address the strategic direction of operating system development and support. Included will be a presentation of release plans for current and future operating system products. A major portion of this presentation is devoted to reaffirming Cray's commitment to present and future customers using UNICOS and UNICOS/mk.