The UNICORE Project:

Uniform Access to Supercomputing over the Web

 

Jim Almond
European Centre for Medium Range Weather Forecasts (ECMWF), Reading, UK
e-mail: j.almond@ecmwf.int

Mathilde Romberg
Forschungszentrum Jülich GmbH Zentralinstitut für Angewandte Mathematik
D-52425 Jülich

e-mail: m.romberg@fz-juelich.de

 

ABSTRACT:
Supercomputers are becoming more powerful, but also more centralised at fewer centres. To fully utilise the potential of such facilities, more uniform, secure, and user friendly access via the Internet is needed. In addition to these generalities, this talk will describe the UNICORE project, a large collaboration dedicated to the implementation of a prototype addressing the above goals.

Keywords:

seamless computing, uniform interface, abstract jobs, batch processing

Motivation

High Performance Computing facilities tend to be increasingly isolated by such deterrents as geographical remoteness, architectural individuality, and the non-uniform operational policies of autonomous centres. The future of such centralised Supercomputing facilities and large scale data resources may depend to a large extent on the development of interfaces for access to HPC resources from the user's desktop in a uniform and user-friendly manner; otherwise, High Performance Computing may fall short of its full potential, becoming increasingly specialised, and less competitive. In the most pessimistic scenario, the volume of the HPC market could fall below the threshold required for its economic survival in the free marketplace. The UNICORE project (Uniform Interface to Computing Resources) addresses these issues using the mechanisms of the World Wide Web (WWW).

Architecture

The Abstract Job Object

The Abstract Job Object (AJO) is the basis for the uniform, neutral specification of requests for computational and data resources - a conceptual representation of a "Job" as a sequence of possibly interdependent operations (AbstractTasks) to be carried out on various computational and data service platforms at collaborating sites. The object-oriented structure and syntax of the AJO and the AbstractTasks it contains are intended to support a specification largely independent of hardware architecture, system software interfaces, and site-specific operational rules. AbstractTasks are currently organised into ExecuteTasks, which carry out computations, and FileTasks, which accomplish data staging.

The executable content specified by the AJO can be compared to the Java byte code executed on platforms supporting the Java Virtual Machine. In the UNICORE architecture, the Network Job Supervisor (NJS) described below is analogous to the Java Virtual Machine. An example AJO is shown conceptually in Figure 1.

 

The AJO is also intended to serve as an open interface between components (see Figure 2), allowing multiple, conforming implementations of both the Job Preparation Agent (JPA) and the NJS to be developed and to coexist among UNICORE sites. In fact, within the UNICORE project, the AJO is used as the interface specification between JPA and the NJS, which are being independently developed. The interoperability goes beyond just independent development. A single NJS may accept AJOs from different JPAs, where each JPA is designed to support the needs of a specific user community. Likewise, the consignment of a child-AJO from one site to another need not be between identical NJS implementations, as long as the AJO standard is supported.

 

Components of the Unicore Architecture

As summarized in simplified form in Figure 2, the UNICORE architecture consists of three major components:

  1. The Job Preparation Agent (JPA), a Java Applet running within the user's Web browser, incorporates the GUI for the definition and construction of an Abstract Job Object. The Web page containing the JPA can be downloaded by any user who furnishes a valid UNICORE certificate. This certificate also serves to authenticate the user and to provide the user's identity (Ulogin) in the UNICORE domain. The applet itself is digitally signed in order to deter tampering. Using the GUI of the JPA, the user defines AJO in an abstract manner independent of the native syntax of the target platform. The user will normally specify a target machine at a site where he or she is registered under a local Unix login name Xlogin. To support this selection, the JPA can be asked to display the availability of resources at alternative UNICORE sites, using information in Resource Pages at candidate sites. Alternatively, if the target site offers multiple machines with common characteristics, the target platform can be specified as a class of required resources, allowing the most appropriate platform(s) to be selected by the Network Job Supervisor in step 3 below. When the AJO has been constructed, it is consigned to the NJS at the target site via the UNICORE Gateway running at the target site. The AJO is accompanied by the user's UNICORE certificate, which is automatically provided by the browser when running under the secure https protocol.
  2. The UNICORE Gateway consists of an https server and a Java Gateway Servlet invoked by the server. An instance of the Gateway is required at each participating site. At sites which run a security firewall, the UNICORE Gateway must be installed as a element of the firewall - essentially as a proxy - which can respect the local security policy. After authenticating the user's UNICORE certificate, the server passes the AJO to the servlet. Based on the user's Ulogin from the certificate, and on user information in a site-specific database, the Gateway Servlet checks the authorization of the user to use the requested resources, and maps the Ulogin to a valid local Unix Xlogin. Finally, the servlet sends the AJO to the NJS within the firewall.
  3. The Network Job Supervisor (NJS) receives the AJO and - subject to plausibility tests - accepts it for processing. At this point, an acknowledgement is returned to the JPA, and the user can exit, or use the JPA to construct further jobs. After setting up the Uspace for the AJO, as described in Section 3.2.3, the NJS translates the AbstractTask objects within the AJO to the syntax of the local target platform, and submits the result to the local Batch Subsystem (BSS). Any child AJO within the AJO being processed is relayed to the specified target site via the UNICORE Gateway. When the actions specified by the AJO have been carried out, the NJS returns sends a notification to the user, together with collected implicit output (stdout, stderr, and log files), and deletes the Uspace. Any files exported from the Uspace by explicit ExportTasks within the AJO remain available at the location specified by the user.

Project Organisation

The Unicore project is funded by the German ministry of education, science, research, and technology (bmb+f) for a period of two years starting in fall 1997. The group of project partners comes from universities, research laboratories, software companies, and computer manufacturers. Five institutions represent the core partners: The German Weather Centre (DWD), the Research Centre Jülich, the computer centre of the University of Stuttgart (RUS), and the software companies Genias and Pallas. Associated partners are the European Centre for Medium Range Weather Forecast (ECMWF), the Leibniz Computer Centre (LRZ), the Paderborn Centre for Parallel Computing (PC2, the computer centre of the University of Karlsruhe (RUKA), the Konrad-Zuse-Zentrum für Informationstechnik (ZIB), and five computer vendors (Fujitsu (Fecit), IBM, NEC, SGI/Cray, SNI), who have committed to support the development and adapt it to their systems.

The two-year time frame for the development of the UNICORE prototype is structured into three phases:

Phase 1 (3.Q '98) will develop a first prototype which allows a user to create a simple job through the uniform interface. The job can be executed on one particular target supercomputer. Input data has to be available on the target system and output data can be stored there. In this phase the Netscape communicator will be the supported Web browser.

In Phase II (1.Q '99) support is still limited to a single target site per job, but data staging is accomplished automatically within the domain of the target platform. This phase will also provide a GUI interface for the user to monitor and control jobs in the Unicore domain.

Phase III (3.Q '99) will support the AJO's containing interdependent components to be executed on various platforms at distributed sites, and the automatic staging of required data from remote locations and archives. The example in Figure 1 would require such support. Additional Web browsers will also be supported.


A User Scenario

Example: the EURAD Distributed Computational Model

The use of Unicore facilities for distributed jobs was introduced conceptually in Figure 1. In this section, we demonstrate these mechanisms for a distributed job suite from the real world.

Figure 3 shows the steps necessary for running the model for the prediction of the distribution of atmospheric pollutants in the EURAD project (Europäisches Ausbreitungs-und Depositionsmodell). This work is being carried out at the University of Cologne, using both SMP machines (the Cray T90), and MPI machines (the Cray T3E), as optimally appropriate for the steps in the sequence. In addition, data are routinely required from remote sites.

We assume the user has access to Cray T3E systems at remote sites, and to a single local Cray T90.

Traditional Procedures for Running the EURAD Model

Current procedures for running the model involve interactions with the operational environments of at least two platforms, and the transfer of data between platforms as necessary. These steps must be coordinated by the user using individual operations.

First of all, the necessary input data must be transferred to the file space of the (local) T90 target system, after which the first job can be submitted. When this is finished and the exit status checked, the resulting data can be used by the next step. The result from this step must be transferred to the (remote) T3E target system for use in the MPMM step. Results from this step are returned to the local file space for use by the PCTM step, and for storage in the local archive. The use of each platform normally involves unique resource limits, path names, user login values, and scheduling policies. There is no uniform means of transferring jobs and data in a secure fashion. Firewalls at remote sites may dictate the use of special security procedures and authentication mechanisms.

Running the EURAD Model using UNICORE

Using UNICORE, the user specifies all steps, and their dependencies in a single session using the GUI of the JPA. The resulting AJO is consigned to a target platform and site specified by the user. Data dependencies specified at this time include both the required pre-existing data, and intermediate data produced by ExecuteTasks within the AJO. User identification and authentication at all involved sites is established uniformly by the Unicore Gateway on the basis of the user's Unicore certificate; no site-specific login or password is needed. The resolution of interdependencies among AbstractTasks is accomplished automatically by the NJS at involved sites. The user may optionally monitor the progress of the distributed Abstract Job using a single Job Monitor and Controller interface via the local browser. Upon completion, the user is notified by selectable mechanisms, including electronic mail.

Figure 3. Simplified Depiction of the EURAD Distributed Model

 

Table of Contents | Author Index | CUG Home Page | Home