Providing Distributed Services in a Heterogeneous Computing Environment

Peter W. Morreale
National Center For Atmospheric Research
P.O. Box 3000, Boulder Colorado. USA
morreale@ncar.ucar.edu

John Clyne
National Center For Atmospheric Research
P.O. Box 3000, Boulder Colorado. USA
clyne@ncar.ucar.edu

Craig Ruff
National Center For Atmospheric Research
P.O. Box 3000, Boulder Colorado. USA
cruff@ncar.ucar.edu

Mark Uris
National Center For Atmospheric Research
P.O. Box 3000, Boulder Colorado. USA
uris@ncar.ucar.edu

Don Middleton
National Center For Atmospheric Research
P.O. Box 3000, Boulder Colorado. USA
don@ncar.ucar.edu
ABSTRACT:
The National Center for Atmospheric Research (NCAR) Distributed Computing Services (DCS) project is a five year effort aimed at providing NCAR supercomputing services on local users desktops. The initial effort has focused on providing NCAR Mass Storage System (MSS) services to major compute servers including Cray, SGI, IBM, and Sun systems. The DCS system is designed around OSF's DCE software. DCS makes liberal use of the DCE Remote Procedure Call (RPC) mechanism, as well as the Cell Directory Service (CDS). This paper discusses the design of the NCAR DCS system as currently implemented as well as future directions.

KEYWORDS:
Mass Storage Systems, DCE, heterogeneous distributed services, RPC

The Distributed Services Project

The Distributed Computing Services (DCS) project was originally designed to provide a number of computing services across the entire network here at the National Center for Atmospheric Research (NCAR). To accomplish this, a flexible and extensible infrastructure capable of handling these services, as well as potential new services was designed and implemented. This paper will discuss the design and implementation of this infrastructure and share the lessons learned along the way.

A bit of history might be useful here. Like many supercomputing centers and laboratories, computing services were made available to the end user via a variety of "home-grown" interfaces. Many of these interfaces had little in common both with each other as well as those found in todays open systems. In addition these interfaces were often incomplete, difficult for a new user to understand, and only available on certain machines (due to hardware constraints). Additionally, due to the uniqueness of the NCAR computing environment, few commercial technologies were available to solve NCAR computing needs. Thus we were faced with providing yet another "home-grown" solution.

The DCS plan was to provide a common interface for these services that bears a strong resemblance to existing open environments. This allows the user to leverage their existing experience and decreases the learning curve necessary to use these services. Another major goal was to increase the availability of this interface to a large number of machines, including user's desktops where possible. The original project scope included batch-job submittal to compute servers, print services, access to the NCAR Mass Storage System (MSS), and Data Interchange services.

There is an old adage that says: "Most problems go away if you wait long enough" and this adage has been proven true. During the design and development process, a number of new technologies emerged that have pared down the original scope of the project. The availability of SGI/Cray Network Queuing Environment (NQE) eliminated the need for a home-grown job submittal service. Print services were (essentially) eliminated by the growth of the web; we no longer provide hardcopy services anymore. So out of the original four basic services, we're left with MSS access and Data Interchange. These two services have become the focus of the DCS development project.

There are two aspects to providing MSS access, metadata operations and file transfer. Metadata operations include the direct manipulation of those attributes that are unique to files stored on the NCAR MSS. Some of the more common attributes are described in the following section. Users must have acces to these atributes to maintain their MSS holdings. Users must also be able to perform file transfer between the various compute servers and the MSS.

The NCAR Mass Storage System

The NCAR Mass Storage System is a combination of hardware and locally developed software that provides file archive services to the NCAR computing facility. This system currently has 130 terabytes of data spread across some 4.5 million files. The data stored on the MSS increases at approximately 3 terabytes a month. We currently transfer some 18 terabytes a month between the MSS and the compute servers.

The current NCAR Mass Storage System, known as MSS-III, was deployed in 1986 as a replacement for the then current Ampex Terabit Memory System. Then, as now, the movement of data and metadata operations were separated onto two different paths. These paths consisted of NSC HYPERchannel networking hardware, and required custom software to drive the host-system interfaces, as well as custom network protocols to control the movement of data across the HYPERchannel trunks. Connection to the HYPERchannel trunks was costly and distance limited. Basically, only hosts physically close to the computer room were able to attach.

The metadata operations (and some low volume archive data) were moved over a separate network of HYPERchannel hardware and trunks, known as the Mainframe And Server network (MASnet). MASnet, a batch store and forward system, was also used to submit compute jobs to the large systems, and to return job output to the user.

The MSS metadata commands provided a pseudo-synchronous interface on top of the underlying MASnet transport. These commands were originally test programs written to exercise the MSS, and did not have a consistent user interface. However, as often happens, no replacements were written, and these testing commands became the production metadata interface to the MSS.

The MSS has a variety of non-UNIX like features. For example, MSS path names consist of up to 128 characters, and directories are implied (but do not actually exist) by user-supplied slash characters in the path name. When the children of the directory are removed, the directory is also removed.

File security is controlled through the use of read and write passwords, rather than permission bits. Users can specify passwords at file creation time, or later via a MSS metadata command.

All MSS files have a retention period associated with them of 1 to 32767 days. The expiration date of the file is the date the file was last referenced (read, written, or "touched") plus the retention period. Thus if a file is not referenced during the retention period, the file would expire and eventually be deleted from the system.

A major design goal of DCS was to present, as much as possible, a POSIX view of the MSS. To that end a number of metadata commands were developed that mimic their UNIX file system counterparts. This approach has several distinct advantages, not the least of which is leveraging user familiarity with UNIX-style filesystems. We have found that today's users are much more knowledgeable about basic UNIX commands such as ls, du, find, etc. Presenting the MSS in this light allows the user to quickly gain confidence on how to use the MSS, and avoid or lessen the learning curve associated with the system.

Implementation

The current implementation of DCS includes a number of POSIX-style clients, and several others used to access additional MSS attributes. There are several other clients used for DCS management functions. The entire system was written in ANSI C, with a small number of supporting Perl scripts.

The POSIX type clients include:

	msls		- List MSS files
	msmv		- Rename MSS files.
	msrm		- Remove MSS files.
	mstouch	- Change file access and modification times.
	mspwd	- Print the current working directory
	msdu		- List MSS file space used.
	msfind	- Locate files based on expressions.
	msrcp	- Transfer files to and from the MSS.

Non-POSIX clients include:

	msallinfo   - List all attributes of MSS file(s)
	mscdsetup   - Used to initiate "mscd" type command
	mspasswd    - Change MSS file passwords.
	mschgproj   - Change the project number of a MSS file.
	mscomment   - Modify the comment field of a MSS file.
	msretention - Change the retention period of a MSS file.
	msstage	  - Migrate a MSS file between MSS storage devices.
	msrecover   - Recover a MSS file.

DCS management clients include the following:

	dcsq		- List the current requests in the system.
	dcsjlog	- Return the log for DCS requests.
	qmadm	- Administrative access to the queue manager.
	dcsrm	- Remove a DCS request.
	dcswait 	- Wait for a specific DCS request to complete.

In the near future a couple of additional clients will be added to the suite. The msexp and msimp clients will allow access to the MSS Data Interchange service, and are in the design phase right now.

As mentioned previously, there are only three binaries associated with the above command suite. These are msmetadata, msrcp, and dcsq. All metadata commands are symbolic links to msmetadata. The DCS management commands are symbolic links to the dcsq binary.

Overall Design of DCS

The Distributed Computing Services system consists of a set of clients making Remote Procedure Calls (RPCs) to DCS servers. At the core of the DCS intrastructure is the Distributed Computing Environment (DCE) from the Open Software Foundation. We make liberal use of the DCE RPC mechanism throughout the DCS system. In addition, DCS also makes use of the DCE pthreads package.

DCS Clients submit requests to the DCS Resource Server, which passes the request along to a set of low-level applications that communicate directly with the MSS to perform the requested operation. The DCS Resource Server consists of a number of different servers, each with a specific purpose. In general a Resource Server consists of a listener, queue manager, request handlers, and the low-level MSS interface. The following diagram shows the relationship and flow of a request through the system.

DCS Clients

By design, DCS clients are kept quite simple. Most DCS clients do not know their own options, nor what operation the request will perform. There were a number of reasons for this design, most important of which was to decrease the amount of maintenance associated with deploying binaries over a large number of heterogeneous machines. By keeping the clients as simple as possible, clients need not be updated when things like options for the command change. In fact, there is only a single executable for each DCS service. The various commands associated with each service are merely symbolic links to the appropriate executable.

Typically, clients package up their arguments and submit the request to the listener. At this point the typical client will become an DCE RPC server itself, waiting for the results of the request from the request handler. The client (in this case, now a server) exports a "Resource Client Interface" which contains procedures for communication through the process's standard output (stdout) and standard error (stderr) devices, as well as terminating the client. Other clients may export other interfaces as well depending on the service they provide.

The DCS Resource Server

As previously mentioned, the DCS Resource Server is a combination of different servers, each of which perform specific actions to a request. This approach allows a flexible, extensible system that can easily grow as future needs dictate. In addition this separation allows us to update the different parts of the Resource Server with minimal impacts to the other servers. There are four distinct parts to a DCS Resource Server. These parts, and their major functions are outlined below:

The Listener

This server handles all incoming requests from the clients. The major functions of the listener are to perform some simple option validations, and queue the request with the queue manager. If the validations fail, the request is rejected immediately and no further processing is involved. This helps prevent any resource from being consumed for simple, easily detectable errors.

The listener assigns a queue and priority to the request on the basis of the command name and options specified in the command line. Commands assumed to require large amounts of resource to complete are assigned a lower priority than those assumed to consume fewer resources.

Once the listener submits the request to the queue manager the listener's interaction with the request is completed.

The Queue Manager

The queue manager provides three distinct functions in the system. First the queue manager provides a set of queue complexes for each different service DCS provides. The queue manager also provides persistence for the request in the event the Resource Server host dies or crashes. Finally, the queue manager maintains a user-accessible log of the request, which is updated as the request is processed.

Each queue complex consists of a set of priority sub-queues. The listener submits the request to a particular complex with an associated priority. The priorities are named high, reg, low, and super. The super sub-queue is reserved for administration purposes. Each sub-queue has a limit of the number of concurrent active requests it may have. When the limit is reached, the selection process blocks until notified that an outstanding request has completed. This scheme prevents, for example, a series of low priority requests from consuming all available MSS resources and causing high priority requests to wait.

The queue manager also maintains a log of the request as it passes through the system. The queue manager interface includes procedures that allow request handlers to update this log with things like command error messages, timestamps, and other pieces of information.

The Request Handlers

Request handlers provide the actual processing of the request. There are currently three distinct types of request handlers, one for metadata, another for file transfer and a third for DCS management/query functions. There will be a forth request handler for the Data Interchange service, currently in design phase as of this writing.

Request handlers currently enforce an active user limit; that is, the number of concurrent active requests owned by a single user. When that limit is reached, the request handler will send the request back to the queue manager for later processing. This prevents a single user from consuming all the MSS resources. Ideally, this limit should be the responsibility of the queue manager, and a planned upgrade will move this limit to queue manager.

For metadata operations, the metadata request handler interacts with the low-level MSS metadata command suite. In general, the metadata request handler dequeues a request from the queue manager, exec's the low-level metadata command, reads the stdout/stderr pipes from the low-level command, and transmits the output back to the original client via the Client Resource Interface. When the low-level command terminates, the request handler calls a resource interface procedure to terminate the client. Recall that after the client submits the request to the listener, the client becomes a server itself, waiting for both the output from the request and for termination.

The queue manager request handler processes requests for queue manager status/management functions. The handler interacts with the queue manager to obtain the required data, formats the data and forwards this output back to the client via the Resource Client interface.

The File Transfer request handler handles requests from msrcp clients. This handler performs local and MSS path expansion and forwards requests to one or more File Mover servers. This interaction is fully described in the File Transfer section below.

Request handlers are also responsible for monitoring the client to ensure the request is still valid. To accomplish this the request handler creates a "cancellation" thread at initialization. As the handler receives requests, the request is added to a list of active requests within the handler. The cancellation thread polls both the queue manager and the client to ensure that the request is still valid. If the client is unreachable for any reason, the handler halts further processing of the request.

Clients also create a cancellation thread during initialization. This thread waits until the associated request handler first pings the client, then sets a timeout value within the loop. This timeout value is reset every time the request handler invokes the Resource Client interface ping procedure. If the timeout period is reached, then the client will automatically terminate itself. This functionality is necessary since at this point, the client is a server waiting for response from the request handler. Should the request handler die for any reason, this functionality prevents clients from living indefinitely.

The low-level metadata commands

The low-level metadata command suite interacts directly with the MSS. It is the responsibility of the low level metadata commands to present the illusion of a (mostly) POSIX compliant file system with extensions. The MSS does not implement a POSIX compliant file system. Directories pop into existence as needed, and disappear when all children are removed or purged. There are no permission bits. Extras such as file comments, a retention period and read and write passwords are present. Absolute path names are always required, and are limited to 128 characters or less.

A single program, mssif, is the gatekeeper to the MSS metadata operations. It is called by the metadata request handler with the user's command line. mssif is responsible for checking the syntax of the options, and the expansion of the shell patterns (globing) to absolute MSS path names. mssif internally implements the simpler metadata operations and msmv. It farms out msdu, msfind, msls, msrawinfo and fminfo (a support program for msrcp) to other subsidiary programs for execution. The msfind command, may in turn, call mssif for further processing if the user has given the '-exec' option with another of the metadata commands.

This separation between the metadata request handler and the low-level interface allows the metadata commands to be updated independently, or the internal implementation can be changed without affecting the upper level DCS commands. The low-level metadata commands can also be used by two other pre-DCS remote access systems that provide their own queuing and request management.

To make things easy, we decided to write an interface (msslib) that emulates to POSIX-style system calls, with versions of stat, opendir/readdir/closedir and the other metadata operations. This allows us to modify existing POSIX software. The library code for globing, directory tree walking, as well as the commands msdu, msfind and msls are modified versions of their NetBSD equivalents. The MSS "current working directory" must be faked with the help of the user's shell, an environment variable and the msmetadata command.

mssif and the other commands are linked against msslib, which is made up of support routines and client RPC stubs that talk via DCE RPCs to the request concentrator, mssrv. Some state is maintained by msslib while talking to mssrv for the readdir and similar routines. mssrv serializes the requests and converts them to ONC RPCs for shipment to the metadata server (MSSMETA) that resides on the MSCP running MVS. The mssrv and MSSMETA programs can be run in parallel to improve aggregate throughput.

DCS File Transfer

File transfer to and from the MSS is, without a doubt, the most complex portion of the entire system. Historically, users were able to transfer files to and from the MSS via a small number of NCAR hosts that supported a direct file transfer connection (currently a Hippi based network) to the MSS. In addition, users could only transfer a single file with a single command invocation. The major goal of this system was to allow users to transfer files to and from the MSS not only from the NCAR compute servers, but from their desktops as well.

To accomplish this, and keeping with the major design goal of mimicing existing standards, DCS File transfer revolves around an rcp-style implementation. The DCS file transfer client, msrcp, is based upon the UNIX-style rcp command. Users can not only specify multiple files with a single command, but also have the ability to recursively traverse both local and MSS directories with a single command invocation.

Since it is not reasonable to configure every local user's desktop with a direct Hippi connection to the MSS, a mechanism must be provided to enable those transfers. Thus the system distinguishes between those systems that have a direct file transfer connection to the MSS and those that do not. Accordingly, client requests originating from directly-connected hosts use the Hippi connection and non-direct hosts use a staging file mover host. Throughout this discussion, the term direct-host implies a host with a Hippi connection to the MSS. The term indirect-host refers to those hosts without Hippi connections to the MSS.

Another important design consideration was the ability to detect files that reside on NFS filesystems. This allows us to bypass the NFS protocol by transferring the file directly from/to the NFS server even though the request originated from an NFS client. This may avoid transferring the file over the network at all since in many cases the NFS server is a direct-host. The optimization is also used on compute servers that are directly-connected to the MSS. In the special case where the NFS server is not included in the DCE cell, the file transfer will occur through NFS.

Finally, users here at NCAR have a need to be able to asynchronously transfer files between the local host and the MSS. While this could be accomplished by placing the msrcp command in the background, they would not have any easy method of determining when the request completed. Thus, support for asynchronous file transfers was another design consideration.

File Transfer interfaces

In order to provide this service, a number of RPC interfaces were developed. We will describe these here in order to facilitate later discussions. First, a Remote File I/O (rio) interface was developed. This is a distributed interface to basic POSIX filesystem operations such as open(2), read(2), write(2), stat(2), etc. and allows a server on one host to read and write files on another. In addition to these standard I/O system calls, a few other procedures were added to the interface. The ExpandPaths procedure is used to recursively traverse a given set of arguments into their respective local absolute path names. The ResolvePaths procedure determines whether a list of paths are local to that system, or are on an NFS mounted system. In the latter case, the procedure converts each local path name into the NFS server path name, and provides the host name of the NFS server. With this information, the request handler can potentially direct the transfer request to the NFS server.

Next a File Mover interface was developed. This interface provides the transfer between the local host and the MSS. There are three distinct versions of this interface. A staging version is used for requests from indirect-hosts. The staging version is located on a MSS direct-host and maintains a large spooling area. This version uses the RIO interface to transfer files from the remote client to it's spooling area.

A version of the File Mover interface is also included in msrcp itself. This version is used only if the invoking host is a direct-host, and one of more of the files implied in the request resides on local disk.

The third version is a standalone server used for completing asynchronous requests. This version along with other supporting servers are described later in a section dealing with asynchronous requests.

The File Transfer Request Handler

The File transfer request handler is similar in function and duties to the previously described request handlers. This handler attaches to the queue manager waiting for incoming file transfer requests. When a request is received, the handler parses the request, and creates one or more file mover requests. The handler then contacts the appropriate file mover(s) and submits the requests to the file mover(s).

Among other things, this request handler maintains a list of direct-hosts. This list is used to determine whether the msrcp client host is directly attached to the MSS.

When a request is received, the handler performs appropriate error checks on the argument list, and determines the direction of the transfer. MSS side arguments must be prefixed with the string "mss:" to indicate which arguments refer to MSS files. Arguments without the prefix are assumed to refer to local entities. Third party transfers, or those involving a host remote to the client, are not supported.

On the basis of the specified arguments, the handler then begins creating one or more File Mover requests. A File Mover request basically consists of a list of local path names and the corresponding MSS path names. Additional information such as the name of the client host, file sizes, and some resource tokens used to control the transfer are also included. In the event the files must be staged, the host information is used to contact the RIO interface. The resource tokens are used to provide limits on the transfer within the file mover.

If the target host is a direct-host, and the request is synchronous, then the file mover request will directed back to the file mover interface in msrcp. In this case, the RIO interface is not used since the request's host can merely transfer the files between the host and the MSS directly.

The interesting case is when the request involves NFS mounted filesystems. In this case the handler detects this and routes the request to the NFS server host, provided that the NFS server is included in the DCS cell. If the NFS server host is a direct-host, then the standalone File Mover interface server is contacted and completes the request. In the event the NFS server is not included in the DCE cell, the file I/O is performed via NFS.

Should a request include both NFS files and local files, the request handler will create separate requests and contact the appropriate File Movers to complete the request. This implies that a single request could result in multiple concurrent transfers between the MSS and the associated hosts.

The request handler also contacts the MSS to perform MSS path expansion and verify MSS arguments. A special low-level metadata client, fminfo, was created to facilitate this functionality. This client takes a list of MSS path names and returns certain characteristics of each returned path. The full path name of each argument is returned, along with the entity type (file, directory, or non-existent), if the path refers to an existing file, the file's size, and a code indicating the MSS storage device the existing file resides on. The MSS file sizes and storage device codes are used by the File Mover.

Since empty directories cannot exist on the MSS, some assumptions about the intent of the transfer are made. For example, if the request implies a multiple file write, and the MSS target path does not already exist, then the request handler assumes that the MSS target is referring to a directory and the MSS path names are built accordingly. If the request is a single file write, and the MSS target does not exist, then the target is assumed to point to the MSS file path name. Although this deviates from the behavior of rcp, user's cannot create empty directories prior to populating them.

The DCS File Mover

The File mover interface transfers files to and from the MSS. This interface can only be used on a direct-host. As previously stated, it is the responsibility of the request handler to contact an appropriate file mover.

Transfers to and from the MSS are multi-threaded. That is, a multi-file request to a file mover results in multiple concurrent transfers to/from the MSS. An underlying set of existing MSS file transfer software is used to transfer the files. In a multi-file request, the File mover creates a number of transfer threads, each of which execs this software to perform the individual transfers. This feature has two potential benefits, first wall-clock time for the entire transfer is dramatically reduced, and secondly, if the files reside on the same MSS storage media the MSS may detect this and maintain the mount point until all the files have been transferred.

The staging file mover maintains a large spooling space for indirect-host transfers. The required space is always preallocated at the start of the request. The amount of this allocation is dependent upon the number of files in the request, and the number of concurrent transfers the mover can use to complete the request. Recall that in the File Mover request, the file sizes are included in the request along with the MSS and local path names. The File Mover will sort the file list and calculate the minimum amount of spooling space needed to satisfy the request. The staging file mover will also use the RIO interface to read or writes files between the spooling area and the remote host.

For direct-hosts, the file transfer request has the local path names and the MSS pathnames. In this case the File Mover Interface merely invokes the host's low-level MSS file transfer commands with these paths.

An additional optimization is used by the file mover during multi-file read requests. The file mover uses the MSS storage codes to sort the list of files according to the MSS storage codes included in the file list. The MSS consists of multiple storage devices, each with different (relative) access times. The multi-file list is sorted such that files located on fast access MSS devices are intermixed with slow MSS storage devices. This optimization has two benefits, first, wall-clock time for completion is potentially reduced since transfers from the fast devices will complete more quickly and another transfer thread can be started. Secondly, for the staging file mover, file returns back through the network can be intermixed with MSS transfer threads thus reducing load on that particular network segment.

The following diagram illustrates the control and data flows between invocations of msrcp on direct-host and indirect-hosts.

The "indirect" path shows how the staging file mover host is used for the MSS transfer, along with the RIO server (dpriod) for the transfer between the staging host and the client host. The "direct" path invocation utilizes the file mover interface included in msrcp to directly transfer the file(s) to and from the MSS. Both cases have a dotted line to indicate standalone RIO and File Mover servers used for asynchronous requests.

Asynchronous File Transfer Requests

As noted previously, msrcp needed to support asynchronous file transfers as well as synchronous file transfers. This consideration posed new dimensions to completing a request. An additional set of software was implemented to handle these types of requests.

An asynchronous file transfer implies that msrcp merely submit the request to the listener and terminate. However, depending on the submitting host's configuration, the request handler would still need the services of the RIO package, and a File Mover to complete the request. DCS handles this by the creation of standalone RIO and File Mover servers. These instances are started by a server known as motherd. Motherd is a server that is started by every DCS host in the cell at boot time. Motherd reads a configuration file containing the list of supported services along with the path to the child servers.

When the request handler receives an asynchronous file transfer request, the request handler contacts motherd to start both an RIO server and a File mover server. These instances are started as setuid of the user for security purposes. The request handler then uses these instances to complete the request.

In order to provide a mechanism for users to synchronize with an asynchronous msrcp request, a client named dcswait was created. When the asynchronous option is specified on a msrcp invocation, msrcp will write a DCS job identifier to stdout. This identifier can be specified to dcswait and dcswait will block until the identifier has been deleted from the queue manager, indicating that the request has completed it's path through the DCS resource server. The command does not indicate whether the request completed successfully or not, only that the request has completed. In general, users might issue an asynchronous msrcp near the beginning of a batch job, continue processing of the job, and prior to accessing the files, perform a dcswait to ensure that the msrcp had completed.

The Data Interchange Service

The Data Interchange services provide for the importing and exporting of data to/from the NCAR MSS via removable media. The service is used primarily to import bit files representing foreign data - data not created on NCAR computers - into the NCAR's Mass Store System, and to export locally archived data to remote sites (primarily universities). In the age of the Internet and the World Wide Web, transferring data between geographically remote sites by way of magnetic tape may seem somewhat archaic. However, the volumes of data, often tens and even hundreds of Gigabytes, that are typical of a single user request make internet transfer on even the most advanced wide area networks completely impractical.

The service is presently defined by two text-based user commands, one for imports and one for exports. The turnaround time for a single request may be hours and possibly days, necessitating that the commands operate asynchronously. Progress of a request may be tracked via the DCS management commands. In the future we envision supporting a web-based interface, conceivably permitting a remote user equipped with a web browser to quickly have a tape dispatched to them in the mail. As computing power on the floor increases and archival space becomes more scarce, we expect that the Data Interchange service will become a more and more important SCD resource.

Some final thoughts...

The system has proven it's reliability by processing over a million requests to date. The major problems encountered have mostly been isolated to DCE problems with individual hosts. As DCE implementations have matured, we have noticed the majority of these problems disappear.

Although the system has been deployed on over 22 hosts, we still have not yet seen the widescale deployment originally envisioned. Part of the problem has been the need for an NCAR wide DCE system administration, as well as licensing and cost issues related to unbundled DCE packages. However, interest in MSS access is causing various NCAR divisions to investigate these DCE issues and some have already been added to the current DCE cell.

A prototype web interface has been developed, but this is not yet sufficiently mature enough to allow general user usage. This simple prototype merely provides a shell-like web interface to the DCS command suite. This and possible Java interfaces might sufficiently extend the DCS system such that further DCE deployment will be unnecessary. We are currently investigating these avenues.

We have noticed two areas for improvement with the current load limits implementation and the priority mechanism. Right now various limits are scattered between the queue manager and the request handlers and this approach is proving troublesome. We will be moving all load limits to the queue manager in the near future. The current priority implementation fails to allow the low-level metadata interface to differentiate between requests of differing priorities. This will be corrected by passing along priority information to the low-level interface, allowing it to more fairly distribute it's resources to the active requests.

The DCS commands have given the NCAR MSS users new ways to manipulate their MSS holdings. The new DCS ability to treat MSS directories as a unit will allow users to modify their habits and use more hierarchical directory structures to manage their archive data. DCS has succeeded in making the NCAR MSS more accessible and easier to use.

Table of Contents | Author Index | CUG Home Page | Home