GET POSTSCRIPT VERSION GET PDF VERSION


Link to slides used for this paper as presented at the conference. They are in Acrobat PDF format.

Cray Product Installation and Configuration

Scott E. Grabow
Silicon Graphics
655F Lone Oak Drive
Eagan, MN 55121
grabowc@cray.com
ABSTRACT:
The Common Installation Tool (CIT) has had a large impact upon the installation process for Cray software. The migration to CIT for installation has yielded a large number of questions about the methodology for installing new software via CIT. This document discusses the installation process with CIT and how CIT's introduction has brought about changes to our installation process and documentation.

KEYWORDS:
UNICOS, UNICOS/mk, Common Installation Tool, CIT, Installation Process, Manuals



Copyright © 1998. Silicon Graphics Company. All rights reserved.

1 Introduction

The Common Installation Tool (CIT) has made a dramatic change to the processes and manuals for installation of UNICOS and asynchronous products. This document is divided into four major sections dealing with the changes and the impacts of the installation process.

2 Changes Made Between UNICOS 9.0 and UNICOS 10.0/UNICOS/mk

With the release of UNICOS 9.2 and UNICOS/mk the packaging and installation of Cray software has undergone major changes when compared to the UNICOS 9.0 packaging and installation process.

There are four major reasons for the changes to the packages, installation processes and installation tool used. Some of the changes were requested by customers, while others were necessary to improve the quality of packages provided. The changes should allow better installation support when sites encounter problems and allows for a common packaging and installation process for both UNICOS and UNICOS/mk.

2.1 Changes Requested by Customers

Over the years customers have requested that Cray:

Some customers have expressed that the UNICOS 9.0 installation process is rather hard to understand initially due to the incremental software approach, the number of packages needed to be loaded, and when these packages need to be installed. Part of the problem was the UNICOS Installation Guide[1] did not present a clear path as what should be performed during a UNICOS installation.

The new process provides all of the above features as part of the installation process.

2.2 Changes to Improve Package Quality and Support

While the above changes made for customers were implemented, additional changes were needed to support in-house processes. The new process reduces the amount of manual work needed to produce a release while also reducing the complexity of a package. The end result of the reductions is that the installation testing of each operating system release now takes less time than before while allowing for a more consistent installation process than previously.

The new packages being produced are less complex than the UNICOS 9.0 packages and ensure that all sites which are at a certain level have the same base from which changes can be made. This helps to reduce the number of variances on the systems when sites experience problems, and allows for more accurate investigation of what could be causing problems when they occur.

The UNICOS and UNICOS/mk packages installed by CIT are fully populated root and usr file systems. These packages are also designed to be a complete replacement package so installations are not dependent upon what you may have loaded previously. Having packages that are complete replacement means there are only two types of installs: an initial, or upgrade versus the UNICOS 9.0 install types: initial, upgrade, revision and updates.

Also, with the two packages types all software shipped for an initial install or upgrade install is the same, which allows for quicker validation of the loaded software, and ensures that the relocatable and source code will always match right after an install.

The reduced number of software packages means that the installation instructions have fewer options, and are simplified. The simplification also ensures that installation testing done in-house closely matches what sites will be doing.

With CIT we are now able to do upgrade install tests on all systems that a release supports and perform initial install tests on supported systems in a couple of hours. Typical install tests now take two to three hours, where UNICOS 9.0 install tests can take up to nine hours to perform.

2.3 Changes Made for a Common Packaging and Installation Process

With the new packaging scheme, we have unified the package types available for UNICOS between the CRAY J90 series systems and other systems. As of UNICOS 10.0 there are three package types available for all systems that run UNICOS:

For the CRAY T3E there are only two packages:

In order to provide sites with information about changes that have been made between the initial 10.0 release and a following release /etc/conv/release_modinfo is a command that can be run at a site with source code to provide the desired mod information of that release.

As a result of the change over from an incremental software installation process to the complete replacement process sites will need to remove local mods before upgrading. Once the upgrade has been completed sites will need to re-apply local mods. The advantage of this approach and the complete replacement process is that all mods are applied in Eagan, which results in consistent USMID lines in all modules during installation.

In the past the removal of local mods has been recommended, but with the new process the removal is now a requirement in order to ensure successful application of the local mods after the installation process completes.

The installation process has been separated into two distinct processes: Installation of software and configuration/building of software. The separation was made since there were two installation processes for UNICOS 9.0, which were different due to the differences in support between the CRAY J90 and the other mainline systems.

In the CRAY J90 world, the installation process was just the installation of software, where as for the other mainline systems the installation process covered installation of software, making necessary configuration changes and rebuilding a new UNICOS kernel.

Installation of UNICOS 10.0 is more like the CRAY J90 software installation process. Once that has been completed sites can then go ahead to UNICOS System Configuration using ICMS[2] which covers how to use ICMS to configure a system, recommendations, and the rebuilding of a UNICOS kernel.

The above changes were made to both packaging and installation to ensure the new installation process would address requests that were made over time with the Install Tool, install(8), and J90 Install Utility installation process.

3 Common Installation Tool (CIT)

The Common Installation Tool was created to satisfy the needs of customers who have been using the Install Tool and the J90 Install Utility. CIT was created from the comments gathered from customers over time and is both a GUI and curses based tool used to install products on remote machines. CIT currently is used for installing:

CIT has now become the only installation tool to be used for installation of software on systems running either UNICOS 10.0 or UNICOS/mk. All CD-ROMs shipped will now contain CIT and the necessary binaries and libraries so CIT can be used on the supported platforms for installation.

CIT replaces both the `Release Media Management' functionality that was included in install(8) and the J90 Install Utility which was used by CRAY J90 Model V based systems for UNICOS releases prior to UNICOS 9.2.

One of the biggest improvements in CIT over previous installation routines is that CIT supports compressed packages. The largest data transference bottleneck is the speed of the CD-ROM drive. We have been seeing 70% compression rate on the UNICOS/mk and UNICOS packages. With only 1/3 of the data to transfer, it is taking about 1/3 of the time to install. This translates into a UNICOS 9.0 source upgrade which takes 9 hours can now be performed as a UNICOS 10.0 source upgrade which takes 3 hours.

CIT runs on the OWS, J90 Console, and the SWS. CIT is also being used to perform installation of Async Packages on other workstations such as DEC, HP, IBM and SGI workstations.

3.1 Basics

CIT is a task based installation tool, where a wrapper runs on the workstation and reads a product information file (*.pif) which contains a list of tasks to be performed on both the local and remote systems. A task performed during the package installation can include installation of new software and performing actions locally on the remote system.

With CIT using a wrapper to monitor task output, people should be aware that problems are highlighted by CIT. A problem encountered during installation may be due to a problem with CIT itself, a problem with the package being installed, a problem with the remote environment on the mainframe, or a problem with the process.

The GUI for CIT shows on which remote system the installation will take place, which packages are available for installation, what order packages will be installed, and which packages have been installed for the current invocation of CIT only.

The interactive or curses mode can provide which remote system the installation will take place, which packages are available for installation and which packages have been installed by the system already for the current invocation only.

During an install CIT's wrapper parses the entire installation output and searches for install errors. Errors are detected in the following manners:

If an error does not follow either of the two processes, CIT may miss the problem and continue. Thus monitoring CIT's installation logs during the install is recommended since it provides an opportunity to stop the install and resolve problems before an error missed by CIT gets out of control. Additional information about how to monitor installs with CIT, where the installation logs are kept and additional information about problems is covered in `Where to look when Problems occur during and After Installation', section 4.5.

3.2 What is Present on the CD-ROM with CIT?

The CD-ROM that software products are distributed upon contains all the software needed to perform an install upon a remote system or the local workstation. The top level of the CD-ROM contains the following files and directories:

For the SWS-ION release, there may be additional directories that are present at the top level of the CD-ROM. The SWS-ION installation process uses these directories and their contents.

While CIT does receive software updates or fixes over time, it is recommended that sites used the version of CIT included on the CD-ROM when installing software. By doing so, sites will ensure they have and are using the latest version which has been tested with the packages being loaded. Using the version of CIT on the CD-ROM will help in the determining and resolving of problems that sites may encounter during installation since the capabilities and performance of that version of CIT are known.

3.3 Things Not Recommended with CIT

While CIT has been around for nearly two years, there are some features or things that a site can do that is not recommended. These features aren't readily used nor are they supported by most packages that CIT is capable of installing.

The usage of "Root Prefix" in CIT's GUI or rootdev in CIT's interactive mode is not recommended. Usage of this option when not mentioned in the installation documentation for a package will produce additional problems and unexpected results.

As part of the installation, Verification Information files(*.vif) are copied from the CD-ROM to the system where the installation is taking place. These files are used to validate the installation, and to validate past installations. The vif files should not be changed or removed once an installation has been completed. If the vif files are removed or changed by hand this will create future installation problems and configuration issues.

Problems that could be seen by a site which make changes to the vif files can include but are not limited to a premature failure of the installation process due to a "corrupted" vif file and removal of files which are necessary to perform future work.

4 Similarities between the Installation of UNICOS and UNICOS/mk

An opportunity was created with CIT was to make installation of UNICOS and UNICOS/mk software as common as possible. The installation processes need to follow a similar type of high level process to ensure that the installation, verification, and relinking can occur without problems.

By having both OS installations following the same high level process, the documentation is organized in a similar fashions so that people who install CRAY J90 software can also install CRAY T3E software with a high degree of confidence.

Besides the documentation's organization being similar, portions of the text and tasks performed during an installation are the same between the various manuals. The emphasis to have similar text between the manuals was done to ensure that people who are installing software are developing a common feeling for how the new installation process occurs.

To continue this reinforcement, the software being shipped is similar in packaging formats, naming, and contents.

4.1 High Level Installation Process for Initial and Upgrade Installations

The high level process for installation contains three major steps. Each step is designed to build on the next, and make sure that if problems arise they are isolated to a problem with the current task or the previous task.

The first high level step is preparation. During this step the system administrator gets superuser, security and network privileges set correctly, and allocates disk space for the new upgrade partitions. The administrator will also enter system information into the sysinfo file, create network communication paths via .rhosts, verify the network communication paths are setup correctly, and prepare the upgrade partitions for the installation of software.

Once the preparation task has been completed, the system administrator is ready to install the new software. This step uses CIT to perform any additional pre-installation tasks such as checking the network communication paths again, and doing additional tasks to support initial install.

Once CIT has completed it's pre-installation tasks the actual installation of software on the remote system will take place. During the installation of each part of a package, a verification step will take place to perform a checksum on the loaded files. In addition to the checksum each file in the package will be checked to make sure the permissions, file specification, owner and group ownership matches the file attributes necessary for the file.

After the installation of all the packages, CIT will perform post-installation tasks. These tasks can include converting configuration files between the old OS and the new OS, making sure any new system configuration files are in their proper location, re-linking a kernel, and transferring back to the OWS/J90 Console/SWS the new UNICOS kernel for the system.

Once CIT has finished, the site can then proceed to change the system configuration to take advantage of the new OS. This can include updating system configuration files, following new recommendations, and performing a complete system build if desired.

Both the initial installation and upgrade installation processes follow this high level process though additional tasks may be performed in any stage to ensure that once the installation and configuration is completed, the system is available for testing.

4.2 Initial Installation Process

The initial installation process follows the basic process already outlined, however, during each of the high level steps there are additional tasks that must be performed to ensure the system will be bootable into multi-user mode on site as soon as possible.

During the preparation task the system administrator enters system information into the sysinfo file, creates the network communications paths via .rhosts on the SWS, and verifies the network node name of the system is not an alias in the /etc/hosts file. The administrator will also create SWS configuration files for GigaRing based systems, build an initial param file for the system, and start the mainframe with a RAM file system and a generic kernel.

Now that the system is up and running in single user mode, CIT can be used to start the initial installation process. CIT will perform the following tasks prior to loading any new software:

Once CIT has successfully completed these tasks CIT will start to install software on the mainframe and verify the software was loaded correctly.

Once all the software installation has completed, CIT will perform the following tasks to prepare the system to go to multi-user:

Once successfully completed, the system should be re-booted with the newly built UNICOS kernel and param file. With the system up and running the site should be able to proceed to the creating of their system configuration.

System configuration entails the updating of necessary system configuration files on the system, review recommendations made and decide upon which recommendations the site wants to follow, and finally re-build a UNICOS kernel if significant changes were made to the kernel configuration.

Once the new kernel has been made, the system should be re-booted again to use the formal system configuration.

4.3 Upgrade Installation Process

The upgrade installation process follows the basic process already outline but has fewer tasks to be performed during each step since the system should already have been configured. The upgrade process will propagate the system configuration forward to the software being installed. This was done to ensure the system will be bootable into multi-user mode on site as soon as possible and to look like the previous system configuration.

During the preparation task the system administrator needs to get superuser, security, and network privileges. The system must be running in multi-user mode with a PRIV_SU kernel which has the NETW_RCMD_COMPAT bit set in the SECURE_NET_OPTIONS bitmask and networks up. This makes possible the usage of remote shell commands to be executed between the OWS/J90 Console/SWS will be able to take place. For sites that have a heightened level of security, we recommend that they perform these installations during dedicated time and restrict access to the system.

In addition to setting up the necessary privileges, the system administrator will need to allocate disk space for the new file systems. Once the disk space has been allocated, upgrade.setup will mkfs, labelit, fsck, mount, dump and restore the old file system onto the new file system partitions as necessary. At the very end of upgrade.setup additional cleanup will be done to ensure that the Nmakefile.mo and Nmakefile.ms files and other extraneous files are cleaned up to ensure a smooth upgrade.

The last task the system administrator will need to perform prior to starting the upgrade process is the backup and removal of local mods applied to source on the Cray mainframe. This must be done to ensure that once the software installation has successfully completed, the site will be able to re-apply the local mods to their system without creating conflicts or additional administrative problems.

Now that the system has been prepared for the installation and with network communications paths up and running, CIT can be used to start the upgrade installation process. CIT will start to install software on the mainframe and verify the software was loaded correctly.

Once all the software installation has completed, CIT will perform the following tasks to prepare the system to go to multi-user:

Once CIT has successfully completed all of its tasks the source sites may want to run the /etc/conv/release_modinfo script which generates a list of mods that have gone into the current and previous releases since UNICOS 10.0.

If the site has any local mods, they should be applied to the source at this time and dependencies should be addressed before continuing with the system configuration task.

System configuration entails the updating of necessary system configuration files on the system, review and decided which recommendations made with this release to follow, and finally re-build a UNICOS kernel if significant changes were made to the kernel.

Once the new kernel has been made, the system should be re-booted again to use the formal system configuration. Once re-booted the system security configuration should be verified, copy time critical files such as the UDB from one root to another, go to multi-user mode, and to restart NQE checkpointed jobs or processes.

4.4 Issues with New Process

With the new installation process and CIT there are several changes that have occurred with the installation process since UNICOS 9.0. Sites need to be aware of the issues associated with these changes and they should also be aware of our recommendations on how to proceed with an installation.

The first major point is that the installation process now requires a PRIV_SU kernel with the NETW_RCMD_COMPAT bit set in the SECURE_NET_OPTIONS bitmask. This will allow CIT to use remote shell commands to perform tasks between the SWS and the Cray mainframe. Common tasks that are performed across the two systems include the accessing of the CD-ROM drive for files, and the transferring of files between the two systems to perform the installation.

With the usage of the remote shell commands the usage of .rhosts on the OWS/J90 Console/SWS and the Cray mainframe became necessary. This was done since some of the Model E and CRAY J90 Model V communication paths that were used in the past are no longer present on GigaRing based systems. Also the relationship between the workstation and the mainframe changed when going to GigaRing. These two changes along with the desire for a common installation process across all systems meant that the .rhosts facility was the lowest common denominator present across all the architectures.

The need for PRIV_SU has come about since the variety of security configurations at sites is as varied as the number of sites. To ensure the installation process can be accurately tested in-house, and that the installation process is as close to what was tested in-house the need for PRIV_SU became clear. PRIV_SU allows everyone to upgrade in a similar process, and then allows sites that have unique security needs to ensure their needs are meet by providing them with an opportunity to make necessary security changes.

In the past, sites with security concerns could install software during batch time. With CIT, if a site has security concern we recommend that sites now perform installation during dedicated time. This will allow the system administrator to take necessary steps to ensure that only the installation process is taking place on the system while providing the administrator the ability to monitor the installation process as much as desired.

A network connection is now required between the OWS/J90 Console/SWS and the Cray mainframe. The Model E and GigaRing based systems have a private network connection between the two systems, while the CRAY J90 Model V based systems historically did not. Since CIT relies very heavily upon network connections between the two systems, the J90 Console will need to be added to a network that also connects to the Cray mainframe.[4]

Ideally, the connection between the two systems would be a private network, however, if a private network can not be made Cray is recommending that during upgrade installations that the Cray J90 be removed from the public network. The site should then create a private network between the J90 Console and the Cray mainframe for the period of the installation. This will ensure people who have security concerns that neither the J90 Console nor the Cray mainframe will be accessed during the installation.

Software being installed via CIT can't be directly placed on or reside on a DFS file system, since the DCE/DFS user authentication process does not take place when remote shell commands execution is taking place on a system. Thus installation will need to occur on a local file system and if that file system needs to become a DFS file system, that will have be taken care of after the installation process has been successfully completed by the system administrator.

A second source partition is needed for the UNICOS 9.0 to UNICOS 10.0 upgrade. Once the upgrade to UNICOS 10.0 has been completed, sites can perform a single source upgrade from UNICOS 10.0 to a newer version of UNICOS 10.0 in the future. The need for two source partitions during the UNICOS 9.0 to UNICOS 10.0 upgrade is being done to ensure that the UNICOS 10.0 source partition is clean when initially installing UNICOS 10.0. The usage of two source partitions will ensure that the new packaging process will not leave miscellaneous files lying around between future upgrades.

4.5 Where to look if Problems occur during and After Installation

During the installation, you can watch what is happening by clicking on the `Display install logs' button of CIT's Install Process Window. A new window will appear that contains all the output of the tasks that are taking place locally on the workstation and on the remote system.

If multiple packages are being loaded, CIT will keep each package's installation information in a separate installation log file and in separate windows. This allows the installation process to be reviewed in the context of the package being installed.

CIT does create multiple installation logs for itself and for each package being loaded. For each invocation of CIT, a new set of installation files will be kept in /tmp/cit.<username>. If there already exists a directory with that name, CIT will move the directory to /tmp/cit.<username>.0 and recreate the CIT installation log directory.

Inside each of the CIT installation log directories are the following files which can be reviewed if problems are experienced during an installation:

If a problem is encountered with CIT during an installation, the first place to look is at the package log file. While the package log file will contain errors associated with a specific package's installation, it would be wise to review the cit.log file. to ensure that problems were package related or CIT related.

4.6 If Problems are Experienced and an SPR is filed

As with any product or process, there are bound to be some problems over the course of the product's life. In order to facilitate a quick response to a problem experienced, it has been found that providing information about the installation experience when the SPR is filed will most of the time provide the information needed to research and analyze the problem and help to develop a fix.

The following files and information have been found to be necessary to resolve site installation problems:

With the new processes and CIT it is hoped that sites will become more familiar with the underlying process that takes place during a software upgrade, and that opportunities are provided to allow sites to use the latest software released as quickly as possible.

5 Why the Manuals were Changed With the New Installation Process

Up to this point, it has been highlighted that the changes to the installation manuals were the result of having CIT available and the desire to create a similar installation process for UNICOS and UNICOS/mk. There are several reasons for the changes that have been to the manuals.

The UNICOS 9.0 installation guide had the installation and configuration information intermixed and spread out across the entire manual. While the installation and configuration information was still needed, there were certain recommendations that people would encountered only when installing a revision, or when installing an update, but the recommendations were valid for all upgrade installations.

In reviewing the manual, it became clear that the intermixing of the two processes in one manual created more confusion for readers. Readers were unsure if they had finished the software installation and had progressed to configuration.

In general the installation guide was poorly organized, or contained way too much information for the installation process. Readers also mentioned they wanted to follow a task-based manual that clearly separated the installation process from the configuration process. These comments lead to the new installation manuals and a single configuration manual. These manuals are tasked based and don't have people flipping back and forth in the manual to verify information, or to find necessary information.

Another change was to have each chapter be an installation process. In the past some people were unsure of where to begin since the UNICOS installation manual used unique terms like upgrade, revision, or update. People unfamiliar with this terminology more than often made a wrong decision about where they should start in the installation guide and would need to perform the installation again to fix the problems.

The end result is four UNICOS installation guides, one UNICOS system configuration guide and one UNICOS/mk installation and configuration guide[5]. Where the UNICOS installation guides [6],[7],[8], [9] are primarily divided upon the Model E, Model V and GigaRing IO models. The GigaRing IO based systems are separated where necessary to satisfy some very specific hardware differences for performing initial installations.

5.1 Comparison of Old Installation Manual Organization with New Installation and Configuration Manuals

The old UNICOS® 9.0 Installation Guide, publication SG-2112 is orgnaized as follows:

The new UNICOS® 10.0 Installation Manuals, publications SG-5271, SG-5296, SG-5297 and SG-5298 are organized as follows:

While the new UNICOS® System Configuration Using ICMS Guide, publication SG-2412 is organized as follows:

The new manuals and their new organization should help the infrequent user to eliminate the confusion as to where the starting point is for an installation and where to continue after performing an OS installation.

6 Final remarks

It is hoped that the new installation process, packages, and manuals will resolve some issues that have been present for years with the old UNICOS installation process and provide sites with a highly reliable method for installing software and confidence the installation occurred as tested.


References

[1]UNICOS® Installation Guide, publication SG-2112.
[2]UNICOS® System Configuration Using ICMS, publication SG-2412.

[3]Common Installation Tool Reference Card, publication SQ-2218.
[4]CRAY J90TM IOS-V and CRAY EL TM UNICOS® Issues, Cray Research Service Bulletin, November 1997
[5]UNICOS/mkTM Installation Guide for CRAY T3ETM Series Systems, publication SG-2610.
[6]UNICOS® Installation Guide for CRAY J90 TM and CRAY J90se TM Model V based systems, publication SG-5271
[7]UNICOS® Installation Guide for CRAY J90 TM and CRAY J90seTM GigaRing based systems, publication SG-5296
[8]UNICOS® Installation Guide for CRAY C90 TM, CRAY T90TM and CRAY T90 IEEETM Model E based systems, publication SG-5297.
[9]UNICOS® Installation Guide for CRAY T90 TM and CRAY T90 IEEETM GigaRing based systems, publication SG-5298.

Scott E. Grabow (grabowc@cray.com)

Table of Contents | Author Index | CUG Home Page | Home