Scott E. Grabow
grabowc@cray.com
The Common Installation Tool (CIT) has made a dramatic change to the processes and manuals for installation of UNICOS and asynchronous products. This document is divided into four major sections dealing with the changes and the impacts of the installation process.
With the release of UNICOS 9.2 and UNICOS/mk the packaging and installation of Cray software has undergone major changes when compared to the UNICOS 9.0 packaging and installation process.
There are four major reasons for the changes to the packages, installation processes and installation tool used. Some of the changes were requested by customers, while others were necessary to improve the quality of packages provided. The changes should allow better installation support when sites encounter problems and allows for a common packaging and installation process for both UNICOS and UNICOS/mk.
Over the years customers have requested that Cray:
Some customers have expressed that the UNICOS 9.0 installation process is rather hard to understand initially due to the incremental software approach, the number of packages needed to be loaded, and when these packages need to be installed. Part of the problem was the UNICOS Installation Guide[1] did not present a clear path as what should be performed during a UNICOS installation.
The new process provides all of the above features as part of the installation process.
While the above changes made for customers were implemented, additional changes were needed to support in-house processes. The new process reduces the amount of manual work needed to produce a release while also reducing the complexity of a package. The end result of the reductions is that the installation testing of each operating system release now takes less time than before while allowing for a more consistent installation process than previously.
The new packages being produced are less complex than the UNICOS 9.0 packages and ensure that all sites which are at a certain level have the same base from which changes can be made. This helps to reduce the number of variances on the systems when sites experience problems, and allows for more accurate investigation of what could be causing problems when they occur.
The UNICOS and UNICOS/mk packages installed by CIT are fully populated root and usr file systems. These packages are also designed to be a complete replacement package so installations are not dependent upon what you may have loaded previously. Having packages that are complete replacement means there are only two types of installs: an initial, or upgrade versus the UNICOS 9.0 install types: initial, upgrade, revision and updates.
Also, with the two packages types all software shipped for an initial install or upgrade install is the same, which allows for quicker validation of the loaded software, and ensures that the relocatable and source code will always match right after an install.
The reduced number of software packages means that the installation instructions have fewer options, and are simplified. The simplification also ensures that installation testing done in-house closely matches what sites will be doing.
With CIT we are now able to do upgrade install tests on all systems that a release supports and perform initial install tests on supported systems in a couple of hours. Typical install tests now take two to three hours, where UNICOS 9.0 install tests can take up to nine hours to perform.
With the new packaging scheme, we have unified the package types available for UNICOS between the CRAY J90 series systems and other systems. As of UNICOS 10.0 there are three package types available for all systems that run UNICOS:
For the CRAY T3E there are only two packages:
In order to provide sites with information about changes that have been made between the initial 10.0 release and a following release /etc/conv/release_modinfo
is a command that can be run at a site with source code to provide the desired mod information of that release.
As a result of the change over from an incremental software installation process to the complete replacement process sites will need to remove local mods before upgrading. Once the upgrade has been completed sites will need to re-apply local mods. The advantage of this approach and the complete replacement process is that all mods are applied in Eagan, which results in consistent USMID lines in all modules during installation.
In the past the removal of local mods has been recommended, but with the new process the removal is now a requirement in order to ensure successful application of the local mods after the installation process completes.
The installation process has been separated into two distinct processes: Installation of software and configuration/building of software. The separation was made since there were two installation processes for UNICOS 9.0, which were different due to the differences in support between the CRAY J90 and the other mainline systems.
In the CRAY J90 world, the installation process was just the installation of software, where as for the other mainline systems the installation process covered installation of software, making necessary configuration changes and rebuilding a new UNICOS kernel.
Installation of UNICOS 10.0 is more like the CRAY J90 software installation process. Once that has been completed sites can then go ahead to UNICOS System Configuration using ICMS[2] which covers how to use ICMS to configure a system, recommendations, and the rebuilding of a UNICOS kernel.
The above changes were made to both packaging and installation to ensure the new installation process would address requests that were made over time with the Install Tool, install(8), and J90 Install Utility installation process.
The Common Installation Tool was created to satisfy the needs of customers who have been using the Install Tool and the J90 Install Utility. CIT was created from the comments gathered from customers over time and is both a GUI and curses based tool used to install products on remote machines. CIT currently is used for installing:
CIT has now become the only installation tool to be used for installation of software on systems running either UNICOS 10.0 or UNICOS/mk. All CD-ROMs shipped will now contain CIT and the necessary binaries and libraries so CIT can be used on the supported platforms for installation.
CIT replaces both the `Release Media Management' functionality that was included in install(8) and the J90 Install Utility which was used by CRAY J90 Model V based systems for UNICOS releases prior to UNICOS 9.2.
One of the biggest improvements in CIT over previous installation routines is that CIT supports compressed packages. The largest data transference bottleneck is the speed of the CD-ROM drive. We have been seeing 70% compression rate on the UNICOS/mk and UNICOS packages. With only 1/3 of the data to transfer, it is taking about 1/3 of the time to install. This translates into a UNICOS 9.0 source upgrade which takes 9 hours can now be performed as a UNICOS 10.0 source upgrade which takes 3 hours.
CIT runs on the OWS, J90 Console, and the SWS. CIT is also being used to perform installation of Async Packages on other workstations such as DEC, HP, IBM and SGI workstations.
CIT is a task based installation tool, where a wrapper runs on the workstation and reads a product information file (*.pif) which contains a list of tasks to be performed on both the local and remote systems. A task performed during the package installation can include installation of new software and performing actions locally on the remote system.
With CIT using a wrapper to monitor task output, people should be aware that problems are highlighted by CIT. A problem encountered during installation may be due to a problem with CIT itself, a problem with the package being installed, a problem with the remote environment on the mainframe, or a problem with the process.
The GUI for CIT shows on which remote system the installation will take place, which packages are available for installation, what order packages will be installed, and which packages have been installed for the current invocation of CIT only.
The interactive or curses mode can provide which remote system the installation will take place, which packages are available for installation and which packages have been installed by the system already for the current invocation only.
During an install CIT's wrapper parses the entire installation output and searches for install errors. Errors are detected in the following manners:
If an error does not follow either of the two processes, CIT may miss the problem and continue. Thus monitoring CIT's installation logs during the install is recommended since it provides an opportunity to stop the install and resolve problems before an error missed by CIT gets out of control. Additional information about how to monitor installs with CIT, where the installation logs are kept and additional information about problems is covered in `Where to look when Problems occur during and After Installation', section 4.5.
The CD-ROM that software products are distributed upon contains all the software needed to perform an install upon a remote system or the local workstation. The top level of the CD-ROM contains the following files and directories:
For the SWS-ION release, there may be additional directories that are present at the top level of the CD-ROM. The SWS-ION installation process uses these directories and their contents.
While CIT does receive software updates or fixes over time, it is recommended that sites used the version of CIT included on the CD-ROM when installing software. By doing so, sites will ensure they have and are using the latest version which has been tested with the packages being loaded. Using the version of CIT on the CD-ROM will help in the determining and resolving of problems that sites may encounter during installation since the capabilities and performance of that version of CIT are known.
While CIT has been around for nearly two years, there are some features or things that a site can do that is not recommended. These features aren't readily used nor are they supported by most packages that CIT is capable of installing.
The usage of "Root Prefix" in CIT's GUI or rootdev
in CIT's interactive mode is not recommended. Usage of this option when not mentioned in the installation documentation for a package will produce additional problems and unexpected results.
As part of the installation, Verification Information files(*.vif) are copied from the CD-ROM to the system where the installation is taking place. These files are used to validate the installation, and to validate past installations. The vif files should not be changed or removed once an installation has been completed. If the vif files are removed or changed by hand this will create future installation problems and configuration issues.
Problems that could be seen by a site which make changes to the vif files can include but are not limited to a premature failure of the installation process due to a "corrupted" vif file and removal of files which are necessary to perform future work.
An opportunity was created with CIT was to make installation of UNICOS and UNICOS/mk software as common as possible. The installation processes need to follow a similar type of high level process to ensure that the installation, verification, and relinking can occur without problems.
By having both OS installations following the same high level process, the documentation is organized in a similar fashions so that people who install CRAY J90 software can also install CRAY T3E software with a high degree of confidence.
Besides the documentation's organization being similar, portions of the text and tasks performed during an installation are the same between the various manuals. The emphasis to have similar text between the manuals was done to ensure that people who are installing software are developing a common feeling for how the new installation process occurs.
To continue this reinforcement, the software being shipped is similar in packaging formats, naming, and contents.
The high level process for installation contains three major steps. Each step is designed to build on the next, and make sure that if problems arise they are isolated to a problem with the current task or the previous task.
The first high level step is preparation. During this step the system administrator gets superuser, security and network privileges set correctly, and allocates disk space for the new upgrade partitions. The administrator will also enter system information into the sysinfo
file, create network communication paths via .rhosts
, verify the network communication paths are setup correctly, and prepare the upgrade partitions for the installation of software.
Once the preparation task has been completed, the system administrator is ready to install the new software. This step uses CIT to perform any additional pre-installation tasks such as checking the network communication paths again, and doing additional tasks to support initial install.
Once CIT has completed it's pre-installation tasks the actual installation of software on the remote system will take place. During the installation of each part of a package, a verification step will take place to perform a checksum on the loaded files. In addition to the checksum each file in the package will be checked to make sure the permissions, file specification, owner and group ownership matches the file attributes necessary for the file.
After the installation of all the packages, CIT will perform post-installation tasks. These tasks can include converting configuration files between the old OS and the new OS, making sure any new system configuration files are in their proper location, re-linking a kernel, and transferring back to the OWS/J90 Console/SWS the new UNICOS kernel for the system.
Once CIT has finished, the site can then proceed to change the system configuration to take advantage of the new OS. This can include updating system configuration files, following new recommendations, and performing a complete system build if desired.
Both the initial installation and upgrade installation processes follow this high level process though additional tasks may be performed in any stage to ensure that once the installation and configuration is completed, the system is available for testing.
The initial installation process follows the basic process already outlined, however, during each of the high level steps there are additional tasks that must be performed to ensure the system will be bootable into multi-user mode on site as soon as possible.
During the preparation task the system administrator enters system information into the sysinfo
file, creates the network communications paths via .rhosts
on the SWS, and verifies the network node name of the system is not an alias in the /etc/hosts
file. The administrator will also create SWS configuration files for GigaRing based systems, build an initial param file for the system, and start the mainframe with a RAM file system and a generic kernel.
Now that the system is up and running in single user mode, CIT can be used to start the initial installation process. CIT will perform the following tasks prior to loading any new software:
Once CIT has successfully completed these tasks CIT will start to install software on the mainframe and verify the software was loaded correctly.
Once all the software installation has completed, CIT will perform the following tasks to prepare the system to go to multi-user:
/skl
/etc/host
and /etc/config/interfaces
file from the RAM file system to the new file system partition
/etc/config/config.mh
/etc/privcmd
to setup the initial PALs for security
Once successfully completed, the system should be re-booted with the newly built UNICOS kernel and param file. With the system up and running the site should be able to proceed to the creating of their system configuration.
System configuration entails the updating of necessary system configuration files on the system, review recommendations made and decide upon which recommendations the site wants to follow, and finally re-build a UNICOS kernel if significant changes were made to the kernel configuration.
Once the new kernel has been made, the system should be re-booted again to use the formal system configuration.
The upgrade installation process follows the basic process already outline but has fewer tasks to be performed during each step since the system should already have been configured. The upgrade process will propagate the system configuration forward to the software being installed. This was done to ensure the system will be bootable into multi-user mode on site as soon as possible and to look like the previous system configuration.
During the preparation task the system administrator needs to get superuser, security, and network privileges. The system must be running in multi-user mode with a PRIV_SU
kernel which has the NETW_RCMD_COMPAT
bit set in the SECURE_NET_OPTIONS
bitmask and networks up. This makes possible the usage of remote shell commands to be executed between the OWS/J90 Console/SWS will be able to take place. For sites that have a heightened level of security, we recommend that they perform these installations during dedicated time and restrict access to the system.
In addition to setting up the necessary privileges, the system administrator will need to allocate disk space for the new file systems. Once the disk space has been allocated, upgrade.setup
will mkfs, labelit, fsck, mount, dump and restore the old file system onto the new file system partitions as necessary. At the very end of upgrade.setup additional cleanup will be done to ensure that the Nmakefile.mo and Nmakefile.ms files and other extraneous files are cleaned up to ensure a smooth upgrade.
The last task the system administrator will need to perform prior to starting the upgrade process is the backup and removal of local mods applied to source on the Cray mainframe. This must be done to ensure that once the software installation has successfully completed, the site will be able to re-apply the local mods to their system without creating conflicts or additional administrative problems.
Now that the system has been prepared for the installation and with network communications paths up and running, CIT can be used to start the upgrade installation process. CIT will start to install software on the mainframe and verify the software was loaded correctly.
Once all the software installation has completed, CIT will perform the following tasks to prepare the system to go to multi-user:
/skl
that are new or are missing
/etc/config/config.mh
/etc/privcmd
to setup the initial PALs for security
Once CIT has successfully completed all of its tasks the source sites may want to run the /etc/conv/release_modinfo
script which generates a list of mods that have gone into the current and previous releases since UNICOS 10.0.
If the site has any local mods, they should be applied to the source at this time and dependencies should be addressed before continuing with the system configuration task.
System configuration entails the updating of necessary system configuration files on the system, review and decided which recommendations made with this release to follow, and finally re-build a UNICOS kernel if significant changes were made to the kernel.
Once the new kernel has been made, the system should be re-booted again to use the formal system configuration. Once re-booted the system security configuration should be verified, copy time critical files such as the UDB from one root to another, go to multi-user mode, and to restart NQE checkpointed jobs or processes.
With the new installation process and CIT there are several changes that have occurred with the installation process since UNICOS 9.0. Sites need to be aware of the issues associated with these changes and they should also be aware of our recommendations on how to proceed with an installation.
The first major point is that the installation process now requires a PRIV_SU
kernel with the NETW_RCMD_COMPAT
bit set in the SECURE_NET_OPTIONS
bitmask. This will allow CIT to use remote shell commands to perform tasks between the SWS and the Cray mainframe. Common tasks that are performed across the two systems include the accessing of the CD-ROM drive for files, and the transferring of files between the two systems to perform the installation.
With the usage of the remote shell commands the usage of .rhosts
on the OWS/J90 Console/SWS and the Cray mainframe became necessary. This was done since some of the Model E and CRAY J90 Model V communication paths that were used in the past are no longer present on GigaRing based systems. Also the relationship between the workstation and the mainframe changed when going to GigaRing. These two changes along with the desire for a common installation process across all systems meant that the .rhosts
facility was the lowest common denominator present across all the architectures.
The need for PRIV_SU
has come about since the variety of security configurations at sites is as varied as the number of sites. To ensure the installation process can be accurately tested in-house, and that the installation process is as close to what was tested in-house the need for PRIV_SU
became clear. PRIV_SU
allows everyone to upgrade in a similar process, and then allows sites that have unique security needs to ensure their needs are meet by providing them with an opportunity to make necessary security changes.
In the past, sites with security concerns could install software during batch time. With CIT, if a site has security concern we recommend that sites now perform installation during dedicated time. This will allow the system administrator to take necessary steps to ensure that only the installation process is taking place on the system while providing the administrator the ability to monitor the installation process as much as desired.
A network connection is now required between the OWS/J90 Console/SWS and the Cray mainframe. The Model E and GigaRing based systems have a private network connection between the two systems, while the CRAY J90 Model V based systems historically did not. Since CIT relies very heavily upon network connections between the two systems, the J90 Console will need to be added to a network that also connects to the Cray mainframe.[4]
Ideally, the connection between the two systems would be a private network, however, if a private network can not be made Cray is recommending that during upgrade installations that the Cray J90 be removed from the public network. The site should then create a private network between the J90 Console and the Cray mainframe for the period of the installation. This will ensure people who have security concerns that neither the J90 Console nor the Cray mainframe will be accessed during the installation.
Software being installed via CIT can't be directly placed on or reside on a DFS file system, since the DCE/DFS user authentication process does not take place when remote shell commands execution is taking place on a system. Thus installation will need to occur on a local file system and if that file system needs to become a DFS file system, that will have be taken care of after the installation process has been successfully completed by the system administrator.
A second source partition is needed for the UNICOS 9.0 to UNICOS 10.0 upgrade. Once the upgrade to UNICOS 10.0 has been completed, sites can perform a single source upgrade from UNICOS 10.0 to a newer version of UNICOS 10.0 in the future. The need for two source partitions during the UNICOS 9.0 to UNICOS 10.0 upgrade is being done to ensure that the UNICOS 10.0 source partition is clean when initially installing UNICOS 10.0. The usage of two source partitions will ensure that the new packaging process will not leave miscellaneous files lying around between future upgrades.
During the installation, you can watch what is happening by clicking on the `Display install logs' button of CIT's Install Process Window. A new window will appear that contains all the output of the tasks that are taking place locally on the workstation and on the remote system.
If multiple packages are being loaded, CIT will keep each package's installation information in a separate installation log file and in separate windows. This allows the installation process to be reviewed in the context of the package being installed.
CIT does create multiple installation logs for itself and for each package being loaded. For each invocation of CIT, a new set of installation files will be kept in /tmp/cit.<username>. If there already exists a directory with that name, CIT will move the directory to /tmp/cit.<username>.0 and recreate the CIT installation log directory.
Inside each of the CIT installation log directories are the following files which can be reviewed if problems are experienced during an installation:
.log
is a log file for PackageName package. This log file contains the stdout and stderr of actions performed locally and on the remote system.
cit.log
is CIT's general process log. Information that goes into this log includes the parsing of pif files, which packages were selected for installation, error messages dealing with unsatisfied package dependencies, CIT's output while installing each selected package, and errors from the packages GUI or interactive interfaces just to name a few.
config.new
is an environment file used by CIT to setup the local and remote environments to be used during an install. This file shows how the environment will be set up and the specific values for the environment.
cit.pid
is the process id for CIT that is currently running.
cit.misc.log
is a log of unexpected errors encountered by CIT and should be reviewed when all the other installation logs don't seem to provide information about what problems were encountered.
If a problem is encountered with CIT during an installation, the first place to look is at the package log file. While the package log file will contain errors associated with a specific package's installation, it would be wise to review the cit.log file. to ensure that problems were package related or CIT related.
As with any product or process, there are bound to be some problems over the course of the product's life. In order to facilitate a quick response to a problem experienced, it has been found that providing information about the installation experience when the SPR is filed will most of the time provide the information needed to research and analyze the problem and help to develop a fix.
The following files and information have been found to be necessary to resolve site installation problems:
sysinfo
file
With the new processes and CIT it is hoped that sites will become more familiar with the underlying process that takes place during a software upgrade, and that opportunities are provided to allow sites to use the latest software released as quickly as possible.
Up to this point, it has been highlighted that the changes to the installation manuals were the result of having CIT available and the desire to create a similar installation process for UNICOS and UNICOS/mk. There are several reasons for the changes that have been to the manuals.
The UNICOS 9.0 installation guide had the installation and configuration information intermixed and spread out across the entire manual. While the installation and configuration information was still needed, there were certain recommendations that people would encountered only when installing a revision, or when installing an update, but the recommendations were valid for all upgrade installations.
In reviewing the manual, it became clear that the intermixing of the two processes in one manual created more confusion for readers. Readers were unsure if they had finished the software installation and had progressed to configuration.
In general the installation guide was poorly organized, or contained way too much information for the installation process. Readers also mentioned they wanted to follow a task-based manual that clearly separated the installation process from the configuration process. These comments lead to the new installation manuals and a single configuration manual. These manuals are tasked based and don't have people flipping back and forth in the manual to verify information, or to find necessary information.
Another change was to have each chapter be an installation process. In the past some people were unsure of where to begin since the UNICOS installation manual used unique terms like upgrade, revision, or update. People unfamiliar with this terminology more than often made a wrong decision about where they should start in the installation guide and would need to perform the installation again to fix the problems.
The end result is four UNICOS installation guides, one UNICOS system configuration guide and one UNICOS/mk installation and configuration guide[5]. Where the UNICOS installation guides [6],[7],[8], [9] are primarily divided upon the Model E, Model V and GigaRing IO models. The GigaRing IO based systems are separated where necessary to satisfy some very specific hardware differences for performing initial installations.
The old UNICOS® 9.0 Installation Guide, publication SG-2112 is orgnaized as follows:
The new UNICOS® 10.0 Installation Manuals, publications SG-5271, SG-5296, SG-5297 and SG-5298 are organized as follows:
While the new UNICOS® System Configuration Using ICMS Guide, publication SG-2412 is organized as follows:
The new manuals and their new organization should help the infrequent user to eliminate the confusion as to where the starting point is for an installation and where to continue after performing an OS installation.
It is hoped that the new installation process, packages, and manuals will resolve some issues that have been present for years with the old UNICOS installation process and provide sites with a highly reliable method for installing software and confidence the installation occurred as tested.
Scott E. Grabow (grabowc@cray.com)