PBS on the Origin 3800

Overview of PBS

 

The Portable Batch Subsystem, PBS, is a system for the management of computing resources and batch jobs. Very similar to NQS in overall philosophy, it offers a number of system resource management features which were lacking in NQS, including better job limit controls, improved handling of parallel jobs, support for "batch interactive", a job dependency feature, and ongoing support for new operating system specific features.

PBS commands

 

The PBS command set for general users is very similar to those provided by NQS. The two most commonly used commands, qsub and qstat offer similar command line parameters, though the command output of qstat differs rather radically from the output of NQS's qstat.

 

The most commonly used PBS commands include:

 

 

qsub

 

submits a batch job

 

qsub man page

 

qstat

 

reports on the status of batch jobs

 

qstat man page

 

qdel

 

deletes a PBS batch job

 

qdel man page

 

xpbs

 

X-Windows job submission and status interface

 

xpbs man page

 

nqs2pbs

 

converts existing NQS job scripts to PBS

 

nqs2pbs man page

Structure of a PBS job script

 

A PBS job script is very similar in structure to an NQS job script. An initial block provides the requested PBS job attributes (such as job name, number of CPUs requested, memory required, and time limit), followed by a standard shell script.

 

An example PBS job script:

#PBS -q small
#PBS -S /bin/sh
#PBS -l pmem=5mb,cput=1:00:00
./bprog 10000 1 0
ls -l

 

The above PBS job requests that the job be placed in the "small" queue, executed using the Bourne shell, with a physical memory limit of 5 megabytes per process, and a total of one hour of CPU time for the job.

 

Note that all PBS directives are optional, but will result in a job with the site-defined job resource limits.

 

As with NQS, these parameters can also be specified on the qsub command line as well. Command line parameters override any corresponding #PBS directive within the job script.

 

 

Common PBS qsub parameters

 

 

-A account_string

 

specify account string (cwa_project)

 

-e path

 

destination for stderr file

 

-I

 

run as an "interactive batch" job

 

-j oe

 

write stdout and stderr to the standard output file

 

-j eo

 

write stdout and stderr to the standard error file

 

-l resource_list

 

specifies required resources (see table below)

 

-m e

 

send email when the job finishes

 

-N name

 

specify job name

 

-o path

 

destination for stderr file

 

-p priority

 

 

 

-q destination_queue

 

request a specific batch queue

 

-r y|n

 

specify whether a request is rerunnable

 

-S shell_path

 

shell to be used

 

 

Common resources specified by the qsub -l parameter

 

 

cput=time

 

job CPU time limit (hh:mm:ss or sssss)

 

pcput=time

 

process CPU time limit (hh:mm:ss or sssss)

 

mem=size

 

job physical memory size

 

pmem=size

 

process physical memory size

 

ncpus=number_of_cpus

 

number of CPUs required

The Boeing Origin 3800 PBS Environment

Resource limits

 

Just as NQS offers a large and somewhat confusing set of resource limits, so does PBS. In general, jobs are scheduled based on a very limited number of specified resources. On the Boeing Origin 3800, we suggest that you specify the following resources:

 

 

Resource

 

PBS parameter

 

Examples

 

Default value

 

Number of CPUs

 

#PBS -l ncpus=

 

#PBS -l ncpus=8

 

1 CPU

 

Total job physical memory

 

#PBS -l mem=

 

#PBS -l mem=5gb
#PBS -l mem=50mb

 

 

1 gigabyte

 

Total job CPU time

 

#PBS -l cput=

 

#PBS -l cput=3600
#PBS -l cput=12:00:00

 

 

1 CPU-hour

 

For those users with multiple CWAs and projects defined...

 

CWA and project

 

#PBS -A

 

#PBS -A aernew_project1

 

user's default cwa & project

Notes on Resource Limits

 

The "number of CPUs" and "total job CPU time" limits map directly from NQS to PBS. Note that the total CPU time limit is an aggregate of the CPU time of all CPUs, so a job which takes one wall-clock hour and uses ten CPUs simulataneously will need a #PBS -l cput= limit of 10 CPU hours.

 

The most notable difference in resource limits between NQS and PBS is that PBS offers the option to schedule based on physical memory (referred to on IRIX as the resident set size (RSS)), rather than virtual memory, which was the only memory limit that NQS supported. The # PBS -l mem= parameter is a specification of physical memory (aka RSS) which in the case of many applications is a small fraction of the size of the virtual memory space. This is especially true of parallel applications which use MPI.

 

The nqs2pbs tool (see below) will convert #QSUB -lM virtual memory limits into #PBS -l mem= physical memory limits, but this is at best a crude approximation. Users whose jobs specify a large memory limit (anything larger than 20gb) under NQS should use jcost output (or similar) to determine what the actual physical memory requirement is, and adjust the job's #PBS -l mem= parameter as appropriate.

 

Here is a sample jcost output:

Job CSA Accounting - Summary Report
====================================
 
Job Accounting File Name         : /wrk/tmp/jtmp.1783169/.jacct2a19000000005fb9
Operating System                 : IRIX64 origin 6.5 04131233 IP35
User Name (ID)                   : tecjbg1 (125)
Group Name (ID)                  : esptec (3446)
Project Name (ID)                : esptec (3431)
Array Session Handle             : 0x2a19ffff00006052
Job ID                           : 0x2a19000000005fb9
Report Starts                    : 10/26/01 15:20:53
Report Ends                      : 10/26/01 15:20:53
Elapsed Time                     :            0      Seconds
User CPU Time                    :            0.1811 Seconds
System CPU Time                  :            0.0611 Seconds
Run Queue Wait Time              :            0.0115 Seconds
Block I/O Wait Time              :            0.0037 Seconds
CPU Time Core Memory Integral    :            0.0298 Mbyte-seconds
CPU Time Virtual Memory Integral :            0.5234 Mbyte-seconds
Maximum Core Memory Used         :            0.4531 Mbytes
Maximum Virtual Memory Used      :            3.8906 Mbytes
Characters Read                  :            0.0338 Mbytes
Characters Written               :            0.0021 Mbytes
Blocks Read                      :            1
Blocks Written                   :            1
Logical I/O Read Requests        :           34
Logical I/O Write Requests       :           12
Number of Commands               :            7
 
Units                      Number          CRUs     Percent
CPU Seconds                 0.242         0.700       99.4%
Memory (Mword-seconds)      0.004         0.000        0.0%
Megabytes transferred       0.036         0.004        0.6%
Total CRUs                                0.704
 
Job Maximum Core Memory Used     :      1.469 Megabytes
Job Maximum Virtual Memory Used  :      9.578 Megabytes
 

 

Note that the last two lines report the core (physical) and virtual memory maximums for the job. Under NQS, the virtual memory limit was the size being enforced, while under PBS, the physical memory is used.

Priorities

 

Users can specify the desired priority just as they do on NQS, but submitting to a routing queue. The same three priorities that were provided under NQS are supported in the local PBS configuration.

 

 

Priority

 

Queue

 

 

PBS parameter

 

Critical

 

CR

 

 

#PBS -q CR

 

Medium

 

ME

 

 

#PBS -q ME

 

Low

 

LO

 

 

#PBS -q LO

 

Like all PBS parameters, any parameters specified in the job with #PBS directives can be overridden by specifying that paramter on the qsub command line. This includes the limits, the CWA_project, and the priority queue.

$TMPDIR

 

Though the automatic creation of a job scratch directory is not a feature directly provided by PBS, such a directory will be provided to PBS batch jobs on the Origin 3800. The $TMPDIR environment variable contains the pathname of this temporary directory, which will be unique to each job. Jobs which do a substantial amount of I/O to files will benefit by the placement of these files in the $TMPDIR directory, as $TMPDIR is created on a file system which is tuned for maximum disk performance.

 

Note that unlike /tmp or other such temporary directories, $TMPDIR is removed immediately at the end of the job. Any files created in $TMPDIR that need to be preserved should be copied to their desired location within the job script itself.

Large files

 

Though not strictly a batch or PBS issue, the policies for the handling of large files is worth mentioning here. Users who create large files (roughly one gigabyte or greater) are strongly encouraged to place such files in their /big directory. This provides a benefit to the user that the recall of their large DMF-migrated files can be done quickly when required and provides a benefit to the user community as a whole in that locating large files on a separate filesystem minimizes the likelyhood that small files in their home directories will be migrated out to tape in order to make space available for a large file.

 

Each user has a /big/$HOME directory corresponding to their home directory on the /big filesystem. Thus user joeuser has a home directory of /acct/joeuser and a directory in /big of /big/acct/joeuser/. If your application generates large permanent files, it is best if you create them directly in /big, or create them in $TMPDIR and copy them to /big at the end of the job.

 

If files which exceed the "large file" threshold are created under the user's home directory, those files will be automatically moved to a corresponding directory in /big, with a symbolic link to the new location left in its place.

The Interactive-Batch Facility

 

Some user tasks are not amenable to running as a true batch job, yet require resources greater than allowed by a interactive session. To accommodate such tasks, PBS provides a mechanism referred to interactive-batch. Interactive-batch is invoked by issuing a qsub -I command, which then waits until resources are available. When resources are available control returns to the terminal, and the user's standard input and output are connected to the batch job. Thus the user can have larger resource limits than would normally be accessable to an interactive seesion, while PBS can still manage control of the overall machine resource utilization.

 

This is a new feature, and we are unclear about its usefulness at this site. One note of caution is that PBS will reserve the specified resources for as long as the interactive-batch job is active, so we ask users to terminate such jobs if they are not actively using them. Please refer to the qsub man page for further details on the usage of PBS interactive jobs.

Conversion of existing NQE/NQS scripts

 

PBS provides a tool called nqs2pbs to convert NQS/NQE job scripts to be usable with PBS. The resulting job script is usable by both PBS and NQS. nqs2pbs is a useful tool both for modifying existing job scripts for use under PBS and for learning the corresponding PBS options and limits and their syntax.

 

Example of nqs2pbs usage:

hydra 25% nqs2pbs samplejob
Converting NQS script "samplejob" into "samplejob.new"
 
In converting the script, 0 errors and 0 warnings occurred
Script conversion complete, new PBS script in samplejob.new
hydra 26%

 

 

Original script - samplejob

 

 

#! /bin/sh
#QSUB -eo
#QSUB -lM 5000Gb
#QSUB -lT 60000
#QSUB -o /acct/tecjbg1/samplejob_output
#QSUB -q LO
#QSUB -r my_sample_job
#QSUB -s /bin/sh
#QSUB -l mpp_p=22
#
ls
big_program -x -y -z
jcost

 

 

Converted script - samplejob.new

 

 

#! /bin/sh
# This script converted on Fri Oct 12 20:23:36 PDT 2001
#PBS -j oe
#QSUB -eo
#PBS -l mem=5000Gb
#QSUB -lM 5000Gb
#PBS -l cput=60000
#QSUB -lT 60000
#PBS -o /acct/tecjbg1/samplejob_output
#QSUB -o /acct/tecjbg1/samplejob_output
#PBS -q LO
#QSUB -q LO
#PBS -N my_sample_job
#QSUB -r my_sample_job
#PBS -S /bin/sh
#QSUB -s /bin/sh
#PBS -l "ncpus=22"
#QSUB -l mpp_p=22
#
ls
big_program -x -y -z
jcost

 

 

Note that in the above example each #QSUB directive is preceded by the corresponding #PBS directive. Both NQS and PBS ignore comments which are interspersed with their directives at the start of the job, allowing the resulting samplejob.new job script to be used with either NQS or PBS.

Conversion Plan

 

To ease the user transition to PBS, the migration will be handled using a phased approach. During the transition period, both NQS and PBS will be running simultaneously on the Origin 3800. Initially NQS will be the default batch queuing system. After a transition period (roughly one month), PBS will become the default queuing system. As soon as the user community has made the transition to PBS, the NQS subsystem will be disabled (all queues turned off, no jobs initiated) and then, shortly thereafter, removed.

 

 

Date

 

Phase

 

Activity

 

Default batch system

 

Duration

 

11/11/2001

 

1

 

Install PBS as secondary batch system

 

NQS

 

1 month

 

12/09/2001

 

2

 

Switch default to PBS

 

PBS

 

1 weeks

 

12/16/2001

 

3

 

Stop NQS queues

 

PBS

 

2 weeks

 

01/01/2002

 

4

 

Remove NQS subsystem

 

PBS

 

n/a

 

Since both NQS and PBS use commands such as qsub and qstat, it is necessary that a mechanism be provided to allow the user to control which batch queuing system he wishes to use. To allow this, two "wrapper" scripts will be provided. Users can prepend the "pbs" and "nqs" commands to their batch commands to specify the desired batch queuing system.

 

 

qsub my_job_script

 

submits to the current default batch subsystem (NQS or PBS)

 

nqs qsub my_job_script

 

will force the use of NQS, regardless of the default

 

pbs qsub my_job_script

 

will force the use of PBS, regardless of the default

 

After the transition period is completed, the pbs and nqs commands will no longer be needed and will be removed.

For further assistance...

 

If you have further questions or comments about PBS or this transition plan, please contact our User Support folks at xxx-xxx-xxxx or contact the author directly.

 

Jim Glidewell - Cray/Origin Technology & User Support - xxx-xxx-xxxx

 

original 10/17/2001 - last revision 11/02/2001