Using the SGI Altix Systems at JPL
Introduction
The User's Guide for the SGI Altix Supercomputer is
intended to provide the minimum amount of information needed by a new
user of these systems.
As such, it assumes that the user is familiar with many of the standard
aspects of supercomputing such as, Fortran and
C programming languages, and various standard libraries (BLAS, LAPACK,
MPI, etc.).
The JPL Supercomputing facility is funded by JPL
and is available to users at JPL.
The computer system is located in building 600,
and is supported by JPL's Supercomputing and Visualization Systems Group.
Getting an account
A user account for this machine can be obtained
by completing an application at the
Account Applications page.
Getting help
User questions and support are handled online by sending e-mail to:
scconsult
The Altix Hardware at JPL
The system is composed of a front end Altix (gemini) with 64 Itanium 2
processors, and two backend Altix systems (castor, pollux) with 256
Itanium 2 processors each.
Interactive editing, compiling, and very simple debugging is
done on the front end, gemini. Production computing is done
on castor and pollux using Altair's Portable Batch Scheduler Professional
(PBS Pro) v10.0. PBS Pro supports both interactive and background batch. See the
"Batch Scheduling" discussion below.
Operating System
The operating system is Linux SuSE 10 SGI ProPack 5.
We assume that the user is familiar with Linux; if not
there are many web pages available on line to help a new user get started
with Linux.
Disk Details
Home directories are on gemini and NFS mounted on castor and pollux. Each user
has a /home quota of 1GB of disk space. Home directories
are backed up nightly.
Gemini:
* 91GB home
* 22 TB /workg
* 2 TB dynamic scratch
Castor:
* 44 TB /workc
* 1 TB dynamic scratch
Pollux:
* 44 TB /workp
* 1 TB dynamic scratch
Environment
We are using modules to switch between compilers.
Here are some basic module commands:
- module list
lists currently loaded modules
- module avail
lists modules available to load
- module help <name>
tells what the module is/does/loads
- module unload <name>
unloads the specified module
- module load <name>
loads the specified module
- module switch <oldname> <newname>
places <newname> as the complier and removes <oldname>
When users first log in, the module "latest_intel91"
is loaded automatically.
Compiling
- To compile your MPI applications, use the following commands:
| icc <filename.c> -lmpi | for Intel's C/C++ compiler |
| ifort <filename.cpp> -lmpi | for Intel's Fortran90 compiler |
Batch Scheduling
As with any supercomputer, the fair and efficient use of CPU time
is an important concern for users. A batch queue system is meant
to address these issues. We are using Altair's Portable Batch System
Professional (PBS Pro) version 10.0 for our batch queuing system. Jobs MUST
be run using the PBS Pro batch system.
PBS Pro commands
There are many commands associated with PBS Pro. Man pages are
available for most of them. The most important commands for
a new user to learn are:
qstat
Display status of PBS batch jobs, queues, or servers.
qstat -q
Gives detailed information on a particular job.
qstat -Q
This command gives status information for one or more jobs.
qstat -a
Gives the status of all jobs on the system.
qstat -n
Lists nodes allocated to a running job in addition to basic information.
qstat -f <PBS_JOBID>
Gives detailed information on a particular job.
qdel <PBS_JOBID>
This deletes one or more unfinished batch jobs.
qsub
This command submits a job for execution. Please see below for details on
qsub usage.
PBS Pro allows for the placement of batch queue jobs based upon the
availability of a variety of resources. The resources of most importance are:
number of processors and wallclock time. Additionally, fairshare scheduling is
being used to determine the order in which jobs are executed.
Currently, the following queues are set up:
- debug
This queue allows the use of up to 50 processors for up to
60 minutes, and is available at all times in gemini.
- shortg, shortc, and shortp
This queue allows the use of up to 128 processors for up to
3 hours, and is available at all times.
- longg, longc, and longp
This queue allows the use of up to 128 processors for up to
12 hours.
Submitting a Job
qsub: How to submit a batch job
The basic command for submitting a batch queue job to PBS Pro.
Although there are a multitude of options to qsub (see the man page),
there are only a few options that the average user will commonly use:
- -l ncpus=number-of-cpus
This option sets the number of processors to be used on this job
and MUST BE USED on EVERY qsub command.
- -l walltime=hh:mm:ss
This option sets the maximum amount of wallclock time to be used by this
job and MUST BE USED on EVERY qsub command.
- -q queue
Allows jobs to run on either gemini, castor or pollux.
- -o outfile
This option reroutes the standard output to outfile. If -o is used
without -e, the standard error of the job is stored in outfile. -o and -I are
mutually exclusive.
- -e errfile
This option reroutes the standard error output to errfile.
- -I
This option specifies that this job is to be an interactive batch
job. Standard error, input, and output will be connected to your
terminal. This is most useful when doing interactive debugging in
the debug queue.
Here is an example of a qsub scriptfile for MPI:
#!/bin/csh
#PBS -l walltime=2:00:00
#PBS -l ncpus=32
#PBS -o out32
#PBS -e err32
#PBS -q shortp
#
# NOTE: PBS Pro starts in the current working directory by default.
cd /home/user-name
mpirun -np 32 ./MPI-job
- Launch this scriptfile as follows:
qsub scriptfile
Here is an example of a qsub scriptfile for OpenMP:
#!/bin/csh
#PBS -l walltime=0:50:00
#PBS -l ncpus=32
#PBS -o out32
#PBS -e err32
#PBS -q shortc
#
setenv OMP_NUM_THREADS 32
#
cd /home/user-name
./OMP-joba
- Launch this scriptfile as follows:
qsub scriptfile
Printer Friendly Version
|