Qsub (Queue Submission)
Use the qsub command to submit a job script to PBS. A job script consists of PBS directives, comments and executable statements. It is important to remember that all the commands in the job script execute serially on the node that runs your script and this node does not necessarily have to be one of the compute nodes allocated to your job. The executable specified with mpirun is the only program that runs in parallel on your allocated compute nodes.
A sample job script:
#PBS -S /bin/csh #PBS -l walltime=5:00:00 #PBS -l ncpus=8 -l nodes=4 #PBS -e stderr.txt #PBS -o stdout.txt cd $PBS_O_WORKDIR # execute program mpirun -np 8 ./a.outHere, the first line identifies which shell should be used. The next five lines are PBS directives.-l walltime: The first directive requests a wall clock time limit of 5 hours. Specify the time in the format HH:MM:SS. Only two digits can be used for minutes and seconds. The default wall clock time is 1 hour.
-l ncpus=8 -l nodes=4: The second PBS directive requests 8 processes on 4 nodes. When your job runs it will be allocated the next 4 free, contiguous nodes. Jobs will not share nodes. By default (block allocation), processes are distributed across nodes in this way: for n processes on N nodes, the first n/N processes are allocated to the first node, the second n/N processes to the second node, etc. In this example, the first two processes are allocated to the first node. Processes three and four are allocated to the second node, and so on, until the eigth process is allocated to the fourth node.
If you submit a job with an incorrect value for nodes or processors requested it will be kept in the queue. The number of processors you request cannot be less than the number of nodes nor can it exceed the number of nodes times 2. For example, if you switch your values for nodes and processors requested your job will usually just sit on the queue.
-e stderr.txt: This directive directs your stderr output into one file, in this case stderr.txt. This can make your program easier to debug if there is a problem.
-o stdout.txt: This directive directs your stdout output into one file, in this case stdout.txt.
comment lines The other lines in the sample script that begin with '#' are comments. The '#' for comments and PBS directives must be in column one of your script file. The remaining lines in the sample script are executable commands.
Submitting your script for execution
After you create your script you must make it executable with the chmod command:
chmod 755 yourscript
Then you can submit it to PBS with the qsub command:
qsub yourscript
Batch output (your job's stdout and stderr output) is returned to the directory from which you issued the qsub command when your job finishes. You can also specify PBS directives as command options to qsub. Thus, you could omit the PBS directives in the sample script above and submit the script with
qsub -l walltime=5:00:00 -l nodes=4 -l ncpus=8 yourscript
Command line options override PBS directives included in your script.
Other qsub options
There are several other qsub options which you might find useful.
-I
provides a form of interactive access to the processor. Standard input, output, and error are connected to the terminal session through which qsub is running. See the section Interactive Access with QSUB -I below. -jcombines -e and -o. If the option is "eo", stderr and stdout are intermingled as stderr. If the option is "oe", the two streams are intermingled as stdout. If the option is not given, stderr and stdout are two separate files. -l file=size (-l is lowercase "L")
specifies the maximum size of any file that your job can create. -m option
specifies if and when mail is sent about job execution. If the option is:
a, mail is sent when the job is aborted by the system
b, mail is sent when the job begins execution
e, mail is sent when the job terminates
n, no mail is sent.Multiple uses of the -l option can be combined, but no spaces can appear between the options, as the following command illustrates:
-l walltime=1:00:00,file=10mb
All of these options can also be specified as PBS directives inside your script.
Interactive Access With the Qsub -I Option
A form of interactive access is available on the processor via the qsub -I command. This is useful for small debugging or test runs. There are two avenues to get interactive time via qsub -I. The maximum walltime for a debug queue job is 60 minutes, and the number of nodes is 8. To use it, type:
qsub -I -l nodes=N -l ncpus=n -l walltime=mm:ss
See the discussion of the format for qsub -I below for additional information. You should use this only for short, interactive runs. If there are no nodes free, the qsub command will wait until they become available. This can be a long wait, even hours, depending on the mix of running and queued jobs. Please check the system to be sure that there are available nodes before issuing qsub -I. You can determine if there are free nodes by using the monitor. Format of the qsub -I command The format for qsub -I is:
qsub -I -l nodes=2 -l ncpus=4 -l walltime=30:00 myscript
This requests interactive access to 4 processors on 2 nodes for thirty minutes. Change the number of nodes and processors and the time to suit your needs. The default is one node and one processor. The system will respond:
qsub: waiting for job 349.eady to start It can take up to 30 seconds to receive a further response from the system telling you that the nodes have been allocated to you:
qsub: job 349.eady ready
pbsmom: Successful (0) in mom_set_limits
pbsmom: Successful (0) in mom_set_limits
If more than 30 seconds elapses with no response, there are probably not enough nodes free to allocate to you. Type ^C (control-c) to exit qsub -I and double check on the availability of free nodes. The -m b option to qsub can be used with qsub -I to have the system send you email when your job starts and you have access to your nodes. The -M option is used to specify the email addresses to which the email will be sent. See the above discussion of options to qsub for more details on using the -m b and -M options. Once nodes are allocated to you, you will receive a command prompt. Even though you are running interactively you must use a mpirun command to run your executable on your compute nodes. This can either be in your script as specified above, or in a command line as
mpirun -np 2 ./myrunscript
Enter the actual values for the -np option. Stdout and stderr will go to your terminal. Use input redirection to get stdin input to your mpirun executable. When you are finished with your interactive session type ^D (CTRL-D).
^D
qsub: job 349.eady completed
The default wallclock time for jobs is 1 hour and this includes jobs submitted using qsub -I. When you use qsub -I you hold your processors whether you compute or not. Thus, as soon as you are done with your mpirun commands you should type ^ D to end your interactive job. If you submit an interactive job and do not specify a wall clock time you will hold your processors for 1 hour or until you type ^D.
Monitoring and Killing Jobs
Qstat
The qstat command is used to display the status of the PBS queue. It includes running and queued jobs. The -f and -a options to qstat provide you with more extensive status listings. If your job is in a special state, the comment field, which is visible when you use the -f option, will often display important information about the state of your job. See man qstat for more details.
Qdel
The qdel command is used to kill queued and running jobs.
qdel jobid