Scripting qsub
Now that we know how to monitor jobs, let’s return to qsub
and discuss how to submit multiple jobs simultaneously. One way is to script a loop that calls qsub
many times (unfortunately, aliases aren’t available in scripts, so we have to specify all the options):
batch-sim
#!/bin/bash
for ((i = 1; i <= 10; i++))
do
qsub -cwd -e ~/err -o ~/out -q BIOSTAT sim $i
done
Now we can run
which will make 10 requests to the batch scheduler, all of which should run and finish more or less simultaneously (unless there are less than 10 available processors on BIOSTAT). After submitting this command, try monitoring the job with qstat
.
WARNING: Writing scripts like the above is not ideal. For reasons that are discussed below, it is better to use arrays (below) or qbatch
(next page).
qsub arrays
A slightly less flexible, but far more convenient alternative, qsub
itself supports basic looping at the command line with what it calls array jobs. Essentially, qsub
itself sets up a loop like the one above; you can access the loop index through the environment variable $SGE_TASK_ID
. So, let’s rewrite the sim
executable:
sim
#!/bin/bash
R CMD BATCH --no-save --no-restore "--args $SGE_TASK_ID" sim.R .sim.Rout
Now, we can run
This will accomplish the same thing as the scripted loop above, but without the need to write a batch-sim
file. This is very convenient as it lets you change the number of jobs on the fly from the command line.
IMPORTANT: In addition to being convenient, array jobs are also very important for limiting overhead with respect to the grid scheduler. Running qsub
ten separate times, as in the first example, creates 10 times more work for the batch scheduler than a single array job. This becomes more and more of a problem when a large number of jobs are submitted, particularly if multiple people do this at the same time. On several occasions, this has caused a great deal of gridlock and essentially frozen the scheduler. If at all possible, please submit your work as a single array job as opposed to a large number of separate jobs.
The next page discusses a useful tool, qbatch
, that offers another easy way to submit array jobs.