qsub

OK, the preliminaries are out of the way and we are now ready to use the batch scheduler for its main purpose: submitting jobs to be run on the compute nodes. The command for this is qsub, and we will illustrate its use with the sim.R file from earlier. Let’s create an executable script that runs sim.R in batch mode:

sim
  
#!/bin/bash
R CMD BATCH --no-save --no-restore "--args $1" sim.R .sim.Rout

Note that this version of sim allows you to pass an argument to sim.R. Thus:

$

./sim 5

creates a file called sim5.RData and so on. At this point, let’s log out of our BIOSTAT node back into a login node (if you ran the above command from a login node, don’t worry about it – this simulation isn’t particularly computer-intensive. But in general it is very important to avoid running simulations on the login nodes).

If this didn’t work, keep in mind that Windows and Linux have different file endings; see here for further details.

Running this simulation on a compute node is (almost) as easy as submitting qsub sim 5. However, before we do this, we need to mention two important arguments to qsub. One is -cwd, which says that we want to run the command in our current working directory (I can’t imagine why you would ever not want to do this); the other is -q BIOSTAT, which specifies the queue (again, you could also submit to UI or all.q). So, let’s run the simulation on a compute node:

$

qsub -cwd -q BIOSTAT sim 3

After a little while, a file sim3.RData will appear. You may also notice that two other files appeared: sim.e1002669 and sim.o1002669 (your numbers will be different). These contain a log of the errors (stderr) and output (stdout) from your submission. Typically, they are not interesting (in this case, they should both be empty), but they are certainly valuable if you are trying to debug things or if you have a command that prints out important information when it finishes.

Personally, I like to keep the error and output streams in separate folders to avoid clutter. Also, I always specify -cwd, and almost always specify -q BIOSTAT. Rather than type all these out every time, I placed the following alias in my .bash_profile file (... indicates that this is not the entire .bash_profile file; there are more lines above and below this one):

~/.bash_profile
  
...
alias q_sub="qsub -cwd -e ~/err -o ~/out -q BIOSTAT"
...

The commands in .bash_profile are run at each login, but we can force them to run now with:

$

source ~/.bash_profile

You don’t ever have to do this again; it will happen automatically from now on when you log in.

Now, running our simulation on a compute node really is as easy as submitting:

$

q_sub sim 5

The downside to the use of aliases is that it prevents us from changing the queue with -q. There are many ways around this, such as leaving the -q option out of the alias, having separate aliases for each queue, or writing shell scripts. By all means, use what works best for you, but in what follows, I’ll use this version of q_sub with the defaults modified to avoid clutter. Keep in mind, though, that if you want to use the original qsub, you’ll have to type in all the -cwd, etc. arguments.

To specify that the job requires multiple cores, or that it runs on specific cores, use the -pe and -l flags as discussed on the previous page.