Queues
When we say that a compute node “belongs” to the Department of Biostatistics, what does that mean? The basic idea is that the node is there for us to use whenever we want, although if we’re not using it, other people are allowed to. This works both ways: if there are compute nodes not being used, we can use them in addition to our own node(s).
All this is managed through a batch queuing system – you place your computing request with the scheduler, and it decides whose jobs run on which node. As part of the request, you specify a queue (essentially, you declare which line you want to wait in). There are three important queues to know about:
BIOSTAT
: This is the name of the general-purpose Department of Biostatistics queue. It currently has 192 slots (cores), spread out over several nodes. Submission to this queue requires approval from one of the queue managers.
UI
: This is the name of the general-purpose University of Iowa queue. As a member of the University of Iowa, these compute nodes belong to you just as much as the BIOSTAT
queue does, although there is a limit of 5 jobs you can run at the same time on the UI
queue (as with all queues, more than this limit can be submitted, though only 5 will run simultaneously).
all.q
: Submitting jobs to this queue means that they will run on any compute node that is available. In particular, it means that your job could run on a node belonging to someone else, and if they want it back, your job would be immediately terminated.
There are also some other queues that could be worth knowing about if you have specific resource requests, such as high memory or GPU cards. See the HPC’s Queues and Policies page for additional details on campus queues and their limits, including guidelines for selecting a queue.
More on the BIOSTAT queue
The BIOSTAT queue consists of three machines (“nodes”), two of which are older and one of which is newer:
- 56 cores, 128G memory
- 56 cores, 128G memory
- 80 cores, 384G memory
Typically, it doesn’t make a great deal of difference which machine your program runs on, but note that one node has more memory. The next page discusses how to force your program to run on one type of machine or the other.
The Sun Grid Engine
There are a variety of batch schedulers; the one used by the Argon cluster is the Sun Grid Engine (SGE). The next several pages discuss the SGE commands for submitting, controlling, and monitoring jobs submitted to the compute nodes.