Compiling R

The R modules available on the HPC machines are usually fairly up to date, so you likely don’t need to use the information on this page, but if you want the absolutely latest version of R, or a development version, you can compile it yourself.

Setting up dependencies

Compiling software can often be an exercise in frustration for new Linux users, but it doesn’t have to be particularly difficult.

There are generally four steps, depending on how you count:

  1. Obtain the dependencies, via a system package manager (on your own workstation) or via something like the modules infrastructure (HPC)
  2. Prepare/configure the compilation script. This generally takes the form of running a configure script (optionally with custom arguments) or using a tool like cmake
  3. Compile the software, most often by calling the make program
  4. Configure your environment, so that the newly compiled binary files are available to your session. This most often involves editing your ~/.bash_profile or ~/.bashrc file.

On HPC, we have one additional consideration: we want the compilation to go quickly, so compiling in parallel is good. However, we don’t want to hog resources, so we will generally want to compile the software on a compute node.

To simplify things, we have created several scripts in the Argon Examples git repository. To save time, you can simply run the following commands:

$ $
$
$
git clone https://github.com/IowaBiostat/ArgonExamples.git
cd ./ArgonExamples/Scripts
./build_r_on_node.sh
./add_R_to_bash_profile.sh

Once the compilation job completes and you either log back into Argon or run

$
source ~/.bash_profile

you should have access to the latest R.

Compiling R by hand

The first thing we need to do on the HPC infrastructure is load the modules which R depends on:

$ $
$
$
$
module load stack/2021.1
module load gcc/9.3.0
module load python/3.8.8_gcc-9.3.0
module load pcre2
module load curl

R doesn’t actually need python, of course, but it’s a concise way to load in several other dependencies.

NOTE: Please compile R on a compute node, not a login node; i.e., run q_login first.

Next, we’ll need to download a copy of R. You may consider placing this in a subfolder of your home directory:

$
mkdir ~/buildR && cd ~/buildR

R can be downloaded using the wget program, and you can find the link to the latest source distribution at the official CRAN website.

For example, suppose we want to compile R-4.1.1. In the code below, we’ll set an environment variable to this version - that’s the only part that will change if we want to do this again for a future version of R.

$ $
$
$
#!/bin/bash
RVERS=R-4.1.1
# Download the current version of R
wget https://cran.r-project.org/src/base/R-4/$RVERS.tar.gz

Once you’ve downloaded the file, you can extract it using the easy to remember

$
tar -xzvf $RVERS.tar.gz

Assuming that you have extracted the archive successfully, we next need to configure it.

$ $
cd $RVERS 
./configure

Pay attention to any error messages, particularly at the end of the long stream of output. This will indicate if any functionality R is expecting to find on the system was not found.

If the configure script executed without issues, you’re now ready to compile the code. It’s possible to use multiple cores to compile code (the following example uses two cores), but please don’t do so on the login node.

$
make -j 2

The number following the j argument is the number of cores to use for compilation. Assuming you requested a larger chunk of slots with qlogin, you can increase this number to compile R faster.

Once this command executes successfully, you’re ready to start using R!

You can either alias R to point to this compiled version of R, or you can add the appropriate folder to your PATH. Assuming that you are in the folder containing the downloaded R installation (you may need to go up one level after compiling: cd ..), you should be able to run the following commands:

You can make things easier by aliasing R to point to this compiled R package:

$ $
$
$
$
$
$
$
$
$
$
$
RVERS=R-4.1.1
CWD=$(pwd)
# Ensure bash profile exists
touch ~/.bash_profile
# Update path
echo "PATH=\$PATH:$CWD/$RVERS/bin" >>  ~/.bash_profile
echo 'module load stack/2021.1' >> ~/.bash_profile
echo 'module load gcc/9.3.0' >> ~/.bash_profile
echo 'module load python/3.8.8_gcc-9.3.0' >> ~/.bash_profile
echo 'module load pcre2' >> ~/.bash_profile
echo 'module load curl' >> ~/.bash_profile
echo 'export PATH' >> ~/.bash_profile

This should append the relevant lines to your ~/.bash_profile, though you should double check and remove any redundancies.