Skip to content

Warning

This is as a Lab Notebook which describes how to solve a specific problem at a specific time. Please keep this in mind as you read and use the content. Please pay close attention to the date, version information and other details.

Lab Notebook --- Using EasyBuild to install a PostgreSQL compatible with R (2023-05-18)

Problem setup

A user wanted to use PostgreSQL with the R/4.2.2 module. However, the versions of PostgreSQL installed

1
2
3
4
5
6
$ module spider PostgreSQL
...
Versions:
   PostgreSQL/9.6.2-Python-2.7.12
   PostgreSQL/11.3-Python-3.7.2
...

require specific versions of the GCC module to be loaded:

1
2
3
4
5
6
7
$ module spider PostgreSQL/11.3-Python-3.7.2
...
You will need to load all module(s) on any one of the lines below before the "PostgreSQL/11.3-Python-3.7.2" module is available to load.

      Core/GCCcore/8.2.0
      GCCcore/8.2.0
...

This conflicts with the version GCC/11.3.0 necessary to load R/4.2.2:

1
2
3
4
5
6
7
8
$ module spider R/4.2.2
...
    You will need to load all module(s) on any one of the lines below before the "R/4.2.2" module is available to load.

      Compiler/GCC/11.2.0/OpenMPI/4.1.1
      Core/GCC/11.2.0  OpenMPI/4.1.1
      GCC/11.2.0  OpenMPI/4.1.1
...

So we need a version of PostgreSQL which is compatible with GCC/11.3.0.

Solution

We often use EasyBuild to install software on the HPCC. One of the nice things about EasyBuild is that other users can contribute EasyConfigs which are recipes to build and install different types of software.

Loading EasyBuild

To get started, we load the EasyBuild module:

1
2
$ module purge
$ module load EasyBuild

We now have access to the eb command and two aliases defined by MSU HPCC staff: ebF to find EasyConfigs and ebS to install software. We can first check our global EasyBuild configuration using

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ eb --show-config
#
# Current EasyBuild configuration
# (C: command line argument, D: default value, E: environment variable, F: configuration file)
#
buildpath            (E) = /tmp/grosscra/EASYBUILD
containerpath        (D) = /mnt/home/grosscra/.local/easybuild/containers
installpath          (E) = /opt
installpath-modules  (E) = /opt/modules
installpath-software (E) = /opt/software
module-naming-scheme (E) = MigrateFromEBToHMNS
optarch              (E) = GENERIC
repositorypath       (E) = /mnt/research/helpdesk/EB_Files_4
robot-paths          (E) = /mnt/research/helpdesk/EB_Files_4, /opt/software/EasyBuild/4.7.1/easybuild/easyconfigs, /mnt/research/helpdesk/ebfiles
sourcepath           (E) = /mnt/research/helpdesk/src

Since I (grosscra) am part of the helpdesk group used by HPCC staff, my --show-config will look different from other user's configuration. In particular, I am set up to install the software into the root directory /opt, with modules going into /opt/modules and the actual software going into /opt/software.

A quick digression on local modules

For a user not in helpdesk you will have directories in your $HOME directory (e.g., software in $HOME/software and modules in $HOME/modules). Thus, using EasyBuild, you can build your own software. If you add your module directory to your module path using

1
2
3
$ module use $HOME/modules
$ echo $MODULEPATH
/mnt/home/grosscra/modules:/opt/software/hpcc/modules:/opt/modules/Core

you can then load modules that you install using the exact same commands you use on the HPCC (e.g., module load PostgreSQL).

Finding our EasyConfig

So now that we're happy with and (mostly) understand our eb --show-config results, we can try finding the EasyConfig for PostgreSQL we'd like to use. Our first step is to use the ebF alias:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ ebF PostgreSQL

ebF_PATH=/opt/software/EasyBuild/4.7.1/easybuild/easyconfigs

====== $ebF_PATH/__archive__/p/PostgreSQL/
PostgreSQL-9.3.5-intel-2014b.eb

====== $ebF_PATH/p/PostgreSQL/
PostgreSQL-10.2-intel-2018a-Python-2.7.14.eb
PostgreSQL-10.3-foss-2017b-Python-2.7.14.eb
PostgreSQL-10.3-foss-2018a-Python-2.7.14.eb
PostgreSQL-10.3-foss-2018b.eb
PostgreSQL-10.3-intel-2017b-Python-2.7.14.eb
PostgreSQL-10.3-intel-2018a-Python-2.7.14.eb
PostgreSQL-11.3-GCCcore-8.2.0-Python-2.7.15.eb
PostgreSQL-11.3-GCCcore-8.2.0-Python-3.7.2.eb
PostgreSQL-12.4-GCCcore-9.3.0.eb
PostgreSQL-13.2-GCCcore-10.2.0.eb
PostgreSQL-13.3-GCCcore-10.3.0.eb
PostgreSQL-13.4-GCCcore-11.2.0.eb
PostgreSQL-14.4-GCCcore-11.3.0.eb
PostgreSQL-9.4.7-intel-2016a-Python-2.7.11.eb
PostgreSQL-9.5.2-intel-2016a-Python-2.7.11.eb
PostgreSQL-9.6.0-intel-2016b-Python-2.7.12.eb
PostgreSQL-9.6.2-foss-2016b-Python-2.7.12.eb
PostgreSQL-9.6.2-intel-2016b-Python-2.7.12.eb

This tells us that there many EasyConfigs available to help us install different versions PostgreSQL under different toolchains.

What is a toolchain?

A toolchain is a set of software dependencies used to install new software. Most often, this is a compiler like GCC or a compiler/MPI pair like GCC and OpenMPI. The most basic toolchains are just single compilers and are labeled using their software version (like GCCcore-11.2.0).

EasyBuild organizes installed modules by toolchain. For example, if you look for the R/4.2.2 module file, it's under /opt/modules/MPI/GCC/11.2.0/OpenMPI/4.1.1/R/4.2.2.lua because it was built using a GCC/OpenMPI toolchain.

Some of these are so commonly used that EasyBuild groups dependency software into larger toolchains like "foss" and "intel" that contain a compiler/MPI pair and a number of other common dependencies. These are labeled by their year and an a or b for the first or second half of the year. You can check what's in them by searching for their EasyConfig and showing it with eb --show-ec:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
$ ebF foss
...
====== $ebF_PATH/f/foss/
foss-2016.04.eb  foss-2016b.eb    foss-2018b.eb  foss-2021a.eb    foss-2022b.eb
foss-2016.06.eb  foss-2017a.eb    foss-2019a.eb  foss-2021b.eb
foss-2016.07.eb  foss-2017b.eb    foss-2019b.eb  foss-2022.05.eb
foss-2016.09.eb  foss-2018.08.eb  foss-2020a.eb  foss-2022.10.eb
foss-2016a.eb    foss-2018a.eb    foss-2020b.eb  foss-2022a.eb
...
$ eb --show-ec foss-2022a.eb
easyblock = 'Toolchain'

name = 'foss'
version = '2022a'

homepage = 'https://easybuild.readthedocs.io/en/master/Common-toolchains.html#foss-toolchain'
description = """GNU Compiler Collection (GCC) based compiler toolchain, including
 OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK."""

toolchain = SYSTEM

local_gccver = '11.3.0'

# toolchain used to build foss dependencies
local_comp_mpi_tc = ('gompi', version)

# we need GCC and OpenMPI as explicit dependencies instead of gompi toolchain
# because of toolchain preparation functions
dependencies = [
    ('GCC', local_gccver),
    ('OpenMPI', '4.1.4', '', ('GCC', local_gccver)),
    ('FlexiBLAS', '3.2.0', '', ('GCC', local_gccver)),
    ('FFTW', '3.3.10', '', ('GCC', local_gccver)),
    ('FFTW.MPI', '3.3.10', '', local_comp_mpi_tc),
    ('ScaLAPACK', '2.2.0', '-fb', local_comp_mpi_tc),
]

moduleclass = 'toolchain'

We can see that foss includes GCC, OpenMPI, FlexiBLAS, FFTW, FFTW.MPI, and ScaLAPACK.

Since we know that R/4.2.2 requires GCC/11.2.0 to load, we look for a PostgreSQL EasyConfig with a compatible toolchain. In this case, we see PostgreSQL-13.4-GCCcore-11.2.0.eb. If a version of PostgreSQL other than 13.4 were required, we would probably need to generate a new EasyConfig, but this version was suitable for the user.

Fixing dependency resolution

Now let's try to see how the installation will go. Since I'm writing this after having installed PostgreSQL/13.4, the output will look different than before it was installed. Instead, I'll use PostgreSQL-14.4-GCCcore-11.3.0.eb which as of now is still not installed.

We can check to see if we're missing any of this install's dependencies on the system using eb -M:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ eb -M PostgreSQL-14.4-GCCcore-11.3.0.eb
...
47 out of 47 required modules missing:

* Core | M4/1.4.19 (M4-1.4.19.eb)
* Core | Bison/3.8.2 (Bison-3.8.2.eb)
* Core | OpenSSL/1.1 (OpenSSL-1.1.eb)
* Core | zlib/1.2.12 (zlib-1.2.12.eb)
* Core | help2man/1.47.4 (help2man-1.47.4.eb)
* Core | M4/1.4.17 (M4-1.4.17.eb)
* Core | Bison/3.0.4 (Bison-3.0.4.eb)
* Core | M4/1.4.18 (M4-1.4.18.eb)
* Core | flex/2.6.4 (flex-2.6.4.eb)
* Core | binutils/2.38 (binutils-2.38.eb)
* Core | GCCcore/11.3.0 (GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | M4/1.4.19 (M4-1.4.19-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | help2man/1.49.2 (help2man-1.49.2-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | zlib/1.2.12 (zlib-1.2.12-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | Bison/3.8.2 (Bison-3.8.2-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | flex/2.6.4 (flex-2.6.4-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | binutils/2.38 (binutils-2.38-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | groff/1.22.4 (groff-1.22.4-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | expat/2.4.8 (expat-2.4.8-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | ncurses/6.3 (ncurses-6.3-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | bzip2/1.0.8 (bzip2-1.0.8-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | DB/18.1.40 (DB-18.1.40-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | pkgconf/1.8.0 (pkgconf-1.8.0-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | libreadline/8.1.2 (libreadline-8.1.2-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | UnZip/6.0 (UnZip-6.0-GCCcore-11.3.0.eb)
* Compiler/GCCcore/11.3.0 | Perl/5.34.1 (Perl-5.34.1-GCC
...

So we're missing everything we need... But this doesn't seem right! It even says we're missing GCC/11.3.0, which is definitely installed:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
$ module spider GCC/11.3.0

----------------------------------------------------------------------------
  GCC: GCC/11.3.0
----------------------------------------------------------------------------
    Description:
      The GNU Compiler Collection includes front ends for C, C++,
      Objective-C, Fortran, Java, and Ada, as well as libraries for these
      languages (libstdc++, libgcj,...).


    This module can be loaded directly: module load GCC/11.3.0
...

What's happening is that the way modules are searched for on the HPCC are different than the way EasyBuild searches for them. EasyBuild wants to include "Core/" in front of the core module names (i.e., those that aren't installed under a toolchain). But if we look at where modules are searched for,

1
2
$ echo $MODULEPATH
/opt/software/hpcc/modules:/opt/modules/Core

the "Core" part is already included in the path. This makes it so you don't need to use module load Core/GCC/11.3.0 and can get right to the software name you need.

To make things work correctly with EasyBuild's expectations , we can add /opt/modules to our module path and try again:

1
2
3
4
5
6
7
8
9
$ module use /opt/modules
$ echo $MODULEPATH
/opt/modules:/opt/software/hpcc/modules:/opt/modules/Core
$ eb -M PostgreSQL-14.4-GCCcore-11.3.0.eb
...
1 out of 47 required modules missing:

* Compiler/GCCcore/11.3.0 | PostgreSQL/14.4 (PostgreSQL-14.4-GCCcore-11.3.0.eb)
...

Much better! Now we're only missing the module we want to install.

In the case where we would actually be missing dependencies, EasyBuild would install those for us, so long as we use the --robot option when installing. This is included in the ebS alias by default.

Installing

Now we're ready to install. We just use the ebS alias with our EasyConfig, and hope things go well!

1
2
3
$ ebS PostgreSQL-14.4-GCCcore-11.3.0.eb

... Good luck! ...

Checking the installation

Now that we have installed a version of PostgreSQL compatible with the same GCC that R needs, we can try to load them all

1
2
$ module purge
$ module load GCC/11.2.0 OpenMPI/4.1.1 R PostgreSQL

We get a small warning about OpenMPI/4.1.1 being incompatible with intel14 nodes, but other than that, everything loads correctly.