Conda and Reproducible Installations
Conda is a popular package manager to perform binary installations of a large variety of packages. Although these do not provide optimal performance on HPC, they can represent an acceptable solution when workflow reproducibility and portability are crucial (similar to containers). A faster version of Conda is Mamba, which we recommend you use in combination with Conda, or instead of Conda. Mamba and Conda can work together, or you can use Mamba as a drop-in replacement for Conda. If you know how to use Conda, you know how to use Mamba. Mamba is much faster at downloading packages, and creates fewer files during the installation process.
This page documents how to improve reproducibility of your environment by saving the details of Conda-installed packages as a YAML file.
Whatever commands you'd use in conda, you can use with mamba. They are interchangeable, and you can mix and match. For example:
# Example mix and match $ conda create -n test_env && conda activate test_env && mamba install -y astropy # The above example is the equivalent to the code below $ mamba create -n test_env && mamba activate test_env && conda install -y astropy
Reproducible installations with conda and mamba
Suppose we've used conda to install the package astropy in a newly created environment:
(base) $ conda create -y -n astropy [..] (base) $ conda activate astropy [..] (astropy) $ conda install -y astropy=3.2.3 [..] (astropy) $ # Equivalent process with mamba (base) $ mamba create -y -n astropy [..] (base) $ mamba activate astropy [..] (astropy) $ mamba install -y astropy=3.2.3 [..] (astropy) $
We can generate a list of packages that are installed in the active conda environment, and their versions, using conda env export:
(astropy) $ conda env export >environment.yaml (astropy) $ (astropy) $ cat environment.yaml name: astropy channels: - defaults dependencies: - _libgcc_mutex=0.1=main - astropy=4.2.1=py39h6323ea4_1 - blas=1.0=mkl - ca-certificates=2021.5.25=h06a4308_1 - certifi=2021.5.30=py39h06a4308_0 - intel-openmp=2021.2.0=h06a4308_610 - ld_impl_linux-64=2.33.1=h53a641e_7 - libffi=3.3=he6710b0_2 - libgcc-ng=9.1.0=hdf63c60_0 - libstdcxx-ng=9.1.0=hdf63c60_0 - mkl=2021.2.0=h06a4308_296 - mkl-service=2.3.0=py39h27cfd23_1 - mkl_fft=1.3.0=py39h42c9631_2 - mkl_random=1.2.1=py39ha9443f7_2 - ncurses=6.2=he6710b0_1 - numpy=1.20.2=py39h2d18471_0 - numpy-base=1.20.2=py39hfae3a4d_0 - openssl=1.1.1k=h27cfd23_0 - pip=21.1.1=py39h06a4308_0 - pyerfa=2.0.0=py39h27cfd23_0 - python=3.9.5=hdb3f193_3 - readline=8.1=h27cfd23_0 - setuptools=52.0.0=py39h06a4308_0 - six=1.15.0=py39h06a4308_0 - sqlite=3.35.4=hdfb4753_0 - tk=8.6.10=hbc83047_0 - tzdata=2020f=h52ac0ba_0 - wheel=0.36.2=pyhd3eb1b0_0 - xz=5.2.5=h7b6447c_0 - zlib=1.2.11=h7b6447c_3 prefix: /group/pawsey0001/mdelapierre/PSS/conda-test/miniconda3/envs/astropy # Equivalent with mamba (astropy) $ mamba env export >environment.yaml (astropy) $ (astropy) $ cat environment.yaml name: astropy channels: - defaults dependencies: - _libgcc_mutex=0.1=main - astropy=4.2.1=py39h6323ea4_1 - blas=1.0=mkl - ca-certificates=2021.5.25=h06a4308_1 - certifi=2021.5.30=py39h06a4308_0 - intel-openmp=2021.2.0=h06a4308_610 - ld_impl_linux-64=2.33.1=h53a641e_7 - libffi=3.3=he6710b0_2 - libgcc-ng=9.1.0=hdf63c60_0 - libstdcxx-ng=9.1.0=hdf63c60_0 - mkl=2021.2.0=h06a4308_296 - mkl-service=2.3.0=py39h27cfd23_1 - mkl_fft=1.3.0=py39h42c9631_2 - mkl_random=1.2.1=py39ha9443f7_2 - ncurses=6.2=he6710b0_1 - numpy=1.20.2=py39h2d18471_0 - numpy-base=1.20.2=py39hfae3a4d_0 - openssl=1.1.1k=h27cfd23_0 - pip=21.1.1=py39h06a4308_0 - pyerfa=2.0.0=py39h27cfd23_0 - python=3.9.5=hdb3f193_3 - readline=8.1=h27cfd23_0 - setuptools=52.0.0=py39h06a4308_0 - six=1.15.0=py39h06a4308_0 - sqlite=3.35.4=hdfb4753_0 - tk=8.6.10=hbc83047_0 - tzdata=2020f=h52ac0ba_0 - wheel=0.36.2=pyhd3eb1b0_0 - xz=5.2.5=h7b6447c_0 - zlib=1.2.11=h7b6447c_3 prefix: /group/pawsey0001/mdelapierre/PSS/mamba-test/mamba/envs/astropy
With some text substitutions, this YAML file can be turned into one that is accepted by conda as an input file to install packages in an environment:
(astropy) $ cp environment.yaml requirements.yaml (astropy) $ sed -i -n '/dependencies/,/prefix/p' requirements.yaml (astropy) $ sed -i -e '/dependencies:/d' -e '/prefix:/d' requirements.yaml (astropy) $ sed -i 's/ *- //g' requirements.yaml (astropy) $ cat requirements.yaml _libgcc_mutex=0.1=main astropy=4.2.1=py39h6323ea4_1 blas=1.0=mkl ca-certificates=2021.5.25=h06a4308_1 certifi=2021.5.30=py39h06a4308_0 intel-openmp=2021.2.0=h06a4308_610 ld_impl_linux-64=2.33.1=h53a641e_7 libffi=3.3=he6710b0_2 libgcc-ng=9.1.0=hdf63c60_0 libstdcxx-ng=9.1.0=hdf63c60_0 mkl=2021.2.0=h06a4308_296 mkl-service=2.3.0=py39h27cfd23_1 mkl_fft=1.3.0=py39h42c9631_2 mkl_random=1.2.1=py39ha9443f7_2 ncurses=6.2=he6710b0_1 numpy=1.20.2=py39h2d18471_0 numpy-base=1.20.2=py39hfae3a4d_0 openssl=1.1.1k=h27cfd23_0 pip=21.1.1=py39h06a4308_0 pyerfa=2.0.0=py39h27cfd23_0 python=3.9.5=hdb3f193_3 readline=8.1=h27cfd23_0 setuptools=52.0.0=py39h06a4308_0 six=1.15.0=py39h06a4308_0 sqlite=3.35.4=hdfb4753_0 tk=8.6.10=hbc83047_0 tzdata=2020f=h52ac0ba_0 wheel=0.36.2=pyhd3eb1b0_0 xz=5.2.5=h7b6447c_0 zlib=1.2.11=h7b6447c_3
If we need to re-install exactly the same environment later on, we can make use of this YAML requirements file:
(base) $ conda create -y -n astropy-bis [..] (base) $ conda activate astropy-bis [..] (astropy-bis) $ conda install -y --no-deps --file requirements.yaml [..] (astropy-bis) $ # Equivalent with mamba (base) $ mamba create -y -n astropy-bis [..] (base) $ mamba activate astropy-bis [..] (astropy-bis) $ mamba install -y --no-deps --file requirements.yaml [..] (astropy-bis) $
Note how we’re now using conda with the option --no-deps, to instruct it not to consider any package dependency for installation, but just those packages in the requirements list. In principle, this is dangerous and can lead to broken environments, but here it is safe as we obtained this list by exporting a real, functional environment.
Related pages
To use containers for packaging and deploying reproducible workflows, see the Containers page.
- How to Configure Conda to Avoid Quota Issues