Installation

In this tutorial, we start from descriptor files to train ML dipole models of isolated methanol.

Installation overview

The package is composed of three part

command line interface to process DFT/AIMD data (written in python)
module to train ML models (written in python)
module to infer dipole moment using ML models (written in C++)

You can install the first two via python package installer pip, while we need cmake to install the last one.

Download

You can download the whole package via git

git clone git@github.com:dirac6582/MLWC.git
cd MLWC
git checkout develop

Please be sure to use develop branch. We define the root_dir as the root directory as

root_dir=`pwd`

for later convenience.

Install python packages

One may create a virtual environment through conda or virtualenv. Here, we show how to create a virtual environment named your_env using conda. Although we use conda for the virtual environment, we use pip for the package installation.

conda create -n your_env python==3.10
conda activate your_env
conda install pip
pip install --upgrade pip

Goint to the root directory of the package, you can install the package using pip.

cd $root_dir
pip install .

If the installation succeeds, you can execute various commands without additional path settings.

CPextract.py --help
CPtrain.py --help

These lines will print the help information.

Install C++ packages

Requirements

To install C++ packages, the following packages/commands are required.

Eigen (https://eigen.tuxfamily.org/index.php?title=Main_Page)
libtorch (https://pytorch.org/cppdocs/installing.html)
RDKit (https://github.com/rdkit/rdkit)
Boost (https://github.com/boostorg)
cmake >= 3.27 (https://cmake.org/download/)
c++ compiler
openMP library

Boost is required by RDKit. Among them, libtorch, RDKit, and Boost should be automatically installed in the previous section with pip. Alternatively, you can build them from the source.

Check libtorch

If you successfully installed pytorch via pip under the virtual environment provided by conda, it is installed to

ls /path/to/your/conda/virtual/environment/lib/python3.10/site-packages/torch/

The directory depends on your python version. The exact path can be checked by executing the following python command.

python -c "from distutils.sysconfig import get_python_lib;print(get_python_lib())"

Libtorch libraries, headers, and CMake settings are in

# pytorch root directory (depends on your system)
pytorch_root=${CONDA_PREFIX}/lib/python3.10/site-packages/torch/

# shared libraries
ls ${pytorch_root}/lib

# header files
ls ${pytorch_root}/include

# CMake settings
ls ${pytorch_root}/share/cmake

Basically, ${CONDA_PREFIX} points to the root directory of the virtual environment.

Install Eigen

Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms. It is a header-only library, so you only need to download and include the header files in your project. You can download Eigen from gitlab as follows.

cd /path/to/where/you/want/to/install/eigen
git clone --depth 1 https://gitlab.com/libeigen/eigen -b 3.4.0 eigen-3.4.0

Or you can download the tarball from the official website (https://eigen.tuxfamily.org/index.php?title=Main_Page).

cd /path/to/where/you/want/to/install/eigen
curl -O https://gitlab.com/libeigen/eigen/-/archive/3.4.0/eigen-3.4.0.tar.gz
tar xzf eigen-3.4.0.tar.gz

Install MLWC C++ packages

After preparing all the required packages, we can build MLWC C++ packages through cmake. Now go to the source code directory and make build directory.

cd ${root_dir}/src/cpp
mkdir build
cd build

Then, we may execute cmake like

cmake ../ -DCMAKE_PREFIX_PATH="path/to/eigen;path/to/libtorch" -DCMAKE_MODULE_PATH=path/to/eigen/cmake -DBOOST_ROOT=${CONDA_PREFIX} -DBoost_NO_BOOST_CMAKE=ON -DBoost_NO_SYSTEM_PATHS=ON

Please be sure to replace path/to/eigen and path/to/libtorch with the actual path to the Eigen and libtorch directories. We have to quote your path list with " if using multiple paths. If you use libtorch in conda environment, /path/to/libtorch is pytorch_root defined above. We also need to specify the CMAKE_MODULE_PATH to the Eigen3 cmake directory to activate the Module mode in cmake, because we did not build Eigen3.

If the CMake has been executed successfully, then run the following make commands to build the package:

make

If everything works fine, you will have the executable named MLWC in ${root_dir}/src/src/cpp/build/. when you run the executable without any argument, you will see the following message.

$ ${root_dir}/notebook/c++/src/build/MLWC
 +-----------------------------------------------------------------+
 +                         Program MLWC                       +
 +-----------------------------------------------------------------+
     PROGRAM MLWC STARTED AT = Thu Jan  1 09:00:00 1970


 ERROR in main  MESSAGE: Error: incorrect inputs. Usage:: MLWC inpfile