Getting-started tutorial No. 3: Liquid water : dielectric spectra

In this tutorial, we are going to calculate the dielectric constant & function of liquid water.

Tutorial data

The reference files of this tutorial are given in examples/tutorial/3_liquidmethanol/ directory.

Prepare training data

The acquisition of the training data requires two steps: generation of the structure and calculation of the Wannier centers. In the case of liquid structures, the acquisition of the structure is done with classical MD and only the Wannier center calculation is done with DFT due to computational cost considerations.

GROMACS calculation for structure sampling

Here we show the workflow using GROMACS, although other packages are possible.

Convert smiles to topology files

To begin with, we build an initial structure for classical molecular dynamics.

$cat methanol.csv
Smiles,Name,density
CO,METHANOL,0.791

Unlike DFT calculations, you need to generate force field parameters for classical MD. The procedure depends on which software you use. Here is an example for GROMACS calculations. We use openbabel and acpype package.

$obabel -ismi methanol.csv -O methanol.mol --gen3D --conformer --nconf 5000 --weighted
$acpype -s 86400 -i methanol.mol -c bcc -n 0 -m 1 -a gaff2 -f -o gmx -k "qm_theory='AM1', grms_tol=0.05, scfconv=1.d-10, ndiis_attempts=700, "
$obabel -i gro methanol.acpype/methanol_GMX.gro -o mol -O methanol_GMX.mol
$obabel -i gro methanol.acpype/methanol_GMX.gro -o pdb -O methanol_GMX.pdb

Generated input_GMX.itp is the topology file.

Note

For mac OS users, openbabel and acpype are installed as

$brew install open-babel
$conda install -c conda-forge acpype

Generate liquid structure

There are several methods to make liquid structures from a single molecule structure. Here we use packmol to achieve this. We use the experimental density of 0.791[g/cm^3], resulting in the cell parameter of 12.91[nm] with 32 molecules.

$packmol < packmol.inp

Note

For mac OS users, packmol is installed via homebrew as

$brew install packmol

Finally, we add the cell information to the generated methanol_liquid.pdb file using

$gmx editconf -f methanol_liquid.pdb  -box 1.291 1.291 1.291 -o init.gro

We note that GROMACS uses nm instead of Angstrom for the length unit.

../_images/methanol_liquid.png

Prepare input parameter files

Let us go to the cmd/ directory. The input file em.mdp for energy minimization is given below.

; VARIOUS PREPROCESSING OPTIONS
;title                    = Yo
;cpp                      = /usr/bin/cpp
include                  =
define                   =

; RUN CONTROL PARAMETERS
integrator               = steep
nsteps                   = 1000000
emtol                    = 10
emstep                   = 0.1
nstlist                  = 1
cutoff-scheme            = verlet
vdw-type                 = cut-off
rlist                    = 0.6
rvdw                     = 0.6
rcoulomb                 = 0.6

The input file run.mdp for production run is given below. We calculate 10,000,000 steps and sample every 1,000 steps to minimize correlation between structures, obtaining a total of 10,000 structures.

; VARIOUS PREPROCESSING OPTIONS
;title                    = Yo
;cpp                      = /usr/bin/cpp
include                  =
define                   =

; RUN CONTROL PARAMETERS
constraints              = none
integrator               = md
nsteps                   = 10000000
dt                       = 0.001
nstlist                  = 1
rlist                    = 0.6
rvdw                     = 0.6
rcoulomb                 = 0.6
coulombtype              = pme
cutoff-scheme            = verlet
vdw-type                 = cut-off
tc-grps                  = system
tau-t                    = 0.1
gen-vel                  = yes
gen-temp                 = 298.15
ref-t                    = 298.15
Pcoupl                   = no
Tcoupl                    = v-rescale
nstenergy                = 1000
nstxout                  = 1000
nstfout                  = 1000
DispCorr                 = EnerPres

Run GROMACS

Let us run classical molecular dynamics simulations.

#grompp for making input file
mpiexec -n 1 gmx_mpi grompp -f em.mdp -p system.top -c  ../make_itp/init.gro -o em.tpr -maxwarn 10
#mdrun for equilibration
mpiexec -n 8 gmx_mpi mdrun -s em.tpr -o em.trr -e em.edr -c em.gro -nb cp
#grompp for making input file
mpiexec -n 1 gmx_mpi grompp -f run.mdp -p system.top -c em.gro -o eq.tpr -maxwarn 1
#mdrun for production
mpiexec -n 8 gmx_mpi mdrun -s eq.tpr -o eq.trr -e eq.edr -c eq.gro -nb cpu
#making final output
mkdir ./inputs/
echo "System" > ./inputs/anal.txt
mpiexec -n 1 gmx_mpi trjconv -s eq.tpr -f eq.trr -dump 0 -o eq.pdb < ./inputs/anal.txt
mpiexec -n 1 gmx_mpi trjconv -s eq.tpr -f eq.trr -pbc mol -force -o eq_pbc.trr < ./inputs/anal.txt

CPMD calculation for Wannier centers

Let us go to the cpmd/ directory.

Prepare input for CPMD


After finishing GROMACS calculations, we will use the following script to make 10,000 input files for CPMD.

make_bulkjobinput_from_gromacs.py
import ase
import mdtraj
import os
import cpmd.converter_cpmd
# load GROMACS trajectory
traj=mdtraj.load("eq_pbc.trr", top="eq.pdb")

num_config = len(traj)
print("The number of configurations :: {0}".format(num_config))
assert num_config == 10001 # check if gromacs completely finish

os.system("mkdir bulkjob")
for i in range(num_config):
    os.system("mkdir bulkjob/struc_{}".format(str(i)))
    traj[i].save_gro("bulkjob/struc_{}/final_structure.gro".format(str(i)))
    ase_atoms=ase.io.read("bulkjob/struc_{}/final_structure.gro".format(str(i)))
    makeinput=cpmd.converter_cpmd.make_cpmdinput(ase_atoms)
    makeinput.make_bomd_oneshot(type="sorted")
    os.system("mv bomd-oneshot.inp bulkjob/struc_{}/bomd-oneshot.inp".format(str(i)))
    os.system("mv sort_index.txt   bulkjob/struc_{}/sort_index.txt".format(str(i)))
    os.system("mkdir bulkjob/struc_{}/tmp".format(str(i)))

print(" -------------------- ")
print("finish making bulkjob inputs !! ")

Note

Generated inputs are just samples. You should tune parameters for serious calculations.

We create tmp/ and pseudo/ directories in each bulkjob/struc_* directory to stock outputs and pseudo potentials, respectively. You also have to prepare C_MT_GIA_BLYP, O_MT_GIA_BLYP, and H_MT_BLYP.psp from CPMD pseudo potential directories and store them in pseudo/ directory.

Run CPMD

We execute 10000 scf calculations as follows.

for i in {1..10000};
do
    cd bulkjob/struc_${i}
    mpirun cpmd.x bomd-oneshot.inp >> bomd-oneshot.out
    cd ../../
done;

After the calculation, we gather IONS+CENTERS.xyz in bulkjob/struc_* directories into a single IONS+CENTERS_merge.xyz file.

NUM=10000
for i in `eval echo {0..$NUM}`;
do
    cat bulkjob/struc_${i}/IONS+CENTERS.xyz >> IONS+CENTERS_merge.xyz

Postprocess CPMD data

IONS+CENTERS_merge.xyz does not include the lattice information, which we need to add manually. We can use CPextract.py to do this.

$CPextract.py cpmd addlattice -i IONS+CENTERS_merge.xyz -s bulkjob/struc_1/bomd-wan-restart.out IONS+CENTERS_merge_cell.xyz

Second, we will re-sort the atomic orders in the file.

$CPextract.py cpmd sort -i IONS+CENTERS_merge_cell.xyz -s bulkjob/struc_1/sort_index.txt IONS+CENTERS_merge_cell_sorted.xyz

Train models

Let us go to the train/ directory.

Train models

The previously prepared IONS+CENTERS_merge_cell_sorted.xyz and methanol.mol are used for training ML models. As methanol has CH, CO, OH bonds and O lone pair, we have to train four independent ML models. The input file for CPtrain.py is given in yaml format. The input file for the CH bond is as follows.

model:
modelname: model_ch  # specify name
nfeature:  288       # length of descriptor
M:         20        # M  (embedding matrix size)
Mb:        6         # Mb (embedding matrix size, smaller than M)

learning_rate:
type: fix

loss:
type: mse        # mean square error

data:
type: xyz
file:
    - "../cpmd/IONS+CENTERS+cell_sorted_merge.xyz"
itp_file: ../make_itp/methanol_GMX.mol
bond_type: CH # CH, CO, OH, O

traininig:
device:     cpu # Torch device (cpu/mps/cuda)
batch_size: 32  # batch size for training
validation_vatch_size: 32 # batch size for validation
max_epochs: 50
learnint_rate: 1e-2 # starting learning rate
n_train:   9000    # the number of training data
n_val:     1000    # the number of validation data
modeldir:  model_ch # directory to save models
restart:   False    # If restart training

For gas systems, we can reduce the model size without losing accuracy.

We can train the CH bond model using CPtrain.py train command as follows.

$CPtrain.py train -i input.yaml

After the training, RMSE should be about 0.01[D] to 0.05[D] for liquid systems.

Next, you can change modelname, bond_type, and modeldir to corresponding bonds, and re-run CPtrain.py to train other 4 models.

Test a model

We can check the quality of the trained model as follows.

$CPtrain.py test -m model_ch/model_ch_python.pt -x ../cpmd/IONS+CENTERS_cell.xyz -i ../make_itp/methanol_GMX.mol

Calculate dielectric constant

Let us go to the pred/ directory.

Run long classical molecular dynamics

As an example of a real-world application, we will calculate the dielectric constant of liquid methanol. The dielectric constant requires extremely long trajectories to converge. Here we prepare a 50ps trajectory containing 500 molecules using GROMACS. Input files prepared in the directory.

$tree
├── em.mdp
├── init.gro
├── run.mdp
└── system.top
└── make_xyz.py

We run the simulation using following commands. It takes long time to finish.

#grompp for making input file
mpiexec -n 1 gmx_mpi grompp -f em.mdp -p system.top -c  init.gro -o em.tpr -maxwarn 10
#mdrun for equilibration
mpiexec -n 8 gmx_mpi mdrun -s em.tpr -o em.trr -e em.edr -c em.gro -nb cp
#grompp for making input file
mpiexec -n 1 gmx_mpi grompp -f run.mdp -p system.top -c em.gro -o eq.tpr -maxwarn 1
#mdrun for production
mpiexec -n 8 gmx_mpi mdrun -s eq.tpr -o eq.trr -e eq.edr -c eq.gro -nb cpu
#making final output
mkdir ./inputs/
echo "System" > ./inputs/anal.txt
mpiexec -n 1 gmx_mpi trjconv -s eq.tpr -f eq.trr -dump 0 -o eq.pdb < ./inputs/anal.txt
mpiexec -n 1 gmx_mpi trjconv -s eq.tpr -f eq.trr -pbc mol -force -o eq_pbc.trr < ./inputs/anal.txt

After the calculations, we have to change the format of output binary file (eq_pbc.trr) into the xyz format.

make_xyz.py
def make_ase(symbols,positions,cell):
    from ase import Atoms
    mols = Atoms(symbols=symbols,
                positions=positions*10,  # nm(gro) to ang (ase)
                cell= cell*10,
                pbc=[1, 1, 1])
    return mols

import mdtraj
traj=mdtraj.load("eq_pbc.trr", top="eq.pdb")
print("traj :: ", len(traj))

# get cell
UNITCELL_VECTORS=traj[-1].unitcell_vectors.reshape([3,3])

# get symbols
table, bonds =traj[-1].topology.to_dataframe()
symbols=table['element']

# get xyz
traj_coords = traj.xyz

import ase.io
import ase

# make list[atoms]
answer_atomslist=[]
for positions in traj_coords:
    aseatom=make_ase(symbols,positions,UNITCELL_VECTORS)
    answer_atomslist.append(aseatom)
# save
ase.io.write("gromacs_trajectory_cell.xyz",answer_atomslist)

print("==========")
print(" end ")

Predict dipole moment along the trajectory

Finally, we predict the dipole moment along the MD trajectory. The input file for C++ interface is as follows.

config.yaml
general:
    itpfilename: methanol.acpype/input_GMX.mol
    bondfilename: ../make_itp/methanol.acpype/methanol_GMX.mol
    savedir: dipole_50ns/
    temperature: 298
descriptor:
    calc: 1
    directory: ./
    xyzfilename: gromacs_trajectory_cell.xyz
    savedir: dipole_50ns/
    descmode: 2
    desctype: allinone
    haswannier: 1
    interval: 1
    desc_coh: 0
predict:
    calc: 1
    desc_dir: dipole_50ns/
    model_dir:
    modelmode: rotate
    bondspecies: 4
    save_truey: 0
$mkdir dipole_50ns/
$dieltools config.yaml

We can check the dielectric constant in the DIELCONST file.

$cat dipole_50ns/DIELCONST
calculated mean dipole & dielectric constants
WARNING eps^inf is fixed to 1.0, and we only support orthorhombic lattice.
 temperature                           298
 mean_absolute_mol_dipole          2.49343
 stderr_absolute_mol_dipole         0.266285
           mean_M2(<M^2>)          10367.5
           mean_M(<M>^2)           3.06974
           eps^0                   32.1915

The calculated dielectric constant is 32.19, which agrees well with the experimental value of 32.66.