Getting-started tutorial No. 3: Liquid water : dielectric spectra
In this tutorial, we are going to calculate the dielectric constant & function of liquid water.
Tutorial data
The reference files of this tutorial are given in examples/tutorial/3_liquidmethanol/
directory.
Prepare training data
The acquisition of the training data requires two steps: generation of the structure and calculation of the Wannier centers. In the case of liquid structures, the acquisition of the structure is done with classical MD and only the Wannier center calculation is done with DFT due to computational cost considerations.
GROMACS calculation for structure sampling
Here we show the workflow using GROMACS, although other packages are possible.
Convert smiles to topology files
To begin with, we build an initial structure for classical molecular dynamics.
$cat methanol.csv
Smiles,Name,density
CO,METHANOL,0.791
Unlike DFT calculations, you need to generate force field parameters for classical MD. The procedure depends on which software you use.
Here is an example for GROMACS
calculations. We use openbabel
and acpype
package.
$obabel -ismi methanol.csv -O methanol.mol --gen3D --conformer --nconf 5000 --weighted
$acpype -s 86400 -i methanol.mol -c bcc -n 0 -m 1 -a gaff2 -f -o gmx -k "qm_theory='AM1', grms_tol=0.05, scfconv=1.d-10, ndiis_attempts=700, "
$obabel -i gro methanol.acpype/methanol_GMX.gro -o mol -O methanol_GMX.mol
$obabel -i gro methanol.acpype/methanol_GMX.gro -o pdb -O methanol_GMX.pdb
Generated input_GMX.itp
is the topology file.
Note
For mac OS users, openbabel
and acpype
are installed as
$brew install open-babel
$conda install -c conda-forge acpype
Generate liquid structure
There are several methods to make liquid structures from a single molecule structure. Here we use packmol
to achieve this.
We use the experimental density of 0.791[g/cm^3]
, resulting in the cell parameter of 12.91[nm]
with 32
molecules.
$packmol < packmol.inp
Note
For mac OS users, packmol
is installed via homebrew
as
$brew install packmol
Finally, we add the cell information to the generated methanol_liquid.pdb
file using
$gmx editconf -f methanol_liquid.pdb -box 1.291 1.291 1.291 -o init.gro
We note that GROMACS
uses nm
instead of Angstrom
for the length unit.
Prepare input parameter files
Let us go to the cmd/
directory. The input file em.mdp
for energy minimization is given below.
; VARIOUS PREPROCESSING OPTIONS
;title = Yo
;cpp = /usr/bin/cpp
include =
define =
; RUN CONTROL PARAMETERS
integrator = steep
nsteps = 1000000
emtol = 10
emstep = 0.1
nstlist = 1
cutoff-scheme = verlet
vdw-type = cut-off
rlist = 0.6
rvdw = 0.6
rcoulomb = 0.6
The input file run.mdp
for production run is given below. We calculate 10,000,000
steps and sample every 1,000
steps to minimize correlation between structures, obtaining a total of 10,000
structures.
; VARIOUS PREPROCESSING OPTIONS
;title = Yo
;cpp = /usr/bin/cpp
include =
define =
; RUN CONTROL PARAMETERS
constraints = none
integrator = md
nsteps = 10000000
dt = 0.001
nstlist = 1
rlist = 0.6
rvdw = 0.6
rcoulomb = 0.6
coulombtype = pme
cutoff-scheme = verlet
vdw-type = cut-off
tc-grps = system
tau-t = 0.1
gen-vel = yes
gen-temp = 298.15
ref-t = 298.15
Pcoupl = no
Tcoupl = v-rescale
nstenergy = 1000
nstxout = 1000
nstfout = 1000
DispCorr = EnerPres
Run GROMACS
Let us run classical molecular dynamics simulations.
#grompp for making input file
mpiexec -n 1 gmx_mpi grompp -f em.mdp -p system.top -c ../make_itp/init.gro -o em.tpr -maxwarn 10
#mdrun for equilibration
mpiexec -n 8 gmx_mpi mdrun -s em.tpr -o em.trr -e em.edr -c em.gro -nb cp
#grompp for making input file
mpiexec -n 1 gmx_mpi grompp -f run.mdp -p system.top -c em.gro -o eq.tpr -maxwarn 1
#mdrun for production
mpiexec -n 8 gmx_mpi mdrun -s eq.tpr -o eq.trr -e eq.edr -c eq.gro -nb cpu
#making final output
mkdir ./inputs/
echo "System" > ./inputs/anal.txt
mpiexec -n 1 gmx_mpi trjconv -s eq.tpr -f eq.trr -dump 0 -o eq.pdb < ./inputs/anal.txt
mpiexec -n 1 gmx_mpi trjconv -s eq.tpr -f eq.trr -pbc mol -force -o eq_pbc.trr < ./inputs/anal.txt
CPMD calculation for Wannier centers
Let us go to the cpmd/
directory.
Prepare input for CPMD
After finishing GROMACS calculations, we will use the following script to make 10,000
input files for CPMD.
import ase
import mdtraj
import os
import cpmd.converter_cpmd
# load GROMACS trajectory
traj=mdtraj.load("eq_pbc.trr", top="eq.pdb")
num_config = len(traj)
print("The number of configurations :: {0}".format(num_config))
assert num_config == 10001 # check if gromacs completely finish
os.system("mkdir bulkjob")
for i in range(num_config):
os.system("mkdir bulkjob/struc_{}".format(str(i)))
traj[i].save_gro("bulkjob/struc_{}/final_structure.gro".format(str(i)))
ase_atoms=ase.io.read("bulkjob/struc_{}/final_structure.gro".format(str(i)))
makeinput=cpmd.converter_cpmd.make_cpmdinput(ase_atoms)
makeinput.make_bomd_oneshot(type="sorted")
os.system("mv bomd-oneshot.inp bulkjob/struc_{}/bomd-oneshot.inp".format(str(i)))
os.system("mv sort_index.txt bulkjob/struc_{}/sort_index.txt".format(str(i)))
os.system("mkdir bulkjob/struc_{}/tmp".format(str(i)))
print(" -------------------- ")
print("finish making bulkjob inputs !! ")
Note
Generated inputs are just samples. You should tune parameters for serious calculations.
We create tmp/
and pseudo/
directories in each bulkjob/struc_*
directory to stock outputs and pseudo potentials, respectively. You also have to prepare C_MT_GIA_BLYP
, O_MT_GIA_BLYP
, and H_MT_BLYP.psp
from CPMD pseudo potential directories and store them in pseudo/
directory.
Run CPMD
We execute 10000
scf calculations as follows.
for i in {1..10000};
do
cd bulkjob/struc_${i}
mpirun cpmd.x bomd-oneshot.inp >> bomd-oneshot.out
cd ../../
done;
After the calculation, we gather IONS+CENTERS.xyz
in bulkjob/struc_*
directories into a single IONS+CENTERS_merge.xyz
file.
NUM=10000
for i in `eval echo {0..$NUM}`;
do
cat bulkjob/struc_${i}/IONS+CENTERS.xyz >> IONS+CENTERS_merge.xyz
Postprocess CPMD data
IONS+CENTERS_merge.xyz
does not include the lattice information, which we need to add manually. We can use CPextract.py
to do this.
$CPextract.py cpmd addlattice -i IONS+CENTERS_merge.xyz -s bulkjob/struc_1/bomd-wan-restart.out IONS+CENTERS_merge_cell.xyz
Second, we will re-sort the atomic orders in the file.
$CPextract.py cpmd sort -i IONS+CENTERS_merge_cell.xyz -s bulkjob/struc_1/sort_index.txt IONS+CENTERS_merge_cell_sorted.xyz
Train models
Let us go to the train/
directory.
Train models
The previously prepared IONS+CENTERS_merge_cell_sorted.xyz
and methanol.mol
are used for training ML models. As methanol has CH
, CO
, OH
bonds and O
lone pair, we have to train four independent ML models. The input file for CPtrain.py
is given in yaml
format.
The input file for the CH bond is as follows.
model:
modelname: model_ch # specify name
nfeature: 288 # length of descriptor
M: 20 # M (embedding matrix size)
Mb: 6 # Mb (embedding matrix size, smaller than M)
learning_rate:
type: fix
loss:
type: mse # mean square error
data:
type: xyz
file:
- "../cpmd/IONS+CENTERS+cell_sorted_merge.xyz"
itp_file: ../make_itp/methanol_GMX.mol
bond_type: CH # CH, CO, OH, O
traininig:
device: cpu # Torch device (cpu/mps/cuda)
batch_size: 32 # batch size for training
validation_vatch_size: 32 # batch size for validation
max_epochs: 50
learnint_rate: 1e-2 # starting learning rate
n_train: 9000 # the number of training data
n_val: 1000 # the number of validation data
modeldir: model_ch # directory to save models
restart: False # If restart training
For gas systems, we can reduce the model size without losing accuracy.
We can train the CH bond model using CPtrain.py train
command as follows.
$CPtrain.py train -i input.yaml
After the training, RMSE should be about 0.01[D]
to 0.05[D]
for liquid systems.
Next, you can change modelname
, bond_type
, and modeldir
to corresponding bonds, and re-run CPtrain.py
to train other 4 models.
Test a model
We can check the quality of the trained model as follows.
$CPtrain.py test -m model_ch/model_ch_python.pt -x ../cpmd/IONS+CENTERS_cell.xyz -i ../make_itp/methanol_GMX.mol
Calculate dielectric constant
Let us go to the pred/
directory.
Run long classical molecular dynamics
As an example of a real-world application, we will calculate the dielectric constant of liquid methanol. The dielectric constant requires extremely long trajectories to converge.
Here we prepare a 50ps
trajectory containing 500
molecules using GROMACS
. Input files prepared in the directory.
$tree
├── em.mdp
├── init.gro
├── run.mdp
└── system.top
└── make_xyz.py
We run the simulation using following commands. It takes long time to finish.
#grompp for making input file
mpiexec -n 1 gmx_mpi grompp -f em.mdp -p system.top -c init.gro -o em.tpr -maxwarn 10
#mdrun for equilibration
mpiexec -n 8 gmx_mpi mdrun -s em.tpr -o em.trr -e em.edr -c em.gro -nb cp
#grompp for making input file
mpiexec -n 1 gmx_mpi grompp -f run.mdp -p system.top -c em.gro -o eq.tpr -maxwarn 1
#mdrun for production
mpiexec -n 8 gmx_mpi mdrun -s eq.tpr -o eq.trr -e eq.edr -c eq.gro -nb cpu
#making final output
mkdir ./inputs/
echo "System" > ./inputs/anal.txt
mpiexec -n 1 gmx_mpi trjconv -s eq.tpr -f eq.trr -dump 0 -o eq.pdb < ./inputs/anal.txt
mpiexec -n 1 gmx_mpi trjconv -s eq.tpr -f eq.trr -pbc mol -force -o eq_pbc.trr < ./inputs/anal.txt
After the calculations, we have to change the format of output binary file (eq_pbc.trr
) into the xyz
format.
def make_ase(symbols,positions,cell):
from ase import Atoms
mols = Atoms(symbols=symbols,
positions=positions*10, # nm(gro) to ang (ase)
cell= cell*10,
pbc=[1, 1, 1])
return mols
import mdtraj
traj=mdtraj.load("eq_pbc.trr", top="eq.pdb")
print("traj :: ", len(traj))
# get cell
UNITCELL_VECTORS=traj[-1].unitcell_vectors.reshape([3,3])
# get symbols
table, bonds =traj[-1].topology.to_dataframe()
symbols=table['element']
# get xyz
traj_coords = traj.xyz
import ase.io
import ase
# make list[atoms]
answer_atomslist=[]
for positions in traj_coords:
aseatom=make_ase(symbols,positions,UNITCELL_VECTORS)
answer_atomslist.append(aseatom)
# save
ase.io.write("gromacs_trajectory_cell.xyz",answer_atomslist)
print("==========")
print(" end ")
Predict dipole moment along the trajectory
Finally, we predict the dipole moment along the MD trajectory. The input file for C++ interface is as follows.
general:
itpfilename: methanol.acpype/input_GMX.mol
bondfilename: ../make_itp/methanol.acpype/methanol_GMX.mol
savedir: dipole_50ns/
temperature: 298
descriptor:
calc: 1
directory: ./
xyzfilename: gromacs_trajectory_cell.xyz
savedir: dipole_50ns/
descmode: 2
desctype: allinone
haswannier: 1
interval: 1
desc_coh: 0
predict:
calc: 1
desc_dir: dipole_50ns/
model_dir:
modelmode: rotate
bondspecies: 4
save_truey: 0
$mkdir dipole_50ns/
$dieltools config.yaml
We can check the dielectric constant in the DIELCONST
file.
$cat dipole_50ns/DIELCONST
calculated mean dipole & dielectric constants
WARNING eps^inf is fixed to 1.0, and we only support orthorhombic lattice.
temperature 298
mean_absolute_mol_dipole 2.49343
stderr_absolute_mol_dipole 0.266285
mean_M2(<M^2>) 10367.5
mean_M(<M>^2) 3.06974
eps^0 32.1915
The calculated dielectric constant is 32.19
, which agrees well with the experimental value of 32.66
.