################################################################### Getting-started tutorial No. 1: Liquid methanol ################################################################### In this tutorial, we train ML dipole models of liquid methanol. ************************************* Required data for calculations ************************************* To train ML models for dipole moment, we only need two files: * atomic coordinates with Wannier centers * molecular structure (chemical bond information) The first file is assumed to be the ``extended xyz`` format via ``ase`` package, which also contains the supercell information. The second file should be a standard ``mol`` file to be processed using ``rdkit``. We prepared a simple example using the isolated methanol system for this tutorial. Necessary files are included in this repository. Let's go to `tutorial/tutorial1/`. .. code-block:: bash $cd examples/tutorial/tutorial1/ $tree ├── IONS+CENTERS+cell_sorted_merge.xyz -> ../../CPtrain/cptrain_train/IONS+CENTERS+cell_sorted_merge.xyz ├── config.yaml ├── methanol.mol └── train.yaml xyz format atomic structures for training data --------------------------------------------------- The order of atoms should satisfy three things * The atomic order must be molecule-by-molecule. * The atomic order in each molecule should be the same as the ``*.mol`` file. * The WCs should come last. If you see the first 8 lines of ``methanol.xyz``, you can find ``C``, four ``H``, and ``O``. The Wannier centers (WC) are represented as `X`. There are the atoms and WCs included in a single MD step. .. code-block:: bash $ cat methanol.xyz 416 Lattice="12.911606431431386 0.0 0.0 0.0 12.911606431431386 0.0 0.0 0.0 12.911606431431386" Properties=species:S:1:pos:R:3 pbc="T T T" C 10.70302493 10.85955260 8.68064224 O 9.72875172 10.09124538 9.38880112 H 10.70478669 10.62022413 7.61094586 H 11.69788485 10.69544339 9.10099246 H 10.46045016 11.94347292 8.76860487 H 9.43087091 10.68001118 10.11893151 They are visualized using `nglview` package via jupyter notebook as follows. .. code-block:: python %pip install nglview %pip install ase import nglview as nv import ase.io # read all the trajectory. # If you want to extract a single step instead, try like ase.io.read("filename", index=1) aseatoms = ase.io.read("mol_wan.xyz",index=":") # This if for list of ase.atoms. If you want to see single ase.atom, use nv.show_ase. w = nv.show_asetraj(aseatoms,gui=True) w.clear_representations() w.add_label(radius=0.2,color="black",label_type="atom") w.add_ball_and_stick("_He",color="green",radius=0.004,aspectRatio=50) w.add_ball_and_stick("_Ne",color="cyan",radius=0.004,aspectRatio=50) w.add_ball_and_stick("_Ar",color="green",radius=0.004,aspectRatio=50) #w.add_ball_and_stick("_Li",color="cyan",radius=0.1) #w.add_ball_and_stick("_Be",color="blue",radius=0.1) w.add_ball_and_stick("_H") w.add_ball_and_stick("_C") w.add_ball_and_stick("_O") w.add_ball_and_stick("_N") w.add_unitcell() w.update_unitcell() w We have ``10000`` MD steps in the file, which will be used for both training and validation data for ML. .. code-block:: python # see how many steps in the aseatoms print(len(aseatoms)) Mol file for bond information --------------------------------------- Next, we dig into the ``*.mol`` file, which contains molecular structures including atomic and bonding information. .. code-block:: bash $ cat methanol.mol 6 5 0 0 0 0 0 0 0 0999 V2000 0.9400 0.0200 -0.0900 C 0 0 0 0 0 0 0 0 0 0 0 0 0.4700 0.2700 -1.4000 O 0 0 0 0 0 0 0 0 0 0 0 0 0.5800 -0.9500 0.2400 H 0 0 0 0 0 0 0 0 0 0 0 0 0.5700 0.8000 0.5800 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0400 0.0200 -0.0900 H 0 0 0 0 0 0 0 0 0 0 0 0 0.8100 1.1400 -1.6700 H 0 0 0 0 0 0 0 0 0 0 0 0 1 5 1 0 0 0 0 1 3 1 0 0 0 0 1 4 1 0 0 0 0 2 1 1 0 0 0 0 6 2 1 0 0 0 0 M END The second to seventh lines are called atom block, which contain atomic coordinates and species in a single molecule. We only use atomic species for training. The following data is called atom block, representing bonding information. .. code-block:: bash 1 5 1 0 0 0 0 For example, the above line means the first and fifth atom (C and H) have a chemical bond. In other words, the atoms with first two numbers have a chemical bond. The ``*.mol`` format is a standard format for molecular structures, and you can easily find information on it. ************************************* Model training ************************************* Prepare input parameters ------------------------------ To train models, we implemented ``CPtrain.py`` command written in python. The command requires a ``yaml`` format file to specify parameters. Here is an example: .. code-block:: yaml model: modelname: test # specify name nfeature: 288 # length of descriptor M: 20 # M (embedding matrix size) Mb: 6 # Mb (embedding matrix size, smaller than M) learning_rate: type: fix loss: type: mse # mean square error data: type: xyz # or xyz file: - "IONS+CENTERS+cell_sorted_merge.xyz" itp_file: methanol.mol bondtype: ch training: device: cpu # Torchのdevice batch_size: 32 # batch size for training validation_batch_size: 32 # batch size for validation max_epochs: 40 learning_rate: 1e-2 # starting learning rate n_train: 900 # the number of training data n_val: 100 # the number of validation data modeldir: model_test # directory to save models restart: False # If restart training Parameters written above are basically necessary values (not optional). The input file consists of four parts: +----------------+------------------------+ | part name | explanation | +================+========================+ | model | ML model parameters | +----------------+------------------------+ | learning_rate | learning rate | +----------------+------------------------+ | loss | loss function | +----------------+------------------------+ | data | training data | +----------------+------------------------+ | training | training parameters | +----------------+------------------------+ As Basic explanations are given above, we only add some important notes. * model * Model parameters (``nfeature``, ``M``, ``Mb``) given above are basically enough for simple gas/liquid molecules. Although the detailed meanings of the parameters will be given later, we emphasize that ``Mb`` should be smaller than ``M`` by definition, and that `nfeature` should be a multiple of ``4``. * ``modelname`` is just used for file names, so you can use any word as you like. * learning_rate * Currently, we only support fixed learning rate. * loss * Currently, We only support Mean Squared Error (MSE) as a loss function. * data * Training data should be ``descriptor`` or ``xyz``. In this tutorial, we use ``xyz`` type. * If training data type is ``descriptor``, the descriptor file name should be :code:`*_descs.npy`, and the true file name should be :code:`*_true.npy`. * ``bondtype`` defines which bond to be trained. The value is one of ``CH``, ``CO``, ``OH``, ``CC``, or ``O``. * training * ``device`` is the same as ``pytorch``'s device for model training. You can use `cpu`, `cuda`, or `mps`. * ``modeldir`` specifies the directory to which model files will be saved. Train a model ---------------------- After the training script is prepared, we can start the training by simply running .. code-block:: bash CPtrain.py train -i input.yaml The code generates ``stdout`` like .. code-block:: bash your python version is ... 3 11 ***************************************************************** CPtrain.py Version. 0.0.1 ***************************************************************** 2024-05-27 23:21:32,907 root mltrain [INFO]: Start logging {'model': {'modelname': 'test', 'nfeature': 288, 'M': 20, 'Mb': 6}, 'learning_rate': {'type': 'fix'}, 'loss': {'type': 'mse'}, 'data': {'type': 'xyz', 'file': ['IONS+CENTERS+cell_sorted_merge.xyz'], 'itp_file': 'methanol.mol'}, 'training': {'device': 'cpu', 'batch_size': 32, 'validation_batch_size': 32, 'max_epochs': 40, 'learning_rate': '1e-2', 'n_train': 900, 'n_val': 100, 'modeldir': 'model_test', 'restart': False}} model NET :: nfeatures :: 288 nfeatures_enet :: 72 nfeatures_fnet :: 120 ================================================================= Layer (type:depth-idx) Param # ================================================================= NET_withoutBN -- ├─Linear: 1-1 3,650 ├─Linear: 1-2 2,550 ├─Linear: 1-3 73,440 ├─Linear: 1-4 6,050 ├─Linear: 1-5 2,550 ├─Linear: 1-6 1,020 ================================================================= Total params: 89,260 Trainable params: 89,260 Non-trainable params: 0 ================================================================= 2024-05-27 23:21:32,927 root mltrain [INFO]: -------------------------------------- data type :: xyz ----- ml.read_mol :: parse results... ------- bonds_list :: [[0, 4], [0, 2], [0, 3], [1, 0], [5, 1]] counter :: 6 atom_list :: ['C', 'O', 'H', 'H', 'H', 'H'] ----------------------------------------------- ================ CH bonds... [[0, 4], [0, 2], [0, 3]] CO bonds... [[1, 0]] OH bonds... [[5, 1]] OO bonds... [] CC bonds... [] CC ring bonds... [] ================== ring_bond_index [] ch_bond_index [0, 1, 2] oh_bond_index [4] co_bond_index [3] cc_bond_index [] ================ O atoms (lonepair)... [1] N atoms (lonepair)... [] C atoms ... [0] H atoms ... [2, 3, 4, 5] C 0.94 0.02 -0.09 O 0.47 0.27 -1.4 ----- ml.read_mol :: parse results... ------- representative_atom_index :: 1 ----------------------------------------------- ================ coh_index/coc_index :: [oの番号, {coボンドの番号(co_bond_indexの0から数えていくつか),ohボンドの番号}] TODO :: もしかしたらbond_indexを使った方が全体的にやりやすいかもしれない coh_index :: [[0, {'CO': 0, 'OH': 0}]] coc_index :: [] Loading xyz file :: ['IONS+CENTERS+cell_sorted_merge.xyz'] len xyz == 1 2024-05-27 23:21:34,561 root mltrain [INFO]: ----------------------------------------------------------------- 2024-05-27 23:21:34,561 root mltrain [INFO]: ---Summary of DataSystem: training ---------------------------------- 2024-05-27 23:21:34,561 root mltrain [INFO]: found 1 system(s): 2024-05-27 23:21:34,561 root mltrain [INFO]: system natoms bch_sz n_bch 2024-05-27 23:21:34,561 root mltrain [INFO]: IONS+CENTERS+cell_sorted_merge.xyz 1000 32 31 2024-05-27 23:21:34,562 root mltrain [INFO]: -------------------------------------------------------------------------------------- splitting atoms into atoms and WCs Assigning Wannier Centers Finish Assigning Wannier Centers 2024-05-27 23:22:28,891 Trainer __init__ [INFO]: model data will be saved to model_test ================================================================= Layer (type:depth-idx) Param # ================================================================= NET_withoutBN -- ├─Linear: 1-1 3,650 ├─Linear: 1-2 2,550 ├─Linear: 1-3 73,440 ├─Linear: 1-4 6,050 ├─Linear: 1-5 2,550 ├─Linear: 1-6 1,020 ================================================================= Total params: 89,260 Trainable params: 89,260 Non-trainable params: 0 ================================================================= 2024-05-27 23:22:28,892 Trainer init_model [INFO]: Torch device (cpu or cuda gpu or m1 mac gpu): cpu 2024-05-27 23:22:29,384 numexpr.utils _init_num_threads [INFO]: Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2024-05-27 23:22:29,384 numexpr.utils _init_num_threads [INFO]: NumExpr defaulting to 8 threads. 2024-05-27 23:22:29,556 Trainer set_dataset [INFO]: n_traing ( number of training data): 900 2024-05-27 23:22:29,556 Trainer set_dataset [INFO]: n_val ( number of validatin data): 100 ^@2024-05-27 23:23:05,110 Trainer epoch_step [INFO]: epoch= 1 : time= 35.553799867630005 [s] : loss(train)= 0.0030520600183600827 : loss(valid)= 0.0028606270595143237 : RMSE[D](train)= 0.0551564542987999 : RMSE[D](valid)= 0.05347409362915805 model is saved to model_test_tmp1.pt at model_test 2024-05-27 23:23:41,283 Trainer epoch_step [INFO]: epoch= 2 : time= 36.05292296409607 [s] : loss(train)= 0.002742534619756043 : loss(valid)= 0.002425628947094083 : RMSE[D](train)= 0.05222889056766713 : RMSE[D](valid)= 0.04924118021059818 model is saved to model_test_tmp2.pt at model_test 2024-05-27 23:24:17,498 Trainer epoch_step [INFO]: epoch= 3 : time= 36.17855882644653 [s] : loss(train)= 0.0027552477217146327 : loss(valid)= 0.0023257903133829436 : RMSE[D](train)= 0.05240615377016262 : RMSE[D](valid)= 0.04821778239842708 model is saved to model_test_tmp3.pt at model_test 2024-05-27 23:24:52,931 Trainer epoch_step [INFO]: epoch= 4 : time= 35.389596939086914 [s] : loss(train)= 0.002701992983929813 : loss(valid)= 0.002566932700574398 : RMSE[D](train)= 0.051863416078856424 : RMSE[D](valid)= 0.0505809825728987 model is saved to model_test_tmp4.pt at model_test 2024-05-27 23:25:28,712 Trainer epoch_step [INFO]: epoch= 5 : time= 35.72677397727966 [s] : loss(train)= 0.0026170784757206483 : loss(valid)= 0.0026297084211061397 : RMSE[D](train)= 0.05109348549317842 : RMSE[D](valid)= 0.05118794610165333 model is saved to model_test_tmp5.pt at model_test 2024-05-27 23:26:02,858 Trainer epoch_step [INFO]: epoch= 6 : time= 34.10452723503113 [s] : loss(train)= 0.002519122832122126 : loss(valid)= 0.002710235926012198 : RMSE[D](train)= 0.05010292158956631 : RMSE[D](valid)= 0.05182654546680134 model is saved to model_test_tmp6.pt at model_test 2024-05-27 23:26:36,344 Trainer epoch_step [INFO]: epoch= 7 : time= 33.43118190765381 [s] : loss(train)= 0.002488897938746959 : loss(valid)= 0.002438249376912912 : RMSE[D](train)= 0.04980324078916386 : RMSE[D](valid)= 0.04936132032714493 model is saved to model_test_tmp7.pt at model_test 2024-05-27 23:27:09,524 Trainer epoch_step [INFO]: epoch= 8 : time= 33.13404870033264 [s] : loss(train)= 0.0024281473555934747 : loss(valid)= 0.0024121985770761967 : RMSE[D](train)= 0.049172273966610065 : RMSE[D](valid)= 0.04911379181779723 model is saved to model_test_tmp8.pt at model_test 2024-05-27 23:27:42,988 Trainer epoch_step [INFO]: epoch= 9 : time= 33.424768924713135 [s] : loss(train)= 0.0024187416752933393 : loss(valid)= 0.00232696447831889 : RMSE[D](train)= 0.04911207242387859 : RMSE[D](valid)= 0.04820846097177897 model is saved to model_test_tmp9.pt at model_test 2024-05-27 23:28:17,375 Trainer epoch_step [INFO]: epoch= 10 : time= 34.339770793914795 [s] : loss(train)= 0.002027994516538456 : loss(valid)= 0.0020500118068108955 : RMSE[D](train)= 0.04497044101939407 : RMSE[D](valid)= 0.045273193022861896 model is saved to model_test_tmp10.pt at model_test 2024-05-27 23:28:50,388 Trainer epoch_step [INFO]: epoch= 11 : time= 32.96853709220886 [s] : loss(train)= 0.0018060718056014074 : loss(valid)= 0.0015387694584205747 : RMSE[D](train)= 0.042402016109021445 : RMSE[D](valid)= 0.0391866800126098 model is saved to model_test_tmp11.pt at model_test 2024-05-27 23:29:23,494 Trainer epoch_step [INFO]: epoch= 12 : time= 33.06202292442322 [s] : loss(train)= 0.0015744604378206922 : loss(valid)= 0.001602462415272991 : RMSE[D](train)= 0.03961410282709073 : RMSE[D](valid)= 0.03996957523043282 model is saved to model_test_tmp12.pt at model_test 2024-05-27 23:29:56,833 Trainer epoch_step [INFO]: epoch= 13 : time= 33.296122789382935 [s] : loss(train)= 0.0015049170885634208 : loss(valid)= 0.0015406373422592878 : RMSE[D](train)= 0.03872750117829672 : RMSE[D](valid)= 0.039230152528586505 model is saved to model_test_tmp13.pt at model_test 2024-05-27 23:30:30,009 Trainer epoch_step [INFO]: epoch= 14 : time= 33.13048601150513 [s] : loss(train)= 0.0014625149363252734 : loss(valid)= 0.0013872180522109072 : RMSE[D](train)= 0.038211363274244604 : RMSE[D](valid)= 0.03717074105946483 model is saved to model_test_tmp14.pt at model_test 2024-05-27 23:31:03,023 Trainer epoch_step [INFO]: epoch= 15 : time= 32.970470905303955 [s] : loss(train)= 0.0012952481338288635 : loss(valid)= 0.0011708201685299475 : RMSE[D](train)= 0.03594431862161766 : RMSE[D](valid)= 0.034191540839132055 model is saved to model_test_tmp15.pt at model_test 2024-05-27 23:31:37,101 Trainer epoch_step [INFO]: epoch= 16 : time= 34.034749031066895 [s] : loss(train)= 0.0012781490970935141 : loss(valid)= 0.0012487516505643725 : RMSE[D](train)= 0.03567568660112371 : RMSE[D](valid)= 0.03530874235402812 model is saved to model_test_tmp16.pt at model_test 2024-05-27 23:32:10,323 Trainer epoch_step [INFO]: epoch= 17 : time= 33.178393840789795 [s] : loss(train)= 0.0012396411870473198 : loss(valid)= 0.0012016038332755368 : RMSE[D](train)= 0.0351610294889353 : RMSE[D](valid)= 0.0346635038703976 model is saved to model_test_tmp17.pt at model_test 2024-05-27 23:32:43,535 Trainer epoch_step [INFO]: epoch= 18 : time= 33.16394877433777 [s] : loss(train)= 0.0012235958503359662 : loss(valid)= 0.0012918139109387994 : RMSE[D](train)= 0.03492190754501672 : RMSE[D](valid)= 0.03592647573961546 model is saved to model_test_tmp18.pt at model_test 2024-05-27 23:33:16,961 Trainer epoch_step [INFO]: epoch= 19 : time= 33.37561392784119 [s] : loss(train)= 0.0012278797990542703 : loss(valid)= 0.0012458103786533077 : RMSE[D](train)= 0.03499613492118348 : RMSE[D](valid)= 0.035192720360356027 model is saved to model_test_tmp19.pt at model_test 2024-05-27 23:33:50,577 Trainer epoch_step [INFO]: epoch= 20 : time= 33.57127404212952 [s] : loss(train)= 0.0012240493794836635 : loss(valid)= 0.001295727367202441 : RMSE[D](train)= 0.03495633493656875 : RMSE[D](valid)= 0.03591999197134574 model is saved to model_test_tmp20.pt at model_test 2024-05-27 23:34:23,622 Trainer epoch_step [INFO]: epoch= 21 : time= 32.997060775756836 [s] : loss(train)= 0.0012256120015600963 : loss(valid)= 0.0011909629683941603 : RMSE[D](train)= 0.03495001219865892 : RMSE[D](valid)= 0.03448448842303 model is saved to model_test_tmp21.pt at model_test 2024-05-27 23:34:56,853 Trainer epoch_step [INFO]: epoch= 22 : time= 33.18896174430847 [s] : loss(train)= 0.0012167344518404985 : loss(valid)= 0.0012835346860811114 : RMSE[D](train)= 0.034833311084896894 : RMSE[D](valid)= 0.03577993540008407 model is saved to model_test_tmp22.pt at model_test 2024-05-27 23:35:29,807 Trainer epoch_step [INFO]: epoch= 23 : time= 32.91186189651489 [s] : loss(train)= 0.0011236058489885181 : loss(valid)= 0.0011108355053390067 : RMSE[D](train)= 0.03343205283435263 : RMSE[D](valid)= 0.03328191643945063 model is saved to model_test_tmp23.pt at model_test 2024-05-27 23:36:03,022 Trainer epoch_step [INFO]: epoch= 24 : time= 33.170867919921875 [s] : loss(train)= 0.001198199895692856 : loss(valid)= 0.0011316734598949552 : RMSE[D](train)= 0.03454902092721321 : RMSE[D](valid)= 0.03361872941420203 model is saved to model_test_tmp24.pt at model_test 2024-05-27 23:36:36,357 Trainer epoch_step [INFO]: epoch= 25 : time= 33.290594816207886 [s] : loss(train)= 0.0011569774401972868 : loss(valid)= 0.00125562238584583 : RMSE[D](train)= 0.03398148767577144 : RMSE[D](valid)= 0.03526287444169499 model is saved to model_test_tmp25.pt at model_test 2024-05-27 23:37:09,678 Trainer epoch_step [INFO]: epoch= 26 : time= 33.2756552696228 [s] : loss(train)= 0.0010826434694796003 : loss(valid)= 0.0012303667608648539 : RMSE[D](train)= 0.03285225680652831 : RMSE[D](valid)= 0.0350027180085837 model is saved to model_test_tmp26.pt at model_test 2024-05-27 23:37:43,053 Trainer epoch_step [INFO]: epoch= 27 : time= 33.32958698272705 [s] : loss(train)= 0.001173686479367981 : loss(valid)= 0.0010828875432101388 : RMSE[D](train)= 0.034169700120643916 : RMSE[D](valid)= 0.03287867522535803 model is saved to model_test_tmp27.pt at model_test 2024-05-27 23:38:19,747 Trainer epoch_step [INFO]: epoch= 28 : time= 36.647231101989746 [s] : loss(train)= 0.001116036980030393 : loss(valid)= 0.0012196329965566595 : RMSE[D](train)= 0.03330824263660591 : RMSE[D](valid)= 0.03489144527681723 model is saved to model_test_tmp28.pt at model_test 2024-05-27 23:38:59,670 Trainer epoch_step [INFO]: epoch= 29 : time= 39.88249492645264 [s] : loss(train)= 0.0011180073737965099 : loss(valid)= 0.0012258108084400494 : RMSE[D](train)= 0.03336500989034002 : RMSE[D](valid)= 0.03500258438073604 model is saved to model_test_tmp29.pt at model_test 2024-05-27 23:39:49,398 Trainer epoch_step [INFO]: epoch= 30 : time= 49.675516843795776 [s] : loss(train)= 0.0010629503813106567 : loss(valid)= 0.0010745280305854976 : RMSE[D](train)= 0.03253391709756107 : RMSE[D](valid)= 0.03270917398237465 model is saved to model_test_tmp30.pt at model_test 2024-05-27 23:40:26,195 Trainer epoch_step [INFO]: epoch= 31 : time= 36.73529386520386 [s] : loss(train)= 0.0010963863585077757 : loss(valid)= 0.001109493294886003 : RMSE[D](train)= 0.03305228076125506 : RMSE[D](valid)= 0.03320700290457949 model is saved to model_test_tmp31.pt at model_test 2024-05-27 23:41:06,045 Trainer epoch_step [INFO]: epoch= 32 : time= 39.80792784690857 [s] : loss(train)= 0.001064531773278889 : loss(valid)= 0.0011376045683088403 : RMSE[D](train)= 0.032547290130018745 : RMSE[D](valid)= 0.03362633110102043 model is saved to model_test_tmp32.pt at model_test 2024-05-27 23:41:46,186 Trainer epoch_step [INFO]: epoch= 33 : time= 40.091362953186035 [s] : loss(train)= 0.0010355500500216813 : loss(valid)= 0.0009315957043630382 : RMSE[D](train)= 0.03212837784581807 : RMSE[D](valid)= 0.030487922940679202 model is saved to model_test_tmp33.pt at model_test 2024-05-27 23:42:21,759 Trainer epoch_step [INFO]: epoch= 34 : time= 35.53277611732483 [s] : loss(train)= 0.0009523128766366946 : loss(valid)= 0.0008897289323310057 : RMSE[D](train)= 0.030749915142596514 : RMSE[D](valid)= 0.029788555875735753 model is saved to model_test_tmp34.pt at model_test 2024-05-27 23:42:57,443 Trainer epoch_step [INFO]: epoch= 35 : time= 35.59277606010437 [s] : loss(train)= 0.0009194710645325748 : loss(valid)= 0.0008036431002741059 : RMSE[D](train)= 0.030231966427741203 : RMSE[D](valid)= 0.028328095215842314 model is saved to model_test_tmp35.pt at model_test 2024-05-27 23:43:34,729 Trainer epoch_step [INFO]: epoch= 36 : time= 37.2429301738739 [s] : loss(train)= 0.0008694644493516535 : loss(valid)= 0.0008589115265446404 : RMSE[D](train)= 0.02939380869049454 : RMSE[D](valid)= 0.029268487767904347 model is saved to model_test_tmp36.pt at model_test 2024-05-27 23:44:10,015 Trainer epoch_step [INFO]: epoch= 37 : time= 35.23839807510376 [s] : loss(train)= 0.0008100573946389236 : loss(valid)= 0.0007349376101046801 : RMSE[D](train)= 0.02839690322675918 : RMSE[D](valid)= 0.0270813759830895 model is saved to model_test_tmp37.pt at model_test 2024-05-27 23:44:46,935 Trainer epoch_step [INFO]: epoch= 38 : time= 36.876343965530396 [s] : loss(train)= 0.0008246830173967672 : loss(valid)= 0.0007781900543098649 : RMSE[D](train)= 0.028661926251427983 : RMSE[D](valid)= 0.027891913603732336 model is saved to model_test_tmp38.pt at model_test 2024-05-27 23:45:27,108 Trainer epoch_step [INFO]: epoch= 39 : time= 40.12124514579773 [s] : loss(train)= 0.0008413328593763124 : loss(valid)= 0.0008210245287045836 : RMSE[D](train)= 0.028899934585797222 : RMSE[D](valid)= 0.02863102027700547 model is saved to model_test_tmp39.pt at model_test 2024-05-27 23:46:17,073 Trainer epoch_step [INFO]: epoch= 40 : time= 49.92154312133789 [s] : loss(train)= 0.0008217944268835708 : loss(valid)= 0.0008133725496008992 : RMSE[D](train)= 0.028557669288744897 : RMSE[D](valid)= 0.028495248694122847 model is saved to model_test_tmp40.pt at model_test model is saved to model_test_weight.pth at model_test model is saved to model_test_all.pth at model_test model is saved to model_test.pt at model_test To train models for all the chemical bond species, We iteratively run the command with modifying the input of ``bondtype``. Test a model ---------------------- We can check the quality of the trained model using a `yaml` structure file. .. code-block:: bash CPtrain.py test -m chmodel_test/model_ch_python.pt -x IONS+CENTERS+cell_sorted_merge.xyz -m methanol.mol It takes a few minutes to complete the calculation. The code generates two figures and two text files. The figures are the correlation between the predicted and true dipole moments (and the absolute value of the dipole moment). The text files named ``pred_list.txt`` and ``true_list.txt`` contain the predicted dipole moments and the true dipole moments, and they are visualized in ``pred_true_norm.png`` and ``pred_true_density.png``. .. image:: ../image/pred_true_norm.png :width: 400 :align: center ****************************************** Calculate dipoles along MD trajectories ****************************************** After constructing four dipole moment models (``CH``, ``CO``, ``OH``, and ``O``) and validating our trained model works well, we try our model on molecular dynamics trajectories using C++ interface. Let us go to the example directory .. code-block:: bash cd examples/dieltools/methanol The input file for the C++ code is given in ``yaml`` format and is as follows. .. code-block:: yaml :caption: input.yaml general: itpfilename: methanol.acpype/input_GMX.mol bondfilename: methanol.mol savedir: dipole_10ps/ temperature: 300 timestep: 0.242 descriptor: calc: 1 directory: ./ xyzfilename: IONS+CENTERS+cell_sorted_merge.xyz savedir: dipole_10ps/ descmode: 2 desctype: allinone haswannier: 1 # if WCs are in xyz, set 1 interval: 1 desc_coh: 0 predict: calc: 1 desc_dir: dipole_10ps/ model_dir: model_rotate_methanol/ modelmode: rotate bondspecies: 4 save_truey: 0 The input composes of three part, ``general``, ``descripter``, and ``predict``. The details of the parameters are given bellow. * general * itpfilename[required]: ``mol`` file for the molecule. (methanol in our case) * savedir[required]: The directory to which all the outputs will be saved. * temperature[optional]: We can optionally set the temperature to calculate dielectric properties. The default is 300 [Kelvin] * timestep[optional]: We can optionally set the MD timestep to calculate dynamical dielectric properties. * descripter * calc[required]: 1 for doing calculation, 0 for skip calculation. * directory[required]: The directory in which the input xyz is stored. * xyzfilename[required]: The input xyz filename. * desctype[required]: The type of descriptor. Currently we have ``allinone`` and ``old``. * predict: * calc: 1 * desc_dir: dipole_10ps/ * model_dir: /home/k0151/k015124/c++/20231025_model_rotate_methanol/ * modelmode: rotate * bondspecies: 4 * save_truey: 0 You can perform C++ calculations with enabling OpenMP. For example, you can set the number of threads to 12 by running .. code-block:: bash export OMP_NUM_THREADS=12 dieltools config.yaml After the calculation, the following result files are saved in the directory specified by ``savedir``. * ``total_dipole.txt``: system total dipole. * ``mol_wan.xyz``: atomic and predicted WCs configurations in ``xyz`` format. * ``DIELCONST``: dielectric constant and average molecular dipole. We can visualize the system dipole moment along the MD trajectory using ``total_dipole.txt`` to see if our calculation success. .. code-block:: python CPextract.py diel total -F dipole_10ps/total_dipole.txt .. image:: ../image/total_dipole.txt_time_dipole.png :width: 400 :align: center Finally, we perform Fourier transformation of the total dipole moments to calculate the dielectric function via ``CPextract.py`` command. You must specify the high-frequency dielectric constant with ``-E`` option .. code-block:: bash CPextract.py diel spectra -F total_dipole.txt -E 1.76624 -s 0 -w 1 The above command generate three files: * ``total_dipole.txt_diel.csv``: real and imaginary parts of the dielectric function. * ``total_dipole.txt_refractive.csv``: real and imaginary parts of the complex refractive index. * ``total_dipole.txt_alphan.csv``: absorption spectra ``alpha(\omega)*n(\omega)``. Here we visualize the imaginary part of the dielectric function using the following python script. .. code-block:: python import matplotlib as mpl import matplotlib.pyplot as plt import numpy as np import pandas as pd # load data df = pd.read_csv("dipole_10ps/total_dipole.txt_diel.csv") # figure instantce fig, ax = plt.subplots(figsize=(8,5),tight_layout=True) ax.plot(df["freq_kayser"], df["imag_diel"],label="imag_diel",lw=3) ax.set_xlim(0,3500) # xticklabels = ax.get_xticklabels() yticklabels = ax.get_yticklabels() xlabel="Frequency [cm-1]" ylabel="Epsilon" # ax.set_xlabel(xlabel,fontsize=22) ax.set_ylabel(ylabel,fontsize=22) ax.grid() ax.tick_params(axis='x', labelsize=20 ) ax.tick_params(axis='y', labelsize=20 ) lgnd=ax.legend(loc="upper left",fontsize=20) # lgnd.legendHandles[0]._sizes = [30] # lgnd.legendHandles[0]._alpha = [1.0] fig.savefig("imag_diel.png") .. image:: ../image/imag_diel.png :width: 400 :align: center As the MD trajectory is too short, we can not get meaningful spectra. We will acquire better one in the following tutorials.