.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/7-explore-advanced.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_7-explore-advanced.py: Advanced dataset exploration ============================ The :ref:`previous example ` introduced :py:func:`chemiscope.explore` and how to use it for automatic exploration of dataset. In this example, we'll show some more representations and give additional featurizers you can use in your own code. .. GENERATED FROM PYTHON SOURCE LINES 11-29 .. code-block:: Python import os import ase.io import numpy as np import requests import chemiscope def fetch_dataset(filename, base_url="https://zenodo.org/records/12748925/files/"): """Helper function to load the pre-computed examples""" local_path = "data/" + filename if not os.path.isfile(local_path): response = requests.get(base_url + filename) with open(local_path, "wb") as file: file.write(response.content) .. GENERATED FROM PYTHON SOURCE LINES 30-44 Example with MACE-OFF and t-SNE +++++++++++++++++++++++++++++++++++++ In this part, we are going to define another ``featurize`` function that runs calculation of desciptors with `MACE-OFF `_ and uses `t-SNE `_ for the dimensionality reduction. The dependencies for this example can be installed with the following command: .. code:: bash pip install mace-torch scikit-learn .. GENERATED FROM PYTHON SOURCE LINES 47-48 Let's import the necessary libraries. .. GENERATED FROM PYTHON SOURCE LINES 49-54 .. code-block:: Python from dscribe.descriptors import SOAP # noqa from mace.calculators import mace_off # noqa from sklearn.manifold import TSNE # noqa .. rst-class:: sphx-glr-script-out .. code-block:: none /home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/e3nn/o3/_wigner.py:10: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. _Jd, _W3j_flat, _W3j_indices = torch.load(os.path.join(os.path.dirname(__file__), 'constants.pt')) .. GENERATED FROM PYTHON SOURCE LINES 55-56 Load the dataset, in our example we are reading the organic molecules. .. GENERATED FROM PYTHON SOURCE LINES 57-60 .. code-block:: Python qm9_frames = ase.io.read("data/explore_qm9.xyz", ":") .. GENERATED FROM PYTHON SOURCE LINES 61-63 Now, we are defining a ``featurize`` function. As on the previous example, it should return the reduced data. .. GENERATED FROM PYTHON SOURCE LINES 64-94 .. code-block:: Python def mace_off_tsne(frames, environments): if environments is not None: raise ValueError("'environments' are not supported by this featurizer") # At first, we initialize a mace_off calculator: descriptor_opt = {"model": "small", "device": "cpu", "default_dtype": "float64"} calculator = mace_off(**descriptor_opt) # Calculate MACE features for each frame descriptors = [] for frame in frames: structure_avg = np.mean( # Only use invariant descriptors (no rotational components) (calculator.get_descriptors(frame, invariants_only=True)), axis=0, # Average the descriptors over all atoms in the frame ) descriptors.append(structure_avg) descriptors = np.array(descriptors) # Get number of jobs for parallelisation n_jobs = min(len(frames), os.cpu_count()) # Apply t-SNE perplexity = min(30, descriptors.shape[0] - 1) reducer = TSNE(n_components=2, perplexity=perplexity, n_jobs=n_jobs) return reducer.fit_transform(descriptors) .. GENERATED FROM PYTHON SOURCE LINES 95-96 We can also extract the additional properties, for example, dipole moment. .. GENERATED FROM PYTHON SOURCE LINES 97-100 .. code-block:: Python properties = chemiscope.extract_properties(qm9_frames, only=["mu"]) .. GENERATED FROM PYTHON SOURCE LINES 101-102 Provide the created featurizer and the properties to :py:func:`chemiscope.explore`. .. GENERATED FROM PYTHON SOURCE LINES 103-106 .. code-block:: Python cs = chemiscope.explore(qm9_frames, featurize=mace_off_tsne, properties=properties) .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading MACE model from 'https://github.com/ACEsuit/mace-off/blob/main/mace_off23/MACE-OFF23_small.model?raw=true' The model is distributed under the Academic Software License (ASL) license, see https://github.com/gabor1/ASL To use the model you accept the terms of the license. ASL is based on the Gnu Public License, but does not permit commercial use Cached MACE model to /home/runner/.cache/mace/MACE-OFF23_small.model Using MACE-OFF23 MODEL for MACECalculator with /home/runner/.cache/mace/MACE-OFF23_small.model Using float64 for MACECalculator, which is slower but more accurate. Recommended for geometry optimization. /home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/mace/calculators/mace.py:130: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. torch.load(f=model_path, map_location=device) .. GENERATED FROM PYTHON SOURCE LINES 107-114 Here we display the visualisation of the pre-computed data using the described function for 6k structures taken from the `QM9 `_ dataset. The map is zoomed in to highlight a cluster of zwitterions grouped together by application of the previously defined function with `MACE-OFF `_ and `t-SNE `_. .. GENERATED FROM PYTHON SOURCE LINES 115-119 .. code-block:: Python fetch_dataset("mace-off-tsne-qm9.json.gz") chemiscope.show_input("data/mace-off-tsne-qm9.json.gz") .. chemiscope:: _datasets/fig_7-explore-advanced_010.json.gz :mode: default .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 120-128 Example with MACE-MP0 and t-SNE ++++++++++++++++++++++++++++++++++++++ We will define another ``featurize`` function that uses MACE-MP0 to calculate the descriptors and t-SNE for the dimensionality reduction. Firstly, import mace library. .. GENERATED FROM PYTHON SOURCE LINES 129-132 .. code-block:: Python from mace.calculators import mace_mp # noqa .. GENERATED FROM PYTHON SOURCE LINES 133-135 Load the frames. In this example we are loading the M3CD dataset with the reduced number of stuctures. .. GENERATED FROM PYTHON SOURCE LINES 136-140 .. code-block:: Python m3cd_frames = ase.io.read("data/explore_m3cd.xyz", ":") .. GENERATED FROM PYTHON SOURCE LINES 141-145 We are defining a function used in :py:func:`chemiscope.explore` as a featurizer that computes the descriptors using MACE-MP0 and then applies t-SNE. Basically, we repeat the steps done in the previous example but using different mace calculator. .. GENERATED FROM PYTHON SOURCE LINES 146-174 .. code-block:: Python def mace_mp0_tsne(frames, environments): if environments is not None: raise ValueError("'environments' are not supported by this featurizer") # Initialise a mace-mp0 calculator descriptor_opt = {"model": "small", "device": "cpu", "default_dtype": "float64"} calculator = mace_mp(**descriptor_opt) # Calculate the features descriptors = [] for frame in frames: structure_avg = np.mean( (calculator.get_descriptors(frame, invariants_only=True)), axis=0, ) descriptors.append(structure_avg) descriptors = np.array(descriptors) n_jobs = min(len(frames), os.cpu_count()) # Apply t-SNE perplexity = min(30, descriptors.shape[0] - 1) reducer = TSNE(n_components=2, perplexity=perplexity, n_jobs=n_jobs) return reducer.fit_transform(descriptors) .. GENERATED FROM PYTHON SOURCE LINES 175-176 Provide the created function to :py:func:`chemiscope.explore`. .. GENERATED FROM PYTHON SOURCE LINES 177-180 .. code-block:: Python cs = chemiscope.explore(m3cd_frames, featurize=mace_mp0_tsne) .. rst-class:: sphx-glr-script-out .. code-block:: none Downloading MACE model from 'https://github.com/ACEsuit/mace-mp/releases/download/mace_mp_0/2023-12-10-mace-128-L0_energy_epoch-249.model' Cached MACE model to /home/runner/.cache/mace/20231210mace128L0_energy_epoch249model Using Materials Project MACE for MACECalculator with /home/runner/.cache/mace/20231210mace128L0_energy_epoch249model Using float64 for MACECalculator, which is slower but more accurate. Recommended for geometry optimization. /home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/mace/calculators/mace.py:130: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature. torch.load(f=model_path, map_location=device) /home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/chemiscope/structures/__init__.py:76: UserWarning: the following atomic properties are only defined for a subset of frames: ['tags']; they will be ignored return _ase_list_atom_properties(frames) .. GENERATED FROM PYTHON SOURCE LINES 181-183 To show case the result, we are loading pre-computed data using the ``mace_mp0_tsne`` function for 1k structures. .. GENERATED FROM PYTHON SOURCE LINES 184-190 .. code-block:: Python fetch_dataset("mace-mp-tsne-m3cd.json.gz") chemiscope.show_input("data/mace-mp-tsne-m3cd.json.gz") .. chemiscope:: _datasets/fig_7-explore-advanced_011.json.gz :mode: default .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 191-199 Example with SOAP, t-SNE and environments +++++++++++++++++++++++++++++++++++++++++++++++++ This example demonstrates how to compute descriptors using the SOAP and t-SNE with ``environments`` parameter specifying which atoms in the frames are used for calculating the descriptors. We are defining a custom featurizer that takes frames and environments. .. GENERATED FROM PYTHON SOURCE LINES 200-250 .. code-block:: Python def soap_tnse_with_environments(frames, environments): if environments is None: raise ValueError("'environments' must be provided") grouped_envs = {} unique_structures = set() # Get atom-centered indices from environments for [env_index, atom_index, _cutoff] in environments: if env_index not in grouped_envs: grouped_envs[env_index] = [] grouped_envs[env_index].append(atom_index) unique_structures.add(env_index) centers = list(grouped_envs.values()) # only include frames that are present in the environments if len(unique_structures) != len(frames): frames = [frames[index] for index in sorted(unique_structures)] # Get global species species = set() for frame in frames: species.update(frame.get_chemical_symbols()) species = list(species) # Initialize calculator soap = SOAP( species=species, r_cut=4.5, n_max=8, l_max=6, sigma=0.2, rbf="gto", average="outer", periodic=True, weighting={"function": "pow", "c": 1, "m": 5, "d": 1, "r0": 3.5}, compression={"mode": "mu1nu1"}, ) # Calculate descriptors feats = soap.create(frames, centers=centers) # Compute tsne perplexity = min(30, feats.shape[0] - 1) reducer = TSNE(n_components=2, perplexity=perplexity) return reducer.fit_transform(feats) .. GENERATED FROM PYTHON SOURCE LINES 251-254 Provide a created function and environments to :py:func:`chemiscope.explore`. The environments are manually defined following the format ``[index of structure, index of atom, cutoff]``. .. GENERATED FROM PYTHON SOURCE LINES 255-261 .. code-block:: Python chemiscope.explore( frames=m3cd_frames, featurize=soap_tnse_with_environments, environments=[(1, 2, 3.5), (2, 0, 3.5)], ) .. chemiscope:: _datasets/fig_7-explore-advanced_012.json.gz :mode: default .. rst-class:: sphx-glr-script-out .. code-block:: none /home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/chemiscope/structures/__init__.py:76: UserWarning: the following atomic properties are only defined for a subset of frames: ['tags']; they will be ignored return _ase_list_atom_properties(frames) .. raw:: html


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 14.425 seconds) .. _sphx_glr_download_examples_7-explore-advanced.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 7-explore-advanced.ipynb <7-explore-advanced.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 7-explore-advanced.py <7-explore-advanced.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 7-explore-advanced.zip <7-explore-advanced.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_