.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples/6-explore.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_6-explore.py: .. _explore-example: Exploring dataset with chemiscope ================================= The :py:func:`chemiscope.explore` function provides a streamlined way to visualize datasets by automatically computing representation and using dimensionality reduction. This function simplifies the process of dataset exploration by offering a quick overview through computed properties and dimensionality reduction, allowing to rapidly gain insights into the composition and structure of data without need to manually implement and fine-tune the representation process. This is particularly useful when the specific choice of hyperparameters does not significantly impact the resulting 2D map. By passing a list of `ase.Atoms `_ objects (or similar structures from other libraries) to :py:func:`chemiscope.explore`, it is possible to generate a chemiscope widget, providing an immediate and intuitive visualization of the dataset. Additionally, :py:func:`chemiscope.explore` allows to provide a custom function for representation and dimensionality reduction, offering flexibility for more advanced usage. To use this function, some additional dependencies are required. You can install them with the following command: .. code:: bash pip install chemiscope[explore] In this example, we will explore several use cases, starting from basic applications to more customized scenarios. First, let's import the necessary packages that will be used throughout the examples. .. GENERATED FROM PYTHON SOURCE LINES 38-55 .. code-block:: Python import os import ase.io import requests import chemiscope def fetch_dataset(filename, base_url="https://zenodo.org/records/12748925/files/"): """Helper function to load the pre-computed examples""" local_path = "data/" + filename if not os.path.isfile(local_path): response = requests.get(base_url + filename) with open(local_path, "wb") as file: file.write(response.content) .. GENERATED FROM PYTHON SOURCE LINES 56-63 Basic example +++++++++++++ This example shows the basic usage of the :py:func:`chemiscope.explore`. At first, read or load the structures from the dataset. Here we use an `ASE package `_ to read the structures from the file and have the frames as the `ase.Atoms `_ objects. .. GENERATED FROM PYTHON SOURCE LINES 64-68 .. code-block:: Python frames = ase.io.read("data/explore_c-gap-20u.xyz", ":") .. GENERATED FROM PYTHON SOURCE LINES 69-71 Provide the frames to the :py:func:`chemiscope.explore`. It will generate a Chemiscope interactive widget with the reduced dimensionality of data. .. GENERATED FROM PYTHON SOURCE LINES 72-74 .. code-block:: Python chemiscope.explore(frames) .. chemiscope:: _datasets/fig_6-explore_007.json.gz :mode: default .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 75-84 In this basic case, no featurizer function is provided, so :py:func:`chemiscope.explore` uses a default method that applies `SOAP (Smooth Overlap of Atomic Positions) `_ to compute atomic structure descriptors and then performs `PCA (Principal Component Analysis) `_ for dimensionality reduction. The resulting components are then added to the properties to be used in visualization. .. GENERATED FROM PYTHON SOURCE LINES 88-96 Besides this, it is possible to run the dimentionality reduction algorithm and display specific atom-centered environments. They can be manually defined by specifying a list of tuples in the format ``(structure_index, atom_index, cutoff)``, as shown in this example. Alternatively, the environments can be extracted from the frames using the function :py:func:`all_atomic_environments`. We also demonstrate a way to provide properties for visualization. The frames and properties related to the indexes in the ``environments`` will be extracted. .. GENERATED FROM PYTHON SOURCE LINES 97-102 .. code-block:: Python properties = chemiscope.extract_properties(frames, only=["energy"]) environments = [(0, 0, 3.5), (1, 0, 3.5), (2, 1, 3.5)] chemiscope.explore(frames, environments=environments, properties=properties) .. chemiscope:: _datasets/fig_6-explore_008.json.gz :mode: default .. raw:: html


.. GENERATED FROM PYTHON SOURCE LINES 103-114 Example with custom featurizer and custom properties ++++++++++++++++++++++++++++++++++++++++++++++++++++ This part illustrates how to create a custom function for dimensionality reduction as an argument (``featurize``) to :py:func:`chemiscope.explore`. Inside this function, we perform descriptor calculation using `SOAP `_ and then reduce the dimensionality with `Kernel PCA `_. First, let's import the necessary packages. .. GENERATED FROM PYTHON SOURCE LINES 115-118 .. code-block:: Python from dscribe.descriptors import SOAP # noqa from sklearn.decomposition import KernelPCA # noqa .. GENERATED FROM PYTHON SOURCE LINES 119-124 Define the function ``soap_kpca_featurize`` which takes two arguments (``frames``, which contains the structures provided to :py:func:`chemiscope.explore` and internally passed to the ``featurize`` function; ``environments``, optional aurgument with the atom-centered environments, if they were provided to the :py:func:`chemiscope.explore`. .. GENERATED FROM PYTHON SOURCE LINES 125-155 .. code-block:: Python def soap_kpca_featurize(frames, environments): if environments is not None: raise ValueError("'environments' are not supported by this featurizer") # Initialise soap calculator. The detailed explanation of the provided # hyperparameters can be checked in the documentation of the library (``dscribe``). soap = SOAP( # the dataset used in the example contains only carbon species=["C"], r_cut=4.5, n_max=8, l_max=6, sigma=0.2, rbf="gto", average="outer", periodic=True, weighting={"function": "pow", "c": 1, "m": 5, "d": 1, "r0": 3.5}, ) # Compute features descriptors = soap.create(frames) # Apply KPCA transformer = KernelPCA(n_components=2, gamma=0.05) # Return a 2D array of reduced features return transformer.fit_transform(descriptors) .. GENERATED FROM PYTHON SOURCE LINES 156-157 Provide the created function to :py:func:`chemiscope.explore`. .. GENERATED FROM PYTHON SOURCE LINES 158-161 .. code-block:: Python cs = chemiscope.explore(frames, featurize=soap_kpca_featurize) .. GENERATED FROM PYTHON SOURCE LINES 162-164 We can also provide the additional properties inside, for example, let's extract energy from the frames using :py:func:`chemiscope.extract_properties`. .. GENERATED FROM PYTHON SOURCE LINES 165-169 .. code-block:: Python properties = chemiscope.extract_properties(frames, only=["energy"]) cs = chemiscope.explore(frames, featurize=soap_kpca_featurize, properties=properties) .. GENERATED FROM PYTHON SOURCE LINES 170-178 Note: It is possible to add parallelization when computing the SOAP descriptors and performing dimensionality reduction with KernelPCA by providing the ``n_jobs`` parameter. This allows the computation to utilize multiple CPU cores for faster processing. An example of how to include ``n_jobs`` is shown below on this page. To showcase the results of the ``soap_kpca`` function, we have pre-computed it for the 6k structures from the `C-GAP-20U `_ dataset: .. GENERATED FROM PYTHON SOURCE LINES 179-182 .. code-block:: Python fetch_dataset("soap_kpca_c-gap-20u.json.gz") chemiscope.show_input("data/soap_kpca_c-gap-20u.json.gz") .. chemiscope:: _datasets/fig_6-explore_009.json.gz :mode: default .. raw:: html


.. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 7.807 seconds) .. _sphx_glr_download_examples_6-explore.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: 6-explore.ipynb <6-explore.ipynb>` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: 6-explore.py <6-explore.py>` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: 6-explore.zip <6-explore.zip>` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_