Exploring dataset with chemiscope¶

The chemiscope.explore() function provides a streamlined way to visualize datasets as the low dimensional maps. This approach provides a quick and interactive overview of dataset composition and structure without the need to manually implement and configure the representation processes. This is particularly useful when the specific choice of hyperparameters does not significantly impact the resulting low-dimensionality map.

By passing a list of ase.Atoms objects (or similar structures from other libraries) to chemiscope.explore(), it is possible to generate a chemiscope widget, providing an immediate and intuitive visualization of the dataset.

By default, the method uses the PETMADFeaturizer, which computes representations as the PET-MAD features and maps them a low-dimensional MAD dataset latent space.

For more advanced use cases, chemiscope.explore() allows to provide a custom function for representation and dimensionality reduction.

To use this function, some additional dependencies are required. You can install them with the following command:

pip install chemiscope[explore]

In this example, we will explore basic and advanced use cases, from simple dataset visualization to custom featurization.

First, let’s import the necessary packages that will be used throughout the examples.

import ase.io

import chemiscope

Basic example¶

This example shows the basic usage of chemiscope.explore(). First, load a dataset of structures as ase.Atoms objects. Here, we use the samples from the M3CD dataset:

frames = ase.io.read("data/explore_m3cd.xyz", ":")

Next, pass the frames to chemiscope.explore() to generate an interactive Chemiscope. In this basic case, we provide the featurizer version to be used:

chemiscope.explore(frames, featurizer="pet-mad-1.0")

  0%|          | 0/5 [00:00<?, ?it/s]
 40%|████      | 2/5 [00:00<00:00, 13.29it/s]
100%|██████████| 5/5 [00:00<00:00, 15.99it/s]
100%|██████████| 5/5 [00:00<00:00, 15.60it/s]
/home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/chemiscope/structures/_ase.py:121: UserWarning: the following atomic properties are only defined for a subset of frames: ['tags']; they will be ignored
  all_properties = _ase_get_atom_properties(frames)

We can also save the visualization to send it to the colloborators or reopen separatelly with chemiscope.read_input():

chemiscope.explore(frames, featurizer="pet-mad-1.0", write_input="m3cd.chemiscope.json")

  0%|          | 0/5 [00:00<?, ?it/s]
 60%|██████    | 3/5 [00:00<00:00, 28.50it/s]
100%|██████████| 5/5 [00:00<00:00, 24.74it/s]
/home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/chemiscope/structures/_ase.py:121: UserWarning: the following atomic properties are only defined for a subset of frames: ['tags']; they will be ignored
  all_properties = _ase_get_atom_properties(frames)

Besides this, it is possible to specify atom-centered environments and properties. Environments can be manually defined as a list of tuples in the format (structure_index, atom_index, cutoff) or extracted automatically using chemiscope.all_atomic_environments(). We can also configure visualisation settings, such as axis and color properties.

properties = chemiscope.extract_properties(frames, only=["energy"])
environments = [(0, 0, 3.5), (1, 0, 3.5), (2, 1, 3.5)]
settings = chemiscope.quick_settings(x="features[1]", y="features[2]", color="energy")
chemiscope.explore(
    frames,
    featurizer="pet-mad-1.0",
    environments=environments,
    properties=properties,
    settings=settings,
)

/home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/chemiscope/input.py:643: UserWarning: 'color' property is deprecated and replaced with 'map_color'
  warnings.warn(

  0%|          | 0/5 [00:00<?, ?it/s]
 60%|██████    | 3/5 [00:00<00:00, 21.77it/s]
100%|██████████| 5/5 [00:00<00:00, 21.12it/s]
/home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/chemiscope/structures/_ase.py:121: UserWarning: the following atomic properties are only defined for a subset of frames: ['tags']; they will be ignored
  all_properties = _ase_get_atom_properties(frames)

Example with custom featurizer¶

For advanced use cases, you can define a custom featurization function. For example, we can describe structures based on their chemical compositions. The function must take two arguments: frames (the input structures) and environments (optional argument for the atom-centered environments). Below, we create a function to calculate fractional composition vectors and apply PCA for dimensionality reduction:

import numpy as np  # noqa
from sklearn.decomposition import PCA  # noqa


def fractional_composition_featurize(frames, environments):
    if environments is not None:
        raise ValueError("'environments' are not supported by this featurizer")

    dimentionality = 100

    features = []

    for frame in frames:
        unique, counts = np.unique(frame.numbers, return_counts=True)
        fractions = counts / len(frame.numbers)

        feature_vector = np.zeros(dimentionality)
        for element_number, franction in zip(unique, fractions):
            feature_vector[element_number - 1] = franction

        features.append(feature_vector)

    pca = PCA(n_components=3)
    return pca.fit_transform(features)

Pass the custom featurizer to chemiscope.explore():

settings = chemiscope.quick_settings(x="features[1]", y="features[2]")
chemiscope.explore(
    frames,
    featurizer=fractional_composition_featurize,
    settings=settings,
)

/home/runner/work/chemiscope/chemiscope/.tox/docs/lib/python3.11/site-packages/chemiscope/structures/_ase.py:121: UserWarning: the following atomic properties are only defined for a subset of frames: ['tags']; they will be ignored
  all_properties = _ase_get_atom_properties(frames)

For more advanced examples, see the next tutorial.

Total running time of the script: (0 minutes 4.449 seconds)

Gallery generated by Sphinx-Gallery