chemiscope functions reference

chemiscope.write_input(path, frames, meta=None, properties=None, environments=None, shapes=None, settings=None, parameters=None)

Create the input JSON file used by the default chemiscope visualizer, and save it to the given path.

Parameters:
  • path (str) – name of the file to use to save the json data. If it ends with ‘.gz’, a gzip compressed file will be written

  • frames (list) – list of atomic structures. For now, only ase.Atoms objects are supported

  • meta (dict) – optional metadata of the dataset

  • properties (dict) – optional dictionary of additional properties

  • environments (list) – optional list of (structure id, atom id, cutoff) specifying which atoms have properties attached and how far out atom-centered environments should be drawn by default.

  • shapes (dict) – optional dictionary of shapes to have available for display. See create_input() for more information on how to define shapes.

  • settings (dict) – optional dictionary of settings to use when displaying the data. Possible entries for the settings dictionary are documented in the chemiscope input file reference.

  • parameters (dict) – optional dictionary of parameters of multidimensional properties

This function uses create_input() to generate the input data, see the documentation of this function for more information.

Here is a quick example of generating a chemiscope input reading the structures from a file that ase can read, and performing PCA using sklearn on a descriptor computed with another package.

import ase
from ase import io
import numpy as np

import sklearn
from sklearn import decomposition

import chemiscope

frames = ase.io.read("trajectory.xyz", ":")

# example property 1: list containing the energy of each structure,
# from calculations performed beforehand
energies = np.loadtxt("energies.txt")

# example property 2: PCA projection computed using sklearn.
# X contains a multi-dimensional descriptor of the structure
X = np.array(...)
pca = sklearn.decomposition.PCA(n_components=3).fit_transform(X)

# if the ASE frames also contain additional data, they can be easily
# extracted as a dictionary using a simple utility function
frame_properties = chemiscope.extract_properties(
    frames,
    only=["temperature", "classification"],
)

# alternatively, properties can also be defined manually
properties = {
    "PCA": {
        "target": "atom",
        "values": pca,
        "description": "PCA of per-atom representation of the structures",
    },
    "energies": {
        "target": "structure",
        "values": energies,
        "units": "kcal/mol",
    },
}

# additional multidimensional properties to be plotted
dos = np.loadtxt(...)  # load the 2D data
dos_energy_grid = np.loadtxt(...)
multidimensional_properties = {
    "DOS": {
        "target": "structure",
        "values": dos,
        "parameters": ["energy"],
    }
}

multidimensional_parameters = {
    "energy": {
        "values": dos_energy_grid,
        "units": "eV",
    }
}

# merge all properties together
properties.extend(frame_properties)
properties.extend(multidimensional_properties)

chemiscope.write_input(
    path="chemiscope.json.gz",
    frames=frames,
    properties=properties,
    # This is required to display properties with `target: "atom"`
    environments=chemiscope.all_atomic_environments(frames),
    # this is necessary to plot the multidimensional data
    parameters=multidimensional_parameters,
)
chemiscope.create_input(frames=None, meta=None, properties=None, environments=None, settings=None, shapes=None, parameters=None)

Create a dictionary that can be saved to JSON using the format used by the default chemiscope visualizer.

Parameters:
  • frames (list) – list of atomic structures. For now, only ase.Atoms objects are supported

  • meta (dict) – optional metadata of the dataset, see below

  • properties (dict) – optional dictionary of properties, see below

  • environments (list) – optional list of (structure id, atom id, cutoff) specifying which atoms have properties attached and how far out atom-centered environments should be drawn by default. Functions like all_atomic_environments() can be used to generate the list of environments in simple cases.

  • shapes (dict) – optional dictionary of shapes to have available for display, see below.

  • settings (dict) – optional dictionary of settings to use when displaying the data. Possible entries for the settings dictionary are documented in the chemiscope input file reference.

  • parameters (dict) – optional dictionary of parameters for multidimensional properties, see below

Dataset metadata

The dataset metadata should be given in the meta dictionary, the possible keys are:

meta = {
    # str, dataset name
    "name": "...",
    # str, dataset description
    "description": "...",
    # list of str, dataset authors, OPTIONAL
    "authors": [
        "...",
    ],
    # list of str, references for this dataset, OPTIONAL
    "references": [
        "...",
    ],
}

Dataset properties

Properties can be added with the properties parameter. This parameter should be a dictionary containing one entry for each property. Properties can be extracted from structures with extract_properties(), or manually defined by the user.

Each entry in the properties dictionary contains a target attribute ('atom' or 'structure') and a set of values. values can be a Python list of float or string; a 1D numpy array of numeric values; or a 2D numpy array of numeric values. In the later case, multiple properties will be generated along the second axis. For example, passing

properties = {
    "cheese": {
        "target": "atom",
        "values": np.zeros((300, 4)),
        # optional: property unit
        "units": "random / fs",
        # optional: property description
        "description": "a random property for example",
    }
}

will generate four properties named cheese[1], cheese[2], cheese[3], and cheese[4], each containing 300 values.

It is also possible to pass shortened representation of the properties, for instance:

properties = {
    "cheese": np.zeros((300, 4)),
}

In this case, the type of property (structure or atom) would be deduced by comparing the numbers atoms and structures in the dataset to the length of provided list/np.ndarray.

Multi-dimensional properties

One can give 2D properties to be displayed as curves in the info panel by setting a parameters in the property, and giving the corresponding parameters values to this function. The previous example becomes:

properties = {
    "cheese": {
        "target": "atom",
        "values": np.zeros((300, 4)),
        # optional: property unit
        "units": "random / fs",
        # optional: property description
        "description": "a random property for examples",
        "parameters": ["origin"],
    }
}

This input describes a 2D property cheese with 300 samples and 4 values taken by the origin parameter. We also need to provide the parameters values to this function:

parameters = {
    "origin": {
        # an array of numbers containing the values of the parameter
        # the size should correspond to the second dimension
        # of the corresponding multidimensional property
        "values": [0, 1, 2, 3],
        # optional free-form description of the parameter as a string
        "name": "a short description of this parameter",
        # optional units of the values in the values array
        "units": "eV",
    }
}

Custom shapes

The shapes option should have the format {"<name>": shape_definition }, where each shape is defined as a dictionary containing the kind of shape, and its parameters

shapes = {
    "shape name": {
        "kind": "sphere",
        "parameters": shape_parameters,
    }
}

Each parameters block defines global, structure and atom - level parameters.

parameters = {
    "global": global_parameters,
    "structure": [structure_1, structure_2, ...],
    "atom": [atom_1, atom_2, ...],
}

Each of these can contain some or all of the parameters associated with each shape, and the parameters for each shape are obtained by combining the parameters from the most general to the most specific, i.e., if there is a duplicate key in the global and atom fields, the value within the atom field will supersede the global field for that atom. The parameters for atom k that is part of structure j are obtained as

global_parameters.update(structure_j).update(atom_k)

If given, the structure parameters list should contain one entry per structure, and the atom parameters list should be a flat list corresponding to the atoms of each consecutive structure. All shapes accept a few general parameters, and some specific ones

# general parameters
{
    # centering (defaults to origin for structure, atom position for atom)
    "position": [float, float, float],
    # scaling of the size of the shape
    "scale": float,
    # optional, given as quaternion in (x, y, z, w) format
    "orientation": [float, float, float, float],
    "color": string | hex_code,  # e.g. 0xFF0000
}

# "kind" : "sphere"
{
    "radius": float,
}

# "kind" : "ellipsoid"
{
    "semiaxes": [float, float, float],
}

# "kind" : "cylinder"
{
    # "orientation" is redundant and hence ignored
    "vector": [float, float, float],  # orientation and shape of the cylinder
    # the tip of the cylinder is at the end of the segment.
    "radius": float,
}

# "kind" : "arrow"
{
    # "orientation" is redundant and hence ignored
    "vector": [float, float, float],  # orientation and shape of the arrow
    "baseRadius": float,
    "headRadius": float,
    # the tip of the arrow is at the end of the segment.
    # It will extend past the base point if the arrow is not long enough
    "headLength": float,
}

# "kind" : "custom"
{
    "vertices": [  # list of vertices
        [float, float, float],
        ...,
    ],
    # mesh triangulation (optional); computed via convex triangulation
    # where omitted
    "simplices": [
        [int, int, int],  # indices refer to the list of vertices
        ...,
    ],
}
chemiscope.quick_settings(x='', y='', z='', color='', size='', symbol='', trajectory=False, map_settings=None, structure_settings=None)

A utility function to return a settings dictionary with the most basic options for a chemiscope viewer (e.g. what to show on the axes).

Parameters:
  • x (str) – The property to show on the x axis of the map.

  • y (str) – The property to show on the y axis of the map.

  • z (str) – The property to show on the z axis of the map.

  • color (str) – The property to use to color the map.

  • size (str) – The property to use to determine data point size.

  • symbol (str) – The (categorical) property to use to determine point markers.

  • trajectory (bool) – A boolean flag that sets some default options suitable to view trajectory data: fixing the viewpoint for the structure, reducing the delay when cycling between structures and adding a line joining the points in the map.

  • map_settings (dict) – Additional settings for the map (following the chemiscope settings schema).

  • structure_settings (dict) – Additional settings for the structure viewer (following the chemiscope settings schema).

chemiscope.extract_properties(frames, only=None, environments=None)

Extract properties defined in the frames in a chemiscope-compatible format.

Parameters:
  • frames – iterable over structures (typically a list of frames)

  • only – optional, list of strings. If not None, only properties with a name from this list are included in the output.

  • environments – optional, list of environments (described as (structure id, center id, cutoff)) to include when extracting the atomic properties.

chemiscope.all_atomic_environments(frames, cutoff=3.5)

Generate a list of environments containing all the atoms in the given frames. The optional spherical cutoff radius is used to display the environments in chemiscope.

Parameters:
  • frames – iterable over structures (typically a list of frames)

  • cutoff (float) – spherical cutoff radius used when displaying the environments

chemiscope.ellipsoid_from_tensor(tensor, scale=1.0, force_positive=False)

Returns an ellipsoid (semiaxes + quaternion) representation of a positive definite tensor (e.g. a polarizability), in the form required by the chemiscope input.

Parameters:
  • tensor – a positive-definite tensor (3x3 or a 6-array [xx,yy,zz,xy,xz,yz])

  • scale – conversion from the units of the tensor to the units of the atomic positions (usually Å)

  • force_positive – takes the absolute value of eigenvalues, to handle non-positive tensors

chemiscope.arrow_from_vector(vec, scale=1.0, radius=0.1, head_radius_scale=1.75, head_length_scale=2.0)

Draws an arrow from the origin to the specified 3D position. Returns a custom shape in the form required by the chemiscope input. Use None for the arrow shape parameters to leave them undefined (so that they can be specified in the global parameters).

Parameters:
  • scale – conversion from the units of the vector to the units of the atomic positions (usually Å)

  • radius – radius of the stem of the arrow (same units as the atomic positions, typically Å)

  • head_radius_scale – radius of the arrow tip, relative to the stem radius

  • head_length_scale – length of the arrow tip, relative to the stem radius

chemiscope.ase_vectors_to_arrows(frames, key='forces', target=None, **kwargs)

Extract a vectorial atom property from a list of ase.Atoms objects, and returns a list of arrow shapes. Besides the specific parameters it also accepts the same parameters as arrow_from_vector(), which are used to define the style of the arrows.

Parameters:
  • frames – list of ASE Atoms objects

  • key – name of the ASE atom property. Should contain three components corresponding to x,y,z

  • target – whether the properties should be associated with the entire structure, or each atom (structure or atom). defaults to autodetection

chemiscope.ase_tensors_to_ellipsoids(frames, key, target=None, **kwargs)

Extract a 2-tensor atom property from a list of ase.Atoms objects, and returns a list of ellipsoids shapes. Besides the specific parameters it also accepts the same parameters as ellipsoid_from_tensor, which are used to draw the shapes

Parameters:
  • frames – list of ASE Atoms objects

  • key – name of the ASE atom property. Should contain nine components corresponding to xx,xy,xz,yx,yy,yz,zx,zy,zz or six components corresponding to xx,yy,zz,xy,xz,yz

  • target – whether the properties should be associated with the entire structure, or each atom (structure or atom). defaults to autodetection

chemiscope.explore(frames, featurize=None, properties=None, environments=None, settings=None, mode='default')

Automatically explore a dataset containing all structures in frames.

This function computes some low-dimensionality representation of the frames, and uses chemiscope to visualize both the resulting embedding and the structures simultaneously. featurize can be used to specify a custom representation and/or dimensionality reduction method.

If no function is provided as a featurize argument, a default SOAP and PCA based featurizer is used. SOAP parameters (e.g., cutoff radius, number of radial and angular functions, etc.) are predefined. The SOAP computation uses all available CPU cores for parallelization. PCA reduces the dimensionality to two components by default.

Parameters:
  • frames (list) – list of frames

  • featurize (callable) – optional. Function to compute features and perform dimensionality reduction on the frames. The function should take frames as input and return a 2D array of reduced features. If None, a default SOAP and PCA based featurizer is used.

  • properties (dict) – optional. Additional properties to be included in the visualization. This dictionary can contain any other relevant data associated with the atomic structures. Properties can be extracted from frames with extract_properties() or manually defined by the user.

  • environments – optional. List of environments (described as (structure id, center id, cutoff)) to include when extracting the atomic properties. Can be extracted from frames with all_atomic_environments(). or manually defined.

  • settings (dict) – optional dictionary of settings to use when displaying the data. Possible entries for the settings dictionary are documented in the chemiscope input file reference.

  • mode (str) – optional. Visualization mode for the chemiscope widget. Can be one of “default”, “structure”, or “map”. The default mode is “default”.

Returns:

a chemiscope widget for interactive visualization

To use this function, additional dependencies are required, specifically, dscribe and sklearn libraries used for the default dimensionality reduction. They can be installed with the following command:

pip install chemiscope[explore]

Here is an example using this function with and without a featurizer function. The frames are obtained by reading the structures from a file that ase can read, and performing Kernel PCA using sklearn on a descriptor computed with SOAP with dscribe library.

import chemiscope
import ase.io
import dscribe.descriptors
import sklearn.decomposition

# Read the structures from the dataset
frames = ase.io.read("trajectory.xyz", ":")

# 1) Basic usage with the default featurizer (SOAP + PCA)
chemiscope.explore(frames)


# Define a function for dimensionality reduction
def soap_kpca_featurize(frames, environments):
    if environments is not None:
        raise ValueError("'environments' are not supported by this featurizer")
    # Compute descriptors
    soap = dscribe.descriptors.SOAP(
        species=["C"],
        r_cut=4.5,
        n_max=8,
        l_max=6,
        periodic=True,
    )
    descriptors = soap.create(frames)

    # Apply KPCA
    kpca = sklearn.decomposition.KernelPCA(n_components=2, gamma=0.05)

    # Return a 2D array of reduced features
    return kpca.fit_transform(descriptors)


# 2) Example with a custom featurizer function
chemiscope.explore(frames, featurize=soap_kpca_featurize)

For more examples, see the related documentation.

chemiscope.metatensor_featurizer(model, extensions_directory=None, check_consistency=False, device=None)

Create a featurizer function using a metatensor model to obtain the features from structures. The model must be able to create a "feature" output.

Parameters:
  • model – model to use for the calculation. It can be a file path, a Python instance of metatensor.torch.atomistic.MetatensorAtomisticModel, or the output of torch.jit.script() on metatensor.torch.atomistic.MetatensorAtomisticModel.

  • extensions_directory – a directory where model extensions are located

  • check_consistency – should we check the model for consistency when running, defaults to False.

  • device – a torch device to use for the calculation. If None, the function will use the options in model’s supported_device attribute.

Returns:

a function that takes a list of frames and returns the features.

To use this function, additional dependencies are required. They can be installed with the following command:

pip install chemiscope[metatensor]

Here is an example using a pre-trained metatensor model, stored as a model.pt file with the compiled extensions stored in the extensions/ directory. To obtain the details on how to get it, see metatensor tutorial. The frames are obtained by reading structures from a file that ase can read.

import chemiscope
import ase.io

# Read the structures from the dataset frames =
ase.io.read("data/explore_c-gap-20u.xyz", ":")

# Provide model file ("model.pt") to `metatensor_featurizer`
featurizer = chemiscope.metatensor_featurizer(
    "model.pt", extensions_directory="extensions"
)

chemiscope.explore(frames, featurize=featurizer)

For more examples, see the related documentation.

chemiscope.convert_stk_bonds_as_shapes(frames: List[Molecule], bond_color: str, bond_radius: float) Dict[str, Dict]

Convert connections between atom ids in each structure to shapes.

Parameters:

frames:

List of Molecule objects, which each are structures in chemiscope.

bond_color:

How to color the bonds added.

bond_radius:

Radius of bonds to add.