Creating chemiscope input files

When using the default chemiscope interface, all the structures and properties in a dataset are loaded from a single JSON file. This sections describe how to generate such JSON file, either using a pre-existing python script that does most of the work for you, or by writing the JSON file directly. Since the resulting JSON file can be quite large and thus harder to share with collaborators, the default chemiscope interface also allows to load JSON files compressed with gzip.

tl;dr if you would like to generate a simple chemiscope for your dataset, we have a Google Colab notebook that can help!

Tools able to create chemiscope input

chemiscope Python module

The easiest way to create a JSON input file is to use the chemiscope Python module. Install the package with pip install chemiscope, and use chemiscope.write_input() or chemiscope.create_input() in your own script to generate the JSON file.

If all the properties you want to include into chemiscope are already stored in a file ase can read, the chemiscope python package also install a chemiscope-input command line script.

Note that chemiscope does not compute structural representations or dimensionality reduction, and you need to do this yourself or use another package such as ASAP.

ASAP

The ASAP structural analysis package is another tool that can directly generate an output in chemiscope format.

chemiscope functions reference

chemiscope.write_input(path, frames, meta=None, properties=None, environments=None, shapes=None, settings=None, parameters=None)

Create the input JSON file used by the default chemiscope visualizer, and save it to the given path.

Parameters:
  • path (str) – name of the file to use to save the json data. If it ends with ‘.gz’, a gzip compressed file will be written

  • frames (list) – list of atomic structures. For now, only ase.Atoms objects are supported

  • meta (dict) – optional metadata of the dataset

  • properties (dict) – optional dictionary of additional properties

  • environments (list) – optional list of (structure id, atom id, cutoff) specifying which atoms have properties attached and how far out atom-centered environments should be drawn by default.

  • shapes (dict) – optional dictionary of shapes to have available for display. See create_input() for more information on how to define shapes.

  • settings (dict) – optional dictionary of settings to use when displaying the data. Possible entries for the settings dictionary are documented in the chemiscope input file reference.

  • parameters (dict) – optional dictionary of parameters of multidimensional properties

This function uses create_input() to generate the input data, see the documentation of this function for more information.

Here is a quick example of generating a chemiscope input reading the structures from a file that ase can read, and performing PCA using sklearn on a descriptor computed with another package.

import ase
from ase import io
import numpy as np

import sklearn
from sklearn import decomposition

import chemiscope

frames = ase.io.read('trajectory.xyz', ':')

# example property 1: list containing the energy of each structure,
# from calculations performed beforehand
energies = np.loadtxt('energies.txt')

# example property 2: PCA projection computed using sklearn.
# X contains a multi-dimensional descriptor of the structure
X = np.array( ... )
pca = sklearn.decomposition.PCA(n_components=3).fit_transform(X)

properties = {
    "PCA": {
        "target": "atom",
        "values": pca,
        "description": "PCA of per-atom representation of the structures",
    },
    "energies": {
        "target": "structure",
        "values": energies,
        "units": "kcal/mol",
    },
}

# additional properties coming from the trajectory
frame_properties = chemiscope.extract_properties(
    frames,
    only=["temperature", "classification"]
)

# additional multidimensional properties to be plotted
dos = np.loadtxt(...) # load the 2D data
dos_energy_grid = np.loadtxt(...)
multidimensional_properties = {
    "DOS": {
        "target": "structure",
        "values": dos,
        "parameters": ["energy"],
    }
}

multidimensional_parameters = {
    "energy": {
        "values": dos_energy_grid,
        "units": "eV",
    }
}

# merge all properties together
properties.extend(frame_properties)
properties.extend(multidimensional_properties)

chemiscope.write_input(
    path="chemiscope.json.gz",
    frames=frames,
    properties=properties,
    # This is required to display properties with `target: "atom"`
    environments=chemiscope.all_atomic_environments(frames),
    # this is necessary to plot the multidimensional data
    parameters=multidimensional_parameters,
)
chemiscope.create_input(frames=None, meta=None, properties=None, environments=None, settings=None, shapes=None, parameters=None)

Create a dictionary that can be saved to JSON using the format used by the default chemiscope visualizer.

Parameters:
  • frames (list) – list of atomic structures. For now, only ase.Atoms objects are supported

  • meta (dict) – optional metadata of the dataset, see below

  • properties (dict) – optional dictionary of properties, see below

  • environments (list) – optional list of (structure id, atom id, cutoff) specifying which atoms have properties attached and how far out atom-centered environments should be drawn by default. Functions like all_atomic_environments() or librascal_atomic_environments() can be used to generate the list of environments in simple cases.

  • shapes (dict) – optional dictionary of shapes to have available for display, see below. extract_lammps_shapes_from_ase() can automatically extract shapes from a LAMMPS simulation.

  • settings (dict) – optional dictionary of settings to use when displaying the data. Possible entries for the settings dictionary are documented in the chemiscope input file reference.

  • parameters (dict) – optional dictionary of parameters for multidimensional properties, see below

Dataset metadata

The dataset metadata should be given in the meta dictionary, the possible keys are:

meta = {
    'name': '...',         # str, dataset name
    'description': '...',  # str, dataset description
    'authors': [           # list of str, dataset authors, OPTIONAL
        '...',
    ],
    'references': [        # list of str, references for this dataset,
        '...',             # OPTIONAL
    ],
}

Dataset properties

Properties can be added with the properties parameter. This parameter should be a dictionary containing one entry for each property. Properties can be extracted from structures with extract_properties() or composition_properties(), or manually defined by the user.

Each entry in the properties dictionary contains a target attribute ('atom' or 'structure') and a set of values. values can be a Python list of float or string; a 1D numpy array of numeric values; or a 2D numpy array of numeric values. In the later case, multiple properties will be generated along the second axis. For example, passing

properties = {
    'cheese': {
        'target': 'atom',
        'values': np.zeros((300, 4)),
        # optional: property unit
        'unit': 'random / fs',
        # optional: property description
        'description': 'a random property for example',
    }
}

will generate four properties named cheese[1], cheese[2], cheese[3], and cheese[4], each containing 300 values.

It is also possible to pass shortened representation of the properties, for instance:

properties = {
    'cheese':  np.zeros((300, 4)),
}

In this case, the type of property (structure or atom) would be deduced by comparing the numbers atoms and structures in the dataset to the length of provided list/np.ndarray.

Multi-dimensional properties

One can give 2D properties to be displayed as curves in the info panel by setting a parameters in the property, and giving the corresponding parameters values to this function. The previous example becomes:

properties = {
    'cheese': {
        'target': 'atom',
        'values': np.zeros((300, 4)),
        # optional: property unit
        'unit': 'random / fs',
        # optional: property description
        'description': 'a random property for example',
        'parameters': ['origin'],
    }
}

This input describes a 2D property cheese with 300 samples and 4 values taken by the origin parameter. We also need to provide the parameters values to this function:

parameters = {
    'origin' : {
        # an array of numbers containing the values of the parameter
        # the size should correspond to the second dimension
        # of the corresponding multidimensional property
        'values': [0, 1, 2, 3],
        # optional free-form description of the parameter as a string
        'name': 'a short description of this parameter',
        # optional units of the values in the values array
        'units': 'eV',
    }
}

Custom shapes

The shapes parameter should have the format {"<name>": list of list of shapes}, where the list of lists contains one list for each structure, itself containing one shape dictionary for each atom/site.

shapes = {
    "shape name": [
        [{"kind": "sphere", "radius": 0.3} for atom in frame]
        for frame in frames
    ]
}

The shape dictionary can have any of the following form:

# Ellipsoid shape
shape = {
    "kind": "ellipsoid",
    "semiaxes": [float, float, float],
    "orientation" [float, float, float, float], # optional
}

# Spherical shape
shape = {
    "kind": "sphere",
    "radius": float,
}

# Fully custom shape
shape = {
    "kind": "custom",
    "vertices": [
        [float, float, float],
        ...
    ],
    # `simplices` is optional
    "simplices": [
        [int, int, int],
        ...
    ],
    # `orientation` is optional
    "orientation" [float, float, float, float],
}

where orientation is an optional parameter corresponding to a quaternion in x, y, z, w format. For custom shapes, simplices, referring to the indices of the facets, is also optional, and will be determined by convex triangulation when not provided.

chemiscope.extract_properties(frames, only=None, environments=None)

Extract properties defined in the frames in a chemiscope-compatible format.

Parameters:
  • frames – iterable over structures (typically a list of frames)

  • only – optional, list of strings. If not None, only properties with a name from this list are included in the output.

  • environments – optional, list of environments (described as (structure id, center id, cutoff)) to include when extracting the atomic properties.

chemiscope.composition_properties(frames, environments=None)

Generate properties containing the chemical composition of the given frames.

This create two atomic properties: symbol (string) and number (int); and multiple structure properties: composition and n_{element} for each elements in the dataset. The properties are then returned in chemiscope format.

Parameters:
  • frames – iterable over structures (typically a list of frames)

  • environments – optional, list of environments (described as (structure id, center id, cutoff)) to include when generating the atomic properties.

chemiscope.all_atomic_environments(frames, cutoff=3.5)

Generate a list of environments containing all the atoms in the given frames. The optional spherical cutoff radius is used to display the environments in chemiscope.

Parameters:
  • frames – iterable over structures (typically a list of frames)

  • cutoff (float) – spherical cutoff radius used when displaying the environments

chemiscope.librascal_atomic_environments(frames, cutoff=3.5)

Generate the list of environments for the given frames, matching the behavior used by librascal when computing descriptors for only a subset of the atomic centers. The optional spherical cutoff radius is used to display the environments in chemiscope.

Only ase.Atoms are supported for the frames since that’s what librascal uses.

Parameters:
  • frames – iterable over ase.Atoms

  • cutoff (float) – spherical cutoff radius used when displaying the environments

chemiscope.extract_lammps_shapes_from_ase(frames, key='shape')

Extract shapes from a LAMMPS data file read by ASE.

Parameters:
  • frames – list of ASE Atoms objects

  • key – name of the ASE property where the shape is stored

chemiscope.ellipsoid_from_tensor(tensor, scale=1.0, force_positive=False)

Returns an ellipsoid (semiaxes + quaternion) representation of a positive definite tensor (e.g. a polarizability), in the form required by the chemiscope input.

Parameters:
  • tensor – a positive-definite tensor (3x3 or a 6-array [xx,yy,zz,xy,xz,yz])

  • scale – conversion from the units of the tensor to the units of the atomic positions (usually Å)

  • force_positive – takes the absolute value of eigenvalues, to handle non-positive tensors

chemiscope.arrow_from_vector(vec, scale=1.0, radius=0.1, head_radius_scale=1.75, head_length_scale=4, n_points=16)

Draws an arrow from the origin to the specified 3D position. Returns a custom shape in the form required by the chemiscope input.

Parameters:
  • scale – conversion from the units of the vector to the units of the atomic positions (usually Å)

  • radius – radius of the stem of the arrow (same units as the atomic positions, typically Å)

  • head_radius_scale – radius of the arrow tip, relative to the stem radius

  • head_length_scale – length of the arrow tip, relative to the stem radius

  • n_points – resolution of the discretization of the shape

chemiscope-input command line interface

Command-line utility to generate an input for chemiscope — the interactive structure-property explorer. Parses an input file containing atomic structures using the ASE I/O module, and converts it into a JSON file that can be loaded in chemiscope. Frame and environment properties must be written in the same file containing atomic structures: we recommend the extended xyz format, which is flexible and simple. In all cases, this utility will simply write to the JSON file anything that is readable by ASE.

chemiscope-input [-h] [-o OUTPUT] [--properties PROPERTIES]
                 [--only-atoms | --only-structures] [--cutoff CUTOFF] [--name NAME]
                 [--description DESCRIPTION] [--authors [AUTHORS ...]]
                 [--references [REFERENCES ...]] [--settings SETTINGS]
                 input

positional arguments

  • input - input file containing the structures and properties (default: None)

options

  • -h, --help - show this help message and exit

  • -o OUTPUT, --output OUTPUT - chemiscope output file in JSON format (default: None)

  • --properties PROPERTIES - comma-separated list of properties that should be extracted. defaults to all

  • --only-atoms - only use per-atom properties from the input file

  • --only-structures - only use per-structure properties from the input file (default)

  • --cutoff CUTOFF - spherical cutoff radius that should be visualized around environments (default: 3.5)

  • --name NAME - name of the dataset (default: )

  • --description DESCRIPTION - description of the dataset (default: )

  • --authors AUTHORS - list of dataset authors (default: [])

  • --references REFERENCES - list of references for the dataset (default: [])

  • --settings SETTINGS - visualization settings, as a JSON string, following the chemiscope format (default: )