User tutorial

This tutorial will present how to use the default chemiscope visualizer with your own database: the different panels and related settings; as well as how to create an input file for it.

Introduction to structural properties

Before we get started, we will introduce a few concepts that underlie the concept and the usage of chemiscope. Chemiscope is designed to help navigating structure-property maps, i.e. 2D or 3D representations of a set of atomic scale entities that reflect how structure influences materials properties.

Chemiscope can work with two kinds of entities: full structures, or atom-centred environments. A structure consists in a set of atoms, possibly representing the periodic repeat unit of an infinite structure. An environment consists in a set of atoms that surround a central atom, In both cases, these entities are fully defined by the position and nature of the atoms present in the structure, or in the neighborhood of the environment center.

For each structure or environment, one may have computed properties, e.g. the cohesive energy of a molecule, or the NMR chemical shielding of a nucleus, or structural representations, i.e. functions of the spatial arrangement of the atoms that incorporate some fundamental symmetries to achieve a description of the structure that is as complete as possible, yet concise. Examples of such representations are for instance atom density representationis or Behler-Parrinello symmetry functions. These representations are usually high-dimensional vectors, hard to visualize and interpret. For this reason, one usually applies a dimensionality reduction algorithm, such as PCA, sketch-map, etc. The interpretation of the resulting will differ depending on both the descriptor used to represent the structures or environments and the dimensionality reduction algorithm applied.

Chemiscope simplifies visualizing the correlations between structural representations and properties associated with structures and environments, by representing in an interactive fashion these atomic-scale entities as points on a map, and by associating these points with an explicit, 3D visualization of the structure of the material or molecule.

_images/mol-to-map.svg

Illustration of the process used to create structural properties from a molecule.

Chemiscope is completly agnostic with respect to how properties and structural representations are generated, and do not provide any facilities to generate them. In the rest of this document, we will refer to properties describing the structure of an environment or structure as structural properties and other associated properties associated (such as energy, density, …) as physical properties.

Different panels and settings

The default chemiscope visualizer is organized in three main panels: the map, the structure viewer and the environment information display. Additionally, clicking on the dataset title (on top of the map) will display some metadata about the dataset (description, authors, references). This section will present each one, as well as the main settings accessible to customize the display.

The map is a 2D or 3D scatter plot showing properties for all the environments in the dataset. You can set which properties (structural or physical) should be used a the x, y, and potentially z axis; as well as for color and size of the points. Additionally, properties which have string values (an not numeric values) can be used as category data to set the symbols used for the points. To open the settings modal window, click on the hamburger menu (the ☰ symbol) on the left of the dataset title.

_images/map.png

The map panel in 2D mode and the related settings

The structure panel is a 3D molecular viewer based on Jmol. The settings are accessible through the hamburger menu (☰) on the right of the viewer. The settings are grouped into representation (how is the molecule rendered); supercell (how many copies of the unit cell to display); environments (how atom-centered environments are displayed); camera (reset the camera in along one of the given axis); and trajectory (playback related settings).

_images/structure.png

The structure panel and related settings

Finally, the environments information panel features sliders and text input to allow for an easy selection of the environment of interest. The play button on the left of the sliders activates the trajectory playback, looping over the structures in the datasets or the atoms in a structure. By clicking on the labels at the top (structure XXX and*atom XXX*), one can hide or show the full property tables. These tables show all properties in the dataset for the currently selected environment.

_images/info.png

The environment information panel fully expanded

Input file format for chemiscope

When using the default chemiscope interface, all the structures and properties in a dataset are loaded from a single JSON file. This sections describe how to generate such JSON file, either using a pre-existing python script that does most of the work for you, or by writing the JSON file directly. Since the resulting JSON file can be quite large and thus harder to share with collaborators, the default chemiscope interface also allows to load JSON files compressed with gzip.

Creating an input file

The easiest way to create a JSON input file is to use the chemiscope_input Python 3 script that lives inside chemiscope’s github repository. Download the script and place it somewhere it can be imported by Python. Then, in your own script, run the write_chemiscope_input function to generate the JSON file. This script assumes you use the ase Python module to read the structures.

If all the properties you want to include into chemiscope are already stored in an ase-readable ifle, you can also use the chemiscope_input`_ script from the command line.

chemiscope_input.write_chemiscope_input(filename, frames, meta=None, extra=None, cutoff=None)

Write the json file expected by the default chemiscope visualizer at filename.

Parameters
  • filename (str) – name of the file to use to save the json data. If it ends with ‘.gz’, a gzip compressed file will be written

  • frames (list) – list of ase.Atoms objects containing all the structures

  • meta (dict) – optional metadata of the dataset, see below

  • extra (dict) – optional dictionary of additional properties, see below

  • cutoff (float) – optional. If present, will be used to generate atom-centered environments

The dataset metadata should be given in the meta dictionary, the possible keys are:

meta = {
    'name': '...',         # str, dataset name
    'description': '...',  # str, dataset description
    'authors': [           # list of str, dataset authors, OPTIONAL
        '...',
    ],
    'references': [        # list of str, references for this dataset,
        '...',             # OPTIONAL
    ],
}

The written JSON file will contain all the properties defined on the ase.Atoms objects. Values in ase.Atoms.arrays are mapped to target = "atom" properties; while values in ase.Atoms.info are mapped to target = "structure" properties. The only exception is ase.Atoms.arrays["numbers"], which is always ignored. If you want to have the atomic numbers as a property, you should add it to extra manually.

Additional properties can be added with the extra parameter. This parameter should be a dictionary containing one entry for each property. Each entry contains a target attribute ('atom' or 'structure') and a set of values. values can be a Python list of float or string; a 1D numpy array of numeric values; or a 2D numpy array of numeric values. In the later case, multiple properties will be generated along the second axis. For example, passing

extra = {
    'cheese': {
        'target': 'atom',
        'values': np.zeros((300, 4))
    }
}

will generate four properties named cheese[1], cheese[2], cheese[3], and cheese[4], each containing 300 values.

Input file structure

If you can not or do not want to use the script mentionned above, you can also directly write the JSON file conforming to the schema described here. The input file follows closely the Dataset typescript interface used in the library. Using a pseudo-JSON format, the file should contains the following fields and values:

{
    // metadata of the dataset. `description`, `authors` and `references`
    // will be rendered as markdown.
    "meta": {
        // the name of the dataset
        "name": "this is my name"
        // description of the dataset, OPTIONAL
        "description": "This contains data from ..."
        // authors of the dataset, OPTIONAL
        "authors": ["John Doe", "Mr Green, green@example.com"],
        // references for the dataset, OPTIONAL
        "references": [
            "'A new molecular construction', Journal of Random Words 19 (1923) pp 3333, DOI: 10.0000/0001100",
            "'nice website' http://example.com",
        ],
    },

    // list of properties in this dataset
    "properties": {
        // each property have a name, a target and some values
        <name>: {
            // the property target: is it defined per atom or for the full
            // structures
            "target": "atom" | "structure",
            // values of the properties can either be numbers or strings.
            // string properties are assumed to represent categories of
            // data.
            "values": [1, 2, 3, ...] | ["first", "second", "first", ...]
        }
    }

    // list of structures in this dataset
    "structures": [
        {
            // number of atoms in the structure
            "size": 42,
            // names of the atoms in the structure
            "names": ["H", "O", "C", "C", ...],
            // x cartesian coordinate of all the atoms, in Angstroms
            "x": [0, 1.5, 5.2, ...],
            // y cartesian coordinate of all the atoms, in Angstroms
            "y": [5.7, 7, -2.4, ...],
            // z cartesian coordinate of all the atoms, in Angstroms
            "z": [8.1, 2.9, -1.3, ...],
            // OPTIONAL: unit cell of the system, if any.
            //
            // This should be given as [ax ay az bx by bz cx cy cz], where
            // a, b, and c are the unit cell vectors. All values are
            // expressed in Angstroms.
            "cell": [10, 0, 0, 0, 10, 0, 0, 0, 10],
        },
        // other structures as needed
        ...
    ],

    // OPTIONAL: atom-centered environments descrptions
    //
    // If present, there should be one environment for each atom in each
    // structure.
    "environments": [
        {
            // index of the structure in the above structures list
            "structure": 0,
            // index of the central atom in structures
            "center": 8,
            // spherical cutoff radius, expressed in Angstroms
            "cutoff": 3.5,
        },
        // more environments
        ...
    ]
}

Using the standalone visualizer

The default chemiscope interface lives online, at https://chemiscope.org/. But there are some cases where you do not want to use an online tool for your own dataset, such as scientific article supplementation information. For these use cases, a standalone, mostly offline visualizer exists that uses the same input file format as the default interface.

To create a standalone visualizer with your own dataset, please follow the steps below:

git clone https://github.com/cosmo-epfl/chemiscope
cd chemiscope
npm install
npm run build
python3 ./utils/generate_standalone.py

This will create a standalone.html file containing all the required HTML and javascript. You can then add your own dataset by adding the corresponding JSON file at the end of the standalone.html file.

cat standalone.html my-dataset.json > my-dataset.html