Creating chemiscope input files¶
When using the default chemiscope interface, all the structures and properties in a dataset are loaded from a single JSON file. This sections describe how to generate such JSON file, either using a pre-existing python script that does most of the work for you, or by writing the JSON file directly. Since the resulting JSON file can be quite large and thus harder to share with collaborators, the default chemiscope interface also allows to load JSON files compressed with gzip.
Tools able to create chemiscope input¶
chemiscope
Python module¶
The easiest way to create a JSON input file is to use the chemiscope
Python
module. Install the package with pip install chemiscope
, and use
chemiscope.write_input()
or chemiscope.create_input()
in your
own script to generate the JSON file.
If all the properties you want to include into chemiscope are already stored in
a file ase can read, the chemiscope
python package also install a
chemiscope-input command line script.
Note that chemiscope does not compute structural representations or dimensionality reduction, and you need to do this yourself or use another package such as ASAP.
ASAP
¶
The ASAP structural analysis package is another tool that can directly generate an output in chemiscope format.
chemiscope
functions reference¶
- chemiscope.write_input(path, frames, meta=None, properties=None, environments=None, settings=None, composition=False)¶
Create the input JSON file used by the default chemiscope visualizer, and save it to the given
path
.- Parameters:
path (str) – name of the file to use to save the json data. If it ends with ‘.gz’, a gzip compressed file will be written
frames (list) – list of atomic structures. For now, only ase.Atoms objects are supported
meta (dict) – optional metadata of the dataset
properties (dict) – optional dictionary of additional properties
environments (list) – optional list of (structure id, atom id, cutoff) specifying which atoms have properties attached and how far out atom-centered environments should be drawn by default. Functions like
all_atomic_environments()
orlibrascal_atomic_environments()
can be used to generate the list of environments in simple cases.settings (dict) – optional dictionary of settings to use when displaying the data. Possible entries for the
settings
dictionary are documented in the chemiscope input file reference.composition (bool) – optional. False by default. If True, will add to the structure and atom properties information about chemical composition
This function uses
create_input()
to generate the input data, see the documentation of this function for more information.Here is a quick example of generating a chemiscope input reading the structures from a file that ase can read, and performing PCA using sklearn on a descriptor computed with another package.
import ase from ase import io import numpy as np import sklearn from sklearn import decomposition import chemiscope frames = ase.io.read('trajectory.xyz', ':') # example property 1: list containing the energy of each structure, # from calculations performed beforehand energies = [ ... ] # example property 2: PCA projection computed using sklearn. # X contains a multi-dimensional descriptor of the structure X = np.array( ... ) pca = sklearn.decomposition.PCA(n_components=3).fit_transform(X) properties = { "PCA": { "target": "atom", "values": pca, "description": "PCA of per-atom representation of the structures", }, "energies": { "target": "structure", "values": energies, "units": "kcal/mol", }, } chemiscope.write_input( path="chemiscope.json.gz", frames=frames, properties=properties, # This is required to display properties with `target: "atom"` environments=chemiscope.all_atomic_environments(frames), )
- chemiscope.create_input(frames=None, meta=None, properties=None, environments=None, settings=None, composition=False)¶
Create a dictionary that can be saved to JSON using the format used by the default chemiscope visualizer.
- Parameters:
frames (list) – list of atomic structures. For now, only ase.Atoms objects are supported
meta (dict) – optional metadata of the dataset, see below
properties (dict) – optional dictionary of additional properties, see below
environments (list) – optional list of (structure id, atom id, cutoff) specifying which atoms have properties attached and how far out atom-centered environments should be drawn by default. Functions like
all_atomic_environments()
orlibrascal_atomic_environments()
can be used to generate the list of environments in simple cases.settings (dict) – optional dictionary of settings to use when displaying the data. Possible entries for the
settings
dictionary are documented in the chemiscope input file reference.composition (bool) – optional,
False
by default. IfTrue
, will add to structure and atom properties containing information about the chemical composition
The dataset metadata should be given in the
meta
dictionary, the possible keys are:meta = { 'name': '...', # str, dataset name 'description': '...', # str, dataset description 'authors': [ # list of str, dataset authors, OPTIONAL '...', ], 'references': [ # list of str, references for this dataset, '...', # OPTIONAL ], }
The returned dictionary will contain all the properties defined on the ase.Atoms objects. Values in
ase.Atoms.arrays
are mapped totarget = "atom"
properties; while values inase.Atoms.info
are mapped totarget = "structure"
properties. The only exception isase.Atoms.arrays["numbers"]
, which is always ignored. If you want to have the atomic numbers as a property, you should add it toproperties
manually.Additional properties can be added with the
properties
parameter. This parameter should be a dictionary containing one entry for each property. Each entry contains atarget
attribute ('atom'
or'structure'
) and a set of values.values
can be a Python list of float or string; a 1D numpy array of numeric values; or a 2D numpy array of numeric values. In the later case, multiple properties will be generated along the second axis. For example, passingproperties = { 'cheese': { 'target': 'atom', 'values': np.zeros((300, 4)), # optional: property unit 'unit': 'random / fs', # optional: property description 'description': 'a random property for example', } }
will generate four properties named
cheese[1]
,cheese[2]
,cheese[3]
, andcheese[4]
, each containing 300 values.It is also possible to pass shortened representation of the properties, for instance:
properties = { 'cheese': np.zeros((300, 4)), } }
In this case, the type of property (structure or atom) would be deduced by comparing the numbers atoms and structures in the dataset to the length of provided list/np.ndarray.
- chemiscope.all_atomic_environments(frames, cutoff=3.5)¶
Generate a list of environments containing all the atoms in the given
frames
. The optional sphericalcutoff
radius is used to display the environments in chemiscope.- Parameters:
frames – iterable over structures (typically a list of frames)
cutoff (float) – spherical cutoff radius used when displaying the environments
- chemiscope.librascal_atomic_environments(frames, cutoff=3.5)¶
Generate the list of environments for the given
frames
, matching the behavior used by librascal when computing descriptors for only a subset of the atomic centers. The optional sphericalcutoff
radius is used to display the environments in chemiscope.Only
ase.Atoms
are supported for theframes
since that’s what librascal uses.- Parameters:
frames – iterable over
ase.Atoms
cutoff (float) – spherical cutoff radius used when displaying the environments
chemiscope-input
command line interface¶
Command-line utility to generate an input for chemiscope — the interactive structure-property explorer. Parses an input file containing atomic structures using the ASE I/O module, and converts it into a JSON file that can be loaded in chemiscope. Frame and environment properties must be written in the same file containing atomic structures: we recommend the extended xyz format, which is flexible and simple. In all cases, this utility will simply write to the JSON file anything that is readable by ASE.
chemiscope-input [-h] [-o OUTPUT] [--only-atoms | --only-structures] [--name NAME]
[--description DESCRIPTION] [--authors [AUTHORS ...]]
[--references [REFERENCES ...]]
input
positional arguments¶
input
- input file containing the structures and properties (default:None
)
options¶
-o
OUTPUT
,--output
OUTPUT
- chemiscope output file in JSON format (default:None
)--only-atoms
- only use per-atom properties from the input file--only-structures
- only use per-structure properties from the input file--name
NAME
- name of the dataset (default:)
--description
DESCRIPTION
- description of the dataset (default:)
--references
REFERENCES
- list of references for the dataset (default:[]
)