MLCF¶
elf¶
ElF - allows for the creation of electronic descriptors (ElFs: ELectronic Fingerprints) out of real space electron densities
-
class
mlc_func.elf.ElF(value, angles, basis, species, unitcell)[source]¶ Class defining the electronic descriptors used by MLCF. ElF stands for ELectronic Fingerprint
- Parameters
value (dict or np.ndarray) – value of descriptor can either be a complex (dict) or real (np.ndarray) tensor.
angles (np.ndarray (3)) – angles by which ElF was rotated into local coordinate system (used to rotate forces into same CS).
basis (dict) – basis for elf representation
species (str,) – atomic species (element symbol)
unitcell (np.ndarray (3,3)) – unitcell of the system (used by fold_back_coords during alingment)
density¶
-
class
mlc_func.elf.density.Density(rho, unitcell, grid)[source]¶ Class defining the density on a real space grid
- Parameters
rho (np.ndarray) – 3-dim real space density.
unitcell (np.ndarray (3,3)) – unitcell in Angstrom.
grid (np.ndarray (3,)) – grid points.
-
mesh_3d(rmin=[0, 0, 0], rmax=0, scaled=False, pbc=True, indexing='xy')[source]¶ Returns a 3d mesh taking into account periodic boundary conditions
- Parameters
rmax (rmin,) – lower and upper cutoff in every euclidean direction.
scaled (boolean) – scale the meshes with unitcell size?
pbc (boolean) – assume periodic boundary conditions?
indexing ('xy' or 'ij') – indexing scheme used by np.meshgrid.
- Returns
X, Y, Z – defines mesh.
- Return type
tuple of np.ndarray
real_space¶
-
mlc_func.elf.real_space.S(r_i, r_o, nmax, gamma)[source]¶ Overlap matrix between radial basis functions
- Parameters
r_i (float) – inner radial cutoff
r_o (float) – outer radial cutoff
nmax (int) – max. number of radial functions
gamma (float) – damping parameter
- Returns
Overlap matrix
- Return type
np.ndarray (nmax, nmax)
-
mlc_func.elf.real_space.atomic_elf(pos, density, basis, chem_symbol)[source]¶ Given an input density and an atomic position decompose the surrounding charge density into an ELF
- Parameters
pos ((,3) np.ndarray) – atomic position
density (Density) – stores charge density rho, unitcell, and grid (see density.py)
basis (dict) – specifies the basis set used for the ELF decomposition for each chem. element
chem_symbol (str) – chemical element symbol
- Returns
dictionary containing the real ELF
- Return type
dict
-
mlc_func.elf.real_space.box_around(pos, radius, density)[source]¶ Return dictionary containing box around an atom at position pos with given radius. Dictionary contains box in mesh, euclidean and spherical coordinates
- Parameters
pos (np.ndarray (3,) or (1,3),) – coordinates for box center
radius (float) – box radius
density (Density) – only needed for Density.unitcell and Density.grid
- Returns
{‘mesh’,’real’,’radial’}, box in mesh, euclidean and spherical coordinates
- Return type
dict
-
mlc_func.elf.real_space.decompose(rho, box, n_rad, n_l, r_i, r_o, gamma, V_cell=1)[source]¶ Project the real space density rho onto a set of basis functions
- Parameters
rho (np.ndarray) – electron charge density on grid
box (dict) – contains the mesh in spherical and euclidean coordinates, can be obtained with get_box_around()
n_rad (int) – number of radial functions
n_l (int) – number of spherical harmonics
r_i (float) – inner radial cutoff in Angstrom
r_o (float) – outer radial cutoff in Angstrom
gamma (float) – exponential damping
V_cell (float) – volume of one grid cell
- Returns
dictionary containing the complex ELF
- Return type
dict
-
mlc_func.elf.real_space.g(r, r_i, r_c, a, gamma)[source]¶ Non-orthogonalized radial functions
- Parameters
r (float) – radius
r_i (float) – inner radial cutoff
r_o (float) – outer radial cutoff
a (int) – exponent (equiv. to radial index n)
gamma (float) – damping parameter
- Returns
value of radial function at radius r
- Return type
float
-
mlc_func.elf.real_space.get_W(r_i, r_o, n, gamma)[source]¶ Get matrix to orthonormalize radial basis functions
- Parameters
r_i (float) – inner radial cutoff
r_o (float) – outer radial cutoff
n (int) – max. number of radial functions
gamma (float) – damping parameter
- Returns
W, orthogonalization matrix
- Return type
np.ndarray
-
mlc_func.elf.real_space.get_elf_thread(pos, density, basis, chem_symbol, i, all_positions, mode)[source]¶ Method that should be used in a parallel executions. One thread/process computes and orients the ElF for a single atom inside a system
-
mlc_func.elf.real_space.get_elfs(atoms, density, basis, view=<mlc_func.elf.serial_view.serial_view object>, orient_mode='none')[source]¶ Given an input density and an ASE Atoms object decompose the complete charge density into atomic ELFs
- Parameters
atoms (ase.Atoms) –
density (Density) – stores charge density rho, unitcell, and grid (see density.py)
basis (dict) – specifies the basis set used for the ELF decomposition for each chem. element
view (ipyparallel.balanced_view) – for parallel execution through sync map
orient_mode (str) – {‘none’: do not orient and return complex tensor, ‘elf’/’nn’: orient using the elf or nn algorithm and return real tensor}
- Returns
list containing the complex/real atomic ELFs
- Return type
list
-
mlc_func.elf.real_space.get_elfs_oriented(atoms, density, basis, mode, view=<mlc_func.elf.serial_view.serial_view object>)[source]¶ Outdated, use get_elfs() with “mode=’elf’/’nn’” instead. Like get_elfs, but returns real, oriented elfs mode = {‘elf’: Use the ElF algorithm to orient fingerprint,
‘nn’: Use nearest neighbor algorithm}
-
mlc_func.elf.real_space.mesh_around(pos, radius, density, unit='A')[source]¶ Similar to box_around but only returns mesh
-
mlc_func.elf.real_space.orient_elf(i, elf, all_pos, mode)[source]¶ Takes an ElF and orient it according to the rule specified in mode.
- Parameters
i (int) – Index of the atom in all_pos
elf (ElF) – ElF to orient
all_pos (numpy.ndarray) – positions of all atoms in system (including the one with index i)
mode (str,) – {‘elf’: Use the ElF algorithm to orient fingerprint, ‘nn’: Use nearest neighbor algorithm}, ‘water’: molecular alignment (can only be used for neat water), ‘neutral’: keep alignment unchanged
- Returns
oriented version of elf
- Return type
-
mlc_func.elf.real_space.orient_elfs(elfs, atoms, mode)[source]¶ Convenience function that applies orient_elf to a list of elfs. (Exists for compatibility reasons)
-
mlc_func.elf.real_space.radials(r, r_i, r_o, W, gamma)[source]¶ Get orthonormal radial basis functions
- Parameters
r (float) – radius
r_i (float) – inner radial cutoff
r_o (float) – outer radial cutoff
W (np.ndarray) – orthogonalization matrix
gamma (float) – damping parameter
- Returns
radial functions
- Return type
np.ndarray
geom¶
Module that provides algebraic operations on SO(3) tensors
-
mlc_func.elf.geom.fold_back_coords(i, coords, unitcell)[source]¶ Return the periodic images of coords in a unit-cell that are closest to coords[i]
- Parameters
i (int) – central atom
coords (np.ndarray (?, 3)) – all atomic positions in given sytem
unitcell (np.ndarray (3,3)) – unitcell in angstrom
- Returns
peridic images of coords
- Return type
np.ndarray (?, 3)
-
mlc_func.elf.geom.get_casimir(tensor)[source]¶ Get the casimir element (equiv. to L_2 norm) of a complex tensor
- Parameters
tensor (dict) – dictionary containing tensor in its complex form
- Returns
Casimir element {‘n,l’}
- Return type
dict
-
mlc_func.elf.geom.get_elfcs_angles(i, coords, tensor)[source]¶ Get angles relating global coordinate system to local coordinate system (LCS) defined by electronic structure
- Parameters
i (int) – LCS around atom i
coords (np.ndarray (?, 3)) – all atomic positions in given sytem
tensor (dict) – complex tensor (electronic descriptor) to use for alignment
- Returns
Euler angles alpha, beta, gamma
- Return type
list of floats
-
mlc_func.elf.geom.get_euler_angles(co)[source]¶ Given a coordinate system co, return the euler angles that relate this CS to the standard CS
- Parameters
co (np.ndarray (3,3)) – coordinates of the body-fixed axes in the global coordinate system
- Returns
euler angles
- Return type
tuple of floats
-
mlc_func.elf.geom.get_max(tensor)[source]¶ Get the maximum radial index and maximum ang. momentum in tensor
-
mlc_func.elf.geom.get_nncs_angles(i, coords, tensor=None)[source]¶ Get angles relating global coordinate system to local coordinate system (LCS) defined by nearest neighbors
- Parameters
i (int) – LCS around atom i
coords (np.ndarray (?, 3)) – all atomic positions in given sytem
tensor (None) – placeholder
- Returns
Euler angles alpha, beta, gamma
- Return type
list of floats
-
mlc_func.elf.geom.make_complex(tensor_array, n_rad, n_l)[source]¶ Take real tensors provided as a np.ndarray and convert them into complex tensors represented as a dictionary
- Parameters
tensor_array (np.ndarray) – real tensor (ordering: radial ang.momentum projection like: s1 ppp1 ddddd1 s2 etc.)
n_rad (int,) – number of radials
n_l (int,) – maximum angular momentum
- Returns
dictionary containing complex tensor elements (keys: {‘n,l,m’})
- Return type
dict
-
mlc_func.elf.geom.make_real(tensor)[source]¶ Take complex tensors provided as a dict and convert them into real tensors
-
mlc_func.elf.geom.rotate_tensor(tensor, angles, inverse=False)[source]¶ Rotate a complex tensor.
- Parameters
tensor (dict) – complex rank-2 tensor to rotate; the tensor is expected to be complete i.e. no entries should be missing
angles (np.ndarray (3,)) – euler angles: alpha, beta, gamma
inverse (bool,) – {False: rotate vector, True: rotate CS}
- Returns
Rotated version of tensor
- Return type
dict
Remember that in nncs and elfcs alignment, inverse = True should be used
-
mlc_func.elf.geom.rotate_vector(vec, angles, inverse=False)[source]¶ Rotate a real vector (euclidean order: xyz) with euler angles
- Parameters
vec (np.ndarray (?, 3)) – vector(s) to rotate, note that if more than one vector provided, ever vector is rotated by the same angles.
angles (np.ndarray (3)) – euler angles: alpha, beta, gamma.
inverse (bool) – {False: rotate vector, True: rotate coordinate system}
- Returns
rotated vector(s)
- Return type
np.ndarray
water¶
-
mlc_func.elf.water.get_water_angles(i, coords, tensor=None)[source]¶ Get euler angles to rotate to the water molecule centered CS for coords[i]. (Assumes that ordering in coords is OHHOHH…)
siesta¶
Utility functions for real-space grid properties
-
mlc_func.elf.siesta.get_atoms(path, n_atoms=-1)[source]¶ find atomic data in siesta .out file for first n_atoms atoms and return as ase Atoms object
-
mlc_func.elf.siesta.get_density(file_path)[source]¶ Import data from RHO file (or similar real-space grid files) Data is saved in global variables.
Structure of RHO file: first three lines give the unit cell vectors fourth line the grid dimensions subsequent lines give density on grid
- Parameters
file_path (string) – path to RHO (or RHOXC) file from which density is read
- Returns
- Return type
-
mlc_func.elf.siesta.get_density_bin(file_path)[source]¶ Same as get_data for binary (unformatted) files
ml¶
This module contains everything regarding machine learning
energy_network¶
Module that implements Energy_Network, the machine learned correcting functional (MLCF) for energies
-
class
mlc_func.ml.network.Dataset(data, species)¶ -
property
data¶ Alias for field number 0
-
property
species¶ Alias for field number 1
-
property
-
class
mlc_func.ml.network.Energy_Network(subnets)[source]¶ Machine learned correcting functional (MLCF) for energies
- Parameters
subnets (list of Subnetwork) – each subnetwork belongs to a single atom inside the system and computes the atomic contributio to the total energy
-
get_cost()[source]¶ Build the tensorflow node that defines the cost function
- Returns
cost_list – list of costs for subnets. subnets whose outputs are added together share cost functions
- Return type
[tensorflow.placeholder]
-
get_energies(summarize=True, which='train')[source]¶ Uses trained model on training or test sets
- Parameters
which (str) – {‘train’,’test’} which set logits are computed for
- Returns
resulting energies grouped by independent subnet datasets
- Return type
list of numpy.ndarray
-
get_feed(which='train', train_valid_split=0.8, seed=42)[source]¶ Return a dictionary that can be used as a feed_dict in tensorflow
- Parameters
which ({'train',test'}) – which part of the dataset is used
train_valid_split (float) – ratio of train and validation set size
seed (int) – seed parameter for the random shuffle algorithm, make
- Returns
either (training feed dictionary, validation feed dict.) or (testing feed dictionary, None)
- Return type
(dictionary, dictionary)
-
load_all(net_dir)[source]¶ Loads the model in net_dir including all subnets and datasets using pickle
-
predict(features, species, use_masks=False, return_gradient=False)[source]¶ Get predicted energies
- Parameters
features (np.ndarray) – input features
species (str) – predict atomic contribution to energy for this species
use_masks (bool) – whether masks should be applied to the provided features
return_gradient (bool) – instead of returning energies, return gradient of network w.r.t. input features
- Returns
predicted energies or gradient
- Return type
np.ndarray
-
save_all(net_dir, override=False)[source]¶ Saves the model including all subnets and datasets using pickle to directory net_dir, if directory exists only save if override = True
-
train(step_size=0.01, max_steps=50001, b_=0, verbose=True, optimizer=None, adaptive_rate=False, multiplier=1.0)[source]¶ Train the master neural network
- Parameters
step_size (float) – step size for gradient descent
max_steps (int) – number of training epochs
b (float) – regularization parameter
verbose (boolean) – print cost for intermediate training epochs
optimizer (tf.nn.GradientDescentOptimizer,tf.nn.AdamOptimizer, ..) –
adaptive_rate (boolean) –
- wether to adjust step_size if cost increases
not recommended for AdamOptimizer
multiplier (list of float) – multiplier that allow to give datasets more weight than others
- Returns
- Return type
None
-
class
mlc_func.ml.network.Subnet[source]¶ Subnetwork that is associated with one Atom
-
add_dataset(dataset, targets, test_size=0.2, target_filter=None, scale=True)[source]¶ Adds dataset to the subnetwork.
- Parameters
dataset (dataset) – contains datasets that will be associated with subnetwork for training and evaluation
targets (np.ndarray) – target values for training and evaluation
- Returns
- Return type
None
-
get_feed(which, train_valid_split=0.8, seed=None)[source]¶ Return a dictionary that can be used as a feed_dict in tensorflow
- Parameters
which (str,) – {‘train’, ‘valid’, ‘test’} which part of the dataset is used
train_valid_split (float) – ratio of train and validation set size
seed (int) – seed parameter for the random shuffle algorithm
- Returns
- Return type
dict
-
-
mlc_func.ml.network.build_energy_mlcf(feature_src, target_src, masks={}, automask_std=0, filters=[], autofilt_percent=0, test_size=0.2)[source]¶ Return a trainable energy MLCF (neural network)
- Parameters
feature_src (list) – list of paths to the hdf5 containing the features
target_src (list) – list of paths to the csv files containing the target energies entries in target_scr and feature_src correspond to each other
masks (dict,) – containing list booleans; can be used to select which features to use. keys specify the atomic species. default: use all features
automask_std (float,) – if mask not set exclude all features whose stdev across dataset is smaller than this value
filters (list,) – containing list of booleans; can be used to exclude datapoints in sets (e.g. outliers)
autofilt_percent (float,) – exclude this percentile of extreme datapoints from set (only if filters not set)
test_size (float,) – relative size of hold_out (test) set
- Returns
- Return type
-
mlc_func.ml.network.get_energy_filters(target_src, autofilt_percent=0)[source]¶ For a given energy target dataset return filter that cutoff the upper and lower percentile specified in autofilt_percent
- Parameters
target_src (str) – path of csv file containing energy targets
autofilt_percent (float) – percentile to cut off
- Returns
filters
- Return type
list of bool
force_network¶
Module that implements Force_Network, the machine learned correcting functional (MLCF) for forces
-
class
mlc_func.ml.force_network.Force_Network(species, scaler, basis, datasets={}, mask=[], n_layers=3, nodes_per_layer=8, b=0)[source]¶ MLCF for force perdiction
- Parameters
species (str) – chemical element symbol
scaler (sklearn Scaler) –
basis (dict) – basis that was used to create electronic descriptors
datasets (dict) – datasets provided as {‘X_train’: np.ndarray, ‘X_test’: etc…}
mask (list of bool) – used to mask the features and filter out features with low variance
n_layers (int) – number of hidden layers, default = 3
nodes_per_layer (int) – nodes for each hidden layer, default = 8
b (float) – l2-regularization strenght, default = 0
-
evaluate(plot=False, on='test')[source]¶ Evaluate model performance
- Parameters
plot (bool) – plot correlation plots
on (str) – {‘test’,’train’,’valid’} which set to evaluat on
- Returns
containing rmse, mae and max. abs. error
- Return type
dict
-
learning_curve(steps=5)[source]¶ Create a learning curve by varying the training set size
- Parameters
steps (int) – how many different training set sizes to use
- Returns
{‘N’: training set size,’train’: training loss, ‘valid’: validation loss}
- Return type
dict,
-
load_all(net_dir)[source]¶ Load force MLCF from net_dir
- Parameters
net_dir (str) – path to directory containing MLCF
-
predict(feat, processed=False)[source]¶ Get predicted forces
- Parameters
feat (np.ndarray) – input features
processed (bool) – are features processed (scaled, masked)?
- Returns
predicted forces
- Return type
np.ndarray
-
predict_from_hdf5(path)[source]¶ Get force prediction but instead of providing features, give source path where features are found
- Parameters
path (str) – path to .hdf5 file containing features
- np.ndarray
force prediction
-
save_all(net_dir, override=False)[source]¶ Save force MLCF
- Parameters
net_dir (str) – directory to save mlcf to
override (bool) – if net_dir already contains model, allow to override? default = False
- Returns
- Return type
None
-
train(step_size=0.001, max_epochs=50001, b=0, early_stopping=False, batch_size=500, epochs_per_output=500, restart=False, tol_train=0, tol_valid=0)[source]¶ Train the model
- Parameters
step_size (float) – step size to take during gradient descent, default=0.001
max_epochs (int) – max. number of epochs to train, default=50001
b (float) – l2-regularization
early_stopping (bool) – use early stopping (interrupt training once valid loss increases), default=False
batch_size (int) – number of samples per batch, default=500
epochs_per_output (int) – only print overview every epochs_per_output steps, default=500
restart (bool) – restart training from beginning (reset network), default=False
tol_train (float) – stop training if relative value of training loss decreases by less than this value
tol_valid (float) – stop training if relative value of validation loss decreases by less than this value
- Returns
- Return type
None
-
mlc_func.ml.force_network.build_force_mlcf(feature_src, target_src, traj_src, species, mask=[], filters=[], automask_std=0, autofilt_percent=0, test_size=0.2, random_state=42)[source]¶ Return a trainable force MLCF (neural network)
- Parameters
feature_src (list) – list of paths to the hdf5 containing the features
target_src (list) – list of paths to the csv files containing the target forces entries in target_scr and feature_src correspond to each other
traj_src (list) – list of paths to the .traj/.xyz files (needed to determine species of each atom)
species (string) – containing the species that model should be fitted for
mask (list) – containing booleans; can be used to select which features to use. default: use all features
filters (list) – containing list of booleans; can be used to exclude datapoints in sets (e.g. outliers)
automask_std (float) – if mask not set exclude all features whose stdev across dataset is smaller than this value
autofilt_percent (float) – exclude this percentile of extreme datapoints from set (only if filters not set)
test_size (float) – relative size of hold_out (test) set
random_state (int) – state used to perform shuffle before spliting dataset
- Returns
- Return type
ensemble_network¶
-
class
mlc_func.ml.ensemble_network.Ensemble_Network(network, n=3)[source]¶ Ensemble Network to obtain confidence intervals for predictions
- Parameters
network (Network) – root network from which to create ensemble
n (int) – ensemble size
-
predict(feat, processed=False)[source]¶ Get mean prediction across ensemble
- Parameters
feat (np.ndarray) – input features
processed (bool) – are features processed (scaled, masked)?
- Returns
mean prediction
- Return type
np.ndarray
-
predict_from_hdf5(path)[source]¶ Get mean prediction across ensemble but instead of providing features, give source path where features are found
- Parameters
path (str) – path to .hdf5 file containing features
- Returns
mean prediction
- Return type
np.ndarray
-
save(net_dir, override=False)[source]¶ Save ensemble Network
- Parameters
net_dir (str) – directory where to save network
override (bool) – if network already exists allow to override? default = False
md¶
Defines ASE calculators that combine a baseline method (e.g. SIESTA) and the Machine learned correcting functional (MLCF)
calculator¶
-
class
mlc_func.md.calculator.MLCF_Calculator(base_calculator=None, feature_getter=None, log_accuracy=True)[source]¶ MLCF_Calculator that consists of a baseline calculator (base_calculator), e.g. Siesta, and a Machine learned correcting functional (MLCF).
- Parameters
base_calculator (ase.Calculator) – baseline method to get first approximation to forces
feature_getter (FeatureGetter) – read the electron density and transforms it into features (electronic fingerprints)
log_accuracy (bool) – whether to log energies, forces and features during MD simulation, default: True
-
set_base_calculator(base_calculator)[source]¶ Sets the baseline calculator :param base_calculator: any ASE calculator can be used (only tested for SiestaCalculator) :type base_calculator: ase.Calculator
-
set_feature_getter(feature_getter)[source]¶ Sets the feature_getter
- Parameters
feature_getter (FeatureGetter) –
-
class
mlc_func.md.calculator.Siesta_Calculator(basis='qz', xc='BH')[source]¶ Provides default options for the Siesta calculator, such as pre-defined custom basis sets and functionals. :param basis, str: {‘dz_custom’,’dz’,’qz_custom’,’sz’,’uf’,’szp’}, basis set to be used :param xc: {‘PBE’,’BH’,’PW92’,’revPBE’, …}, exchange-correlation functional :type xc: str
-
mlc_func.md.calculator.load_from_file(input_file)[source]¶ Read input_file that defines baseline calculator and MLCF model and return MLCF_Calculator
-
mlc_func.md.calculator.load_mlcf(model_path, client=None)[source]¶ Given a directory model_path, load and return a calculator that uses the MLCF contained in that directory. For parallel computing an ipyparallel client can be provided. The baseline calculator of the returned instance still has to be set before usage.
feature_io¶
-
class
mlc_func.md.feature_io.DescriptorGetter(basis, client=None, rhopath='./H2O.RHOXC')[source]¶ Reads the real space electron density and returns electronic descriptors
- Parameters
basis (dict) – dictionary defining the basis
client (ipyparallel.client) – for parallel processing
rhopath (str) – path under which the electron density can be found after every MD step
-
get_features(atoms)[source]¶ Return the electronic descriptos for a set of atoms, electron density is read from file specified in self.rhopath
- Parameters
ase.Atoms (atoms,) –
listintegrator¶
mixer¶
-
class
mlc_func.md.mixer.Mixer(fast_calculator, slow_calculator, n, correct_species='')[source]¶ ASE Calculator that uses the time step mixing method defined in E. Anglada, J. Junquera, and J. M. Soler, Physical Review E68, 055701 (2003)
- Parameters
fast_calculator (ase.Calculator) – fast, “quick and dirty” method
slow_calculator (ase.Calculator) – slow accurate method
n (int,) – Mixing parameter, correct with slow calculator after n steps
correct_species (str) – only apply correctiong to elements specified in this string. Example: ‘oh’ only corrects Oxygen and Hydrogen