The MBAR Reweighting Layer
The ReweightingLayer
is a calculation layer which employs the Multistate Bennett Acceptance Ratio (MBAR) method to calculate observables at states which have
not been previously simulated, but for which simulations have been previously run at similar states and their data
cached. It inherits the WorkflowCalculationLayer
base layer, and primarily makes use of the built-in
workflow engine to perform the required calculations.
Because MBAR is a technique which reprocesses exisiting simulation data rather than re-running new simulations, it is typically several fold faster than the simulation layer provided it has cached simulation data (made accessible via a storage backend) available. Any properties for which the required data (see Calculation Schema) is not available will be skipped.
Theory
The theory behind applying MBAR to reweighting observables from a simulated state to an unsimulated state is covered in detail in the publication Configuration-Sampling-Based Surrogate Models for Rapid Parameterization of Non-Bonded Interactions.
Calculation Schema
The reweighting layer will be provided with one ReweightingSchema
per type of property that it is being requested to
estimate. It builds off of the base WorkflowCalculationSchema
schema providing an additional storage_queries
attribute.
The storage_queries
attribute will contain a dictionary of SimulationDataQuery
which will be used by the layer to
access the data required for each property from the storage backend. Each key in this dictionary will correspond to the
key of a piece of metadata made available to the property workflows.
Default Metadata
The reweighting layer makes available the default metadata provided by the parent workflow layer in addition to any cached data retrieved via the schemas storage_queries
.
When building the metadata for each property, a copy of the query will be made and any of the supported attributes
(currently only substance
) whose values are set as PlaceholderValue
objects will have their values updated
using values directly from the property. This query will then be passed to the storage backend to retrieve any matching
data.
The matching data will be stored as a list of tuples of the form:
(object_path, data_directory, force_field_path)
where object_path
is the file path to the stored dataclass, the data_directory
is the file path to the ancillary
data directory and force_field_path
is the file path to the force field parameters which were used to generate the
data originally.
This list of tuples will be made available as metadata under the key that was associated with the query.