The MBAR Reweighting Layer

The ReweightingLayer is a calculation layer which employs the Multistate Bennett Acceptance Ratio (MBAR) method to calculate observables at states which have not been previously simulated, but for which simulations have been previously run at similar states and their data cached. It inherits the WorkflowCalculationLayer base layer, and primarily makes use of the built-in workflow engine to perform the required calculations.

Because MBAR is a technique which reprocesses exisiting simulation data rather than re-running new simulations, it is typically several fold faster than the simulation layer provided it has cached simulation data (made accessible via a storage backend) available. Any properties for which the required data (see Calculation Schema) is not available will be skipped.

Theory

The theory behind applying MBAR to reweighting observables from a simulated state to an unsimulated state is covered in detail in the publication Configuration-Sampling-Based Surrogate Models for Rapid Parameterization of Non-Bonded Interactions.

Calculation Schema

The reweighting layer will be provided with one ReweightingSchema per type of property that it is being requested to estimate. It builds off of the base WorkflowCalculationSchema schema providing an additional storage_queries attribute.

The storage_queries attribute will contain a dictionary of SimulationDataQuery which will be used by the layer to access the data required for each property from the storage backend. Each key in this dictionary will correspond to the key of a piece of metadata made available to the property workflows.

Default Metadata

The reweighting layer makes available the default metadata provided by the parent workflow layer in addition to any cached data retrieved via the schemas storage_queries.

When building the metadata for each property, a copy of the query will be made and any of the supported attributes (currently only substance) whose values are set as PlaceholderValue objects will have their values updated using values directly from the property. This query will then be passed to the storage backend to retrieve any matching data.

The matching data will be stored as a list of tuples of the form:

(object_path, data_directory, force_field_path)

where object_path is the file path to the stored dataclass, the data_directory is the file path to the ancillary data directory and force_field_path is the file path to the force field parameters which were used to generate the data originally.

This list of tuples will be made available as metadata under the key that was associated with the query.