Tutorial 03 - Analysing Data Sets

Open In Colab

In this tutorial we will be analysing the results of the calculations which we performed in the second tutorial. The tutorial will cover:

  • comparing the estimated data set with the experimental data set.

  • plotting the two data sets.

Note: If you are running this tutorial in google colab you will need to run a setup script instead of following the installation instructions:

[ ]:
# !wget https://raw.githubusercontent.com/openforcefield/openff-evaluator/main/docs/tutorials/colab_setup.ipynb
# %run colab_setup.ipynb

For the sake of clarity all warnings will be disabled in this tutorial:

[ ]:
import warnings

import logging


Loading the Data Sets

We will begin by loading both the experimental data set and the estimated data set:

[ ]:
import pathlib

from openff.evaluator.datasets import PhysicalPropertyDataSet

experimental_data_set_path = "filtered_data_set.json"
estimated_data_set_path = "estimated_data_set.json"

# If you have not yet completed the previous tutorials or do not have the data set files
# available, this tutorial will use copies provided by the framework

if not (
    and pathlib.Path(estimated_data_set_path).exists()
    from openff.evaluator.utils import get_data_filename

    experimental_data_set_path = get_data_filename(
    estimated_data_set_path = get_data_filename(

experimental_data_set = PhysicalPropertyDataSet.from_json(experimental_data_set_path)
estimated_data_set = PhysicalPropertyDataSet.from_json(estimated_data_set_path)

if everything went well from the previous tutorials, these data sets will contain the density and \(H_{vap}\) of ethanol and isopropanol:

[ ]:
[ ]:

Extracting the Results

We will now compare how the value of each property estimated by simulation deviates from the experimental measurement.

To do this we will extract a list which contains pairs of experimental and evaluated properties. We can easily match properties based on the unique ids which were automatically assigned to them on their creation:

[ ]:
properties_by_type = {"Density": [], "EnthalpyOfVaporization": []}

for experimental_property in experimental_data_set:
    # Find the estimated property which has the same id as the
    # experimental property.
    estimated_property = next(
        x for x in estimated_data_set if x.id == experimental_property.id

    # Add this pair of properties to the list of pairs
    property_type = experimental_property.__class__.__name__
        (experimental_property, estimated_property)

Plotting the Results

We will now compare the experimental results to the estimated ones by plotting them using matplotlib:

[ ]:
from matplotlib import pyplot

# Create the figure we will plot to.
figure, axes = pyplot.subplots(nrows=1, ncols=2, figsize=(8.0, 4.0))

# Set the axis titles
axes[0].set_xlabel("OpenFF 1.0.0")
axes[0].set_title("Density $kg m^{-3}$")

axes[1].set_xlabel("OpenFF 1.0.0")
axes[1].set_title("$H_{vap}$ $kJ mol^{-1}$")

# Define the preferred units of the properties
from openff.units import unit

preferred_units = {
    "Density": unit.kilogram / unit.meter**3,
    "EnthalpyOfVaporization": unit.kilojoule / unit.mole,

for index, property_type in enumerate(properties_by_type):
    experimental_values = []
    estimated_values = []

    preferred_unit = preferred_units[property_type]

    # Convert the values of our properties to the preferred units.
    for experimental_property, estimated_property in properties_by_type[property_type]:

        estimated_values, experimental_values, marker="x", linestyle="None"


And that concludes the third tutorial!

If you have any questions and / or feedback, please open an issue on the GitHub issue tracker.