Physical Property Data Sets

Warning

This text is now out of date, but will be updated in future to reflect the latest version of the framework.

A PhysicalPropertyDataset is a collection of MeasuredPhysicalProperty objects that are related in some way.

dataset = PhysicalPropertyDataset([measurement1, measurement2])

The dataset is iterable:

dataset = PhysicalPropertyDataset([measurement1, measurement2])

for measurement in dataset:
    print measurement.value

and has accessors to retrieve DOIs and references associated with measurements in the dataset:

# Print the DOIs associated with this dataset
print(dataset.DOIs)

# Print the references associated with this dataset
print(dataset.references)

For convenience, you can retrieve the dataset as a pandas DataFrame:

dataset.to_pandas()

ThermoML datasets

A ThermoMLDataset object represents a physical property dataset stored in the IUPAC-standard ThermoML) for specifying thermodynamic properties in XML format. ThermoMLDataset is a subclass of PhysicalPropertyDataset, and provides the same API interface (in addition to some ThermoML-specfic methods).

Direct access to the NIST ThermoML Archive is supported for obtaining physical property measurements in this format directly from the NIST TRC repository.

For example, to retrieve the ThermoML dataset that accompanies this paper, we can simply use the DOI 10.1016/j.jct.2005.03.012 as a key for creating a PhysicalPropertyDataset subclassed object from the ThermoML Archive:

dataset = ThermoMLDataset(doi='10.1016/j.jct.2005.03.012')

You can also specify multiple ThermoML Archive keys to create a dataset from multiple ThermoML files:

thermoml_keys = ['10.1021/acs.jced.5b00365', '10.1021/acs.jced.5b00474']
dataset = ThermoMLDataset(doi=thermoml_keys)

It is also possible to specify ThermoML datasets housed at other locations, such as

dataset = ThermoMLDataset(url='http://openforcefieldgroup.org/thermoml-datasets')

or

dataset = ThermoMLDataset(url='file:///Users/choderaj/thermoml')

or

dataset = ThermoMLDataset(doi=['10.1021/acs.jced.5b00365', '10.1021/acs.jced.5b00474'],
                          url='http://openforcefieldgroup.org/thermoml-datasets')

or from ThermoML and a different URL:

dataset = ThermoMLDataset(doi=thermoml_keys)
dataset.retrieve(doi=local_keys, url='http://openforcefieldgroup.org/thermoml-datasets')

You can see which DOIs contribute to the current ThermoMLDataset with the convenience functions:

print(dataset.DOIs)

NIST has compiled a JSON frame of corrections to uncertainties.

These can be used to update or correct data uncertainties and discard outliers using applyNISTUncertainties():

# Modify uncertainties according to NIST evaluation
dataset.apply_nist_uncertainties(nist_uncertainties, adjust_uncertainties=True, discard_outliers=True)

Todo

  • We should merge any other useful parts parts of the ThermoPyL API in here.

Other datasets

In future, we will add interfaces to other online datasets, such as