Physical Property Data Sets¶
Warning
This text is now out of date, but will be updated in future to reflect the latest version of the framework.
A PhysicalPropertyDataset
is a collection of MeasuredPhysicalProperty
objects that are related in some way.
dataset = PhysicalPropertyDataset([measurement1, measurement2])
The dataset is iterable:
dataset = PhysicalPropertyDataset([measurement1, measurement2])
for measurement in dataset:
print measurement.value
and has accessors to retrieve DOIs and references associated with measurements in the dataset:
# Print the DOIs associated with this dataset
print(dataset.DOIs)
# Print the references associated with this dataset
print(dataset.references)
For convenience, you can retrieve the dataset as a pandas DataFrame:
dataset.to_pandas()
ThermoML datasets¶
A ThermoMLDataset
object represents a physical property dataset stored in the IUPAC-standard
ThermoML) for specifying thermodynamic properties in XML format.
ThermoMLDataset
is a subclass of PhysicalPropertyDataset
, and provides the same API interface (in addition to
some ThermoML-specfic methods).
Direct access to the NIST ThermoML Archive is supported for obtaining physical property measurements in this format directly from the NIST TRC repository.
For example, to retrieve the ThermoML dataset that
accompanies this paper, we can simply use the
DOI 10.1016/j.jct.2005.03.012
as a key for creating a PhysicalPropertyDataset
subclassed object from the
ThermoML Archive:
dataset = ThermoMLDataset(doi='10.1016/j.jct.2005.03.012')
You can also specify multiple ThermoML Archive keys to create a dataset from multiple ThermoML files:
thermoml_keys = ['10.1021/acs.jced.5b00365', '10.1021/acs.jced.5b00474']
dataset = ThermoMLDataset(doi=thermoml_keys)
It is also possible to specify ThermoML datasets housed at other locations, such as
dataset = ThermoMLDataset(url='http://openforcefieldgroup.org/thermoml-datasets')
or
dataset = ThermoMLDataset(url='file:///Users/choderaj/thermoml')
or
dataset = ThermoMLDataset(doi=['10.1021/acs.jced.5b00365', '10.1021/acs.jced.5b00474'],
url='http://openforcefieldgroup.org/thermoml-datasets')
or from ThermoML and a different URL:
dataset = ThermoMLDataset(doi=thermoml_keys)
dataset.retrieve(doi=local_keys, url='http://openforcefieldgroup.org/thermoml-datasets')
You can see which DOIs contribute to the current ThermoMLDataset
with the convenience functions:
print(dataset.DOIs)
NIST has compiled a JSON frame of corrections to uncertainties.
These can be used to update or correct data uncertainties and discard outliers using applyNISTUncertainties()
:
# Modify uncertainties according to NIST evaluation
dataset.apply_nist_uncertainties(nist_uncertainties, adjust_uncertainties=True, discard_outliers=True)
Todo
We should merge any other useful parts parts of the ThermoPyL API in here.
Other datasets¶
In future, we will add interfaces to other online datasets, such as
BindingDB for retrieving host-guest binding affinity datasets.