The framework implements a number of calculation backends which integrate with the dask distributed and job-queue libraries.

The DaskLocalCluster backend wraps around the dask LocalCluster class to distribute tasks on a single machine:

worker_resources = ComputeResources(
number_of_gpus=1,
preferred_gpu_toolkit=GPUToolkit.CUDA,
)

...


Its main purpose is for use when debugging calculations locally, or when running calculations on machines with large numbers of CPUs or GPUs.

The DaskLSFBackend and DaskPBSBackend backends wrap around the dask LSFCluster and PBSCluster classes respectively, and both inherit the BaseDaskJobQueueBackend class which implements the core of their functionality. They predominantly run in an adaptive mode, whereby the backend will automatically scale up or down the number of workers based on the current number of tasks that the backend is trying to execute.

These backends integrate with the queueing systems which most HPC cluster use to manage task execution. They work by submitting jobs into the queueing system which themselves spawn dask workers, which in turn then execute tasks on the available compute nodes:

# Create the object which describes the compute resources each worker should request from
# the queueing system.
worker_resources = QueueWorkerResources(
number_of_gpus=1,
preferred_gpu_toolkit=QueueWorkerResources.GPUToolkit.CUDA,
wallclock_time_limit="05:59",
)

# Create the backend object.
setup_script_commands = [
f"conda activate openff-evaluator",
]

minimum_number_of_workers=1,
maximum_number_of_workers=max_number_of_workers,
resources_per_worker=queue_resources,
queue_name="gpuqueue",
setup_script_commands=setup_script_commands,
)

with calculation_backend:
...


The setup_script_commands argument takes a list of commands which should be run by the queue job submission script before spawning the actual worker. This enables setting up custom environments, and setting any required environmental variables.

### Configuration¶

To ensure optimal behaviour we recommend changing / uncommenting the following settings in the dask distributed configuration file (this can be found at ~/.config/dask/distributed.yaml):

distributed:

worker:
daemon: False

comm:
timeouts:
connect: 10s
tcp: 30s

deploy:
lost-worker-timeout: 15s


See the dask documentation for more information about changing dask settings.