slurm_cluster_setup

acme.slurm_cluster_setup(partition: str = 'partition_name', n_cores: int = 1, n_workers: int = 1, processes_per_worker: int = 1, mem_per_worker: str = '1GB', n_workers_startup: int = 1, timeout: int = 60, interactive: bool = True, interactive_wait: int = 10, start_client: bool = True, job_extra: List = [], invalid_partitions: List = [], **kwargs: Optional[Any]) Optional[Union[Client, SLURMCluster]][source]

Start a distributed Dask cluster of parallel processing workers using SLURM

NOTE If you are working on the ESI HPC cluster, please use esi_cluster_setup() instead!

Parameters:
  • partition (str) – Name of SLURM partition/queue to use

  • n_cores (int) – Number of CPU cores per SLURM worker

  • n_workers (int) – Number of SLURM workers (=jobs) to spawn

  • processes_per_worker (int) – Number of processes to use per SLURM job (=worker). Should be greater than one only if the chosen partition contains nodes that expose multiple cores per job.

  • mem_per_worker (str) – Memory allocation for each worker

  • n_workers_startup (int) – Number of spawned SLURM workers to wait for. The code does not return until either n_workers_startup SLURM jobs are running or the timeout interval (see below) has been exceeded.

  • timeout (int) – Number of seconds to wait for requested workers to start (see n_workers_startup).

  • interactive (bool) – If True, user input is queried in case not enough workers (set by n_workers_startup) could be started in the provided waiting period (determined by timeout). The code waits interactive_wait seconds for a user choice - if none is provided, it continues with the current number of running workers (if greater than zero). If interactive is False and no worker could not be started within timeout seconds, a TimeoutError is raised.

  • interactive_wait (int) – Countdown interval (seconds) to wait for a user response in case fewer than n_workers_startup workers could be started. If no choice is provided within the given time, the code automatically proceeds with the current number of active dask workers.

  • start_client (bool) – If True, a distributed computing client is launched and attached to the dask worker cluster. If start_client is False, only a distributed computing cluster is started to which compute-clients can connect.

  • job_extra (list) – Extra sbatch parameters to pass to SLURMCluster.

  • invalid_partition (list) – List of partition names (strings) that are not available for launching dask workers.

Returns:

proc – A distributed computing client (if start_client = True) or a distributed computing cluster (otherwise). If no SLURM workers can be started within the given timeout interval, proc is set to None.

Return type:

object or None

See also

dask_jobqueue.SLURMCluster

launch a dask cluster of SLURM workers

esi_cluster_setup

start a SLURM worker cluster on the ESI HPC infrastructure

local_cluster_setup

start a local Dask multi-processing cluster on the host machine

cluster_cleanup

remove dangling parallel processing worker-clusters