slurm_cluster_setup
- acme.slurm_cluster_setup(partition: str = 'partition_name', n_cores: int = 1, n_workers: int = 1, processes_per_worker: int = 1, mem_per_worker: str = '1GB', n_workers_startup: int = 1, timeout: int = 60, interactive: bool = True, interactive_wait: int = 10, start_client: bool = True, job_extra: List = [], invalid_partitions: List = [], **kwargs: Optional[Any]) Optional[Union[Client, SLURMCluster]] [source]
Start a distributed Dask cluster of parallel processing workers using SLURM
NOTE If you are working on the ESI HPC cluster, please use
esi_cluster_setup()
instead!- Parameters:
partition (str) – Name of SLURM partition/queue to use
n_cores (int) – Number of CPU cores per SLURM worker
n_workers (int) – Number of SLURM workers (=jobs) to spawn
processes_per_worker (int) – Number of processes to use per SLURM job (=worker). Should be greater than one only if the chosen partition contains nodes that expose multiple cores per job.
mem_per_worker (str) – Memory allocation for each worker
n_workers_startup (int) – Number of spawned SLURM workers to wait for. The code does not return until either n_workers_startup SLURM jobs are running or the timeout interval (see below) has been exceeded.
timeout (int) – Number of seconds to wait for requested workers to start (see n_workers_startup).
interactive (bool) – If True, user input is queried in case not enough workers (set by n_workers_startup) could be started in the provided waiting period (determined by timeout). The code waits interactive_wait seconds for a user choice - if none is provided, it continues with the current number of running workers (if greater than zero). If interactive is False and no worker could not be started within timeout seconds, a TimeoutError is raised.
interactive_wait (int) – Countdown interval (seconds) to wait for a user response in case fewer than n_workers_startup workers could be started. If no choice is provided within the given time, the code automatically proceeds with the current number of active dask workers.
start_client (bool) – If True, a distributed computing client is launched and attached to the dask worker cluster. If start_client is False, only a distributed computing cluster is started to which compute-clients can connect.
job_extra (list) – Extra sbatch parameters to pass to SLURMCluster.
invalid_partition (list) – List of partition names (strings) that are not available for launching dask workers.
- Returns:
proc – A distributed computing client (if
start_client = True
) or a distributed computing cluster (otherwise). If no SLURM workers can be started within the given timeout interval, proc is set to None.- Return type:
object or None
See also
dask_jobqueue.SLURMCluster
launch a dask cluster of SLURM workers
esi_cluster_setup
start a SLURM worker cluster on the ESI HPC infrastructure
local_cluster_setup
start a local Dask multi-processing cluster on the host machine
cluster_cleanup
remove dangling parallel processing worker-clusters