slurm_cluster_setup

acme.slurm_cluster_setup(partition: str = 'partition_name', n_cores: int = 1, n_workers: int = 1, processes_per_worker: int = 1, mem_per_worker: str | None = '1GB', n_workers_startup: int = 1, timeout: int = 60, interactive: bool = True, interactive_wait: int = 10, start_client: bool = True, job_extra: List = [], worker_extra_args: List[str] | None = None, scheduler_options: Dict | None = None, avail_partitions: List = [], invalid_partitions: List = [], mem_cushion: int = 100, **kwargs: Any | None) → Client | SLURMCluster | None[source]

Start a distributed Dask cluster of parallel processing workers using SLURM

NOTE If you are working on the ESI or CoBIC HPC cluster, please use esi_cluster_setup() or bic_cluster_setup() instead!

Parameters:

partition (str) – Name of SLURM partition/queue to use
n_cores (int) – Number of CPU cores per SLURM worker
n_workers (int) – Number of SLURM workers (=jobs) to spawn
processes_per_worker (int) – Number of processes to use per SLURM job (=worker). Should be greater than one only if the chosen partition contains nodes that expose multiple cores per job.
mem_per_worker (str or None) – Memory allocation for each worker. If None, partition’s DefMemPerCPU is queried.
n_workers_startup (int) – Number of spawned SLURM workers to wait for. The code does not return until either n_workers_startup SLURM jobs are running or the timeout interval (see below) has been exceeded.
timeout (int) – Number of seconds to wait for requested workers to start (see n_workers_startup).
interactive (bool) – If True, user input is queried in case not enough workers (set by n_workers_startup) could be started in the provided waiting period (determined by timeout). The code waits interactive_wait seconds for a user choice - if none is provided, it continues with the current number of running workers (if greater than zero). If interactive is False and no worker could be started within timeout seconds, a TimeoutError is raised.
interactive_wait (int) – Countdown interval (seconds) to wait for a user response in case fewer than n_workers_startup workers could be started. If no choice is provided within the given time, the code automatically proceeds with the current number of active dask workers.
start_client (bool) – If True, a distributed computing client is launched and attached to the dask worker cluster. If start_client is False, only a distributed computing cluster is started to which compute-clients can connect.
job_extra (list) – Extra sbatch parameters to pass to SLURMCluster.
worker_extra_args (list or None) – Additional arguments to be passed to distributed.Worker
scheduler_options (dict or None) – Additional arguments to be passed to distributed.Scheduler
avail_partition (list) – List of valid partition names (strings) that are available for launching dask workers. If not provided, partitions are fetched at runtime using sinfo
invalid_partition (list) – List of partition names (strings) that are not available for launching dask workers.
mem_cushion (int) – Amount of memory to “withhold” from mem_per_worker to stay clear of partition limits (either imposed via QoS or MaxMemPerCPU)

Returns:

proc – A distributed computing client (if start_client = True) or a distributed computing cluster (otherwise). If no SLURM workers can be started within the given timeout interval, proc is set to None.

Return type:

object or None