local_cluster_setup

acme.local_cluster_setup(n_workers: int | None = None, mem_per_worker: str | None = None, interactive: bool = True) → Client | None[source]

Start a local distributed Dask multi-processing cluster

Parameters:

n_workers (int) – Number of local workers to start (this should align with the locally available hardware, see distributed.LocalCluster for details)
mem_per_worker (str) – Memory cap for each local worker (corresponds to the memory_limit keyword of a distributed.worker.Worker)
interactive (bool) – If True, a confirmation dialog is displayed to ensure proper encapsulation of calls to local_cluster_setup inside a script’s main module block. See Notes for details. If interactive is False, the dialog is not shown.

Returns:

client – A distributed computing client. If a client cannot be started, proc is set to None.

Return type:

distributed.Client or None

Notes

The way Python spawns new processes requires an explicit separation of initialization code (i.e., code blocks that should only be executed once) from the actual program code. Specifically, everything that is supposed to be invoked only once by the parent spawner must be encapsulated in a script’s main block. Otherwise, initialization code is not only run once by the parent process but executed by every child process at import time.

This means that starting a local multi-processing cluster has to be wrapped inside a script’s main module block, otherwise, every child process created by the multi-processing cluster starts a multi-processing cluster itself and so on escalating to an infinite recursion. Thus, if local_cluster_setup is called inside a script, it has to be encapsulated in the script’s main module block, i.e.,

if __name__ == "__main__":
    ...
    local_cluster_setup()
    ...

Note that this capsulation is only required inside Python scripts. Launching local_cluster_setup in the (i)Python shell or inside a Jupyter notebook does not suffer from this problem. A more in-depth technical discussion of this limitation can be found in Dask’s GitHub issue tracker.

Examples

The following command launches a local distributed computing cluster using all CPU cores available on the host machine

>>> client = local_cluster_setup()

The underlying distributed computing cluster can be accessed using

>>> client.cluster