Easily deploy Dask using MPI
The Dask-MPI project makes it easy to deploy Dask from within an existing MPI
environment, such as one created with the common MPI command-line launchers
mpiexec. Such environments are commonly found in high performance
supercomputers, academic research institutions, and other clusters where MPI
has already been installed.
Dask-MPI provides two convenient interfaces to launch Dask, either from within a batch script or directly from the command-line.
Batch Script Example¶
You can turn your batch Python script into an MPI executable
from dask_mpi import initialize initialize() from dask.distributed import Client client = Client() # Connect this local process to remote workers
This makes your Python script launchable directly with
mpirun -np 4 python my_client_script.py
This deploys the Dask scheduler and workers as well as the user’s Client process within a single cohesive MPI computation.
Command Line Example¶
Alternatively you can launch a Dask cluster directly from the command-line
dask-mpi command and specifying a scheduler file where Dask can
write connection information.
mpirun -np 4 dask-mpi --scheduler-file ~/dask-scheduler.json
You can then access this cluster either from a separate batch script or from an
interactive session (such as a Jupyter Notebook) by referencing the same scheduler
from dask.distributed import Client client = Client(scheduler_file='~/dask-scheduler.json')
Use Job Queuing System Directly¶
You can also use Dask Jobqueue to deploy Dask directly on a job queuing system like SLURM, SGE, PBS, LSF, Torque, or others. This can be especially nice when you want to dynamically scale your cluster during your computation, or for interactive use.