Dask-MPI
Contents
Dask-MPI¶
Easily deploy Dask using MPI
The Dask-MPI project makes it easy to deploy Dask from within an existing MPI
environment, such as one created with the common MPI command-line launchers
mpirun
or mpiexec
. Such environments are commonly found in high performance
supercomputers, academic research institutions, and other clusters where MPI
has already been installed.
Dask-MPI provides two convenient interfaces to launch Dask, either from within a batch script or directly from the command-line.
Batch Script Example¶
You can turn your batch Python script into an MPI executable
with the dask_mpi.initialize
function.
from dask_mpi import initialize
initialize()
from dask.distributed import Client
client = Client() # Connect this local process to remote workers
This makes your Python script launchable directly with mpirun
or mpiexec
.
mpirun -np 4 python my_client_script.py
This deploys the Dask scheduler and workers as well as the user’s Client process within a single cohesive MPI computation.
Command Line Example¶
Alternatively you can launch a Dask cluster directly from the command-line
using the dask-mpi
command and specifying a scheduler file where Dask can
write connection information.
mpirun -np 4 dask-mpi --scheduler-file ~/dask-scheduler.json
You can then access this cluster either from a separate batch script or from an
interactive session (such as a Jupyter Notebook) by referencing the same scheduler
file that dask-mpi
created.
from dask.distributed import Client
client = Client(scheduler_file='~/dask-scheduler.json')
Use Job Queuing System Directly¶
You can also use Dask Jobqueue to deploy Dask directly on a job queuing system like SLURM, SGE, PBS, LSF, Torque, or others. This can be especially nice when you want to dynamically scale your cluster during your computation, or for interactive use.