Dask-MPI with Interactive JobsΒΆ

Dask-MPI can be used to easily launch an entire Dask cluster in an existing MPI environment, and attach a client to that cluster in an interactive session.

In this scenario, you would launch the Dask cluster using the Dask-MPI command-line interface (CLI) dask-mpi.

mpirun -np 4 dask-mpi --scheduler-file scheduler.json

In this example, the above code will use MPI to launch the Dask Scheduler on MPI rank 0 and Dask Workers (or Nannies) on all remaining MPI ranks.

It is advisable, as shown in the previous example, to use the --scheduler-file option when using the dask-mpi CLI. The --scheduler-file option saves the location of the Dask Scheduler to a file that can be referenced later in your interactive session. For example, the following code would create a Dask Client and connect it to the Scheduler using the scheduler JSON file.

from distributed import Client
client = Client(scheduler_file='/path/to/scheduler.json')

As long as your interactive session has access to the same filesystem where the scheduler JSON file is saved, this procedure will let you run your interactive session easily attach to your separate dask-mpi job.

After a Dask cluster has been created, the dask-mpi CLI can be used to add more workers to the cluster by using the --no-scheduler option.

mpirun -n 5 dask-mpi --scheduler-file scheduler.json --no-scheduler

In this example (above), 5 more workers will be created and they will be registered with the Scheduler (whose address is in the scheduler JSON file).


Running with a Job Scheduler

In High-Performance Computing environments, job schedulers, such as LSF, PBS, or SLURM, are commonly used to request the necessary resources needed for an MPI job, such as the number of CPU cores, the total memory needed, and/or the number of nodes over which to spread out the MPI job. In such a case, it is advisable that the user place the mpirun ... dask-mpi ... command in a job submission script, with the number of MPI ranks (e.g., -np 4) matches the number of cores requested from the job scheduler.


MPI Jobs and Dask Nannies

It is many times useful to launch your Dask-MPI cluster (using dask-mpi) with Dask Nannies (i.e., with the --nanny option), rather than strictly with Dask Workers. This is because the Dask Nannies can relaunch a worker when a failure occurs. However, in some MPI environments, Dask Nannies will not be able to work as expected. This is because some installations of MPI may restrict the number of actual running processes from exceeding the number of MPI ranks requested. When using Dask Nannies, the Nanny process is executed and runs in the background after forking a Worker process. Hence, one Worker process will exist for each Nanny process. Some MPI installations will kill any forked process, and you will see many errors arising from the Worker processes being killed. If this happens, disable the use of Nannies with the --no-nanny option to dask-mpi.

For more details on how to use the dask-mpi command, see the Command-Line Interface (CLI).