Dask-MPI with Interactive Jobs
Dask-MPI with Interactive JobsΒΆ
Dask-MPI can be used to easily launch an entire Dask cluster in an existing MPI environment, and attach a client to that cluster in an interactive session.
In this scenario, you would launch the Dask cluster using the Dask-MPI command-line interface
(CLI) dask-mpi
.
mpirun -np 4 dask-mpi --scheduler-file scheduler.json
In this example, the above code will use MPI to launch the Dask Scheduler on MPI rank 0 and Dask Workers (or Nannies) on all remaining MPI ranks.
It is advisable, as shown in the previous example, to use the --scheduler-file
option when
using the dask-mpi
CLI. The --scheduler-file
option saves the location of the Dask
Scheduler to a file that can be referenced later in your interactive session. For example,
the following code would create a Dask Client and connect it to the Scheduler using the
scheduler JSON file.
from distributed import Client
client = Client(scheduler_file='/path/to/scheduler.json')
As long as your interactive session has access to the same filesystem where the scheduler JSON
file is saved, this procedure will let you run your interactive session easily attach to your
separate dask-mpi
job.
After a Dask cluster has been created, the dask-mpi
CLI can be used to add more workers to
the cluster by using the --no-scheduler
option.
mpirun -n 5 dask-mpi --scheduler-file scheduler.json --no-scheduler
In this example (above), 5 more workers will be created and they will be registered with the Scheduler (whose address is in the scheduler JSON file).
Tip
Running with a Job Scheduler
In High-Performance Computing environments, job schedulers, such as LSF, PBS, or SLURM, are
commonly used to request the necessary resources needed for an MPI job, such as the number
of CPU cores, the total memory needed, and/or the number of nodes over which to spread out
the MPI job. In such a case, it is advisable that the user place the mpirun ... dask-mpi ...
command in a job submission script, with the number of MPI ranks (e.g., -np 4
) matches the
number of cores requested from the job scheduler.
Warning
MPI Jobs and Dask Nannies
It is many times useful to launch your Dask-MPI cluster (using dask-mpi
) with Dask Nannies
(i.e., with the --worker-class distributed.Nanny
option), rather than strictly with Dask Workers.
This is because the Dask Nannies can relaunch a worker when a failure occurs. However, in some MPI
environments, Dask Nannies will not be able to work as expected. This is because some installations
of MPI may restrict the number of actual running processes from exceeding the number of MPI ranks
requested. When using Dask Nannies, the Nanny process is executed and runs in the background
after forking a Worker process. Hence, one Worker process will exist for each Nanny process.
Some MPI installations will kill any forked process, and you will see many errors arising from
the Worker processes being killed. If this happens, disable the use of Nannies with the
--worker-class distributed.Worker
option to dask-mpi
.
For more details on how to use the dask-mpi
command, see the Command-Line Interface (CLI).