
The srun command, used by the mpirun command to launch the MPI tasks in parallel,
determines the number of tasks to lau nch from the SLURM_NPROCS environment variable that
was set by LSF-HPC. Recall that the value of this environment variable is equivalent to the
number provided by the -n option of the bsub command.
Consider an HP XC system configuration in w hich lsfhost.localdomain is the LSF
execution host and nodes n[1-10] are compute nodes in the lsf partition. All n odes contain
2 processors, providing 20 processors for use by LSF jobs.
Example 7-6 runs a hello_world MPI program on four processors.
Example 7-6: Submitting an HP-MPI Job
$ bsub -n4 -I mpirun -srun ./hello_world
Job <75> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
Hello world! I’m 0 of 4 on n2
Hello world! I’m 1 of 4 on n2
Hello world! I’m 2 of 4 on n4
Hello world! I’m 3 of 4 on n4
Example 7-7 runs the same hello_world MPI program on four processors, but u ses the
external SLURM scheduler to request one task per node.
Example 7-7: Submitting an HP-MPI Job with a Specific Topology Request
$ bsub -n4 -ext "SLURM[nodes=4]" -I mpirun -srun ./hello_world
Job <77> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
Hello world! I’m 0 of 4 on n1
Hello world! I’m 1 of 4 on n2
Hello world! I’m 2 of 4 on n3
Hello world! I’m 3 of 4 on n4
If the MPI job requires the use of an appfile, or has another reason that pro hib its the use of
the srun comm and as the task launcher, some preprocessing to determine the node hostnames
to which mpirun’s standard task launcher should launch the tasks needs to be done. In such
scenarios, you need to write a batch script; there are several methods available for determinin g
the n odes in an allocation. One is using the SLURM_JOBID environment variable with the
squeue command to query the nodes. Another is using LSF environment variables such as
LSB_HOSTS and LSB_MCPU_HOSTS, which are prepared by the HP XC job starter script.
7.4.6 Submitting a Batch Job or Job Script
The bsub command format to submit a batch job or job script is:
bsub -n num-procs [bsub-options] script-name
The -n nu m-p rocs parameter specifies the number o f processo rs the job requests. -n num-procs
is required for parallel jobs. script-name is the name of the batch job o r script. A ny bsub
options can be included. The script can contain one or more srun or mpirun commands
and optio ns.
The script will be executed once on the first allocated node, and any srun or mpirun
commands within the script can use some o r all of the allocated compute nodes.
7-14 Using LSF
Komentáře k této Příručce