Hp XC System 2.x Software Uživatelský manuál Strana 71

  • Stažení
  • Přidat do mých příruček
  • Tisk
  • Strana
    / 154
  • Tabulka s obsahem
  • KNIHY
  • Hodnocené. / 5. Na základě hodnocení zákazníků
Zobrazit stránku 70
6
Using SLURM
6.1 Introduction
HP XC uses th e Simple Linux Utility for Resource Management (SLURM) for system resource
management and job scheduling. SLURM is a reliable, efficient, open source, fault-tolerant,
job and comp ute resource manager with features that make it suitable for large-scale, high
performance computing environments. SLURM can report on machine status, perform partition
management, job management, and job scheduling.
The SLURM R eference Manual is available on the HP XC Documentation CD-ROM and from
the following Web site: http://www.llnl.gov/LCdocs/slurm/.
As a sy stem resource manager, SLURM has the following key f unction s:
Allocate exclusive and/or non-exclusive access to resources (compute nodes) to users for
some duration of time so t hey can perform work
Provide a fra mework for starting, executing, and monitoring work (normally a par allel
job) on the set of allocated nodes
Arbitrate conflicting requests fo r resources by managing a queue of pending work
Section 1.4.3 describes the interaction between SLURM and LSF.
6.2 SLURM Commands
Users interact with SLURM thro ugh its command line utilities. S LURM has the following basic
commands: srun, scancel, squeue, sinfo,andscontrol, which can run on any
node in the HP XC system. These commands are sum marized in Table 6-1 and described
in the followin g sections.
Ta ble 6-1: SLURM Commands
Command
Function
srun
Submits job
s to run under SLURM management. srun is used to submit a job for
execution,
allocate resources, attach to an existing allocation, or initiate job steps.
srun can:
Submit a batch job and then terminate
Submit an interactive job and then persist to shepherd the job as it runs
Allocate re
sources to a shell and then s pawn that shell for use in running
subordinat
e jobs
squeue
Displays th
e queue of running and w aiting jobs (or " job steps"), including the J obID
used for sca
ncel), and the nodes assigned to each running job. It has a wide variety
of filteri
ng, sorting, and formatting options. By default, it reports the running jobs in
priority o
rder and then the pending jobs in priority order.
scancel
Cancels a p
ending or running job or job step. It can also be used to send a specified
signal to a
ll processes on all nodes associated with a job. Only job owners or
administr
ators can cancel jobs.
Using SLURM 6-1
Zobrazit stránku 70
1 2 ... 66 67 68 69 70 71 72 73 74 75 76 ... 153 154

Komentáře k této Příručce

Žádné komentáře