You are here: Home ATLAS-BFG Slurm

Slurm

Submitting Jobs

Jobs are submitted to the slurm batch system by issuing the command

sbatch <SCRIPT>

where <SCRIPT> is a shell script which can contain additional parameters in the header to configure the job. All jobs are constrained to the requested amount of time, CPUs and memory.

Memory Limit

The default memory limits are set on purpose comparatively low. To find an appropriate limit for your job first submit one job requesting a large memory limit. You can check afterwards the actual memory usage of the finished job with the command

sacct -o MaxRSS -j JOBID

Walltime Limit

As for the memory limit the default walltime limit is also set to a quite short time. Please check in advance how long the job will run and set the time accordingly.

Example: Single-Core Job

The following script describes a simple job requesting one CPU core and 1GB of memory with a running time of 15 hours:

#!/bin/bash

 

#SBATCH -o /home/myUID/output.%j.%N.log

The standard output and error is written to the file specified. The file should not be located on our Lustre file system

#SBATCH -p medium

Run the job in the partition (queue) medium

#SBATCH -n 1

Number of cores

#SBATCH -t 15:00:00

Running time in hours

#SBATCH --mem-per-cpu=1024

Request 1GB of memory per core

#SBATCH --get-user-env

Use environment variables of interactive login

echo "I'm running on node $HOSTNAME"

The actual script to be executed

All options provided in the submission script can also be provided directly as parameters to sbatch

Examples: Multi-Core Job on One Node

The following sbatch options allow to submit a job requesting 1 task with 4 cores on one node. The overall requested memory on the node is 4GB:

sbatch -n 1 --cpus-per-task 4 --mem=4000 <SCRIPT>

The following sbatch options allow to submit a job requesting 4 tasks each with 1 core on one node. The overall requested memory on the node is 4GB:

sbatch -n 4 --mem=4000 <SCRIPT>

The following option allows to avoid the use of in-core multi-threading. The command advices Slurm to only allocate one thread from each core to the job:

sbatch -n 4 --mem=4000 --hint=nomultithread <SCRIPT>

Job Scheduling

Job priorities are calculated using information about fairshare and the length of time a job is waiting in the queue. The most important factor is the fairshare. A detailed description of how the fairshare priority is calculated can be found here.

The longer your job is waiting for execution in the queue the higher its priority will grow. To check your current fairshare status you can user the following command

sshare -u <UID>

To check your current job priority use the command

sprio -j <JOBID>

which will provide you with some details to the calculation of your job priority.

Job Arrays

Job arrays can be used to submit and manage a large amount of jobs with similar settings.

Here is an example for submitting a job array with 5 individual jobs:

sbatch --array=0-4 job.sh

The ID of the individual job can be accessed via the $SLURM_ARRAY_TASK_ID variable. The minimum value of job array IDs is 0 and the maximum is set to 999. This means that the maximum length of a job array is 1000.

More information about job arrays can be found here.

Overview of Slurm commands

The following table gives an overview of the available Slurm commands:

Task Slurm Command
Job submission

sbatch <SCRIPT>

Submit a job for execution or initiate job steps in real time

srun

Cancel pending or running jobs

scancel <JOBID>

List queued jobs

squeue

Job status / Details

scontrol show job <JOBID>

Job status by user

squeue -u <UID>

List queues (partitions)

sinfo

List queue configuration details

scontrol show partition

List nodes

sinfo -N

 

scontrol show nodes

Gui

smap (shell)

 

sview (gui)

Further Documentation

Detailed documentation is available from the Slurm web page: