Run Applications

Query the installed software

In order to run an application you have to load its module. You can query the installed software by.

module avail

which will give you a list like this

-------------------------------------------------------------------------------------------------------------------- /usr/share/Modules/modulefiles ---------------------------------------------------------------------------------------------------------------------
dot         module-git  module-info modules     null        use.own

--------------------------------------------------------------------------------------------------------------------------- /etc/modulefiles ----------------------------------------------------------------------------------------------------------------------------
arcanist/06-2015                     cuda/7.0.28                          intelcomp/2015.3.187                 libphutil/880c0fb344                 openmpi/1.10.0/gcc/4.8.2             petsc/3.5.4/openmpi/1.10.0/gcc/4.8.2 ruby/1.9.3-p547
boost/1.59.0                         gcc/5.2.0                            intelmpi/5.1.1.109                   libyaml/0.1.5                        openmpi/1.10.0/gcc/5.2.0             pgi/15.7                             slurm/14.11.8
cppcheck/1.69                        gtest/1.6.0                          knem/1.1.2                           mkl/2015.3.187                       openmpi/1.10.0/icc/2015.3.187        poco/1.4.7                           superlu/4.3
cuda/6.5.14                          hdf5_ompi1100_gcc520/1.8.15-p1       libphutil/06-2015                    munge/0.5.11                         petsc/3.5.4/boost/1.59.0             rrdtool/1.5.3

After you found what you need, simply load the module by:

module load gcc/5.2.0

Now you can run all commands that blong to this software package.

gcc -c test.c -o test.exe

To see which modules you have already loaded type module list. To remove a loaded module use module unload <module-file>. You can find more information in the internal manual by typing man module.

Submitting something to the cluster

Submitting something to the cluster

Up to this point you never used more than the "frontend" node which is meant to be used for compiling applications or developing things.

If you want to use the resources of the whole cluster you need to use the scheduler.

In this cluster the scheduler is called "SLURM" (Simple Linux Utility for Resource Management). In order to use it you have to load its module

module load slurm
Query Information from SLURM

Query Information from SLURM

Now you can run some commands like

$ sinfo 
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*         up   infinite      1  down* node041
all*         up   infinite      8  alloc gpu[03-10]
all*         up   infinite     79   idle gpu[01-02],node[001-040,042-078]
compute      up   infinite      1  down* node041
compute      up   infinite     77   idle node[001-040,042-078]
gpu          up   infinite      8  alloc gpu[03-10]
gpu          up   infinite      2   idle gpu[01-02]

The output tells you, which systems are online, idle or down. In the shown example the system called "node041" is down, the nodes "gpu03"-"gpu10" are allocated and the nodes "node[001-040,042-078]" are in idle state.

The command

$ squeue

returns some information about the jobs that are running right now, or which are planned to run:

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               777   compute ep_12.sh   geistn PD       0:00     77 (BeginTime)
               776       gpu ep_10.sh   geistn  R    4:52:32      8 gpu[03-10]

In this example Job 777 will start to run at a specific time and is therefore pending (PD) while Job 776 is running right now (R) since 4 hours and 52 minutes on 8 gpu-systems.

Writing a submit file

Writing a submit file

In order to run your own program, open an editor of choice (vim/nano) and write a file with the commands necessary to start your calculations.

Keep in mind that you have to load the necessary modules in this submit file.

Each file starts with #!/path/to/the/interpreter/ (almost always /bin/bash). After this line (next line) the arguments to sbatch can be listed. You can get the arguments for sbatch from the web.

The things you will need the most are

  • N: <Number-Of-Systems>
  • ...

This submit file will start your application on ten systems:

#!/bin/bash
#SBATCH -J my_job                            # job name
#SBATCH -N 10                                  # number of nodes
#SBATCH -n 10                                   # number of MPI processes, here 1 per node
#SBATCH --partition=compute         # choose nodes from partition
#SBATCH -o %j.out                            # stdout file name (%j: job ID)
#SBATCH -e %j.err                             # stderr file name (%j: job ID)
#SBATCH -t 24:00:00                        # max run time (hh:mm:ss), max 72h!
#SBATCH --mail-type=end
#SBATCH --mail-user=BRAIN_USER@uni-greifswald.de

## optional environment variables
echo "On which nodes it executes:"
echo $SLURM_JOB_NODELIST
echo " "
echo "jobname: $SLURM_JOB_NAME"

## load modules
module load gcc/5.2.0   # for instance

./my_application my_arguments

In order to tell the scheduler to run this you need to call sbatch with you script as the first argument:

sbatch my_submit_script.sh

You will get an answer from sbatch with a job id. Remember this id to find your application in squeues output.

The output which your application would normally produce will be redirected to a file called slurm-<jobid>.out.