Run Applications
In order to run an application you have to load its module. You can query the installed software by.
module avail
which will give you a list like this
-------------------------------------------------------------------------------------------------------------------- /usr/share/Modules/modulefiles ---------------------------------------------------------------------------------------------------------------------
dot module-git module-info modules null use.own
--------------------------------------------------------------------------------------------------------------------------- /etc/modulefiles ----------------------------------------------------------------------------------------------------------------------------
arcanist/06-2015 cuda/7.0.28 intelcomp/2015.3.187 libphutil/880c0fb344 openmpi/1.10.0/gcc/4.8.2 petsc/3.5.4/openmpi/1.10.0/gcc/4.8.2 ruby/1.9.3-p547
boost/1.59.0 gcc/5.2.0 intelmpi/5.1.1.109 libyaml/0.1.5 openmpi/1.10.0/gcc/5.2.0 pgi/15.7 slurm/14.11.8
cppcheck/1.69 gtest/1.6.0 knem/1.1.2 mkl/2015.3.187 openmpi/1.10.0/icc/2015.3.187 poco/1.4.7 superlu/4.3
cuda/6.5.14 hdf5_ompi1100_gcc520/1.8.15-p1 libphutil/06-2015 munge/0.5.11 petsc/3.5.4/boost/1.59.0 rrdtool/1.5.3
After you found what you need, simply load the module by:
module load gcc/5.2.0
Now you can run all commands that blong to this software package.
gcc -c test.c -o test.exe
To see which modules you have already loaded type module list. To remove a loaded module use module unload <module-file>. You can find more information in the internal manual by typing man module.
Submitting something to the cluster
Up to this point you never used more than the "frontend" node which is meant to be used for compiling applications or developing things.
If you want to use the resources of the whole cluster you need to use the scheduler.
In this cluster the scheduler is called "SLURM" (Simple Linux Utility for Resource Management).
Query Information from SLURM
Now you can run some commands like
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
all* up infinite 1 down* node041
all* up infinite 8 alloc gpu[03-10]
all* up infinite 79 idle gpu[01-02],node[001-040,042-078]
compute up infinite 1 down* node041
compute up infinite 77 idle node[001-040,042-078]
gpu up infinite 8 alloc gpu[03-10]
gpu up infinite 2 idle gpu[01-02]
The output tells you, which systems are online, idle or down. In the shown example the system called "node041" is down, the nodes "gpu03"-"gpu10" are allocated and the nodes "node[001-040,042-078]" are in idle state.
The command
$ squeue
returns some information about the jobs that are running right now, or which are planned to run:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
777 compute ep_12.sh geistn PD 0:00 77 (BeginTime)
776 gpu ep_10.sh geistn R 4:52:32 8 gpu[03-10]
In this example Job 777 will start to run at a specific time and is therefore pending (PD) while Job 776 is running right now (R) since 4 hours and 52 minutes on 8 gpu-systems.
Writing a submit file
In order to run your own program, open an editor of choice (vim/nano) and write a file with the commands necessary to start your calculations.
Keep in mind that you have to load the necessary modules in this submit file.
Each file starts with #!/path/to/the/interpreter/ (almost always /bin/bash). After this line (next line) the arguments to sbatch can be listed. You can get the arguments for sbatch from the web.
The things you will need the most are
- N: <Number-Of-Systems>
- ...
This submit file will start your application on ten systems:
#!/bin/bash
#SBATCH -J my_job # job name
#SBATCH -N 10 # number of nodes
#SBATCH -n 10 # number of MPI processes, here 1 per node
#SBATCH --partition=compute # choose nodes from partition
#SBATCH -o %j.out # stdout file name (%j: job ID)
#SBATCH -e %j.err # stderr file name (%j: job ID)
#SBATCH -t 24:00:00 # max run time (hh:mm:ss), max 72h!
#SBATCH --mail-type=end
#SBATCH --mail-user=BRAIN_USER@uni-greifswald.de
## optional environment variables
echo "On which nodes it executes:"
echo $SLURM_JOB_NODELIST
echo " "
echo "jobname: $SLURM_JOB_NAME"
## load modules
module load gcc/5.2.0 # for instance
./my_application my_arguments
In order to tell the scheduler to run this you need to call sbatch with you script as the first argument:
sbatch my_submit_script.sh
You will get an answer from sbatch with a job id. Remember this id to find your application in squeues output.
The output which your application would normally produce will be redirected to a file called slurm-<jobid>.out.