Parallel Cluster Guides: Difference between revisions

From Micro and Nano Mechanics Group
Jump to navigation Jump to search
(added more details about su-ahpcrc)
 
(14 intermediate revisions by 2 users not shown)
Line 9: Line 9:
level:user
level:user
</pre></blockquote>
</pre></blockquote>
*If your result is something other than the above (version number may different), execute:
*If your result is something other than the above (version number may differ), execute:
<blockquote style="background: white; border: 0; padding: 1em; width: 400px;"><pre>
<blockquote style="background: white; border: 0; padding: 1em; width: 400px;"><pre>
$ mpi-selector --set mvapich2_intel-1.2
$ mpi-selector --set mvapich2_intel-1.2
Line 109: Line 109:
$ qsub paradis.pbs
$ qsub paradis.pbs
</pre></blockquote>
</pre></blockquote>

=mc2=
By default, you will not have access to Intel compilers or MPI. They need to be added with the module command.
*EXTREMELY IMPORTANT: Ensure that nothing related to Intel has been added to your path in .bashrc and .bash_profile
*To see the list of available modules use: <tt>module avial</tt>
*To temporarily add the modules for your current session you can use <tt>module add</tt>
*To permanently add modules issue a command similar to this:
<blockquote style="background: white; border: 0; padding: 1em; width: 500px;"><pre>
$ module purge
$ module initrm intel/intel-11 mvapich/1.2rc1-intel-11-dell-gen2
$ module initadd intel/intel-12 mvapich2/1.7rc1-intel-12
$ module load intel/intel-12 mvapich2/1.7rc1-intel-12
</pre></blockquote>


=run ParaDiS in MC2=

To run a ParaDiS input in mc2, we need to edit the following file("paradis.serial.run") and submit it.

=== In paradis.serial.run ===
First of all, we can specify the JOB NAME in "[[#paradis.serial.run|paradis.serial.run]]", as follows
### Job id
#PBS -N CYL_T4_MC2.3

,where JOB NAME is CYL_T4_MC2.3 which is used when you check if the input file is running well.

Next, we need to specify the input file and the location of the log file will be written, as follows.

bin/paradiscyl tests/CYL_test/CYL_T5_MC2/CYL_T4_MC2_3.ctrl >& tests/CYL_test/CYL_T4_MC2/CYL_T4_MC2_3.log

We can see that the first part ("bin/paradiscyl tests/CYL_test/CYL_T5_MC2/CYL_T4_MC2_3.ctrl") is exactly same as the command you use when you run a input file in your computer.After this simbol(" >&"), the name of log file is specified("tests/CYL_test/CYL_T4_MC2/CYL_T4_MC2_3.log")

Then, we need to turn off the x-window in "winDefaultsFile" which is specified in the ctrl file.
For example, if it is "inputs/paradis.xdefaults", modify this file as follows.If you don't have X-server in your computer, you need to do the same thing to run the simulation.
enable_window = 0 # Toggle enabling/disabling simulation X-window.

=== submit paradis.serial.run ===
Finally, submit this file("paradis.serial.run") in MC2, as follows.

$ qsub paradis.serial.run
6006.mc2.stanford.edu

Job id("6006.mc2") will be shown in the screen.

After that, we can check if the input file in running using "qstat" or "showq"

$ qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
5976.mc2 YSZ_CC5 inmyway 597:13:0 R default
5977.mc2 YSZ_CC6 inmyway 597:11:2 R default
5978.mc2 YSZ_CC7 inmyway 597:05:2 R default
5979.mc2 YSZ_CC8 inmyway 597:08:0 R default
5980.mc2 YSZ_CC9 inmyway 597:08:5 R default
5981.mc2 YSZ_CC10 inmyway 597:07:1 R default
5982.mc2 YSZ_CC11 inmyway 398:06:0 R default
5999.mc2 ...mmps_Test-hcp yanmingw 147:57:1 R default
6000.mc2 AuSi_Lammps_Test yanmingw 147:48:1 R default
6005.mc2 YSZ_S2_NEB inmyway 15:46:24 R default
<u>6006.mc2 CYL_T4_MC2.3 iryu 00:00:00 R default</u>

Moreover, you can see the progress in the log file, as follows.
$ cd tests/CYL_test/
$ vi CYL_T4_MC2_3.log

=== check compilers ===
We can check which compilers are used, as follows.

$ ldd bin/paradiscyl
libm.so.6 => /lib64/libm.so.6 (0x00000038dba00000)
libX11.so.6 => /usr/lib64/libX11.so.6 (0x00000038de200000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00000038dc200000)
libfftw3.so.3 => /usr/lib64/libfftw3.so.3 (0x0000003440200000)
libgsl.so.0 => /home/iryu/usr/lib/libgsl.so.0 (0x00002b38ee018000)
libgslcblas.so.0 => /home/iryu/usr/lib/libgslcblas.so.0 (0x00002b38ee545000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000038ee000000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000038ebc00000)
libc.so.6 => /lib64/libc.so.6 (0x00000038db600000)
libdl.so.2 => /lib64/libdl.so.2 (0x00000038dbe00000)
libXau.so.6 => /usr/lib64/libXau.so.6 (0x00000038dd600000)
libXdmcp.so.6 => /usr/lib64/libXdmcp.so.6 (0x00000038dde00000)
/lib64/ld-linux-x86-64.so.2 (0x00000038db200000)
libimf.so => /share/apps/intel/lib/intel64/libimf.so (0x00002b38ee7af000)
libsvml.so => /share/apps/intel/lib/intel64/libsvml.so (0x00002b38eeb92000)
libintlc.so.5 => /share/apps/intel/lib/intel64/libintlc.so.5 (0x00002b38ef23a000)

If you have some problem to find proper compiler, we need to add it in your user bash_profile.
To do that, check your bash_profile, as follows.

$ cd ~
$ vi ~/.bash_profile

In "bash_profile", we need to specify path for "LD_LIBRARY_PATH", as follows.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/iryu/Codes/fftw3_lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/home/iryu/usr/lib
export TARGET=mc2

After that, copy proper files to the path which is specified in bash_profile.
For example, if "libfftw3.so.3" is missing,
$ cp libfftw3.so.3 usr/lib/

=== kill the job ===
If you want to stop the calculation, we can do it using "qdel"
$ qdel 6005
, where 6005 is JOB ID which you can see through "qstat"

=== paradis.serial.run ===
<blockquote style="background: white; border: 0; padding: 1em; width: 750px;"><pre>
#!/bin/bash

### Job id
#PBS -N CYL_T4_MC2.3
### #PBS -N fmm.8cpu

#PBS -j oe

### ppn : # of cpus / walltime = running time
#PBS -l nodes=1:ppn=1,walltime=48:00:00
#PBS -V

### ---------------------------------------
### BEGINNING OF EXECUTION
### ---------------------------------------

echo The master node of this job is `hostname`
echo The working directory is `echo $PBS_O_WORKDIR`
echo This job runs on the following nodes:
echo `cat $PBS_NODEFILE`

ncpu=`cat $PBS_NODEFILE | wc -w`
echo "Number of processors = $ncpu "

### end of information preamble

cd $PBS_O_WORKDIR

echo $PWD

cd $PBS_O_WORKDIR

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/iryu/Codes/fftw3_lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/home/iryu/usr/lib

### To see which compiler is being used.(You can see it from Job_id.oJOBNAME)
###ldd bin/paradiscyl

bin/paradiscyl tests/CYL_test/CYL_T5_MC2/CYL_T4_MC2_3.ctrl >& tests/CYL_test/CYL_T4_MC2/CYL_T4_MC2_3.log

</pre></blockquote>

=== Run several single-cpu jobs in MC2 ===

If you want to run many jobs which all use just one cpu, we can use the following stop the calculation, we can do it by submitting the following file.
For example, if you want to run two jobs whose command are

bin/paradiscyl tests/CYL_TEST1.ctrl >& tests/CYL_TEST1.log
bin/paradiscyl tests/CYL_TEST2.ctrl >& tests/CYL_TEST2.log

, you can use the following file.
<blockquote style="background: white; border: 0; padding: 1em; width: 750px;"><pre>
#!/bin/bash

### Job id
#PBS -N CYL_TEST
### #PBS -N fmm.8cpu

#PBS -j oe

### ppn : # of cpus / walltime = running time
#PBS -l nodes=1:ppn=1,walltime=48:00:00
#PBS -V

### ---------------------------------------
### BEGINNING OF EXECUTION
### ---------------------------------------

echo The master node of this job is `hostname`
echo The working directory is `echo $PBS_O_WORKDIR`
echo This job runs on the following nodes:
echo `cat $PBS_NODEFILE`

ncpu=`cat $PBS_NODEFILE | wc -w`
echo "Number of processors = $ncpu "

### end of information preamble

cd $PBS_O_WORKDIR

echo $PWD

cd $PBS_O_WORKDIR

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/iryu/Codes/fftw3_lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/home/iryu/usr/lib

### To see which compiler is being used.(You can see it from Job_id.oJOBNAME)
###ldd bin/paradiscyl

bin/paradiscyl tests/CYL_TEST1.ctrl >& tests/CYL_TEST1.log &
bin/paradiscyl tests/CYL_TEST2.ctrl >& tests/CYL_TEST2.log &

wait
</pre></blockquote>

The only difference is to put ampersand sign(&) in the end of the command line.

Latest revision as of 23:12, 1 May 2012

su-ahpcrc

The default settings for MPI will cause the code to use GCC rather than the Intel compilers. To change this:

  • EXTREMELY IMPORTANT: Ensure that nothing related to Intel has been added to your path in .bashrc and .bash_profile
  • Run:
$ mpi-selector --query
default:mvapich2_intel-1.2
level:user
  • If your result is something other than the above (version number may differ), execute:
$ mpi-selector --set mvapich2_intel-1.2
  • Log out and reconnect to the cluster
  • If everything is correct, asking which mpicc should yield:
$ which mpicc
/usr/mpi/intel/mvapich2-1.2/bin/mpicc
  • Add the following to makefile.sys, if not present:
########################################################
#
#    System type:  su-ahpcrc
#    
#    Stanford ME linux system using intel compilers
#
########################################################

#
#    Define parallel and serial compilers and compiler flags
#    and set the default compiler based on the execution mode
#    (defined by user in makefile)
#
CC_PARALLEL.su-ahpcrc       = mpicc
CPP_PARALLEL.su-ahpcrc      = mpicxx
CCFLAG_PARALLEL.su-ahpcrc   = -longdouble -DLONGDOUBLE -DPARALLEL=1 
CPPFLAG_PARALLEL.su-ahpcrc  = 

CC_SERIAL.su-ahpcrc         = icc
CPP_SERIAL.su-ahpcrc        = icpc
CCFLAG_SERIAL.su-ahpcrc     = -DLONGDOUBLE
CPPFLAG_SERIAL.su-ahpcrc    = 

F90.su-ahpcrc               = ifort
F90_OPTS.su-ahpcrc          = 
F90_LIB.su-ahpcrc           = -L/opt/intel/fce/10.1.015/lib -lifcore

CC.su-ahpcrc                = $(CC_$(MODE).su-ahpcrc)
CPP.su-ahpcrc               = $(CPP_$(MODE).su-ahpcrc)
CCFLAG.su-ahpcrc            = $(CCFLAG_$(MODE).su-ahpcrc)
CPPFLAG.su-ahpcrc           = $(CPPFLAG_$(MODE).su-ahpcrc)

XLIB_LIBDIR.su-ahpcrc       = /usr/X11R6/lib64
XLIB_LIB.su-ahpcrc          = -L$(XLIB_LIBDIR.su-ahpcrc) -lX11 -lpthread
XLIB_INCS.su-ahpcrc         =

MPI_LIBDIR.su-ahpcrc        = -L/export/apps/mvapich/intel/lib
MPI_LIB.su-ahpcrc           = -lmpich
MPI_INCS.su-ahpcrc          = 

OPENMP_FLAG.su-ahpcrc       = -openmp

#
#    Identify any additional libraries and paths needed for compilation
#    on this system type
#
LIB_PARALLEL.su-ahpcrc      =
INCS_PARALLEL.su-ahpcrc     =

LIB_SERIAL.su-ahpcrc        =-L../lib
INCS_SERIAL.su-ahpcrc       =
  • When compiling the code, seeing icc and comments about vectorizing loops such as "StressTableGen.c(357): (col. 37) remark: LOOP WAS VECTORIZED." are signs that the Intel compilers were used.
  • Use a PBS script similar to the following (/opt/mpiexec/bin/mpiexec --comm=pmi is the only important difference):
#!/bin/bash
#PBS -N ParaDiS
#PBS -j oe
#PBS -l nodes=1:ppn=8,walltime=24:00:00
#PBS -V

### ---------------------------------------
### BEGINNING OF EXECUTION
### ---------------------------------------

echo The master node of this job is `hostname`
echo The working directory is `echo $PBS_O_WORKDIR`
echo This job runs on the following nodes:
echo `cat $PBS_NODEFILE`

ncpu=`cat $PBS_NODEFILE | wc -w`
echo "Number of processors = $ncpu "

### end of information preamble

cd $PBS_O_WORKDIR

echo $PWD

PARADIS_O_DIR="tests/fmm_8cpu_results"
mkdir -p $PARADIS_O_DIR
cmd="/opt/mpiexec/bin/mpiexec --comm=pmi -np $ncpu bin/paradis tests/fmm_8cpu.ctrl"
$cmd >& $PARADIS_O_DIR/paradis.log
  • Submit the job using a command similar to:
$ qsub paradis.pbs

mc2

By default, you will not have access to Intel compilers or MPI. They need to be added with the module command.

  • EXTREMELY IMPORTANT: Ensure that nothing related to Intel has been added to your path in .bashrc and .bash_profile
  • To see the list of available modules use: module avial
  • To temporarily add the modules for your current session you can use module add
  • To permanently add modules issue a command similar to this:
$ module purge
$ module initrm intel/intel-11 mvapich/1.2rc1-intel-11-dell-gen2
$ module initadd intel/intel-12 mvapich2/1.7rc1-intel-12
$ module load intel/intel-12 mvapich2/1.7rc1-intel-12


run ParaDiS in MC2

To run a ParaDiS input in mc2, we need to edit the following file("paradis.serial.run") and submit it.

In paradis.serial.run

First of all, we can specify the JOB NAME in "paradis.serial.run", as follows

### Job id
#PBS -N CYL_T4_MC2.3

,where JOB NAME is CYL_T4_MC2.3 which is used when you check if the input file is running well.

Next, we need to specify the input file and the location of the log file will be written, as follows.

bin/paradiscyl tests/CYL_test/CYL_T5_MC2/CYL_T4_MC2_3.ctrl >& tests/CYL_test/CYL_T4_MC2/CYL_T4_MC2_3.log 

We can see that the first part ("bin/paradiscyl tests/CYL_test/CYL_T5_MC2/CYL_T4_MC2_3.ctrl") is exactly same as the command you use when you run a input file in your computer.After this simbol(" >&"), the name of log file is specified("tests/CYL_test/CYL_T4_MC2/CYL_T4_MC2_3.log")

Then, we need to turn off the x-window in "winDefaultsFile" which is specified in the ctrl file. For example, if it is "inputs/paradis.xdefaults", modify this file as follows.If you don't have X-server in your computer, you need to do the same thing to run the simulation.

enable_window = 0    # Toggle enabling/disabling simulation X-window.

submit paradis.serial.run

Finally, submit this file("paradis.serial.run") in MC2, as follows.

$ qsub paradis.serial.run 
6006.mc2.stanford.edu

Job id("6006.mc2") will be shown in the screen.

After that, we can check if the input file in running using "qstat" or "showq"

$ qstat
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
5976.mc2                  YSZ_CC5          inmyway         597:13:0 R default        
5977.mc2                  YSZ_CC6          inmyway         597:11:2 R default        
5978.mc2                  YSZ_CC7          inmyway         597:05:2 R default        
5979.mc2                  YSZ_CC8          inmyway         597:08:0 R default        
5980.mc2                  YSZ_CC9          inmyway         597:08:5 R default        
5981.mc2                  YSZ_CC10         inmyway         597:07:1 R default        
5982.mc2                  YSZ_CC11         inmyway         398:06:0 R default        
5999.mc2                  ...mmps_Test-hcp yanmingw        147:57:1 R default        
6000.mc2                  AuSi_Lammps_Test yanmingw        147:48:1 R default        
6005.mc2                  YSZ_S2_NEB       inmyway         15:46:24 R default 
6006.mc2                  CYL_T4_MC2.3       iryu          00:00:00 R default

Moreover, you can see the progress in the log file, as follows.

$ cd tests/CYL_test/
$ vi CYL_T4_MC2_3.log

check compilers

We can check which compilers are used, as follows.

$ ldd bin/paradiscyl
       libm.so.6 => /lib64/libm.so.6 (0x00000038dba00000)
       libX11.so.6 => /usr/lib64/libX11.so.6 (0x00000038de200000)
       libpthread.so.0 => /lib64/libpthread.so.0 (0x00000038dc200000)
       libfftw3.so.3 => /usr/lib64/libfftw3.so.3 (0x0000003440200000)
       libgsl.so.0 => /home/iryu/usr/lib/libgsl.so.0 (0x00002b38ee018000)
       libgslcblas.so.0 => /home/iryu/usr/lib/libgslcblas.so.0 (0x00002b38ee545000)
       libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000038ee000000)
       libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000038ebc00000)
       libc.so.6 => /lib64/libc.so.6 (0x00000038db600000)
       libdl.so.2 => /lib64/libdl.so.2 (0x00000038dbe00000)
       libXau.so.6 => /usr/lib64/libXau.so.6 (0x00000038dd600000)
       libXdmcp.so.6 => /usr/lib64/libXdmcp.so.6 (0x00000038dde00000)
       /lib64/ld-linux-x86-64.so.2 (0x00000038db200000)
       libimf.so => /share/apps/intel/lib/intel64/libimf.so (0x00002b38ee7af000)
       libsvml.so => /share/apps/intel/lib/intel64/libsvml.so (0x00002b38eeb92000)
       libintlc.so.5 => /share/apps/intel/lib/intel64/libintlc.so.5 (0x00002b38ef23a000)

If you have some problem to find proper compiler, we need to add it in your user bash_profile. To do that, check your bash_profile, as follows.

$ cd ~
$ vi ~/.bash_profile

In "bash_profile", we need to specify path for "LD_LIBRARY_PATH", as follows.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/iryu/Codes/fftw3_lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/home/iryu/usr/lib
export TARGET=mc2

After that, copy proper files to the path which is specified in bash_profile. For example, if "libfftw3.so.3" is missing,

$ cp libfftw3.so.3 usr/lib/

kill the job

If you want to stop the calculation, we can do it using "qdel"

$ qdel 6005

, where 6005 is JOB ID which you can see through "qstat"

paradis.serial.run

 #!/bin/bash

 ### Job id
 #PBS -N CYL_T4_MC2.3
 ### #PBS -N fmm.8cpu

 #PBS -j oe

 ### ppn : # of cpus / walltime = running time
 #PBS -l nodes=1:ppn=1,walltime=48:00:00
 #PBS -V

 ### ---------------------------------------
 ### BEGINNING OF EXECUTION
 ### ---------------------------------------

 echo The master node of this job is `hostname`
 echo The working directory is `echo $PBS_O_WORKDIR`
 echo This job runs on the following nodes:
 echo `cat $PBS_NODEFILE`

 ncpu=`cat $PBS_NODEFILE | wc -w`
 echo "Number of processors = $ncpu "

 ### end of information preamble

 cd $PBS_O_WORKDIR 

 echo $PWD

 cd $PBS_O_WORKDIR

 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/iryu/Codes/fftw3_lib
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/home/iryu/usr/lib

 ### To see which compiler is being used.(You can see it from Job_id.oJOBNAME) 
 ###ldd bin/paradiscyl

 bin/paradiscyl tests/CYL_test/CYL_T5_MC2/CYL_T4_MC2_3.ctrl >& tests/CYL_test/CYL_T4_MC2/CYL_T4_MC2_3.log

Run several single-cpu jobs in MC2

If you want to run many jobs which all use just one cpu, we can use the following stop the calculation, we can do it by submitting the following file. For example, if you want to run two jobs whose command are

bin/paradiscyl tests/CYL_TEST1.ctrl >& tests/CYL_TEST1.log
bin/paradiscyl tests/CYL_TEST2.ctrl >& tests/CYL_TEST2.log

, you can use the following file.

 #!/bin/bash

 ### Job id
 #PBS -N CYL_TEST
 ### #PBS -N fmm.8cpu

 #PBS -j oe

 ### ppn : # of cpus / walltime = running time
 #PBS -l nodes=1:ppn=1,walltime=48:00:00
 #PBS -V

 ### ---------------------------------------
 ### BEGINNING OF EXECUTION
 ### ---------------------------------------

 echo The master node of this job is `hostname`
 echo The working directory is `echo $PBS_O_WORKDIR`
 echo This job runs on the following nodes:
 echo `cat $PBS_NODEFILE`

 ncpu=`cat $PBS_NODEFILE | wc -w`
 echo "Number of processors = $ncpu "

 ### end of information preamble

 cd $PBS_O_WORKDIR 

 echo $PWD

 cd $PBS_O_WORKDIR

 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/iryu/Codes/fftw3_lib
 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/home/iryu/usr/lib

 ### To see which compiler is being used.(You can see it from Job_id.oJOBNAME) 
 ###ldd bin/paradiscyl

 bin/paradiscyl tests/CYL_TEST1.ctrl >& tests/CYL_TEST1.log &
 bin/paradiscyl tests/CYL_TEST2.ctrl >& tests/CYL_TEST2.log &

 wait

The only difference is to put ampersand sign(&) in the end of the command line.