Micro and Nano Mechanics Group
Revision as of 11:28, 8 July 2011 by Swlee49 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Contents

Computing Clusters

The Micro and Nano Mechanics Group primarily uses two clusters here at Stanford. They are MC-CC and WCR (or Glacial). MC-CC is a cluster supported by the Mechanics and Computation Group in the Mechanical Engineering Department and whose name means "Mechanics and Computation Computing Cluster". WCR, or alternatively Glacial, is a computer cluster hosted the the Mechanical Engineering Department and is the initials of the late William Reynolds (Not the Reynolds number Reynolds, but a Professor at Stanford who did research in thermodynamics). To use either cluster, you must be working with a faculty affiliated with the cluster. MC-CC has 45 nodes, each with 2 processors and WCR has 221 nodes with 8 processors per node.

Hints For working with the clusters

Below you will find information regarding nuances of working with these computing clusters. This is not meant to be a exhaustive list nor a tutorial, but a collection of advice and tools. They may not follow and order and we will update them as we remember.

Semaphores

A semaphore is a type of shared resource on the clusters, that can lead to communication problems in parallel runs if left on the nodes since there is a limit of number of semaphores. If any job you run terminates abnormally, you may leave Semaphore Arrays on the computer. If you run the job on the head node, then the Semaphore Array is left with the head node, otherwise it will be associated with the compute node you were working on. This can lead to problems with future parallel jobs run by you or other users. Therefore, it is good manners to frequently clean up your Semaphores.

How do I clean up Semahpores, you ask? Easy! You need to run the cleanipcs script to clean semaphores from your the computer. To do this, first you must find cleanipcs in the cluster using the locate command. Then, copy it over to you home directory and chmod it so that you have execute privileges. Now, you can run the program by typing

 $ ./cleanipcs

This will clean all of your Semaphores from the computer. However, if you are on a cluster, then this will only clean the Semaphores from the head node. Generally, the problem usually lies with Semapahores on the compute nodes, which can be cleaned using the cluster-fork command. The command cluster-fork runs a following command(e.g. ./cleanipcs) on all nodes or a subset of nodes. To clean the compute nodes, use

 $ cluster-fork ./cleanipcs

In some of clusters, the command cluster-fork has been replaced with rocks run host. In this case, use

 $ rocks run host "./cleanipcs"

How do I know if I have a Semaphore problem? Ok, thats easy too. Just run the command

 $/usr/bin/ipcs -s

This will list all of the Semaphore Arrays used on the head node. Basically, the command ipcs lists all the active interprocess communication objects (shared memory, message queues and semaphores). With option -s, it will show only semaphore sets. For the compute nodes, just use the cluster-fork prefix. The aforementioned script cleanipcs internally uses the command ipcrm to remove interprocess communication objects.

Finally, how do I know if this is crashing my parallel job? Well, likely your job will crash with the following error:

 $ p4_error: semget failed for setnum:0

If you see this error in your parallel job, you may have a problem with Semaphore Arrays. Run the ipcs command to see which nodes have Semaphore Arrays and clean up yours. You cannot clean up Semapahores owned by other users, so you may have to contact them to free all Semaphores.


Use Intel compiler in MC2

MC2 has an intel compiler in the folder "/opt/intel", but it would not be active at your home directory. In this case, you need to activate an intel compiler at your home directory to compile somthing with icc (C compiler) and icpc (C++ compiler). You can do this simply by adding a line in your .bashrc file:

 source /opt/intel/Compiler/11.1/064/bin/iccvars.sh intel64

Because MC2 is a 64 bit machine, you need intel64. (If the machine is 32 bit, you need ia32). Then, log out of a machine and reconnect. Then, you can use the intel compiler at your home folder. This is needed to compile ParaDiS 2.5.1.