Micro and Nano Mechanics Group
Revision as of 16:30, 6 February 2008 by Cweinber (Talk)

Computing Clusters

The Micro and Nano Mechanics Group primarily uses two clusters here at Stanford. They are MC-CC and WCR (or Glacial). MC-CC is a cluster supported by the Mechanics and Computation Group in the Mechanical Engineering Department and whose name means "Mechancs and Computation Computing Cluster". WCR, or alternatively Glacial, is a computer cluster hosted the the Mechanical Engineering Department and is the initials of the late William Renolds (Not the Reynolds number Reynolds, but a Professor at Stanford who did research in thermodynamics). To use either cluster, you must be working with a faculty affiliated with the cluster. MC-CC has 45 nodes, each with 2 processors and WCR has 221 nodes with eight processors per node.

Hints For working with the clusters

Below you will find information regarding nuances of working with these computing clusters. This is not meant to be a exhaustive list nor a tutorial, but a collection of advice and tools. They may not follow and order and we will update them as we remember.

Semaphores

A semaphore is a type of shared resource on the clusters that can lead to communication problems in paralle runs. If any job you run terminates abnormally, you may leave Semaphore Arrays on the computer. If you run the job on the head node, then the Semaphore Array is left with the head node, otherwise it will be associated with the compute node you were working on. This can lead to problems with future parallel jobs run by you or other users. Therefore, it is good manners to frequently clean up your Semaphores.

How do I clean up Semahpores, you ask? Easy! You need to run the cleanipcs script to clean semphores from your the computer. To do this, first you must find cleanipcs in the cluster using the locate command. Then, copy it over to you home directory and chmod it so that you have execute privaleges. Now, you can run the program by typing ./cleanipcs. This will clean all of your Semaphores from the computer. However, if you are on a cluster, then this will only clean the Semaphores from the head node. Generally, the problem usually lies with Sempahores on the compute nodes, which can be cleaned using the cluster-fork command. To clean the compute nodes, use cluster-fork ./cleanipcs.

How do I know if I have a Semaphore problem? Ok, thats easy too. Just run the command /usr/bin/ipcs -s. This will list all of the Semaphore Arrays used on the head node. For the compute nodes, just use the cluster-fork prefix.

Finally, how do I know if this is crashing my parallel job? Well, likely your job will crash with the following error: p4_error: semget failed for setnum:0. If you see this error in your parallel job, you may have a problem with Semaphore Arrays. Run the ipcs command to see which nodes have Semaphore Arrays and clean up yours. You cannot clean up Sempahores owned by other users, so you may have to contact them to free all Semaphores. For this reason, cleaning up your Semaphores is good manners because it can affect all users.