<?xml version="1.0" standalone="no"?>

<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
          "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
<article>
<articleinfo>
  <title>NPACI Rocks at NCSU: Beginning Computing</title>
  <authorgroup>
    <author><firstname>Jack</firstname><surname>Neely</surname>
      <affiliation>
        <orgname>NC State University College of PAMS</orgname>
        <address>
          <email>pco@pams.ncsu.edu</email>
        </address>
      </affiliation>
    </author>
  </authorgroup>
</articleinfo>

<section><title>The Next Generation of Beowulf Software in PAMS</title>

<para>PAMS is moving its High Performance Computing Beowulf cluster to a new set of software called <ulink url="http://www.rocksclusters.org">NPACI Rocks</ulink> and phasing out the very old version of the Beowulf software from <ulink url="http://www.scyld.com">Scyld</ulink>.  The new software has all the same benefits as the old but works very differently.  The goal of this short document is to give users of our new Beowulf software a quick start to computing on the cluster and pointers for where to get more information.</para>

<para>The most noticeable difference is that we have changed the batch system we use to the Sun Grid Engine and now <emphasis>require</emphasis> users to submit all jobs via this batch system.  (Jobs not submitted through the batch system or run on the head node will be terminated by the system administrators for the Beowulf.)  Another noticeable change is that the software is now based on Red Hat Linux 7.3 and much more modern.</para>

</section>

<section><title>Running Simple Compute Jobs</title>

<para>On to submitting jobs.  Each job is nothing more than a shell script.  You submit this shell script via <command>qsub</command> and the Sun Grid Engine (SGE) will schedule it and then run it.  For non parallel compute jobs its pretty simple.  For example:</para>

<screen>[jjneely@beo-test jjneely]$ cat job.csh
#!/bin/tcsh

echo "Running job job.csh"
date
sleep 20
date
echo "WOOT!"

[jjneely@beo-test jjneely]$ qsub job.csh
your job 6 ("job.csh") has been submitted
[jjneely@beo-test jjneely]$
</screen>

<para>The script <filename>job.sh</filename> is my "compute job" and I've used the <command>qsub</command> command to submit it to the system.  After the job has been scheduled and run it will leave its output as files in your home directory on the cluster.</para>

<screen>[jjneely@beo-test jjneely]$ ls  job.*
job.csh  job.csh.e6  job.csh.o6
[jjneely@beo-test jjneely]$
</screen>

<para>Here we see the script and the output files.  The files always end in a numerical ID -- that's the job ID to SGE.  The "e" and the "o" specify the standard error, and the standard out outputs respectively.</para>

<screen>[jjneely@beo-test jjneely]$ cat job.csh.o6
Running job job.csh
Mon Feb 10 18:45:14 GMT 2003
Mon Feb 10 18:45:34 GMT 2003
WOOT!
[jjneely@beo-test jjneely]$
</screen>

<para>There are many, many options to qsub and different ways of running jobs.  (Such as "array jobs" which run the same script multiple times to get different output.)  A handy one is <option>-cwd</option> which runs the job from the same directory that you submitted it from and puts your output files in that directory.  All of the details can be found in the <ulink url="http://gridengine.sunsource.net/project/gridengine/documentation.html">documentation for the Sun Grid Engine</ulink>.</para>

</section>

<section><title>Running MPI Jobs</title>

<para>Next, how do you run an MPI job?  Yes, MPI jobs must also be submitted via <command>qsub</command> and SGE knows exactly how to work with them.  There are three things you'll need to run an MPI job.  First, you will have to request a <emphasis>parallel environment</emphasis> and specify how many (can be a range) of compute nodes you need for your MPI job.  Secondly, you'll need to use the <varname>$NSLOTS</varname> variable so that SGE will tell MPI how many processes to use.  Finally, you'll need to use the path <filename>$TMPDIR/machines</filename> for your machines file to tell MPI to run on which nodes.</para>

<para>That's a lot easier than it looks.  You can request the environment and number of nodes to use either on the command line or in your job script.  You can use <parameter>-pe mpich &lt;range&gt;</parameter> to qsub, or to embed that in your job script start a newline with "#$" and then type in the same arguments you would give to qsub.  The "mpich" is the environment you are requesting.  (That's also the only MPI environment we have right now.)  The range can be a single number for a definite number of nodes or a string like "2-4,8,16" to specify you would like to use 2, 3, 4, 8, or 16 nodes depending on what's available to the system at that time.</para>

<para>Let's look at an example that runs the popular Linpack benchmark.  (You'll need your <filename>HPL.dat</filename> in your home directory.)  Here's my job:</para>

<screen>[jjneely@beo-test jjneely]$ cat linpack.csh
#!/bin/tcsh

#$ -pe mpich 4

echo "I have $NSLOTS slots."

/opt/mpich/ethernet/gcc/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines /opt/hpl-eth/bin/xhpl

[jjneely@beo-test jjneely]$
</screen>

<para>You can see how I've requested the environment and used the supplied variables as command line arguments to the <command>mpirun</command> program.  That's pretty much it.  Use qsub to submit the script and off it goes.</para>

<para>One more detail to note with MPI.  Our Beowulf uses MPICH's ethernet device for its communications layer and GCC for its compiler.  So when running and building MPI programs you'll need to use the MPI scripts, binaries, and libraries found in <filename>/opt/mpich/ethernet/gcc</filename> as we did in the above job script.  As compilers are added to the system you may find other directiories in <filename>/opt/mpich/ethernet</filename> that contain MPI tools specific to those compilers.</para>

</section>

<section><title>Seeing Job Status</title>

<para>How do you see what the status of the jobs and queues are on the cluster?  This is done with the <command>qstat</command> command.  Running <command>qstat</command> without any options will list out any jobs that are currently running.  If there is no output from the command then there are no jobs running.  The man page for qstat explains all the options.</para>

<screen>[jjneely@beo-test jjneely]$ qstat
job-ID  prior name       user         state submit/start at     queue      master  ja-task-ID
---------------------------------------------------------------------------------------------
      8     0 linpack.csh jjneely      t     02/11/2003 15:13:32 compute-0- SLAVE
      8     0 linpack.csh jjneely      t     02/11/2003 15:13:32 compute-0- MASTER
            0 linpack.csh jjneely      t     02/11/2003 15:13:32 compute-0- SLAVE
      8     0 linpack.csh jjneely      t     02/11/2003 15:13:32 compute-0- SLAVE
      8     0 linpack.csh jjneely      t     02/11/2003 15:13:32 compute-0- SLAVE
[jjneely@beo-test jjneely]$
</screen>

<para>Some of the more useful options to <command>qstat</command> are <option>-f</option> for full output, and <option>-u</option> to specify a user.  With these options you can get detailed information about your currently running jobs.  Also, if you remember your job ID number you can use the <option>-j</option> option to see detailed information about just that job.</para>

</section>

<section>
<title>Have Problems?  Need More Information?</title>

<para>If you have problems use the PAMS Beowulf or just need more information about using the system please email us as <email>pams_hpc@help.ncsu.edu</email>.</para>

</section>

</article>
