Running the GEOS-5 SBU Benchmark: Difference between revisions

From GEOS-5
Jump to navigation Jump to search
 
Clean up for Benchmark
Line 1: Line 1:
==Method for Checking Out GEOS-5 GCM==
==Build and install the model==
 
===Set up CVSROOT===
 
First, set up your CVSROOT environment variable using the scheme provided [https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL NCCS's CVSACL webpage] (requires NCCS login) where:
 
On Discover and NAS: CVSROOT=:ext:$USER@cvsacldirect:/cvsroot/esma
Elsewhere:          CVSROOT=:ext:$USER@ctunnel:/cvsroot/esma
 
===Set up CVS_RSH===
 
Also, set CVS_RSH=ssh in your environment.


====Start the tunnel (machines other than Discover and Pleiades)====
First, untar the model tarball (in $NOBACKUP!!!):


==Checking out the model==
  $ tar xf Heracles-UNSTABLE-MPT-Benchmark.2017Feb13.tar.gz
 
Make the directory in which you wish to checkout the model:
     
  $ mkdir Heracles-UNSTABLE
$ cd Heracles-UNSTABLE/


And do the actual checkout using:
Next, set up ESMADIR:


$ cvs co -r Heracles-UNSTABLE GEOSagcm
  $ setenv ESMADIR <directory-to>/Heracles-UNSTABLE-MPT-Benchmark/GEOSagcm
 
In general:
 
$ cvs co -r <Tag Name> <Module Name>
 
where <Tag Name> is the tag for the model to check out (e.g., Heracles-UNSTABLE, Ganymed-4_1) and <Module Name> is the module (e.g., GEOSagcm usually, though some older tags use: Ganymed, Fortuna).
 
==Build and install the model==
 
First, set up ESMADIR:
 
  $ setenv ESMADIR <directory-to>/Heracles-UNSTABLE/GEOSagcm


it is just below the src/ directory.
it is just below the src/ directory.
Line 54: Line 25:
or you can interactively build the model using:
or you can interactively build the model using:


  $ gmake install
  $ make install


To capture the install log, we recommend tee'ing the output to a file:
To capture the install log, we recommend tee'ing the output to a file:


  $ gmake install |& tee make.install.log (on tcsh)
  $ make install |& tee make.install.log (on tcsh)
  $ gmake install 2>&1 | tee make.install.log (on bash)
  $ make install 2>&1 | tee make.install.log (on bash)


Note you can also build in parallel interactively with:
Note you can also build in parallel interactively with:


  $ gmake --jobs=N pinstall |& tee make.pinstall.log (on tcsh)
  $ make -j8 pinstall |& tee make.pinstall.log (on tcsh)


where N is the number of parallel processes. For best performance, N should never be, say, more than 2 less than the number of cores. So, on a Haswell node, you could use 26, but any number over about 10 is probably not worth it. Note: for the sake of others, ''do this on an interactive node'' if you are going to use 16 cores or so. If you are on a head node, a 2-4 core pinstall is probably safe and shouldn't bother too many people.
where N is the number of parallel processes. From testing, 8 jobs is about as much as is useful. You can use more, but no benefit will accrue.


By default, the Intel Fortran compiler (ifort) is used for the build process. For other compilers, contact matthew.thompson@nasa.gov for instructions to use GCC or PGI compilers.
By default, the Intel Fortran compiler (ifort) is used for the build process. For other compilers, contact matthew.thompson@nasa.gov for instructions to use GCC or PGI compilers.
Line 114: Line 85:


In case of errors, gmh summarizes exactly where it happens by indicating the package where it occured. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.
In case of errors, gmh summarizes exactly where it happens by indicating the package where it occured. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.
 
<!--
===Advanced features===
===Advanced features===


Line 135: Line 106:
These effectively let you change whatever you want - useful for debugging, etc. For example, you can set your timers in ~/.esma_base.mk.
These effectively let you change whatever you want - useful for debugging, etc. For example, you can set your timers in ~/.esma_base.mk.


-->
==Run the model==
==Run the model==



Revision as of 12:29, 13 February 2017

Build and install the model

First, untar the model tarball (in $NOBACKUP!!!):

$ tar xf Heracles-UNSTABLE-MPT-Benchmark.2017Feb13.tar.gz

Next, set up ESMADIR:

$ setenv ESMADIR <directory-to>/Heracles-UNSTABLE-MPT-Benchmark/GEOSagcm

it is just below the src/ directory.

Go into the src/ directory of your model. Following above:

$ cd $ESMADIR/src

Setup the environment by sourcing the g5_modules file:

$ source g5_modules

To build the model, you have one of two choices. First, you can use the parallel_build.csh script to submit a PBS job that compiles the model:

$ ./parallel_build.csh

or you can interactively build the model using:

$ make install

To capture the install log, we recommend tee'ing the output to a file:

$ make install |& tee make.install.log (on tcsh)
$ make install 2>&1 | tee make.install.log (on bash)

Note you can also build in parallel interactively with:

$ make -j8 pinstall |& tee make.pinstall.log (on tcsh)

where N is the number of parallel processes. From testing, 8 jobs is about as much as is useful. You can use more, but no benefit will accrue.

By default, the Intel Fortran compiler (ifort) is used for the build process. For other compilers, contact matthew.thompson@nasa.gov for instructions to use GCC or PGI compilers.

Monitor build process

The build can be monitored using the utility gmh.pl in the directory Config. From the src directory:

$ Config/gmh.pl -Av make.install.log

outputs the build status as

                          --------
                          Packages
                          --------

         >>>> Fatal Error           .... Ignored Error

 [ok]      Config
 [ok]      GMAO_Shared
 [ok]      |    GMAO_mpeu
 [ok]      |    |    mpi0
 [ok]      |    GMAO_pilgrim
 [ok]      |    GMAO_gfio
 [ok]      |    |    r4
 [ok]      |    |    r8
 [ok]      |    GMAO_perllib
 [ok]      |    MAPL_cfio
 [ok]      |    |    r4
 [ok]      |    |    r8
 [ok]      |    MAPL_Base
 [ok]      |    |    TeX
 [ok]      |    GEOS_Shared
 [ 1] .... .... Chem_Base
 [ok]      |    Chem_Shared
 [ok]      |    GMAO_etc
 [ok]      |    GMAO_hermes
 [ 2] .... .... GFDL_fms
 [ok]      |    GEOS_Util
 [ok]      |    |    post 

                          -------
                          Summary
                          -------

IGNORED mpp_comm_sma.d mpp_transmit_sma.d Chem_AodMod.d (3 files in 2 packages)
All 22 packages compiled successfully.

In case of errors, gmh summarizes exactly where it happens by indicating the package where it occured. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.

Run the model

Setting up an experiment

Go into the model application directory and run the setup script:

$ cd $ESMADIR/src/Applications/GEOSgcm_App
$ ./gcm_setup

and provide required answers.

MAT: Add a sample run here

Run a one-day test

To run the job, first change to the experiment directory you specified in the above gcm_setup. Then, copy a set of restarts to model directory. Sample restarts are provided on Discover (at /discover/nobackup/mathomp4/Restarts-G10) and Pleiades (at /nobackup/gmao_SIteam/ModelData/Restarts-G10/).

Then, edit CAP.rc:

JOB_SGMT should be 1 day (00000001 000000)
NUM_SGMT should be 1

and gcm_run.j by inserting "exit" after

$RUN_CMD $NPES ./GEOSgcm.x
set rc =  $status
echo       Status = $rc 

A script that can automate these processes is:

#!/bin/bash

sed -i'.orig' -e '/^JOB_SGMT:/ s/000000[0-9][0-9]/00000001/' \
              -e '/^NUM_SGMT:/ s/[0-9][0-9]*/1/' \
              -e '/^MAPL_ENABLE_TIMERS:/ s/NO/YES/' CAP.rc

sed -i'.orig' -e '/^echo       Sta/ a exit' \
              -e '/^#PBS -j oe/ a #PBS -m abe\n#PBS -M email@myserver.com' \
              -e '/^  if(-e $EXPDIR\/$rst ) \/bin/ s/cp/ln -s/' gcm_run.j

Note: This script also sets up email notification as well as linking restarts to the scratch/ directory rather than copying (which is nice if running at high-res).

Compare to a "correct" run by using cmpscratch

a. If all success, runs are bit-identical b. If not => "What is correct?"