Running the GEOS-5 SBU Benchmark: Difference between revisions

Latest revision as of 05:25, 6 April 2017

Build and install the model

First, untar the model tarball in nobackup or swdev; the model alone will break the home quota:

$ tar xf GEOSadas-5_16_5-Benchmark.tar.gz

Next, set up ESMADIR:

$ setenv ESMADIR <directory-to>/GEOSadas-5_16_5-Benchmark/GEOSadas

it is just below the src/ directory.

Go into the src/ directory of your model. Following above:

$ cd $ESMADIR/src

Setup the environment by sourcing the g5_modules file:

$ source g5_modules

To build the model, you have one of two choices. First, you can use the parallel_build.csh script to submit a PBS job that compiles the model:

$ ./parallel_build.csh

or you can interactively build the model using:

$ make install

To capture the install log, we recommend tee'ing the output to a file:

$ make install |& tee make.install.log (on tcsh)
$ make install 2>&1 | tee make.install.log (on bash)

Note you can also build in parallel interactively with:

$ make -j8 pinstall |& tee make.pinstall.log (on tcsh)

where N is the number of parallel processes. From testing, 8 jobs is about as much as is useful. You can use more, but no benefit will accrue.

By default, the Intel Fortran compiler (ifort) is used for the build process. For other compilers, contact matthew.thompson@nasa.gov for instructions to use GCC or PGI compilers.

Monitor build process

The build can be monitored using the utility gmh.pl in the directory Config. From the src directory:

$ Config/gmh.pl -Av make.install.log

outputs the build status as

                          --------
                          Packages
                          --------

         >>>> Fatal Error           .... Ignored Error

 [ok]      Config
 [ok]      GMAO_Shared
 [ok]      |    GMAO_mpeu
 [ok]      |    |    mpi0
 [ok]      |    GMAO_pilgrim
 [ok]      |    GMAO_gfio
 [ok]      |    |    r4
 [ok]      |    |    r8
 [ok]      |    GMAO_perllib
 [ok]      |    MAPL_cfio
 [ok]      |    |    r4
 [ok]      |    |    r8
 [ok]      |    MAPL_Base
 [ok]      |    |    TeX
 [ok]      |    GEOS_Shared
 [ 1] .... .... Chem_Base
 [ok]      |    Chem_Shared
 [ok]      |    GMAO_etc
 [ok]      |    GMAO_hermes
 [ 2] .... .... GFDL_fms
 [ok]      |    GEOS_Util
 [ok]      |    |    post 

                          -------
                          Summary
                          -------

IGNORED mpp_comm_sma.d mpp_transmit_sma.d Chem_AodMod.d (3 files in 2 packages)
All 22 packages compiled successfully.

In case of errors, gmh summarizes exactly where it happens by indicating the package where it occurred. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.

Benchmark Run

The full benchmark run is a run of 5-days using a portable version of the GEOS-5 boundary conditions. This will use space and cores. For effective benchmarking of I/O, it's recommended to run on less congested than nobackup.

Learn to love tcsh

One preliminary note is that GEOS-5 is, in many ways, a collection of csh/tcsh scripts. If things start going wrong, the answer can often be "change your shell to tcsh and try". Yes, it's not bash/fish/zsh, but it is what it is. I don't think it's entirely necessary for just this automated work, but it could happen.

Setting up benchmark experiment

Go into the model application directory and do a couple of preliminary commands:

$ cd $ESMADIR/src/Applications/GEOSgcm_App

These echos set up some defaults. While they both don't have to be in the same location, it's highly recommended they are and the use of the script below assumes they are. Also, make sure they are in a nobackup directory.

For the next few commands, you will need to know the location of the portable BCs directory used for this experiment, which is referred to below as $PORTBCS. On discover, a version will always be at:

/discover/nobackup/mathomp4/HugeBCs-H50

To create the experiment, run create_expt.py and choose a C720 horizontal resolution with climatological GOCART:

$ $PORTBCS/scripts/create_expt.py benchmark-GEOSadas-5_16-5-5day-c720 --horz c720 --ocean o3 --gocart C --account <ACCOUNTID> --expdir <root-for-experiment>
Found c720 horizontal resolution in experiment name
Using c720 horizontal resolution

Assuming default vertical resolution of 72
Using 72 vertical resolution

Ocean resolution of o3 passed in 
Using o3 ocean resolution

Using climatological aerosols
 Running gcm_setup...done.

Experiment is located in directory: <root-for-experiment>/benchmark-GEOSadas-5_16-5-5day-c720

Again, if you don't pass in an account-id, you'll get the default of g0620 (the developer's account).

Setup and Run Benchmark

Now change to the experiment directory and run MakeSBUBench.bash which will set up the experiment:

$ $PORTBCS/scripts/MakeSBUBench.bash

The script sets the run up for 5 days using 5400 cores, and other flags are tripped to best emulate Ops.

NOTE: This will also set the experiment to run in this SLURM environment:

#SBATCH --partition=preops
#SBATCH --qos=benchmark

This is how the script's developer can run 5400-core jobs. Others might have different partition/qos to submit to. Please edit these before sbatch submission to a partition/qos that you have access to that can accept a 5400-core job.

Finally, submit the job:

$ sbatch gcm_run.j

Running the GEOS-5 SBU Benchmark: Difference between revisions

Latest revision as of 05:25, 6 April 2017

Contents

Build and install the model

Monitor build process

Benchmark Run

Learn to love tcsh

Setting up benchmark experiment

Setup and Run Benchmark

Navigation menu

@@ Line 1: / Line 1: @@
-==Method for Checking Out GEOS-5 GCM==
+==Build and install the model==
-===Set up CVSROOT===
-First, set up your CVSROOT environment variable using the scheme provided [https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL NCCS's CVSACL webpage] (requires NCCS login) where:
- On Discover and NAS: CVSROOT=:ext:$USER@cvsacldirect:/cvsroot/esma
- Elsewhere:           CVSROOT=:ext:$USER@ctunnel:/cvsroot/esma
-===Set up CVS_RSH===
-Also, set CVS_RSH=ssh in your environment.
-====Start the tunnel (machines other than Discover and Pleiades)====
-==Checking out the model==
-Make the directory in which you wish to checkout the model:
- $ mkdir Heracles-UNSTABLE
- $ cd Heracles-UNSTABLE/
-And do the actual checkout using:
- $ cvs co -r Heracles-UNSTABLE GEOSagcm
+First, untar the model tarball in <tt>nobackup</tt> or <tt>swdev</tt>; the model alone will break the <tt>home</tt> quota:
-In general:
+  $ tar xf GEOSadas-5_16_5-Benchmark.tar.gz
-  $ cvs co -r <Tag Name> <Module Name>
-where <Tag Name> is the tag for the model to check out (e.g., Heracles-UNSTABLE, Ganymed-4_1) and <Module Name> is the module (e.g., GEOSagcm usually, though some older tags use: Ganymed, Fortuna).
-==Build and install the model==
-First, set up ESMADIR:
+Next, set up ESMADIR:
-  $ setenv ESMADIR <directory-to>/Heracles-UNSTABLE/GEOSagcm
+  $ setenv ESMADIR <directory-to>/GEOSadas-5_16_5-Benchmark/GEOSadas
 it is just below the src/ directory.
@@ Line 54: / Line 25: @@
 or you can interactively build the model using:
-  $ gmake install
+  $ make install
 To capture the install log, we recommend tee'ing the output to a file:
-  $ gmake install |& tee make.install.log (on tcsh)
+  $ make install |& tee make.install.log (on tcsh)
-  $ gmake install 2>&1 | tee make.install.log (on bash)
+  $ make install 2>&1 | tee make.install.log (on bash)
 Note you can also build in parallel interactively with:
-  $ gmake --jobs=N pinstall |& tee make.pinstall.log (on tcsh)
+  $ make -j8 pinstall |& tee make.pinstall.log (on tcsh)
-where N is the number of parallel processes. For best performance, N should never be, say, more than 2 less than the number of cores. So, on a Haswell node, you could use 26, but any number over about 10 is probably not worth it. Note: for the sake of others, ''do this on an interactive node'' if you are going to use 16 cores or so. If you are on a head node, a 2-4 core pinstall is probably safe and shouldn't bother too many people.
+where N is the number of parallel processes. From testing, 8 jobs is about as much as is useful. You can use more, but no benefit will accrue.
 By default, the Intel Fortran compiler (ifort) is used for the build process. For other compilers, contact matthew.thompson@nasa.gov for instructions to use GCC or PGI compilers.
@@ Line 113: / Line 84: @@
   All 22 packages compiled successfully.
-In case of errors, gmh summarizes exactly where it happens by indicating the package where it occured. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.
+In case of errors, gmh summarizes exactly where it happens by indicating the package where it occurred. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.
+<!--
 ===Advanced features===
@@ Line 135: / Line 106: @@
 These effectively let you change whatever you want - useful for debugging, etc. For example, you can set your timers in ~/.esma_base.mk.
-==Run the model==
-===Setting up an experiment===
+==Test Build with One-Day Run==
+To make sure all works, we will first try setting up a simple one-day experiment.
-Go into the model application directory and run the setup script:
+===Setting up a one-day experiment===
+Go into the model application directory and do a couple of preliminary commands:
   $ cd $ESMADIR/src/Applications/GEOSgcm_App
- $ ./gcm_setup
-and provide required answers.
+Now you can run <tt>create_expt.py</tt>:
+ <nowiki>$ ~mathomp4/bin/create_exp.py -h
+usage: create_expt.py [-h] [-v] [-q] [--expdsc EXPDSC] [--expdir EXPDIR]
+                      [--horz {a,b,c,d,e,c12,c24,c48,c90,c180,c360,c720,c1440,c2880}]
+                      [--vert {72,132}] [--ocean {o1,o2,o3}] [--land {1,2}]
+                      [--runoff {yes,no}] [--gocart {A,C}]
+                      [--emission {MERRA2,PIESA,CMIP,NR,MERRA2-DD,OPS}]
+                      [--history HISTORY] [--account ACCOUNT] [--gpu]
+                      expid
+Utility to quickly create experiment. At present, it creates an experiment in
+the same manner as gcm_setup would with home and experiment directories as
+usual.
+positional arguments:
+  expid                 Experiment ID
+optional arguments:
+  -h, --help            show this help message and exit
+  -v, --verbose         Verbose output
+  -q, --quiet           Quietly Setup Experiment (no printing)
+  --expdsc EXPDSC       Experiment Description (Default: same as expid)
+  --expdir EXPDIR       Experiment Directory Root *NOT CONTAINING EXPID*
+                        (Default is what is in ~/.EXPDIRroot:
+                        /discover/nobackup/mathomp4 )
+  --horz {a,b,c,d,e,c12,c24,c48,c90,c180,c360,c720,c1440,c2880}
+                        Horizontal Resolution (Default: c48 on clusters, c12
+                        on desktop)
+  --vert {72,132}       Vertical Resolution (Default: 72)
+  --ocean {o1,o2,o3}    Data Ocean Resolution (Default: o1)
+  --land {1,2}          Land Surface Model (Default: 1)
+  --runoff {yes,no}     Runoff Routing Model (Default: no)
+  --gocart {A,C}        GOCART aerosols: Actual (A) or Climatological (C)
+                        (Default: A on clusters, C on desktops)
+  --emission {MERRA2,PIESA,CMIP,NR,MERRA2-DD,OPS}
+                        GOCART Emissions to use (Default: MERRA2)
+  --history HISTORY     History Template (Default: Current)
+  --account ACCOUNT     Account Number to Use (Default: g0620 at NCCS, g26141
+                        at NAS)
+  --gpu                 Setup Experiment to use GPUs</nowiki>
+This is a script that attempts to ease setting up a GEOS-5 AGCM run. It has some "smarts" in that it will detect from an experiment ID some information, but you can be explicit (to the point you can specify impossible experiments too).
+Your first choice is where you'd like to run the experiment. $NOBACKUP or other disks with space are vital. For example, if you'd like to run the experiment in /discover/nobackup/username/experiment-name, then run the script with --expdir /discover/nobackup/username. To actually create the experiment, run the script and choose a C48 horizontal resolution:
+ $ ~mathomp4/bin/create_expt.py test-1day-expt --horz c48 --account <ACCOUNTID> --expdir /discover/nobackup/username
+ Horizontal resolution c48 passed in
+ Using c48 horizontal resolution
+ Assuming default vertical resolution of 72
+ Using 72 vertical resolution
+ Assuming default ocean resolution of o1
+ Using o1 ocean resolution
+ Using actual aerosols
+  Running gcm_setup...done.
+ Experiment is located in directory: /discover/nobackup/username/test-1day-expt
-'''''MAT: Add a sample run here'''''
+If you don't pass in an account-id, you'll get the default of g0620 (the developer's account).
 ===Run a one-day test===
-To run the job, first change to the experiment directory you specified in the above <code>gcm_setup</code>.  Then, copy a set of restarts to model directory. Sample restarts are provided on Discover (at <code>/discover/nobackup/mathomp4/Restarts-G10</code>) and Pleiades (at <code>/nobackup/gmao_SIteam/ModelData/Restarts-G10/</code>).
+To run a test one-day run, go to the experiment directory:
+ <nowiki>$ cd /discover/nobackup/mathomp4/test-1day-expt
+total 96M
+-rwxr-xr-x 1 mathomp4 g0620 96M Feb 13 11:11 GEOSgcm.x*
+-rwxr-xr-x 1 mathomp4 g0620 573 Feb 13 14:47 CAP.rc*
+-rwxr-xr-x 1 mathomp4 g0620 364 Feb 13 14:47 fvcore_layout.rc*
+-rw-r--r-- 1 mathomp4 g0620 25K Feb 13 14:47 AGCM.rc
+-rwxr-xr-x 1 mathomp4 g0620 21K Feb 13 14:47 gcm_run.j*
+-rw-r--r-- 1 mathomp4 g0620 61K Feb 13 14:47 HISTORY.rc
+drwxr-xr-x 2 mathomp4 g0620 512 Feb 13 14:47 post/
+drwxr-xr-x 2 mathomp4 g0620 512 Feb 13 14:47 plot/
+drwxr-xr-x 2 mathomp4 g0620 512 Feb 13 14:47 archive/
+drwxr-xr-x 2 mathomp4 g0620 512 Feb 13 14:47 regress/
+drwxr-xr-x 2 mathomp4 g0620 512 Feb 13 14:47 convert/
+drwxr-xr-x 2 mathomp4 g0620 512 Feb 13 14:47 forecasts/
+drwxr-xr-x 2 mathomp4 g0620 16K Feb 13 14:47 RC/
+Directory: /discover/nobackup/mathomp4/test-1day-expt</nowiki>
+Now you'll run another script called <tt>makeoneday.bash</tt> that edits some .rc files, edits run scripts, as well as linking in some example restarts and sets up the model to stop after one day (with no options). Note: these restarts are write-protected, so if you want to run a multi-year run, don't use this script!
+ <nowiki>$ ~mathomp4/bin/makeoneday.bash
+Using Heracles-5_0 directories
+Making one-day experiment
+Restoring AGCM.rc.save to AGCM.rc...
+Copying AGCM.rc to AGCM.rc.save...
+Restoring CAP.rc.save to CAP.rc...
+Copying CAP.rc to CAP.rc.save...
+Restoring HISTORY.rc.save to HISTORY.rc...
+Copying HISTORY.rc to HISTORY.rc.save...
+Restoring gcm_run.j.save to gcm_run.j...
+Copying gcm_run.j to gcm_run.j.save...
+Restoring regress/gcm_regress.j.save to regress/gcm_regress.j...
+Copying regress/gcm_regress.j to regress/gcm_regress.j.save...
+Restoring RC/GEOS_ChemGridComp.rc.save to RC/GEOS_ChemGridComp.rc...
+Copying RC/GEOS_ChemGridComp.rc to RC/GEOS_ChemGridComp.rc.save...
+DYN_INTERNAL_RESTART_TYPE not found. Assuming NC4
+Found fvcore_internal_rst. Assuming you have needed restarts!
+Changes made to CAP.rc:
+,10c9,10
+< JOB_SGMT:     00000015 000000
+< NUM_SGMT:     20
+---
+> JOB_SGMT:     00000001 000000
+> NUM_SGMT:     1
+c25
+< MAPL_ENABLE_TIMERS: NO
+---
+> MAPL_ENABLE_TIMERS: YES
+Changes made to gcm_run.j:
+c7
+< #PBS -l walltime=12:00:00
+---
+> #PBS -l walltime=0:15:00
+a12
+> #SBATCH --mail-type=ALL
+c261
+< if($numrs == 0) then
+---
+> if($numrs == 1) then
+a435
+> exit
+</nowiki>
+At this point, you should be able to <tt>sbatch gcm_run.j</tt> and the model should run a day.
+-->
+==Benchmark Run==
+The full benchmark run is a run of 5-days using a portable version of the GEOS-5 boundary conditions. This will use space and cores. For effective benchmarking of I/O, it's recommended to run on less congested than nobackup.
+===Learn to love tcsh===
+One preliminary note is that GEOS-5 is, in many ways, a collection of csh/tcsh scripts. If things start going wrong, the answer can often be "change your shell to tcsh and try". Yes, it's not bash/fish/zsh, but it is what it is. I don't think it's entirely necessary for just this automated work, but it could happen.
+===Setting up benchmark experiment===
+Go into the model application directory and do a couple of preliminary commands:
+ $ cd $ESMADIR/src/Applications/GEOSgcm_App
-Then, edit CAP.rc:
+These echos set up some defaults. While they both don't have to be in the same location, it's highly recommended they are and the use of the script below assumes they are. Also, make sure they are in a nobackup directory.
- JOB_SGMT should be 1 day (00000001 000000)
- NUM_SGMT should be 1
-and gcm_run.j by inserting "exit" after
+For the next few commands, you will need to know the location of the portable BCs directory used for this experiment, which is referred to below as <tt>$PORTBCS</tt>. On discover, a version will always be at:
-  $RUN_CMD $NPES ./GEOSgcm.x
+  /discover/nobackup/mathomp4/HugeBCs-H50
- set rc =  $status
- echo       Status = $rc
-A script that can automate these processes is:
+To create the experiment, run create_expt.py and choose a C720 horizontal resolution with climatological GOCART:
-  #!/bin/bash
+  $ $PORTBCS/scripts/create_expt.py benchmark-GEOSadas-5_16-5-5day-c720 --horz c720 --ocean o3 --gocart C --account <ACCOUNTID> --expdir <root-for-experiment>
+ Found c720 horizontal resolution in experiment name
+ Using c720 horizontal resolution
-  sed -i'.orig' -e '/^JOB_SGMT:/ s/000000[0-9][0-9]/00000001/' \
+  Assuming default vertical resolution of 72
-               -e '/^NUM_SGMT:/ s/[0-9][0-9]*/1/' \
+ Using 72 vertical resolution
-               -e '/^MAPL_ENABLE_TIMERS:/ s/NO/YES/' CAP.rc
-  sed -i'.orig' -e '/^echo       Sta/ a exit' \
+  Ocean resolution of o3 passed in
-               -e '/^#PBS -j oe/ a #PBS -m abe\n#PBS -M ''email@myserver.com''' \
+ Using o3 ocean resolution
-               -e '/^  if(-e $EXPDIR\/$rst ) \/bin/ s/cp/ln -s/' gcm_run.j
+ Using climatological aerosols
+  Running gcm_setup...done.
+ Experiment is located in directory: <root-for-experiment>/benchmark-GEOSadas-5_16-5-5day-c720
+Again, if you don't pass in an account-id, you'll get the default of g0620 (the developer's account).
+===Setup and Run Benchmark===
+Now ''change to the experiment directory'' and run MakeSBUBench.bash which will set up the experiment:
+  $ $PORTBCS/scripts/MakeSBUBench.bash
+The script sets the run up for 5 days using 5400 cores, and other flags are tripped to best emulate Ops.
+'''NOTE''': This will also set the experiment to run in this SLURM environment:
+ #SBATCH --partition=preops
+ #SBATCH --qos=benchmark
+This is how the script's developer can run 5400-core jobs. Others might have different partition/qos to submit to. Please edit these before sbatch submission to a <tt>partition/qos</tt> that you have access to that can accept a 5400-core job.
-Note: This script also sets up email notification as well as linking restarts to the scratch/ directory rather than copying (which is nice if running at high-res).
+Finally, submit the job:
-Compare to a "correct" run by using cmpscratch
+ $ sbatch gcm_run.j
-a. If all success, runs are bit-identical
-b. If not => "What is correct?"
 [[Category:SI Team]]
 [[Category:CVS]]
 [[Category:Running the Model]]

Running the GEOS-5 SBU Benchmark: Difference between revisions

Latest revision as of 05:25, 6 April 2017

Build and install the model

Monitor build process

Benchmark Run

Learn to love tcsh

Setting up benchmark experiment

Setup and Run Benchmark

Navigation menu

Search