Running the GEOS-5 SBU Benchmark: Difference between revisions

(18 intermediate revisions by the same user not shown)

Line 1:

==Build and install the model==

First, untar the model tarball (in ~~$NOBACKUP!!!)~~:

First, untar the model tarball in <tt>nobackup</tt> or <tt>swdev</tt>; the model alone will break the <tt>home</tt> quota:

$ tar xf ~~Heracles~~-~~UNSTABLE-MPT~~-Benchmark~~.2017Feb13~~.tar.gz

$ tar xf GEOSadas-5_16_5-Benchmark.tar.gz

Next, set up ESMADIR:

$ setenv ESMADIR <directory-to>/~~Heracles~~-~~UNSTABLE-MPT~~-Benchmark/~~GEOSagcm~~

$ setenv ESMADIR <directory-to>/GEOSadas-5_16_5-Benchmark/GEOSadas

it is just below the src/ directory.

Line 84:

All 22 packages compiled successfully.

In case of errors, gmh summarizes exactly where it happens by indicating the package where it ~~occured~~. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.

In case of errors, gmh summarizes exactly where it happens by indicating the package where it occurred. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.

<!--

===Advanced features===

Line 106:

These effectively let you change whatever you want - useful for debugging, etc. For example, you can set your timers in ~/.esma_base.mk.

~~-->~~

==Run ~~the model~~==

==Test Build with One-Day Run==

To make sure all works, we will first try setting up a simple one-day experiment.

===Setting up a one-day experiment===

Line 116:

Line 117:

$ cd $ESMADIR/src/Applications/GEOSgcm_App

~~$ echo $NOBACKUP > ~/.HOMDIRroot~~

~~$ echo $NOBACKUP > ~/.EXPDIRroot~~

These echos set up some defaults. While they both don't have to be in the same location, it's highly recommended they are and the use of the script below assumes they are. Also, make sure they are in a nobackup directory.

Now you can run <tt>create_expt.py</tt>:

<nowiki>$ ~mathomp4/bin/create_exp.py -h

usage: create_expt.py [-h] [-v] [-q] [--expdsc EXPDSC]

usage: create_expt.py [-h] [-v] [-q] [--expdsc EXPDSC] [--expdir EXPDIR]

[--horz {a,b,c,d,e,c12,c24,c48,c90,c180,c360,c720,c1440,c2880}]

[--vert {72,132}] [--ocean {o1,o2,o3}] [--land {1,2}]

Line 144:

Line 141:

-q, --quiet Quietly Setup Experiment (no printing)

--expdsc EXPDSC Experiment Description (Default: same as expid)

--expdir EXPDIR Experiment Directory Root *NOT CONTAINING EXPID*

(Default is what is in ~/.EXPDIRroot:

/discover/nobackup/mathomp4 )

--horz {a,b,c,d,e,c12,c24,c48,c90,c180,c360,c720,c1440,c2880}

Horizontal Resolution (Default: c48 on clusters, c12

Line 160:

--gpu Setup Experiment to use GPUs</nowiki>

This is a script that attempts to ease setting up a GEOS-5 AGCM run. It has some "smarts" in that it will detect from an experiment ID some information, but you can be explicit (to the point you can specify impossible experiments too).

To actually create the experiment, run the script:

Your first choice is where you'd like to run the experiment. $NOBACKUP or other disks with space are vital. For example, if you'd like to run the experiment in /discover/nobackup/username/experiment-name, then run the script with --expdir /discover/nobackup/username. To actually create the experiment, run the script and choose a C48 horizontal resolution:

$ ~mathomp4/bin/create_expt.py test-1day-expt --horz c48 --account <ACCOUNTID>

$ ~mathomp4/bin/create_expt.py test-1day-expt --horz c48 --account <ACCOUNTID> --expdir /discover/nobackup/username

Horizontal resolution c48 passed in

Using c48 horizontal resolution

Line 177:

Running gcm_setup...done.

Experiment is located in directory: /discover/nobackup/~~mathomp4~~/test-1day-expt

Experiment is located in directory: /discover/nobackup/username/test-1day-expt

If you don't pass in an account-id, you'll get the default of g0620 (the developer's account).

Line 251:

At this point, you should be able to <tt>sbatch gcm_run.j</tt> and the model should run a day.

-->

==Benchmark Run==

The full benchmark run is a run of 5-days using a portable version of the GEOS-5 boundary conditions. This will use space and cores. For effective benchmarking of I/O, it's recommended to run on less congested than nobackup.

===Learn to love tcsh===

One preliminary note is that GEOS-5 is, in many ways, a collection of csh/tcsh scripts. If things start going wrong, the answer can often be "change your shell to tcsh and try". Yes, it's not bash/fish/zsh, but it is what it is. I don't think it's entirely necessary for just this automated work, but it could happen.

===Setting up benchmark experiment===

Go into the model application directory and do a couple of preliminary commands:

$ cd $ESMADIR/src/Applications/GEOSgcm_App

These echos set up some defaults. While they both don't have to be in the same location, it's highly recommended they are and the use of the script below assumes they are. Also, make sure they are in a nobackup directory.

For the next few commands, you will need to know the location of the portable BCs directory used for this experiment, which is referred to below as <tt>$PORTBCS</tt>. On discover, a version will always be at:

/discover/nobackup/mathomp4/HugeBCs-H50

To create the experiment, run create_expt.py and choose a C720 horizontal resolution with climatological GOCART:

$ $PORTBCS/scripts/create_expt.py benchmark-GEOSadas-5_16-5-5day-c720 --horz c720 --ocean o3 --gocart C --account <ACCOUNTID> --expdir <root-for-experiment>

Found c720 horizontal resolution in experiment name

Using c720 horizontal resolution

Assuming default vertical resolution of 72

Using 72 vertical resolution

Ocean resolution of o3 passed in

Using o3 ocean resolution

Using climatological aerosols

Running gcm_setup...done.

Experiment is located in directory: <root-for-experiment>/benchmark-GEOSadas-5_16-5-5day-c720

Again, if you don't pass in an account-id, you'll get the default of g0620 (the developer's account).

===Setup and Run Benchmark===

Now ''change to the experiment directory'' and run MakeSBUBench.bash which will set up the experiment:

$ $PORTBCS/scripts/MakeSBUBench.bash

The script sets the run up for 5 days using 5400 cores, and other flags are tripped to best emulate Ops.

'''NOTE''': This will also set the experiment to run in this SLURM environment:

#SBATCH --partition=preops

#SBATCH --qos=benchmark

This is how the script's developer can run 5400-core jobs. Others might have different partition/qos to submit to. Please edit these before sbatch submission to a <tt>partition/qos</tt> that you have access to that can accept a 5400-core job.

Finally, submit the job:

$ sbatch gcm_run.j

@@ Line 1: / Line 1: @@
 ==Build and install the model==
-First, untar the model tarball (in $NOBACKUP!!!):
+First, untar the model tarball in <tt>nobackup</tt> or <tt>swdev</tt>; the model alone will break the <tt>home</tt> quota:
-  $ tar xf Heracles-UNSTABLE-MPT-Benchmark.2017Feb13.tar.gz
+  $ tar xf GEOSadas-5_16_5-Benchmark.tar.gz
 Next, set up ESMADIR:
-  $ setenv ESMADIR <directory-to>/Heracles-UNSTABLE-MPT-Benchmark/GEOSagcm
+  $ setenv ESMADIR <directory-to>/GEOSadas-5_16_5-Benchmark/GEOSadas
 it is just below the src/ directory.
@@ Line 84: / Line 84: @@
   All 22 packages compiled successfully.
-In case of errors, gmh summarizes exactly where it happens by indicating the package where it occured. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.
+In case of errors, gmh summarizes exactly where it happens by indicating the package where it occurred. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.
 <!--
 ===Advanced features===
@@ Line 106: / Line 106: @@
 These effectively let you change whatever you want - useful for debugging, etc. For example, you can set your timers in ~/.esma_base.mk.
--->
-==Run the model==
+==Test Build with One-Day Run==
 To make sure all works, we will first try setting up a simple one-day experiment.
 ===Setting up a one-day experiment===
@@ Line 116: / Line 117: @@
   $ cd $ESMADIR/src/Applications/GEOSgcm_App
- $ echo $NOBACKUP > ~/.HOMDIRroot
- $ echo $NOBACKUP > ~/.EXPDIRroot
-These echos set up some defaults. While they both don't have to be in the same location, it's highly recommended they are and the use of the script below assumes they are. Also, make sure they are in a nobackup directory.
 Now you can run <tt>create_expt.py</tt>:
   <nowiki>$ ~mathomp4/bin/create_exp.py -h
-usage: create_expt.py [-h] [-v] [-q] [--expdsc EXPDSC]
+usage: create_expt.py [-h] [-v] [-q] [--expdsc EXPDSC] [--expdir EXPDIR]
                        [--horz {a,b,c,d,e,c12,c24,c48,c90,c180,c360,c720,c1440,c2880}]
                        [--vert {72,132}] [--ocean {o1,o2,o3}] [--land {1,2}]
@@ Line 144: / Line 141: @@
    -q, --quiet           Quietly Setup Experiment (no printing)
    --expdsc EXPDSC       Experiment Description (Default: same as expid)
+  --expdir EXPDIR       Experiment Directory Root *NOT CONTAINING EXPID*
+                        (Default is what is in ~/.EXPDIRroot:
+                        /discover/nobackup/mathomp4 )
    --horz {a,b,c,d,e,c12,c24,c48,c90,c180,c360,c720,c1440,c2880}
                          Horizontal Resolution (Default: c48 on clusters, c12
@@ Line 160: / Line 160: @@
    --gpu                 Setup Experiment to use GPUs</nowiki>
 This is a script that attempts to ease setting up a GEOS-5 AGCM run. It has some "smarts" in that it will detect from an experiment ID some information, but you can be explicit (to the point you can specify impossible experiments too).
-To actually create the experiment, run the script:
+Your first choice is where you'd like to run the experiment. $NOBACKUP or other disks with space are vital. For example, if you'd like to run the experiment in /discover/nobackup/username/experiment-name, then run the script with --expdir /discover/nobackup/username. To actually create the experiment, run the script and choose a C48 horizontal resolution:
-  $ ~mathomp4/bin/create_expt.py test-1day-expt --horz c48 --account <ACCOUNTID>
+  $ ~mathomp4/bin/create_expt.py test-1day-expt --horz c48 --account <ACCOUNTID> --expdir /discover/nobackup/username
   Horizontal resolution c48 passed in
   Using c48 horizontal resolution
@@ Line 177: / Line 177: @@
    Running gcm_setup...done.
-  Experiment is located in directory: /discover/nobackup/mathomp4/test-1day-expt
+  Experiment is located in directory: /discover/nobackup/username/test-1day-expt
 If you don't pass in an account-id, you'll get the default of g0620 (the developer's account).
@@ Line 251: / Line 251: @@
 At this point, you should be able to <tt>sbatch gcm_run.j</tt> and the model should run a day.
+-->
+==Benchmark Run==
+The full benchmark run is a run of 5-days using a portable version of the GEOS-5 boundary conditions. This will use space and cores. For effective benchmarking of I/O, it's recommended to run on less congested than nobackup.
+===Learn to love tcsh===
+One preliminary note is that GEOS-5 is, in many ways, a collection of csh/tcsh scripts. If things start going wrong, the answer can often be "change your shell to tcsh and try". Yes, it's not bash/fish/zsh, but it is what it is. I don't think it's entirely necessary for just this automated work, but it could happen.
+===Setting up benchmark experiment===
+Go into the model application directory and do a couple of preliminary commands:
+ $ cd $ESMADIR/src/Applications/GEOSgcm_App
+These echos set up some defaults. While they both don't have to be in the same location, it's highly recommended they are and the use of the script below assumes they are. Also, make sure they are in a nobackup directory.
+For the next few commands, you will need to know the location of the portable BCs directory used for this experiment, which is referred to below as <tt>$PORTBCS</tt>. On discover, a version will always be at:
+ /discover/nobackup/mathomp4/HugeBCs-H50
+To create the experiment, run create_expt.py and choose a C720 horizontal resolution with climatological GOCART:
+ $ $PORTBCS/scripts/create_expt.py benchmark-GEOSadas-5_16-5-5day-c720 --horz c720 --ocean o3 --gocart C --account <ACCOUNTID> --expdir <root-for-experiment>
+ Found c720 horizontal resolution in experiment name
+ Using c720 horizontal resolution
+ Assuming default vertical resolution of 72
+ Using 72 vertical resolution
+ Ocean resolution of o3 passed in
+ Using o3 ocean resolution
+ Using climatological aerosols
+  Running gcm_setup...done.
+ Experiment is located in directory: <root-for-experiment>/benchmark-GEOSadas-5_16-5-5day-c720
+Again, if you don't pass in an account-id, you'll get the default of g0620 (the developer's account).
+===Setup and Run Benchmark===
+Now ''change to the experiment directory'' and run MakeSBUBench.bash which will set up the experiment:
+ $ $PORTBCS/scripts/MakeSBUBench.bash
+The script sets the run up for 5 days using 5400 cores, and other flags are tripped to best emulate Ops.
+'''NOTE''': This will also set the experiment to run in this SLURM environment:
+ #SBATCH --partition=preops
+ #SBATCH --qos=benchmark
+This is how the script's developer can run 5400-core jobs. Others might have different partition/qos to submit to. Please edit these before sbatch submission to a <tt>partition/qos</tt> that you have access to that can accept a 5400-core job.
+Finally, submit the job:
+ $ sbatch gcm_run.j