Ganymed 4.0 User's Guide

From GEOS-5
Jump to navigation Jump to search

This page describes in detail how to set up and optimize a global model run of GEOS-5 Ganymed 4.0 on NCCS discover and NAS pleiades and generally make the model do what you want. It assumes that you have already run the model as described in Ganymed 4.0 Quick Start.

Back to GEOS-5 Documentation for Ganymed 4.0

Compiling the Model

Most of the time for longer runs you will be using a release version of the model, perhaps compiled with a different version of one or more of the model's gridded components, as defined by subdirectories in the source code. This process starts with checking out the stock model from the repository using the command

 cvs co -r  TAGNAME -d DIRECTORY Fortuna

where TAGNAME is the model "tag" (version). A tag in cvs marks the various versions of the source files in the repository that together make up a particular version of the model. A sample release tag is Ganymed-4_0_BETA8, indicating the latest patch of version Ganymed 4.0 for general use. DIRECTORY is the directory that the source code tree will be created. If you are using a stock model tag it is reasonable to name the directory the same as the tag. This directory determines which model in presumably your space a particular experiment is using. Some scripts use the environment variable ESMADIR, which should be set to the absolute (full) pathname of this directory.

When a modified version of some component of the model is saved to the repository, the tag it uses -- different from the standard model tag -- is supposed to be applied at most only to the directories with modified files. This means that if you need to use some variant tag of a gridded component, you will have to cd to that directory and update to the variant tag. So, for example, if you needed to apply updates to the SatSim gridded component, you would have to cd several levels down to the directory GEOSsatsim_GridComp and run

 cvs upd -r  VARIANT_TAGNAME

The source code will then incorporate the tag's modifications.

Once the checkout from the repository is completed, you are ready to compile. cd to the src directory at the top of the source code directory tree and from a csh shell run source g5_modules. This will load the appropriate modules and create the necessary environment for compiling and running. It is tailored to the individual systems that GEOS-5 usually runs on, so it probably won't work elsewhere. After that you can run make install, which will create the necessary executables in the directory ARCH/bin, where ARCH is the local architecture (most often Linux).

Setting up a Global Model Run

The following describes how to set up a global model run. The procedure to set up a single column model run is described in Ganymed 4.0 Single Column Model.

Using gcm_setup

The setup script for global runs, gcm_setup, is in the directory src/Applications/GEOSgcm_App. The following is an example of a session with the setup script, with commentary.  :


 Enter the Experiment ID:

Enter a name and hit return. For this example we'll set the experiment ID to "myexp42". Experiment IDs need to have no whitespace and not start with a digit, since it will be the prefix of job names and PBS imposes certain limits on job names.


Enter a 1-line Experiment Description:

This should be short but descriptive, since it will be used to label plots. It can have spaces, though the string will be stored with underscores for the spaces. Provide a description and hit return.

Enter the Atmospheric Horizontal Resolution code:
-----------------------------------------------------------
     Lat/Lon                     Cubed-Sphere
-----------------------------------------------------------
   b --  2  deg                c48  --  2   deg 
   c --  1  deg                c90  --  1   deg 
   d -- 1/2 deg                c180 -- 1/2  deg (56-km) 
   e -- 1/4 deg (35-km)        c360 -- 1/4  deg (28-km)  
                               c720 -- 1/8  deg (14-km) 
                               c1440 - 1/16 deg ( 7-km) 
 

Here you choose whether to run with lat/lon domain decomposition (i.e. how the globe gets distributed to processors) or with the cubed sphere. The science between to two should be identical and the output transparent to the difference, but lat/lon is deprecated and all further development should be done with the cubed sphere.

The options b/c/d/e select a resolution with lat/lon, and c48-c1440 select one with the cubed sphere. Enter a resolution like so:

c48

and hit enter.

Do you wish to run the COUPLED Ocean/Sea-Ice Model? (Default: NO or FALSE)

You probably don't, so hit enter.

Enter the Data_Ocean Horizontal Resolution code: o1 (1  -deg,  360x180  Reynolds) Default
                                                 o2 (1/4-deg, 1440x720  MERRA-2)
                                                 o8 (1/8-deg, 2880x1440 OSTIA)

This selects the source of SST boundary conditions, 1 degree Reynolds, 1/4 degree MERRA-2 or 1/8 degree OSTIA. Unless you are using a higher-resolution experiment, the default will suffice.

Do you wish to run GOCART? (Default: NO or FALSE)

GOCART is the interactive chemistry package, as opposed to prescribed chemistry. It incurs a significant performance cost, so unless you know you want it, you should go with the default. The following assumes that you have entered "y". Otherwise, skip two steps to "Enter the tag..."


Enter the GOCART Emission Files to use: "CMIP" (Default), "PIESA", or "OPS":

Select your favorite emission files here.

Enter the AERO_PROVIDER: GOCART (Default) or PCHEM:

Here you get to choose again to use interactive or prescribed aerosols.

Enter the tag or directory (/filename) of the HISTORY.AGCM.rc.tmpl to use
(To use HISTORY.AGCM.rc.tmpl from current build, Type:  Current         )
-------------------------------------------------------------------------
Hit ENTER to use Default Tag/Location: (Current)

This provides a default HISTORY.rc (output specification) file. The initial default will be the tag of the build in which you are running gcm_setup. The idea is that you can save a custom HISTORY.rc to the repository and have it checked out for your experiments.

 
Enter Desired Location for HOME Directory (to contain scripts and RC files)
Hit ENTER to use Default Location:
----------------------------------
Default:  /discover/nobackup/aeichman/myexp42

This option determines where the experiment's home directory is located -- where the basic job scripts and major RC files (AGCM.rc, CAP.rc and HISTORY.rc) will be located. The first time you run the script it will default to a subdirectory under your account's home directory named geos5, remember what you decide (in ~/.HOMDIRroot) and use that as a default in subsequent times the script is run. This initial default is fine, though another possibility is to enter your nobackup space, as shown here. This will place all of the HOME directory files of the experiment together with the rest of them.

Enter Desired Location for EXPERIMENT Directory (to contain model output and restart files)
Hit ENTER to use Default Location:
----------------------------------
Default:  /discover/nobackup/aeichman/myexp42

This determines the experiment directory, where restart files and various job output is stored. These are the storage-intensive parts and so default to the nobackup space.

Enter Location for Build directory containing:  src/ Linux/ etc...
Hit ENTER to use Default Location:
----------------------------------
Default:  /discover/nobackup/aeichman/Ganymed-4_0_BETA8

This determines which of your local builds is used to create the experiment. It defaults to the build of the script you are running, which is generally a good idea.

Current GROUPS: g0620 gmaoint q_warp q_warp-test
Enter your GROUP ID for Current EXP: (Default: g0620)

This is used for by the job accounting system. If you are not in the default group, you will probably have been informed.


sending incremental file list
GEOSgcm.x

sent 73969908 bytes  received 31 bytes  147939878.00 bytes/sec
total size is 73960783  speedup is 1.00
 
Creating gcm_run.j for Experiment: myexp42 
Creating gcm_post.j for Experiment: myexp42 
Creating gcm_archive.j for Experiment: myexp42 
Creating gcm_regress.j for Experiment: myexp42 
Creating gcm_convert.j for Experiment: myexp42 
Creating gcm_plot.tmpl for Experiment: myexp42 
Creating gcm_forecast.tmpl for Experiment: myexp42 
Creating gcm_forecast.setup for Experiment: myexp42 
Creating CAP.rc.tmpl for Experiment: myexp42 
Creating AGCM.rc.tmpl for Experiment: myexp42 
Creating HISTORY.rc.tmpl for Experiment: myexp42 
Creating ExtData.rc.tmpl for Experiment: myexp42 
Creating fvcore_layout.rc for Experiment: myexp42 
 
Done!
-----

Build Directory: /discover/nobackup/aeichman/ Ganymed-4_0_BETA8
----------------
 
 
The following executable has been placed in your Experiment Directory:
----------------------------------------------------------------------
/discover/nobackup/aeichman/Ganymed-4_0_BETA8/Linux/bin/GEOSgcm.x
 
 
You must now copy your Initial Conditions into: 
----------------------------------------------- 
/discover/nobackup/aeichman/myexp42


Applications/GEOSgcm_App> 

And the experiment is set up. After you copy initial condition files (aka restarts) to the experiment directory, you can submit your job.

Do not copy old experiments

When creating related experiments, you will be tempted to copy the experiment directory tree of an older experiment. Do not copy old experiments, run gcm_setup instead. There are numerous instances where an experiment-specific directory is used in the run scripts created from templates by gcm_setup and they will wreak subtle and pervasive havoc if executed in an unexpected environment. This warning is especially true between model versions. A useful and relatively safe exception to this rule is to copy previously used examples of HISTORY.rc. However, you need to change the lines labeled EXPID and EXPDSC to the values in your automatically-generated HISTORY.rc or the plotting will fail.

Using restart files

Restart files provide the initial conditions for a run, and a set needs to be copied into a fresh experiment directory before running. This includes the file cap_restart, which provides the model starting date and time in text. Restart files themselves are resolution-specific and sometimes change between model versions. As of the current model version, they are flat binary files with no metadata, so they tend to be stored together with restarts of the same provinance with the date either embedded in the filename or in an accompanying cap_restart, typically under a directory indicating the model version.

A cleanly completed model run will leave a set of restarts and the corresponding cap_restart in its experiment directory. Another source is /archive/u/aeichman/restarts. Restarts are also left during runs in date-labeled tarballs in the restarts directory under the experiment directory before being transferred to the user's /archive space. You may have to create the cap_restart, which is simply one line of text with the date of the restart files in the format YYYYMMDD HHMMSS (with a space).

Failing the above sources, you can convert restarts from different resolutions and model versions, including MERRA, as described in Regridding Restarts for Ganymed 1.0.

What Happens During a Run

When the script gcm_run.j starts running, it creates a directory called scratch and copies or links into it the model executable, rc files, restarts and boundary conditions necessary to run the model. It also creates a directory for each of the output collections (in the default setup with the suffix geosgcm_) in the directory holding for before post-processing, and in the experiment directory for after post-processing. It also tars the restarts and moves the tarball to the restarts directory.

Then the executable GEOSgcm.x is run in the scratch directory, starting with the date in cap_restart and running for the length of a segment. A segment is the length of model time that the model integrates before returning, letting gcm_run.j do some housekeeping and then running another segment. A model job will typically run a number of segments before trying to resubmit itself, hopefully before the allotted wallclock time of the job runs out.

The processing that the various batch jobs perform is illustrated below:


Each time a segment ends, gcm_run.j submits a post-processing job before starting a new segment or exiting. The post-processing job moves the model output from the scratch directory to the respective collection directory under holding. Then it determines whether there is a enough output to create a monthly or seasonal mean, and if so, creates them and moves them to the collection directories in the experiment directory, and then tars up the daily output and submits an archiving job. The archiving job tries to move the tarred daily output, the monthly and seasonal means and any tarred restarts to the user's space in archive filesystem. The post-processing script also determines (assuming the default settings) whether enough output exists to create plots; if so, a plotting job is submitted to the queue. The plotting script produces a number of pre-determined plots as .gif files in the plot_CLIM directory in the experiment directory.

You can check on jobs in the queue with qstat. The jobs associated with the run will be named with the experiment name appended with the type of job it is: RUN, POST, ARCH or PLT.

As explained above, the contents of the cap_restart file determine the start of the model run in model time, which determines boundary conditions and the times stamps of the output. The end time may be set in CAP.rc with the property END_DATE (format YYYYMMDD HHMMSS, with a space), though integration is usually leisurely enough that one can just kill the job or rename the run script gcm_run.j so that it is not resubmitted to the job queue.

Tuning a run

Most of the other properties in CAP.rc are discussed elsewhere, but two that are important for understanding how the batch jobs work are JOB_SGMT, the length of the segment, and NUM_SGMT, the number of segments that the job tries to run before resubmitting itself and exiting. JOB_SGMT is in the format of YYYYMMDD HHMMSS (but usually expressed in days) and NUM_SGMT as an integer, so the multiple of the two is the total model time that a job will attempt to run. It may be tempting to just run one long segment, but much housekeeping is done between segments, such as saving state in the form of restarts and spawning archiving jobs that keep your account from running over disk quota. So to tune for the maximum number of segments in a job, it is usually best to manipulate JOB_SGMT.

Determining Output: HISTORY.rc

The contents of the the file HISTORY.rc (in your experiment HOME directory) tell the model what and how to output its state and diagnostic fields. The default HISTORY.rc provides many fields as is, but you may want to modify it to suit your needs.

File format

The top of a default HISTORY.rc will look something like this:

EXPID:  myexp42
EXPDSC: this_is_my_experiment
  
 
COLLECTIONS: 'geosgcm_prog'
             'geosgcm_surf'
             'geosgcm_moist'
             'geosgcm_turb'

[....]

The attribute EXPID must match the name of the experiment HOME directory; this is only an issue if you copy the HISTORY.rc from a different experiment. The EXPDSC attribute is used to label the plots. The COLLECTIONS attribute contains list of strings indicating the output collections to be created. The content of the individual collections are determined after this list. Individual collections can be "turned off" by commenting the relevant line with a #.

The following is an example of a collection specification:

  geosgcm_prog.template:  '%y4%m2%d2_%h2%n2z.nc4',
  geosgcm_prog.archive:   '%c/Y%y4',
  geosgcm_prog.format:    'CFIO',
  geosgcm_prog.frequency:  060000,
  geosgcm_prog.resolution: 144 91,
  geosgcm_prog.vscale:     100.0,
  geosgcm_prog.vunit:     'hPa',
  geosgcm_prog.vvars:     'log(PLE)' , 'DYN'          ,
  geosgcm_prog.levels:     1000 975 950 925 900 875 850 825 800 775 750 725 700 650 600 550 500 450 400 350 300 250 200 150 100 70 50 40 30 20 10 7 5 4 3 2 1 0.7 0.5 0.4 0.3 0.2
0.1 0.07 0.05 0.04 0.03 0.02,
  geosgcm_prog.fields:    'PHIS'     , 'AGCM'         ,
                          'T'        , 'DYN'          ,
                          'PS'       , 'DYN'          ,
                          'ZLE'      , 'DYN'          , 'H'   ,
                          'OMEGA'    , 'DYN'          ,
                          'Q'        , 'MOIST'        , 'QV'  ,
                          ::

The individual collection attributes are described below, but what users modify the most are the fields attribute. This determines which exports are saved in the collection. Each field record is a string with the name of an export from the model followed by a string with the name of the gridded component which exports it, separated by a comma. The entries with a third column determine the name by which that export in saved in the collection file when the name is different from that of the export.

What exports are available?

To add export fields to the HISTORY.rc you will need to know what fields the model provides, which gridded component provides them, and their name. The most straightforward way to do this is to use PRINTSPEC. The setting for PRINTSPEC is in the file CAP.rc. By default the line looks like so:

PRINTSPEC: 0  # (0: OFF, 1: IMPORT & EXPORT, 2: IMPORT, 3: EXPORT)

Setting PRINTSPEC to 3 will make the model send to standard output a list of exports available to HISTORY.rc in the model's current configuration, and then exit without integrating. The list includes each export's gridded component and short name (both necessary to include in HISTORY.rc), long (descriptive) name, units, and number of dimensions. Note that run-time options can affect the exports available, so see to it that you have those set as you intend. The other PRINTSPEC values are useful for debugging.

While you can set PRINTSPEC, submit qsub gcm_run.j, and get the export list as part of PBS standard output, there are quicker ways of obtaining the list. One way is to run it as a single column model on a single processor, as explained in Ganymed 1.0 Single Column Model. Another way is to run it in an existing experiment. In the scratch directory of an experiment that has already run, change PRINTSPEC in CAP.rc as above. Then, in the file AGCM.rc, change the values of NX and NY (near the beginning of the file) to 1. Then, from an interactive job (one processor will suffice), run the executable GEOSgcm.x in scratch. You will need to run source src/g5_modules in the model's build tree to set up the environment. The model executable will simply output the export list to stdout.

Outputting Derived Fields

In addition to writing export fields created by model components (we will refer to these as model fields), the user may specify new fields that will be evaluated using the MAPL parser. These will be referred to as derived fields in the following discussion. The derived fields are evaluated using an expression that involves other fields in the collection as variables. The expression is evaluated element by element to create a new field. Derived fields are specified like a regular field from a gridded component in a history collection with 3 comma separated strings. The difference is now that in place of a variable name string, an expression string that will be evaluated is entered. Following this comes the string specifying the gridded component. You MUST put a string here, which should be the name of a gridded component. Finally a string MUST be entered which is the name of the new variable. This will be the name of the variable in the output file. In general the expression entered will involve variables, functions, and real numbers. The derived fields are evaluated before time and spatial (vertical and horizontal) averaging.

Here are some rules about expressions

  1. Fields in expression can only be model fields.
  2. If the model field has an alias you must use the alias in the expression.
  3. You can not mix center and edge fields in an expression. You can mix 2D and 3D fields if the 3D fields are all center or edge. In this case each level of the 3D field operated with the 2D field. Another way to think of this is that in an expression involving a 2D and 3D field the 2D field gets promoted to a 3D field with the same data in each level.
  4. When parsing an expression the parser first checks if the fields in an expression are part of the collection. Any model field in a collection can be used in an expression in the same collection. However, there might be cases where you wish to output an expression but not the model fields used in the expression. In this case if the parser does not find the field in the collection it checks the gridded component name after the expression for the model field. If the field is found in the gridded component it can use it in the expression. Note that if you have an expression with two model fields from different gridded components you can not use this mechanism to output the expression without outputting either field. One of them must be in the collection.
  5. The alias of an expression can not be used in a subsequent expression.

Here are the rules for the expressions themselves The following can appear in the expression string

  1. The function string can contain the following mathematical operators +, -, *, /, ^ and ()
  2. Variable names - Parsing of variable names is case sensitive.
  3. The following single argument fortran intrinsic functions and user defined functions are implmented: exp, log10, log, sqrt, sinh, cosh, tanh, sin, cos, tan, asin, acos, atan, heav (the Heaviside step function). Parsing of functions is case insensitive.
  4. Integers or real constants. To be recognized as explicit constants these must conform to the format [+|-][nnn][.nnn][e|E|d|D][+|-][nnn] where nnn means any number of digits. The mantissa must contain at least one digit before or following an optional decimal point. Valid exponent identifiers are 'e', 'E', 'd' or 'D'. If they appear they must be followed by a valid exponent!
  5. Operations are evaluated in the order
    1. expressions in brackets
    2. -X unary minux
    3. X^Y exponentiation
    4. X*Y X/Y multiplicaiton and division
    5. A+B X-Y addition and subtraction

In the following example we create a collection that has three derived fields, the magnitude of the wind, the temperature in farenheit, and temperature cubed:

  geosgcm_prog.template:  '%y4%m2%d2_%h2%n2z.nc4',
  geosgcm_prog.archive:   '%c/Y%y4',
  geosgcm_prog.format:    'CFIO',
  geosgcm_prog.frequency:  060000,
  geosgcm_prog.resolution: 144 91,
  geosgcm_prog.fields:    'U'             , 'DYN'          ,
                          'V'             , 'DYN'          ,
                          'T'             , 'DYN'          ,
                          'sqrt(U*U+V*V)' , 'DYN'          , 'Wind_Magnitude'   ,
                          '(T-273.15)*1.8+32.0' , 'DYN'    , 'TF' ,
                          'T^3'           , 'DYN',         'T3' ,
                          ::

Special Requirements

Perpetual ("Groundhog Day") mode

GEOS-5 Ganymed 1.0 and later can be run in "perpetual mode", automatically running with the same forcings for a time period delineated as a calendar year, month or day. The time period desired is set in CAP.rc with the parameters PERPETUAL_YEAR, PERPETUAL_MONTH and PERPETUAL_DAY. Set all three to run with the forcings for a particular day, and NUM_SGMT to how many times you wish to run it -- the history collection files will be appended with dates starting with the one in cap_restart and generally incrementing for the number of days in NUM_SGMT.

Saving restarts during a segment

post.rc

Parser Expression Guide

Both the History and ExtData components use the MAPL parser implemented in Ganymed-1.0 and to avoid duplication what constitutes a valid expression for the parser will be documented here.

The MAPL parser evaluates an ESMF field element by element using an expression string which could contain other ESMF fields as variables. For example, an expression such as log(A) would produce a field where each element of the new field is the log of that particular element of A.

The following can appear in the expression string:

  1. The function string can contain the following mathematical operators +, -, *, /, ^, and ()
  2. Variable names. See the documentation for the History and ExtData components for what variables names can be used in the particular application of the parser.
  3. The following single argument functions which are case insensitive: exp, log10, log, sqrt, sinh, cosh, tanh, sin, cos, tan, asin, acos, atah, heav (the Heaviside step function)
  4. Integers or real constants specified in the following format: [+|-][nnn][.nnn][e|E|d|D[+|-]nnn] where nnn is any number of digits. The mantissa must contain at least one digit before or following an optional deciamal point. Valid exponent identifiers are 'e', 'E', 'd', or 'D'. If they appear they must be followed by a valid exponent!

Operations are evaluated in the following order:

  1. () expressions in brackets
  2. -X unary minux
  3. X^Y exponentiation
  4. X*Y X/Y multiplication and division
  5. X+Y X-Y addition and subtraction

There are several logical requirements one must be cognisant of when creating parser expressions. Since the expression is evaluated element by element any fields used in an expression must be conformal. In others words the underlying arrays in every field must have the same dimensions. Thus it is illegal to specify an expression involving two fields that have different vertical levels such as a center and an edge variable. The one exception is operations involving 2D and 3D fields when the first two dimensions of the 3D field are the same as the 2D field. In this case the expression is evaluated between each level of the 3D field and the 2D field resulting in a 3D field. This could be used to scale each level of a 3D field by a 2D field.

The parser also obeys undef arithmetic. Any operation involving the hard coded undef value (MAPL_UNDEF) is undef.

The following are several examples of valid expressions. For the examples it is assumed that A, B, C, and D are conformal fields.

  1. B*2.0e0
  2. sqrt(A*A+B*B)
  3. A*heav(B)
  4. A^(C+D)-2.0e-3