GEOS-5 Software Engineering: Difference between revisions

From GEOS-5
Jump to navigation Jump to search
New page: {{rightTOC}} == Overall Configuration Management == == Source Code Configuration Management == === General Policies === === The ESMA Repository === === CVS Modules === == Release En...
 
Pchakrab (talk | contribs)
 
(31 intermediate revisions by 2 users not shown)
Line 7: Line 7:
=== General Policies ===
=== General Policies ===


=== The ESMA Repository ===
=== The ESMA Project ===
The ESMA project is concerned with the deployment of modeling and data
assimilation applications that are part of the ESMF testbed applications.     


=== CVS Modules ===
This project is hosted at: cvsacl.nccs.nasa.gov


Project admins are:
* da Silva, Arlindo
* Lucchesi, Rob
* Todling, Ricardo
* Takacs, Larry
=== The ESMA CVS repository ===
CVSACL is a version control server intended to specifically support the access control patches to CVS (https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL). Access to CVSACL requires an NCCS account (https://www.nccs.nasa.gov/). The web interface to CVS ESMA project is available at https://cvsacl.nccs.nasa.gov/cgi-bin/.
The ESMA CVS repository has a flat directory structure designed to accommodate a variety of modeling systems. The repository holds 'Applications', 'Components' and other software libraries needed to build earth modeling systems. The directories under esma/src/ are Applications/, Components/, Config/, Couplers/, Documentation/, Shared/ and of course CVS/.
'''Packages:''' A collection of source files having one or more software deliverables
* Libraries
* Executables (binaries, scripts)
* Includes (.h, .mod)
* Configuration (Resource) files (.rc)
* Examples
'''Modules:''' CVS modules are then used to compose individual modeling systems. A module is a collection of packages comprising some stand alone application, e.g. GEOSGCM_m0 (GMAO Unified Model, GEOS-5). A complete list of ESMA CVS modules is available at CVSROOT/modules.
Some examples are:
# ESMF/MAPL Tutorial
G5tutorial        -d G5tutorial/src  esma/src/Applications/G5tutorial &Config &GEOSgcm_Shared_m2
# Ganymed
Ganymed          -d GEOSagcm &GEOSGCM_m3
=== CVS/CVSACL ===
==== Approaches, loopholes etc. ====
There are two basic approaches to perform configuration management under CVS:
# Two separate bracnches
#* Development branch - free for all
#* Production bracnch: lead custodian merges dev->prod
# Single branch approach (GMAO approach)
#* Individual developers commit mods to 'development' branch
#* Custodian (gate keeper) issues tags identifying fully tested and stable releases. Checked in modifications may be accepted or rejected by the custodian
#* Branches are created only when conflicts are unavoidable
With standard CVS there is no good way to enforce either policy, and
several security loopholes are present:
* Anyone with 'checkout' privileges also have 'check in' privileges. One way around is to use a Pserver which has its own vulnerabilities.
* No good way to protect tags
* User can logon to machine where repository resides and tinker with files There really is no good way to enforce CVS policies
==== Solution: CVS/ACL ====
CVS/ACL is a patch to CVS for access control list management. It provides advanced ACL definitions per modules, directories and files on branch/tag for remote cvs repository connections. As a result the execution of all CVS subcommands can be controlled with eight different permissions.
# (n) no acess
# (r) read - allows subcommands: annotate, checkout, diff, export, log, rannotate, rdiff, rlog, status
# (w) write - only cvs commit/checkin action. Not allowed add/remove files to/from repository.
# (t) tag
# (c) create
# (d) delete
# (a) full access except admin rights
# (p) acl admin
The advantages of CVS/ACL are:
* Branches/packages are protected from modification from 'non-authorized' groups.
* General users check out code, experiment with it and check in modified files to their 'own private branches'.
* Tags from major releases are protected from modification by non-authorized users.
* General users are not allowed to directly manipulate repository files.
==== GMAO development group ====
==== CVSACL implementation ====
# CVSROOT :ext:USERNAME@cvsacldirect:/cvsroot/esma
# Development is conducted on the trunk (a.k.a HEAD)
## Packages on this branch are associated with one or more GMAO development groups.
## The custodian groups have 'full access' (a) to the packages on this branch, all other have 'read' and 'tag' (rt) privileges.
## General users modifying files (for which they do NOT have check-in privileges) commit these files on their own private branches.
For uniformity, we suggest following the [[CVS Best Practices]].
==== Tag conventions ====
==== CVS/ACL administration ====


== Release Engineering ==
== Release Engineering ==


<!--
=== Versioning ===
=== Versioning ===


Line 19: Line 105:


=== Testing and Validation ===
=== Testing and Validation ===
 
-->


== The ESMA Build Mechanism ==
== The ESMA Build Mechanism ==
Line 25: Line 111:
=== Baselibs: Managing External Dependencies ===
=== Baselibs: Managing External Dependencies ===


=== Building GEOS-5 AGCM ===
This section lists the steps to checkout and build GEOS-5 mechanism using the latest stable tag: Ganymed-2_1_p5.
==== Set up CVSROOT ====
First, set up your CVSROOT environment variable using the scheme provided at the [https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL NCCS's CVSACL webpage] (requires NCCS login) where:
On Discover and NAS: CVSROOT=:ext:$USER@cvsacldirect:/cvsroot/esma
Elsewhere:          CVSROOT=:ext:$USER@ctunnel:/cvsroot/esma
Start the tunnel (machines other than Discover and Pleiades)
==== Checking out the model ====
Make the directory in which you wish to checkout the model and do the actual checkout:
     
$ mkdir G21p5
$ cd G21p5
$ cvs co -r Ganymed-2_1_p5 Ganymed
In general, one uses <tt>$ cvs co -r <Tag Name> <Module Name></tt> where <Tag Name> is the tag for the model to check out (e.g., Ganymed-2_0_UNSTABLE, Fortuna-2_5_p6) and <Module Name> is the module (e.g., Ganymed, Fortuna).
==== Build and install the model ====
Go into the src/ directory of your model. Following above:
$ cd G21p5/GEOSagcm/src
Setup the environment by sourcing the <code>g5_modules</code> file:
$ source g5_modules
To build the model, you have one of two choices. First, you can use the parallel_build.csh script to submit a PBS job that compiles the model:
$ ./parallel_build.csh
or you can interactively build the model using:
$ gmake install
To capture the install log, we recommend tee'ing the output to a file:
$ gmake install |& tee make.install.log (on tcsh)
$ gmake install 2>&1 | tee make.install.log (on bash)
Note you can also build in parallel interactively with:
$ gmake --jobs=jN pinstall |& tee make.install.log (on tcsh)
where N is the number of parallel processes. For best performance, N should be, say, 2 less than the number of cores. So, on a Westmere node, use 10. For the sake of others, ''do this on an interactive node''.
By default, the Intel Fortran compiler (ifort) is used for the build process. One can specify a different compiler name (e.g. pgfortran) through the environment variable ESMA_FC as in
$ gmake install ESMA_FC=pgfortran |& tee make.install.log (on tcsh)
or
$ gmake ESMA_FC=pgfortran --jobs=N pinstall |& tee make.install.log (on tcsh)
==== Monitor build process ====
The build can be monitored using the utility gmh.pl in the directory Config. From the src directory
$ Config/gmh.pl -Av make.install.log
outputs the build status as
                          --------
                          Packages
                          --------
          >>>> Fatal Error          .... Ignored Error
  [ok]      Config
  [ok]      GMAO_Shared
  [ok]      |    GMAO_mpeu
  [ok]      |    |    mpi0
  [ok]      |    GMAO_pilgrim
  [ok]      |    GMAO_gfio
  [ok]      |    |    r4
  [ok]      |    |    r8
  [ok]      |    GMAO_perllib
  [ok]      |    MAPL_cfio
  [ok]      |    |    r4
  [ok]      |    |    r8
  [ok]      |    MAPL_Base
  [ok]      |    |    TeX
  [ok]      |    GEOS_Shared
  [ 1] .... .... Chem_Base
  [ok]      |    Chem_Shared
  [ok]      |    GMAO_etc
  [ok]      |    GMAO_hermes
  [ 2] .... .... GFDL_fms
  [ok]      |    GEOS_Util
  [ok]      |    |    post
                          -------
                          Summary
                          -------
IGNORED mpp_comm_sma.d mpp_transmit_sma.d Chem_AodMod.d (3 files in 2 packages)
All 22 packages compiled successfully.
In case of errors, <tt>gmh</tt> summarizes exactly where it happens by indicating the package where it occured. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.
==== Advanced features ====
=====Check load balance of build=====
The model includes useful tools like build timers, Config/esma_timer.sh, Config/esma_tgraph.pl. These are useful to time the build and check the load balance of the build, process. There is a way to hook these timers to the build process by setting
ESMA_base.mk:ESMA_TIMER    = # command to time build steps (for compilation)
ESMA_base.mk:ESMA_TIMER_CI  = # command to start timer (for user to backet code segments)
ESMA_base.mk:ESMA_TIMER_CO  = # command to end  timer (for user to backet code segments)
=====Customize build=====
A build can be customized by using <tt>$HOME/.esma_xxxx.mk</tt>
ESMA_BASE = ESMA_base.mk $(wildcard $(HOME)/.esma_base.mk)
ESMA_ARCH = ESMA_arch.mk $(wildcard $(HOME)/.esma_arch.mk)
ESMA_POST = ESMA_post.mk $(wildcard $(HOME)/.esma_post.mk)
These effectively let you change whatever you want - useful for debugging, etc. For example, you can set your timers in ~/.esma_base.mk.
<!--
=== Building ESMA Components ===
=== Building ESMA Components ===


=== Building ESMA Systems ===
=== Building ESMA Systems ===
-->
== Testing and validation==

Latest revision as of 17:39, 29 March 2013

Overall Configuration Management

Source Code Configuration Management

General Policies

The ESMA Project

The ESMA project is concerned with the deployment of modeling and data assimilation applications that are part of the ESMF testbed applications.

This project is hosted at: cvsacl.nccs.nasa.gov

Project admins are:

  • da Silva, Arlindo
  • Lucchesi, Rob
  • Todling, Ricardo
  • Takacs, Larry

The ESMA CVS repository

CVSACL is a version control server intended to specifically support the access control patches to CVS (https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL). Access to CVSACL requires an NCCS account (https://www.nccs.nasa.gov/). The web interface to CVS ESMA project is available at https://cvsacl.nccs.nasa.gov/cgi-bin/.

The ESMA CVS repository has a flat directory structure designed to accommodate a variety of modeling systems. The repository holds 'Applications', 'Components' and other software libraries needed to build earth modeling systems. The directories under esma/src/ are Applications/, Components/, Config/, Couplers/, Documentation/, Shared/ and of course CVS/.

Packages: A collection of source files having one or more software deliverables

  • Libraries
  • Executables (binaries, scripts)
  • Includes (.h, .mod)
  • Configuration (Resource) files (.rc)
  • Examples

Modules: CVS modules are then used to compose individual modeling systems. A module is a collection of packages comprising some stand alone application, e.g. GEOSGCM_m0 (GMAO Unified Model, GEOS-5). A complete list of ESMA CVS modules is available at CVSROOT/modules.

Some examples are:

# ESMF/MAPL Tutorial
G5tutorial        -d G5tutorial/src  esma/src/Applications/G5tutorial &Config &GEOSgcm_Shared_m2

# Ganymed
Ganymed           -d GEOSagcm &GEOSGCM_m3

CVS/CVSACL

Approaches, loopholes etc.

There are two basic approaches to perform configuration management under CVS:

  1. Two separate bracnches
    • Development branch - free for all
    • Production bracnch: lead custodian merges dev->prod
  2. Single branch approach (GMAO approach)
    • Individual developers commit mods to 'development' branch
    • Custodian (gate keeper) issues tags identifying fully tested and stable releases. Checked in modifications may be accepted or rejected by the custodian
    • Branches are created only when conflicts are unavoidable

With standard CVS there is no good way to enforce either policy, and several security loopholes are present:

  • Anyone with 'checkout' privileges also have 'check in' privileges. One way around is to use a Pserver which has its own vulnerabilities.
  • No good way to protect tags
  • User can logon to machine where repository resides and tinker with files There really is no good way to enforce CVS policies

Solution: CVS/ACL

CVS/ACL is a patch to CVS for access control list management. It provides advanced ACL definitions per modules, directories and files on branch/tag for remote cvs repository connections. As a result the execution of all CVS subcommands can be controlled with eight different permissions.

  1. (n) no acess
  2. (r) read - allows subcommands: annotate, checkout, diff, export, log, rannotate, rdiff, rlog, status
  3. (w) write - only cvs commit/checkin action. Not allowed add/remove files to/from repository.
  4. (t) tag
  5. (c) create
  6. (d) delete
  7. (a) full access except admin rights
  8. (p) acl admin

The advantages of CVS/ACL are:

  • Branches/packages are protected from modification from 'non-authorized' groups.
  • General users check out code, experiment with it and check in modified files to their 'own private branches'.
  • Tags from major releases are protected from modification by non-authorized users.
  • General users are not allowed to directly manipulate repository files.

GMAO development group

CVSACL implementation

  1. CVSROOT :ext:USERNAME@cvsacldirect:/cvsroot/esma
  2. Development is conducted on the trunk (a.k.a HEAD)
    1. Packages on this branch are associated with one or more GMAO development groups.
    2. The custodian groups have 'full access' (a) to the packages on this branch, all other have 'read' and 'tag' (rt) privileges.
    3. General users modifying files (for which they do NOT have check-in privileges) commit these files on their own private branches.

For uniformity, we suggest following the CVS Best Practices.

Tag conventions

CVS/ACL administration

Release Engineering

The ESMA Build Mechanism

Baselibs: Managing External Dependencies

Building GEOS-5 AGCM

This section lists the steps to checkout and build GEOS-5 mechanism using the latest stable tag: Ganymed-2_1_p5.

Set up CVSROOT

First, set up your CVSROOT environment variable using the scheme provided at the NCCS's CVSACL webpage (requires NCCS login) where:

On Discover and NAS: CVSROOT=:ext:$USER@cvsacldirect:/cvsroot/esma

Elsewhere:           CVSROOT=:ext:$USER@ctunnel:/cvsroot/esma

Start the tunnel (machines other than Discover and Pleiades)

Checking out the model

Make the directory in which you wish to checkout the model and do the actual checkout:

$ mkdir G21p5
$ cd G21p5
$ cvs co -r Ganymed-2_1_p5 Ganymed

In general, one uses $ cvs co -r <Tag Name> <Module Name> where <Tag Name> is the tag for the model to check out (e.g., Ganymed-2_0_UNSTABLE, Fortuna-2_5_p6) and <Module Name> is the module (e.g., Ganymed, Fortuna).

Build and install the model

Go into the src/ directory of your model. Following above:

$ cd G21p5/GEOSagcm/src

Setup the environment by sourcing the g5_modules file:

$ source g5_modules

To build the model, you have one of two choices. First, you can use the parallel_build.csh script to submit a PBS job that compiles the model:

$ ./parallel_build.csh

or you can interactively build the model using:

$ gmake install

To capture the install log, we recommend tee'ing the output to a file:

$ gmake install |& tee make.install.log (on tcsh)
$ gmake install 2>&1 | tee make.install.log (on bash)

Note you can also build in parallel interactively with:

$ gmake --jobs=jN pinstall |& tee make.install.log (on tcsh)

where N is the number of parallel processes. For best performance, N should be, say, 2 less than the number of cores. So, on a Westmere node, use 10. For the sake of others, do this on an interactive node.

By default, the Intel Fortran compiler (ifort) is used for the build process. One can specify a different compiler name (e.g. pgfortran) through the environment variable ESMA_FC as in

$ gmake install ESMA_FC=pgfortran |& tee make.install.log (on tcsh)

or

$ gmake ESMA_FC=pgfortran --jobs=N pinstall |& tee make.install.log (on tcsh)

Monitor build process

The build can be monitored using the utility gmh.pl in the directory Config. From the src directory

$ Config/gmh.pl -Av make.install.log

outputs the build status as

                          --------
                          Packages
                          --------

         >>>> Fatal Error           .... Ignored Error

 [ok]      Config
 [ok]      GMAO_Shared
 [ok]      |    GMAO_mpeu
 [ok]      |    |    mpi0
 [ok]      |    GMAO_pilgrim
 [ok]      |    GMAO_gfio
 [ok]      |    |    r4
 [ok]      |    |    r8
 [ok]      |    GMAO_perllib
 [ok]      |    MAPL_cfio
 [ok]      |    |    r4
 [ok]      |    |    r8
 [ok]      |    MAPL_Base
 [ok]      |    |    TeX
 [ok]      |    GEOS_Shared
 [ 1] .... .... Chem_Base
 [ok]      |    Chem_Shared
 [ok]      |    GMAO_etc
 [ok]      |    GMAO_hermes
 [ 2] .... .... GFDL_fms
 [ok]      |    GEOS_Util
 [ok]      |    |    post 

                          -------
                          Summary
                          -------

IGNORED mpp_comm_sma.d mpp_transmit_sma.d Chem_AodMod.d (3 files in 2 packages)
All 22 packages compiled successfully.

In case of errors, gmh summarizes exactly where it happens by indicating the package where it occured. Caveat: it does not work in parallel (output is scrambled). So, if the parallel build fails, rerun it sequentially (it will go quickly and die in the same place) and run gmh on the output for a summary.


Advanced features

Check load balance of build

The model includes useful tools like build timers, Config/esma_timer.sh, Config/esma_tgraph.pl. These are useful to time the build and check the load balance of the build, process. There is a way to hook these timers to the build process by setting

ESMA_base.mk:ESMA_TIMER     = # command to time build steps (for compilation)
ESMA_base.mk:ESMA_TIMER_CI  = # command to start timer (for user to backet code segments)
ESMA_base.mk:ESMA_TIMER_CO  = # command to end   timer (for user to backet code segments)
Customize build

A build can be customized by using $HOME/.esma_xxxx.mk

ESMA_BASE = ESMA_base.mk $(wildcard $(HOME)/.esma_base.mk) 
ESMA_ARCH = ESMA_arch.mk $(wildcard $(HOME)/.esma_arch.mk) 
ESMA_POST = ESMA_post.mk $(wildcard $(HOME)/.esma_post.mk)

These effectively let you change whatever you want - useful for debugging, etc. For example, you can set your timers in ~/.esma_base.mk.

Testing and validation