GEOS-5 Quick Start: Difference between revisions

No edit summary
 
(56 intermediate revisions by the same user not shown)
Line 1: Line 1:
The following assumes that your shell is cshActually, the job scripts tend to assume the same thing, so I wouldn't bother trying to import too much into bash or whatever. I like bash, so I generally open a csh shell for compiling.   
This page describes the minimum steps required to build and run GEOS-5 Fortuna 2.0 on discoverYou should successfully complete the steps in these instructions before doing anything more complicated.  '''FORTUNA 2.0 IS NOW DEPRECATED AND MINIMALLY SUPPORTED, AS IS THIS DOCUMENTATIONPLEASE SEE THE DOCUMENTATION FOR THE LATEST VERSION'''.


Set the following three environment variables:
== Checking Out and Compiling GEOS-5 ==
 
The following assumes that you know your way around Unix, have successfully logged into your NCCS account (presumably on the '''discover''' cluster) and have an account on the source code repository with the proper <code>ssh</code> configuration -- see the progress repository quick  start: https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL. 


setenv CVS_RSH ssh
The commands below assume that your shell is <code>csh</code>.  Since the scripts to build and run GEOS-5  tend to be written in the same, you shouldn't bother trying to import too much into an alternative shell. If you prefer a different shell, it is easiest just to open a <code>csh</code> process to build the model and your experiment.
setenv CVSROOT :ext:USERID@c-sourcemotel.gsfc.nasa.gov:/cvsroot/esma


setenv BASEDIR /discover/nobackup/projects/gmao/share/dao_ops/Baselibs/v3.1.5_build1
Furthermore, model builds should be created in your space under <code>/discover/nobackup</code>, as creating them under your home directory will quickly wipe out your disk quota.


where USERID is, of course, your NCCS User-ID.
Set the following three environment variables:


Then, issue the command:
setenv CVS_RSH ssh
setenv CVSROOT :ext:''USERID''@progressdirect.nccs.nasa.gov:/cvsroot/esma
setenv BASEDIR /discover/nobackup/projects/gmao/share/dao_ops/Baselibs/v3.1.5_build1


cvs co -r Fortuna-1_4  Fortuna
where ''USERID'' is, of course, your repository username, which should be the same as your NASA and NCCS username. Then, issue the command:


This should check out the model from CVS, and create a directory called:
cvs co -r  Fortuna-2_0  Fortuna


  GEOSagcm
This should check out the latest stable version of the model from the repository and create a directory called <code>GEOSagcm</code>.  <code>cd</code> into <code>GEOSagcm/src</code> and <code>source</code> the file called <code>g5_modules</code>:


cd into GEOSagcm/src
source g5_modules


and source the file called  g5_modules
If you then type


source g5_modules
module list


If you then type:  module list
you should see:
You should see:


Currently Loaded Modulefiles:
Currently Loaded Modulefiles:
   1) comp/intel-9.1.052  2) lib/mkl-9.1.023      3) mpi/impi-3.2.011
   1) comp/intel-9.1.052  2) lib/mkl-9.1.023      3) mpi/impi-3.2.011


If this all worked, then type:
If this all worked, then type:


gmake install
gmake install


This will build the model.  It will take about 40 minutes.
This will build the model.  It will take about 40 minutes. If this works, it should create a directory under <code>GEOSagcm</code> called <code>Linux/bin</code>.  In here you should find the executable: <code>GEOSgcm.x</code> .
If this works, it should create a directory under GEOSagcm called Linux/bin


In here you should find:  GEOSgcm.x
== Running GEOS-5 ==


Setting up to run:
First of all, to run jobs on the cluster you will need to set up passwordless <code>ssh</code> (which operates within the cluster).  To do so, run the following from your '''discover''' home directory:


1) From the  .../src/Applications/GEOSgcm_App  directory, we run:
cd .ssh
gcm_setup
ssh-keygen -t dsa
    Note: these scripts change very often, to correct bugs and to make
cat id_dsa.pub >>  authorized_keys
updates for new platforms, etc.
      It is often advantageous to update these files to the HEAD of the
repository to get the latest versions:
      cvs upd -A gcm_*


2) After answering a few questions, the gcm_setup script produces an
Similarly, transferring the daily output files (in monthly tarballs) requires passwordless authentication from '''discover''' to '''palm'''.  While in <code>~/.ssh</code> on '''discover''', run
experiment directory
    which contains, among other things, the sub-directories:


    post  (containing the post-processing job script)
  ssh-keygen -t dsa
    archive  (containing an incomplete archiving job script)
    plot  (containing an incomplete plotting job script)


    Note: The post-processing script will complete (i.e., add
Then, log into '''palm''' and cut and paste the contents of the <code>id_rsa.pub</code> and <code>id_dsa.pub</code> files on '''discover''' into the <code>~/.ssh/authorized_keys</code> file on  '''palm'''.
necessary commands to) the archiving and plotting scripts as it runs.


3) You will need some more environment variables for running -- here are the contents of my .cshrc:
To set the model up to run, in the  <code>GEOSagcm/src/Applications/GEOSgcm_App</code> directory we run:


umask 022
gcm_setup
unlimit
limit stacksize unlimited
source ~/.g5_modules
set arch = `uname`
setenv LD_LIBRARY_PATH ${LIBRARY_PATH}:${BASEDIR}/${arch}/lib


.g5_modules is simply a copy of g5_modules that you ran earlierLarry has something more clever in his .cshrc .
The <code>gcm_setup</code> script asks you a few questions such as an experiment name (with no spaces, called ''EXPID'') and description (spaces ok). It will also ask you for the model resolution, expecting one of the available lat-lon domain sizes, the dimensions separated by a space.  For your first time out you will probably want to enter <code>144 91</code> (corresponding to ~2 degree resolution).  Towards the end it will ask you for a group ID -- the default is g0602 (GMAO modeling group)Enter whatever is appropriate, as necessary. The rest of the questions provide defaults which will be suitable for now, so just press enter for these.


The script produces an experiment directory (''EXPDIR'') in your space as <code>/discover/nobackup/''USERID''/''EXPID''</code>, which contains, among other things, the sub-directories:


3a) Copy the *rst and cap_restart files into the experiment directory EXPDIR, by default under /discovery/nobackup/USERNAME .  Keep the "originals" handy since if the model crashes early in the run it might have renamed them and put them in the EXPDIR/restarts/ directory.
*<code>post</code>  (containing the post-processing job script)
*<code>archive</code>  (containing an incomplete archiving job script)
*<code>plot</code>  (containing an incomplete plotting job script)


4For the moment, you will have to change RC/Chem_Registry.rc so that everything under "Active Constituents" besides doing_PC is "no".
The post-processing script will complete (i.e., add necessary commands to) the archiving and plotting scripts as it runs. The setup script that you ran also creates an experiment home directory (''HOMEDIR'') as <code>~''USERID''/geos5/''EXPID''</code>  containing the run scripts and GEOS resource (<code>.rc</code>) files.


5) The script you submit, gcm_run.j, is in HOMEDIR, by default under your home directory under geos5.  It should be ready to go as is, though you may eventually want to tune JOB_SGMT (the number of days between saving restarts, called segments) and NUM_SGMT (the number of segments attempted in a job) to maximize your run time.  END_DATE can be changed to your end date, or just left as is.  Commenting out the ``qsub gcm_run.j'' at the end of the script will stop it, too.  Those and the batch parameters at the beginning are all that you will usually want to change.


6) Submit the job with ``qsub gcm_run.j''.  You can keep track of it with qstat or ``qstat | grep USERNAME'', or stdout with ``tail -f /discovery/pbs_spool/JOBID.OU'', JOBID being returned by qsub and displayed with qstat.  Jobs can be killed with ``qdel JOBID''.  The stdout and stderr will be delivered as files to HOMEDIR at the end of a job.
The run scripts need some more environment variables -- here are the minimum contents of a <code>.cshrc</code>:


7) During a normal run, the gcm_run script will run the model for a
umask 022
user-specified segment length (current default is 8 days).
unlimit
After each segment, the script spawns a post-processing batch job which
  limit stacksize unlimited
partitions and moves the files within the "holding" directory to their
  source ~/.g5_modules
own distinct "collection" directory which is again partitioned into the
  set arch = `uname`
appropriate Year and Month.  The post processing script then checks to
  setenv LD_LIBRARY_PATH ${LIBRARY_PATH}:${BASEDIR}/${arch}/lib
see if a full month of data is presentIf not, the post-processing
job ends.
If there is a full month, the script will then run the time-averaging
executable to produce a monthly mean file. The post-processing script
then spawns a new batch job which will archive the data onto the
mass-storage drives.


The post-processing script can be found in:
where <code>.g5_modules</code> is simply a copy of the <code>g5_modules</code> that you ran earlier before compiling.  The <code>umask 022</code> is not strictly necessary, but it will make the various files readable to others, which will facilitate data sharing and user support.  Your home directory <code>~''USERID''</code> is also inaccessible to others by default; running <code>chmod 755 ~</code> is helpful.
.../src/GMAO_Shared/GEOS_Util/post/gcmpost.script
 
Copy the restart (initial condition) files and associated <code>cap_restart</code> into ''EXPDIR''.  Keep the "originals" handy since if the job stumbles early in the run it might stop after having renamed them.  The model expects restart filenames to end in "rst" but produces them with the date and time appended, so you may have to rename them.  The <code>cap_restart</code> file is simply one line of text with the date of the restart files in the format YYYYMMDD<space>HHMMSS.  The boundary conditions/forcings are provided by symbolic links created by the run script.  If you need an arbitrary set of restarts, you can copy them from <code>/discover/nobackup/aeichman/restarts/Fortuna-2_0/144x91/20080327_benchmark</code>.
 
 
The script you submit, <code>gcm_run.j</code>, is in ''HOMEDIR''.  It should be ready to go as is, though you may eventually want to tune JOB_SGMT (the number of days per segment, the internal between saving restarts) and NUM_SGMT (the number of segments attempted in a job) to maximize your run time.  Leave END_DATE alone in Fortuna 2.0 -- there is a bug that erroneously resubmits the script after this date.  You can stop the run by commenting out the <code>qsub $HOMDIR/gcm_run.j</code> at the end of the script, which will prevent the script from being resubmitted.  Those and the PBS (batch system) parameters at the beginning are all that you will usually want to change in the script.
 
Submit the job with <code>qsub gcm_run.j</code>.  You can keep track of it with <code>qstat</code> or <code>qstat | grep ''USERID''</code>, or stdout with <code>tail -f /discover/pbs_spool/''JOBID''.OU</code>, ''JOBID'' being returned by <code>qsub</code> and displayed with <code>qstat</code>.  Jobs can be killed with <code>qdel ''JOBID''</code>.  The standard out and standard error will be delivered as files to the working directory at the time you submitted the job.
 
== Output and Plots ==
 
During a normal run, the <code>gcm_run.j</code> script will run the model for the segment length (current default is 8 days). The model creates output files (with an <code>nc4</code> extension), also called collections (of output variables), in  <code>''EXPDIR''/scratch</code> directory.  After each segment, the script moves the output to the <code>''EXPDIR''/holding</code> and spawns a post-processing batch job which partitions and moves the output files  within the <code>holding</code> directory to their own distinct collection directory, which is again partitioned into the appropriate year and month.  The  post processing script then checks to
see if  a full month of data is present.  If not, the post-processing job ends.  If there is a full month, the script will then run the time-averaging executable to produce a monthly mean file in <code>''EXPDIR''/geos_gcm_*</code>. The post-processing script then spawns a new batch job which will archive the data onto the mass-storage drives (<code>/archive/u/''USERID''/GEOS5.0/''EXPID''</code>).


If a monthly average file was made, the post-processing script will also
If a monthly average file was made, the post-processing script will also
check to see if it should spawn a plot job.  Currently, our criteria for
check to see if it should spawn a plot job.  Currently, our criteria for
plotting is:  1) if the month created was February or August,  AND
plotting are:  1) if the month created was February or August,  AND
2) there are at least 3 monthly average files, then a plotting job for
2) there are at least 3 monthly average files, then a plotting job for
the seasons DJF or JJA will be issued.
the seasons DJF or JJA will be issued.  The plots are created as gifs in <code>''EXPDIR''/plots</code>.
 
The post-processing script can be found in:
<code>GEOSagcm/src/GMAO_Shared/GEOS_Util/post/gcmpost.script</code>.  The <code>nc4</code> output files can be opened and plotted with <code>gradsnc4</code> -- see http://www.iges.org/grads/gadoc/tutorial.html for a tutorial, but use <code>sdfopen</code> instead of <code>open</code>.
 
The contents of the output files (including which variables get saved) may be configured in the  <code>''HOMEDIR''/HISTORY/tmpl</code> -- a good description of this file may be found at http://modelingguru.nasa.gov/clearspace/docs/DOC-1190 .