Fortuna 2.1 Quick Start: Difference between revisions
(7 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
== Checking Out and Compiling GEOS-5 == | == Checking Out and Compiling GEOS-5 == | ||
The following assumes that you know your way around Unix, have successfully logged into your NCCS account (presumably on the '''discover''' cluster) and have an account on | The following assumes that you know your way around Unix, have successfully logged into your NCCS account (presumably on the '''discover''' cluster) and have an account on the source code repository with the proper <code>ssh</code> configuration -- see the progress repository quick start: https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL. | ||
The commands below assume that your shell is <code>csh</code>. Since the scripts to build and run GEOS-5 tend to be written in the same, you shouldn't bother trying to import too much into an alternative shell. If you prefer a different shell, it is easiest just to open a <code>csh</code> process to build the model and your experiment. | The commands below assume that your shell is <code>csh</code>. Since the scripts to build and run GEOS-5 tend to be written in the same, you shouldn't bother trying to import too much into an alternative shell. If you prefer a different shell, it is easiest just to open a <code>csh</code> process to build the model and your experiment. | ||
Line 12: | Line 12: | ||
setenv CVS_RSH ssh | setenv CVS_RSH ssh | ||
setenv CVSROOT :ext:''USERID''@ | setenv CVSROOT :ext:''USERID''@progressdirect.nccs.nasa.gov:/cvsroot/esma | ||
setenv BASEDIR /discover/nobackup/projects/gmao/share/dao_ops/Baselibs/v3.1.5_build1 | setenv BASEDIR /discover/nobackup/projects/gmao/share/dao_ops/Baselibs/v3.1.5_build1 | ||
where ''USERID'' is, of course, your | where ''USERID'' is, of course, your repository username, which should be the same as your NASA and NCCS username. Then, issue the command: | ||
cvs co -r Fortuna- | cvs co -r Fortuna-2_1_p2 Fortuna | ||
This should check out the latest stable version of the model from | This should check out the latest stable version of the model from the repository and create a directory called <code>GEOSagcm</code>. <code>cd</code> into <code>GEOSagcm/src</code> and <code>source</code> the file called <code>g5_modules</code>: | ||
source g5_modules | source g5_modules | ||
Line 78: | Line 78: | ||
where <code>.g5_modules</code> is simply a copy of the <code>g5_modules</code> that you ran earlier before compiling. The <code>umask 022</code> is not strictly necessary, but it will make the various files readable to others, which will facilitate data sharing and user support. Your home directory <code>~''USERID''</code> is also inaccessible to others by default; running <code>chmod 755 ~</code> is helpful. | where <code>.g5_modules</code> is simply a copy of the <code>g5_modules</code> that you ran earlier before compiling. The <code>umask 022</code> is not strictly necessary, but it will make the various files readable to others, which will facilitate data sharing and user support. Your home directory <code>~''USERID''</code> is also inaccessible to others by default; running <code>chmod 755 ~</code> is helpful. | ||
Copy the restart (initial condition) files and associated <code>cap_restart</code> into ''EXPDIR''. Keep the "originals" handy since if the job stumbles early in the run it might stop after having renamed them. The model expects restart filenames to end in "rst" but produces them with the date and time appended, so you may have to rename them. The <code>cap_restart</code> file is simply one line of text with the date of the restart files in the format YYYYMMDD<space>HHMMSS. The boundary conditions/forcings are provided by symbolic links created by the run script. If you need an arbitrary set of restarts, you can copy them from <code>/discover/nobackup/aeichman/ | Copy the restart (initial condition) files and associated <code>cap_restart</code> into ''EXPDIR''. Keep the "originals" handy since if the job stumbles early in the run it might stop after having renamed them. The model expects restart filenames to end in "rst" but produces them with the date and time appended, so you may have to rename them. The <code>cap_restart</code> file is simply one line of text with the date of the restart files in the format YYYYMMDD<space>HHMMSS. The boundary conditions/forcings are provided by symbolic links created by the run script. If you need an arbitrary set of restarts, you can copy them from <code>/discover/nobackup/aeichman/test2_1</code>. | ||
The script you submit, <code>gcm_run.j</code>, is in ''HOMEDIR''. It should be ready to go as is | The script you submit, <code>gcm_run.j</code>, is in ''HOMEDIR''. It should be ready to go as is. The parameter END_DATE in <code>CAP.rc</code> (previously in <code>gcm_run.j</code>) can be set to the date you want the run to stop -- this works in Fortuna 2.1 where it did not in Fortuna 2.0. Also in Fortuna 2.1, you may edit the <code>.rc</code> files directly instead of template (<code>.tmpl</code>). An alternative way to stop the run is by commenting out the line <code> if ( $capdate < $enddate ) qsub $HOMDIR/gcm_run.j</code> at the end of the script, which will prevent the script from being resubmitted, or rename the script file. You may eventually want to tune parameters in the <code>CAP.rc</code> file JOB_SGMT (the number of days per segment, the interval between saving restarts) and NUM_SGMT (the number of segments attempted in a job) to maximize your run time. | ||
Submit the job with <code>qsub gcm_run.j</code>. You can keep track of it with <code>qstat</code> or <code>qstat | grep ''USERID''</code>, or stdout with <code>tail -f /discover/pbs_spool/''JOBID''.OU</code>, ''JOBID'' being returned by <code>qsub</code> and displayed with <code>qstat</code>. Jobs can be killed with <code>qdel ''JOBID''</code>. The standard out and standard error will be delivered as files to the working directory at the time you submitted the job. | Submit the job with <code>qsub gcm_run.j</code>. You can keep track of it with <code>qstat</code> or <code>qstat | grep ''USERID''</code>, or stdout with <code>tail -f /discover/pbs_spool/''JOBID''.OU</code>, ''JOBID'' being returned by <code>qsub</code> and displayed with <code>qstat</code>. Jobs can be killed with <code>qdel ''JOBID''</code>. The standard out and standard error will be delivered as files to the working directory at the time you submitted the job. |
Latest revision as of 12:28, 13 October 2010
This page describes the minimum steps required to build and run GEOS-5 Fortuna 2.1 on discover. You should successfully complete the steps in these instructions before doing anything more complicated.
Checking Out and Compiling GEOS-5
The following assumes that you know your way around Unix, have successfully logged into your NCCS account (presumably on the discover cluster) and have an account on the source code repository with the proper ssh
configuration -- see the progress repository quick start: https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL.
The commands below assume that your shell is csh
. Since the scripts to build and run GEOS-5 tend to be written in the same, you shouldn't bother trying to import too much into an alternative shell. If you prefer a different shell, it is easiest just to open a csh
process to build the model and your experiment.
Furthermore, model builds should be created in your space under /discover/nobackup
, as creating them under your home directory will quickly wipe out your disk quota.
Set the following three environment variables:
setenv CVS_RSH ssh setenv CVSROOT :ext:USERID@progressdirect.nccs.nasa.gov:/cvsroot/esma setenv BASEDIR /discover/nobackup/projects/gmao/share/dao_ops/Baselibs/v3.1.5_build1
where USERID is, of course, your repository username, which should be the same as your NASA and NCCS username. Then, issue the command:
cvs co -r Fortuna-2_1_p2 Fortuna
This should check out the latest stable version of the model from the repository and create a directory called GEOSagcm
. cd
into GEOSagcm/src
and source
the file called g5_modules
:
source g5_modules
If you then type
module list
you should see:
Currently Loaded Modulefiles: 1) comp/intel-11.0.083 2) other/mpi/mvapich2-1.4.1/intel-11.0.083
If this all worked, then type:
gmake install
This will build the model. It will take about 40 minutes. If this works, it should create a directory under GEOSagcm
called Linux/bin
. In here you should find the executable: GEOSgcm.x
.
Running GEOS-5
First of all, to run jobs on the cluster you will need to set up passwordless ssh
(which operates within the cluster). To do so, run the following from your discover home directory:
cd .ssh ssh-keygen -t dsa cat id_dsa.pub >> authorized_keys
Similarly, transferring the daily output files (in monthly tarballs) requires passwordless authentication from discover to palm. While in ~/.ssh
on discover, run
ssh-keygen -t dsa
Then, log into palm and cut and paste the contents of the id_rsa.pub
and id_dsa.pub
files on discover into the ~/.ssh/authorized_keys
file on palm. Problems with ssh
should be referred to NCCS support.
To set the model up to run, in the GEOSagcm/src/Applications/GEOSgcm_App
directory we run:
gcm_setup
The gcm_setup
script asks you a few questions such as an experiment name (with no spaces, called EXPID) and description (spaces ok). It will also ask you for the model resolution, expecting one of the available lat-lon domain sizes, the dimensions separated by a space. For your first time out you will probably want to enter 144 91
(corresponding to ~2 degree resolution). Towards the end it will ask you for a group ID -- the default is g0602 (GMAO modeling group). Enter whatever is appropriate, as necessary. The rest of the questions provide defaults which will be suitable for now, so just press enter for these.
The script produces an experiment directory (EXPDIR) in your space as /discover/nobackup/USERID/EXPID
, which contains, among other things, the sub-directories:
post
(containing the post-processing job script)archive
(containing an incomplete archiving job script)plot
(containing an incomplete plotting job script)
The post-processing script will complete (i.e., add necessary commands to) the archiving and plotting scripts as it runs. The setup script that you ran also creates an experiment home directory (HOMEDIR) as ~USERID/geos5/EXPID
containing the run scripts and GEOS resource (.rc
) files.
The run scripts need some more environment variables -- here are the minimum contents of a .cshrc
:
umask 022 unlimit limit stacksize unlimited source ~/.g5_modules set arch = `uname` setenv LD_LIBRARY_PATH ${LIBRARY_PATH}:${BASEDIR}/${arch}/lib
where .g5_modules
is simply a copy of the g5_modules
that you ran earlier before compiling. The umask 022
is not strictly necessary, but it will make the various files readable to others, which will facilitate data sharing and user support. Your home directory ~USERID
is also inaccessible to others by default; running chmod 755 ~
is helpful.
Copy the restart (initial condition) files and associated cap_restart
into EXPDIR. Keep the "originals" handy since if the job stumbles early in the run it might stop after having renamed them. The model expects restart filenames to end in "rst" but produces them with the date and time appended, so you may have to rename them. The cap_restart
file is simply one line of text with the date of the restart files in the format YYYYMMDD<space>HHMMSS. The boundary conditions/forcings are provided by symbolic links created by the run script. If you need an arbitrary set of restarts, you can copy them from /discover/nobackup/aeichman/test2_1
.
The script you submit, gcm_run.j
, is in HOMEDIR. It should be ready to go as is. The parameter END_DATE in CAP.rc
(previously in gcm_run.j
) can be set to the date you want the run to stop -- this works in Fortuna 2.1 where it did not in Fortuna 2.0. Also in Fortuna 2.1, you may edit the .rc
files directly instead of template (.tmpl
). An alternative way to stop the run is by commenting out the line if ( $capdate < $enddate ) qsub $HOMDIR/gcm_run.j
at the end of the script, which will prevent the script from being resubmitted, or rename the script file. You may eventually want to tune parameters in the CAP.rc
file JOB_SGMT (the number of days per segment, the interval between saving restarts) and NUM_SGMT (the number of segments attempted in a job) to maximize your run time.
Submit the job with qsub gcm_run.j
. You can keep track of it with qstat
or qstat | grep USERID
, or stdout with tail -f /discover/pbs_spool/JOBID.OU
, JOBID being returned by qsub
and displayed with qstat
. Jobs can be killed with qdel JOBID
. The standard out and standard error will be delivered as files to the working directory at the time you submitted the job.
Output and Plots
During a normal run, the gcm_run.j
script will run the model for the segment length (current default is 8 days). The model creates output files (with an nc4
extension), also called collections (of output variables), in EXPDIR/scratch
directory. After each segment, the script moves the output to the EXPDIR/holding
and spawns a post-processing batch job which partitions and moves the output files within the holding
directory to their own distinct collection directory, which is again partitioned into the appropriate year and month. The post processing script then checks to
see if a full month of data is present. If not, the post-processing job ends. If there is a full month, the script will then run the time-averaging executable to produce a monthly mean file in EXPDIR/geos_gcm_*
. The post-processing script then spawns a new batch job which will archive the data onto the mass-storage drives (/archive/u/USERID/GEOS5.0/EXPID
).
If a monthly average file was made, the post-processing script will also
check to see if it should spawn a plot job. Currently, our criteria for
plotting are: 1) if the month created was February or August, AND
2) there are at least 3 monthly average files, then a plotting job for
the seasons DJF or JJA will be issued. The plots are created as gifs in EXPDIR/plots
.
The post-processing script can be found in:
GEOSagcm/src/GMAO_Shared/GEOS_Util/post/gcmpost.script
. The nc4
output files can be opened and plotted with gradsnc4
-- see http://www.iges.org/grads/gadoc/tutorial.html for a tutorial, but use sdfopen
instead of open
.
The contents of the output files (including which variables get saved) may be configured in the HOMEDIR/HISTORY/tmpl
-- a good description of this file may be found at http://modelingguru.nasa.gov/clearspace/docs/DOC-1190 .