Fortuna 2.1 Quick Start: Difference between revisions

(9 intermediate revisions by the same user not shown)

Line 3:

== Checking Out and Compiling GEOS-5 ==

The following assumes that you know your way around Unix, have successfully logged into your NCCS account (presumably on the '''discover''' cluster) and have an account on ~~'''progress'''~~ with the proper <code>ssh</code> ~~preparation~~ -- see the progress repository quick start: https://progress.nccs.nasa.gov/trac/admin/wiki/~~QuickStart~~.

The following assumes that you know your way around Unix, have successfully logged into your NCCS account (presumably on the '''discover''' cluster) and have an account on the source code repository with the proper <code>ssh</code> configuration -- see the progress repository quick start: https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL.

The commands below assume that your shell is <code>csh</code>. Since the scripts to build and run GEOS-5 tend to be written in the same, you shouldn't bother trying to import too much into an alternative shell. If you prefer a different shell, it is easiest just to open a <code>csh</code> process to build the model and your experiment.

Line 12:

setenv CVS_RSH ssh

setenv CVSROOT :ext:''USERID''@~~progress~~.nccs.nasa.gov:/cvsroot/esma

setenv CVSROOT :ext:''USERID''@progressdirect.nccs.nasa.gov:/cvsroot/esma

setenv BASEDIR /discover/nobackup/projects/gmao/share/dao_ops/Baselibs/v3.1.5_build1

where ''USERID'' is, of course, your ~~'''progress'''~~ username, which should be the same as your NASA and NCCS username. Then, issue the command:

where ''USERID'' is, of course, your repository username, which should be the same as your NASA and NCCS username. Then, issue the command:

cvs co -r Fortuna-~~2_0~~ Fortuna

cvs co -r Fortuna-2_1_p2 Fortuna

This should check out the latest stable version of the model from ~~'''progress'''~~ and create a directory called <code>GEOSagcm</code>. <code>cd</code> into <code>GEOSagcm/src</code> and <code>source</code> the file called <code>g5_modules</code>:

This should check out the latest stable version of the model from the repository and create a directory called <code>GEOSagcm</code>. <code>cd</code> into <code>GEOSagcm/src</code> and <code>source</code> the file called <code>g5_modules</code>:

source g5_modules

Line 30:

Currently Loaded Modulefiles:

1) comp/intel-9.1.~~052~~ 2) ~~lib~~/~~mkl~~-9.1~~.023 3) mpi~~/~~impi~~-3.2.~~011~~

1) comp/intel-11.0.083 2) other/mpi/mvapich2-1.4.1/intel-11.0.083

If this all worked, then type:

Line 50:

ssh-keygen -t dsa

Then, log into '''palm''' and cut and paste the contents of the <code>id_rsa.pub</code> and <code>id_dsa.pub</code> files on '''discover''' into the <code>~/.ssh/authorized_keys</code> file on '''palm'''.

Then, log into '''palm''' and cut and paste the contents of the <code>id_rsa.pub</code> and <code>id_dsa.pub</code> files on '''discover''' into the <code>~/.ssh/authorized_keys</code> file on '''palm'''. Problems with <code>ssh</code> should be referred to NCCS support.

To set the model up to run, in the <code>GEOSagcm/src/Applications/GEOSgcm_App</code> directory we run:

Line 78:

where <code>.g5_modules</code> is simply a copy of the <code>g5_modules</code> that you ran earlier before compiling. The <code>umask 022</code> is not strictly necessary, but it will make the various files readable to others, which will facilitate data sharing and user support. Your home directory <code>~''USERID''</code> is also inaccessible to others by default; running <code>chmod 755 ~</code> is helpful.

Copy the restart (initial condition) files and associated <code>cap_restart</code> into ''EXPDIR''. Keep the "originals" handy since if the job stumbles early in the run it might stop after having renamed them. The model expects restart filenames to end in "rst" but produces them with the date and time appended, so you may have to rename them. The <code>cap_restart</code> file is simply one line of text with the date of the restart files in the format YYYYMMDD<space>HHMMSS. The boundary conditions/forcings are provided by symbolic links created by the run script. If you need an arbitrary set of restarts, you can copy them from <code>/discover/nobackup/aeichman/~~restarts/Fortuna-2_0/144x91/20080327_benchmark~~</code>.

Copy the restart (initial condition) files and associated <code>cap_restart</code> into ''EXPDIR''. Keep the "originals" handy since if the job stumbles early in the run it might stop after having renamed them. The model expects restart filenames to end in "rst" but produces them with the date and time appended, so you may have to rename them. The <code>cap_restart</code> file is simply one line of text with the date of the restart files in the format YYYYMMDD<space>HHMMSS. The boundary conditions/forcings are provided by symbolic links created by the run script. If you need an arbitrary set of restarts, you can copy them from <code>/discover/nobackup/aeichman/test2_1</code>.

The script you submit, <code>gcm_run.j</code>, is in ''HOMEDIR''. It should be ready to go as is~~, though you may eventually want to tune JOB_SGMT~~ (~~the number of days per segment, the internal between saving restarts) and NUM_SGMT (the number of segments attempted~~ in ~~a job~~) to ~~maximize your~~ run ~~time~~. ~~Leave END_DATE alone~~ in Fortuna 2.~~0 -- there is a bug that erroneously resubmits~~ the ~~script after this date~~. ~~You can~~ stop the run by commenting out the <code>qsub $HOMDIR/gcm_run.j</code> at the end of the script, which will prevent the script from being resubmitted. ~~Those and~~ the ~~PBS~~ (~~batch system~~) ~~parameters at~~ the ~~beginning are all that you will usually want~~ to ~~change in the script~~.

The script you submit, <code>gcm_run.j</code>, is in ''HOMEDIR''. It should be ready to go as is. The parameter END_DATE in <code>CAP.rc</code> (previously in <code>gcm_run.j</code>) can be set to the date you want the run to stop -- this works in Fortuna 2.1 where it did not in Fortuna 2.0. Also in Fortuna 2.1, you may edit the <code>.rc</code> files directly instead of template (<code>.tmpl</code>). An alternative way to stop the run is by commenting out the line <code> if ( $capdate < $enddate ) qsub $HOMDIR/gcm_run.j</code> at the end of the script, which will prevent the script from being resubmitted, or rename the script file. You may eventually want to tune parameters in the <code>CAP.rc</code> file JOB_SGMT (the number of days per segment, the interval between saving restarts) and NUM_SGMT (the number of segments attempted in a job) to maximize your run time.

Submit the job with <code>qsub gcm_run.j</code>. You can keep track of it with <code>qstat</code> or <code>qstat | grep ''USERID''</code>, or stdout with <code>tail -f /discover/pbs_spool/''JOBID''.OU</code>, ''JOBID'' being returned by <code>qsub</code> and displayed with <code>qstat</code>. Jobs can be killed with <code>qdel ''JOBID''</code>. The standard out and standard error will be delivered as files to the working directory at the time you submitted the job.

@@ Line 3: / Line 3: @@
 == Checking Out and Compiling GEOS-5 ==
-The following assumes that you know your way around Unix, have successfully logged into your NCCS account (presumably on the '''discover''' cluster) and have an account on '''progress''' with the proper <code>ssh</code> preparation -- see the progress repository quick  start: https://progress.nccs.nasa.gov/trac/admin/wiki/QuickStart.
+The following assumes that you know your way around Unix, have successfully logged into your NCCS account (presumably on the '''discover''' cluster) and have an account on the source code repository with the proper <code>ssh</code> configuration -- see the progress repository quick  start: https://progress.nccs.nasa.gov/trac/admin/wiki/CVSACL.
 The commands below assume that your shell is <code>csh</code>.  Since the scripts to build and run GEOS-5  tend to be written in the same, you shouldn't bother trying to import too much into an alternative shell.  If you prefer a different shell, it is easiest just to open a <code>csh</code> process to build the model and your experiment.
@@ Line 12: / Line 12: @@
   setenv CVS_RSH ssh
-  setenv CVSROOT :ext:''USERID''@progress.nccs.nasa.gov:/cvsroot/esma
+  setenv CVSROOT :ext:''USERID''@progressdirect.nccs.nasa.gov:/cvsroot/esma
   setenv BASEDIR /discover/nobackup/projects/gmao/share/dao_ops/Baselibs/v3.1.5_build1
-where ''USERID'' is, of course, your '''progress''' username, which should be the same as your NASA and NCCS username.  Then, issue the command:
+where ''USERID'' is, of course, your repository username, which should be the same as your NASA and NCCS username.  Then, issue the command:
-  cvs co -r  Fortuna-2_0  Fortuna
+  cvs co -r  Fortuna-2_1_p2 Fortuna
-This should check out the latest stable version of the model from '''progress''' and create a directory called <code>GEOSagcm</code>.  <code>cd</code> into <code>GEOSagcm/src</code> and <code>source</code> the file called <code>g5_modules</code>:
+This should check out the latest stable version of the model from the repository and create a directory called <code>GEOSagcm</code>.  <code>cd</code> into <code>GEOSagcm/src</code> and <code>source</code> the file called <code>g5_modules</code>:
   source g5_modules
@@ Line 30: / Line 30: @@
   Currently Loaded Modulefiles:
-) comp/intel-9.1.052   2) lib/mkl-9.1.023      3) mpi/impi-3.2.011
+) comp/intel-11.0.083                       2) other/mpi/mvapich2-1.4.1/intel-11.0.083
 If this all worked, then type:
@@ Line 50: / Line 50: @@
    ssh-keygen -t dsa
-Then, log into  '''palm''' and cut and paste the contents of the <code>id_rsa.pub</code> and <code>id_dsa.pub</code> files on '''discover''' into the  <code>~/.ssh/authorized_keys</code> file on   '''palm'''.
+Then, log into  '''palm''' and cut and paste the contents of the <code>id_rsa.pub</code> and <code>id_dsa.pub</code> files on '''discover''' into the  <code>~/.ssh/authorized_keys</code> file on   '''palm'''.  Problems with <code>ssh</code> should be referred to NCCS support.
 To set the model up to run, in the  <code>GEOSagcm/src/Applications/GEOSgcm_App</code> directory we run:
@@ Line 78: / Line 78: @@
 where <code>.g5_modules</code> is simply a copy of the <code>g5_modules</code> that you ran earlier before compiling.  The <code>umask 022</code> is not strictly necessary, but it will make the various files readable to others, which will facilitate data sharing and user support.  Your home directory <code>~''USERID''</code> is also inaccessible to others by default; running <code>chmod 755 ~</code> is helpful.
-Copy the restart (initial condition) files and associated <code>cap_restart</code> into ''EXPDIR''.   Keep the "originals" handy since if the job stumbles early in the run it might stop after having renamed them.  The model expects restart filenames to end in "rst" but produces them with the date and time appended, so you may have to rename them.   The <code>cap_restart</code> file is simply one line of text with the date of the restart files in the format YYYYMMDD<space>HHMMSS.  The boundary conditions/forcings are provided by symbolic links created by the run script.  If you need an arbitrary set of restarts, you can copy them from <code>/discover/nobackup/aeichman/restarts/Fortuna-2_0/144x91/20080327_benchmark</code>.
+Copy the restart (initial condition) files and associated <code>cap_restart</code> into ''EXPDIR''.   Keep the "originals" handy since if the job stumbles early in the run it might stop after having renamed them.  The model expects restart filenames to end in "rst" but produces them with the date and time appended, so you may have to rename them.   The <code>cap_restart</code> file is simply one line of text with the date of the restart files in the format YYYYMMDD<space>HHMMSS.  The boundary conditions/forcings are provided by symbolic links created by the run script.  If you need an arbitrary set of restarts, you can copy them from <code>/discover/nobackup/aeichman/test2_1</code>.
-The script you submit, <code>gcm_run.j</code>, is in ''HOMEDIR''.  It should be ready to go as is, though you may eventually want to tune JOB_SGMT (the number of days per segment, the internal between saving restarts) and NUM_SGMT (the number of segments attempted in a job) to maximize your run time.  Leave END_DATE alone in Fortuna 2.0 -- there is a bug that erroneously resubmits the script after this date.  You can stop the run by commenting out the <code>qsub $HOMDIR/gcm_run.j</code> at the end of the script, which will prevent the script from being resubmitted.  Those and the PBS (batch system) parameters at the beginning are all that you will usually want to change in the script.
+The script you submit, <code>gcm_run.j</code>, is in ''HOMEDIR''.  It should be ready to go as is.  The parameter END_DATE in <code>CAP.rc</code> (previously in <code>gcm_run.j</code>) can be set to the date you want the run to stop -- this works in Fortuna 2.1 where it did not in Fortuna 2.0.  Also in Fortuna 2.1, you may edit the <code>.rc</code> files directly instead of template  (<code>.tmpl</code>).   An alternative way to stop the run is by commenting out the line <code> if ( $capdate < $enddate ) qsub $HOMDIR/gcm_run.j</code> at the end of the script, which will prevent the script from being resubmitted, or rename the script file.  You may eventually want to tune parameters in the <code>CAP.rc</code> file JOB_SGMT (the number of days per segment, the interval between saving restarts) and NUM_SGMT (the number of segments attempted in a job) to maximize your run time.
 Submit the job with <code>qsub gcm_run.j</code>.  You can keep track of it with <code>qstat</code> or <code>qstat | grep ''USERID''</code>, or stdout with <code>tail -f /discover/pbs_spool/''JOBID''.OU</code>, ''JOBID'' being returned by <code>qsub</code> and displayed with <code>qstat</code>.  Jobs can be killed with <code>qdel ''JOBID''</code>.  The standard out and standard error will be delivered as files to the working directory at the time you submitted the job.