Debugging the GEOS-5 GCM: Difference between revisions

From GEOS-5
Jump to navigation Jump to search
m Use rightTOC as it seems prettier.
Add up to gcm_run edit
Line 1: Line 1:
{{rightTOC}}
{{rightTOC}}


This page describes the process of building GEOS-5 Ganymed 1.0 on NCCS discover and NAS pleiades for debugging purposes.  
This page describes the process of building GEOS-5 Ganymed 1.0 on NCCS discover for debugging purposes.  


'''It is assumed that you are able to build and run the model as detailed on the [[Ganymed 1.0 Quick Start]] page.''' Indeed, if you couldn't, you wouldn't know that you had an issue worth debugging.
'''It is assumed that you are able to build and run the model as detailed on the [[Ganymed 1.0 Quick Start]] page.''' Indeed, if you couldn't, you wouldn't know that you had an issue worth debugging.
Line 10: Line 10:


== Compiling the model for debugging ==
== Compiling the model for debugging ==
=== Setup modules for compiling ===


To compile the model for debugging, first set up the environment for compiling by sourcing the <code>g5_modules</code> file located in <code>GEOSagcm/src</code>
To compile the model for debugging, first set up the environment for compiling by sourcing the <code>g5_modules</code> file located in <code>GEOSagcm/src</code>
Line 22: Line 24:
   1) comp/intel-11.0.083                      3) lib/mkl-10.0.3.020
   1) comp/intel-11.0.083                      3) lib/mkl-10.0.3.020
   2) mpi/impi-3.2.2.006                      4) other/SIVO-PyD/spd_1.6.0_gcc-4.3.4-sp1
   2) mpi/impi-3.2.2.006                      4) other/SIVO-PyD/spd_1.6.0_gcc-4.3.4-sp1
=== Make the model with debug options ===


Now compile the model for debugging by making with an option argument <code><b>BOPT=g</b></code>:
Now compile the model for debugging by making with an option argument <code><b>BOPT=g</b></code>:
Line 72: Line 76:


Much of what is included in this section is based on information from [http://www.nccs.nasa.gov/primer/computing.html#totalview NCCS's Computing Primer's entry on Totalview].
Much of what is included in this section is based on information from [http://www.nccs.nasa.gov/primer/computing.html#totalview NCCS's Computing Primer's entry on Totalview].
=== Determine MPI Layout ===
Next, determine the MPI layout you will be running under. For example, if you have just set up a 2-degree lat-lon run, you'll most likely be running on 4 nodes with 12 cores per node. This can be determined by looking at both <code>AGCM.rc</code>:
$ head -10 AGCM.rc
# Atmospheric Model Configuration Parameters
# ------------------------------------------
            '''NX: 4'''
            '''NY: 12'''
      AGCM_IM: 144
      AGCM_JM: 91
      AGCM_LM: 72
AGCM_GRIDNAME: PC144x91-DC
and <code>gcm_run.j</code>:
$ head -10 gcm_run.j
#!/bin/csh -f
#######################################################################
#                    Batch Parameters for Run Job
#######################################################################
#PBS -l walltime=12:00:00
'''#PBS -l select=4:ncpus=12:mpiprocs=12'''
#PBS -N test-G10p1-_RUN
#PBS -q general
where the bolded entries show you where to see that information.
=== Submit an Interactive Job ===
Next, submit an [http://www.nccs.nasa.gov/primer/computing.html#interactive interactive job] using the MPI geometry from <code>gcm_run.j</code>. Note we use <code>xsub</code> because we need to allow for DISPLAY to be passed through as Totalview is an X application:
$ xsub -I -V -l select=4:ncpus=12:mpiprocs=12,walltime=2:00:00
=== Reload modules for model ===
Now that you are in a new shell, you must once again [[#Setup modules for compiling|source <code>g5_modules</code>]] so that you have the correct setup.
=== Load Totalview Module and Setup Environment ===
Next, you must load the Totalview module:
$ module load tool/tview-8.9.2.2
At present, Totalview 8.9.2.2 is both the current and recommended Totalview version to load. You should still have the modules from building the module loaded, so your module environment should at least have:
$ module list
Currently Loaded Modulefiles:
  1) comp/intel-11.0.083                      3) lib/mkl-10.0.3.020                      5) tool/tview-8.9.2.2
  2) mpi/impi-3.2.2.006                      4) other/SIVO-PyD/spd_1.6.0_gcc-4.3.4-sp1 
Then, in your <code>~/.cshrc</code> or <code>~/.tcshrc</code> file, add:
setenv TVDSVRLAUNCHCMD ssh
and, for safety's sake, run this command interactively as well:
$ setenv TVDSVRLAUNCHCMD ssh
This is needed by Totalview. It is safe to permanently leave it in your csh/tcsh setup file, since it only affects Totalview and is necessary.
=== Edit gcm_run.j to use Totalview ===
Once you have everything set up, you must alter

Revision as of 11:32, 26 November 2012

This page describes the process of building GEOS-5 Ganymed 1.0 on NCCS discover for debugging purposes.

It is assumed that you are able to build and run the model as detailed on the Ganymed 1.0 Quick Start page. Indeed, if you couldn't, you wouldn't know that you had an issue worth debugging.

Obtaining the model (optional)

If you need to, obtain the model doing so as described in the Ganymed 1.0 Quick Start page.

Compiling the model for debugging

Setup modules for compiling

To compile the model for debugging, first set up the environment for compiling by sourcing the g5_modules file located in GEOSagcm/src

$ cd GEOSagcm/src
$ source g5_modules

This should set up the modules for compiling:

$ module list
Currently Loaded Modulefiles:
 1) comp/intel-11.0.083                      3) lib/mkl-10.0.3.020
 2) mpi/impi-3.2.2.006                       4) other/SIVO-PyD/spd_1.6.0_gcc-4.3.4-sp1

Make the model with debug options

Now compile the model for debugging by making with an option argument BOPT=g:

$ make install BOPT=g

This should take roughly 30 minutes to build.

Optional method of debug compiling with parallel_build.csh

You can also compile the model by using the parallel_build.csh script by using the optional -debug flag:

> ./parallel_build.csh -debug
g5_modules: Setting BASEDIR and modules for discover25

   ================
    PARALLEL BUILD 
   ================

The build will proceed with 10 parallel processes on 12 CPUs.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
LOG and info will be written to the src/BUILD_LOG_DIR directory.
Do the following to check the status/success of the build:

  cd BUILD_LOG_DIR
  ./gmh.pl [-v] LOG[.n]

Note: Check the latest version of LOG for the most recent build info.



Sp Code|  Org  | Sponsor            | Research
-------+-------+--------------------+----------------------------------
 g0620 | 610.1 | Michele Rienecker  | GMAO - Systems and Data Synthesis

select group: [g0620] 
qsub -W group_list=g0620 -N parallel_build -l select=1:ncpus=12:mpiprocs=10:proc=west -l walltime=1:00:00 -S /bin/csh -V -j oe ./parallel_build.csh
1203711.pbsa1
unset echo
1203711.pbsa1   mathomp4 debug    parallel_b    --    1  12 3892mb 01:00 Q   -- 

As the output states, you can monitor the progress of the job in the BUILD_LOG_DIR directory. (Note: This output will be slightly different depending on the Project Code and Username.)

Set up a run

Following the Ganymed 1.0 Quick Start page, set up a run using the gcm_setup script.

Running the GCM under Totalview

Much of what is included in this section is based on information from NCCS's Computing Primer's entry on Totalview.

Determine MPI Layout

Next, determine the MPI layout you will be running under. For example, if you have just set up a 2-degree lat-lon run, you'll most likely be running on 4 nodes with 12 cores per node. This can be determined by looking at both AGCM.rc:

$ head -10 AGCM.rc 

# Atmospheric Model Configuration Parameters
# ------------------------------------------
           NX: 4
           NY: 12
      AGCM_IM: 144
      AGCM_JM: 91
      AGCM_LM: 72
AGCM_GRIDNAME: PC144x91-DC

and gcm_run.j:

$ head -10 gcm_run.j 
#!/bin/csh -f

#######################################################################
#                     Batch Parameters for Run Job
####################################################################### 

#PBS -l walltime=12:00:00
#PBS -l select=4:ncpus=12:mpiprocs=12
#PBS -N test-G10p1-_RUN
#PBS -q general

where the bolded entries show you where to see that information.

Submit an Interactive Job

Next, submit an interactive job using the MPI geometry from gcm_run.j. Note we use xsub because we need to allow for DISPLAY to be passed through as Totalview is an X application:

$ xsub -I -V -l select=4:ncpus=12:mpiprocs=12,walltime=2:00:00

Reload modules for model

Now that you are in a new shell, you must once again source g5_modules so that you have the correct setup.

Load Totalview Module and Setup Environment

Next, you must load the Totalview module:

$ module load tool/tview-8.9.2.2

At present, Totalview 8.9.2.2 is both the current and recommended Totalview version to load. You should still have the modules from building the module loaded, so your module environment should at least have:

$ module list
Currently Loaded Modulefiles:
  1) comp/intel-11.0.083                      3) lib/mkl-10.0.3.020                       5) tool/tview-8.9.2.2
  2) mpi/impi-3.2.2.006                       4) other/SIVO-PyD/spd_1.6.0_gcc-4.3.4-sp1   

Then, in your ~/.cshrc or ~/.tcshrc file, add:

setenv TVDSVRLAUNCHCMD ssh 

and, for safety's sake, run this command interactively as well:

$ setenv TVDSVRLAUNCHCMD ssh 

This is needed by Totalview. It is safe to permanently leave it in your csh/tcsh setup file, since it only affects Totalview and is necessary.

Edit gcm_run.j to use Totalview

Once you have everything set up, you must alter