Debugging the GEOS-5 GCM
This page describes the process of building GEOS-5 Ganymed 1.0 on NCCS discover for debugging purposes.
It is assumed that you are able to build and run the model as detailed on the Ganymed 1.0 Quick Start page. Indeed, if you couldn't, you wouldn't know that you had an issue worth debugging.
Obtaining the model (optional)
If you need to, obtain the model doing so as described in the Ganymed 1.0 Quick Start page.
Compiling the model for debugging
Setup modules for compiling
To compile the model for debugging, first set up the environment for compiling by sourcing the g5_modules
file located in GEOSagcm/src
$ cd GEOSagcm/src $ source g5_modules
This should set up the modules for compiling:
$ module list Currently Loaded Modulefiles: 1) comp/intel-11.0.083 3) lib/mkl-10.0.3.020 2) mpi/impi-3.2.2.006 4) other/SIVO-PyD/spd_1.6.0_gcc-4.3.4-sp1
Make the model with debug options
Now compile the model for debugging by making with an option argument BOPT=g
:
$ make install BOPT=g
This should take roughly 30 minutes to build.
Optional method of debug compiling with parallel_build.csh
You can also compile the model by using the parallel_build.csh
script by using the optional -debug
flag:
> ./parallel_build.csh -debug g5_modules: Setting BASEDIR and modules for discover25 ================ PARALLEL BUILD ================ The build will proceed with 10 parallel processes on 12 CPUs. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LOG and info will be written to the src/BUILD_LOG_DIR directory. Do the following to check the status/success of the build: cd BUILD_LOG_DIR ./gmh.pl [-v] LOG[.n] Note: Check the latest version of LOG for the most recent build info. Sp Code| Org | Sponsor | Research -------+-------+--------------------+---------------------------------- g0620 | 610.1 | Michele Rienecker | GMAO - Systems and Data Synthesis select group: [g0620] qsub -W group_list=g0620 -N parallel_build -l select=1:ncpus=12:mpiprocs=10:proc=west -l walltime=1:00:00 -S /bin/csh -V -j oe ./parallel_build.csh 1203711.pbsa1 unset echo 1203711.pbsa1 mathomp4 debug parallel_b -- 1 12 3892mb 01:00 Q --
As the output states, you can monitor the progress of the job in the BUILD_LOG_DIR
directory. (Note: This output will be slightly different depending on the Project Code and Username.)
Set up a run
Following the Ganymed 1.0 Quick Start page, set up a run using the gcm_setup
script.
Running the GCM under Totalview
Much of what is included in this section is based on information from NCCS's Computing Primer's entry on Totalview.
Determine MPI Layout
Next, determine the MPI layout you will be running under. For example, if you have just set up a 2-degree lat-lon run, you'll most likely be running on 4 nodes with 12 cores per node. This can be determined by looking at both AGCM.rc
:
$ head -10 AGCM.rc # Atmospheric Model Configuration Parameters # ------------------------------------------ NX: 4 NY: 12 AGCM_IM: 144 AGCM_JM: 91 AGCM_LM: 72 AGCM_GRIDNAME: PC144x91-DC
and gcm_run.j
:
$ head -10 gcm_run.j #!/bin/csh -f ####################################################################### # Batch Parameters for Run Job ####################################################################### #PBS -l walltime=12:00:00 #PBS -l select=4:ncpus=12:mpiprocs=12 #PBS -N test-G10p1-_RUN #PBS -q general
where the bolded entries show you where to see that information.
Submit an Interactive Job
Next, submit an interactive job using the MPI geometry from gcm_run.j
. Note we use xsub
because we need to allow for DISPLAY to be passed through as Totalview is an X application:
$ xsub -I -V -l select=4:ncpus=12:mpiprocs=12,walltime=2:00:00
Reload modules for model
Now that you are in a new shell, you must once again source g5_modules
so that you have the correct setup.
Load Totalview Module and Setup Environment
Next, you must load the Totalview module:
$ module load tool/tview-8.9.2.2
At present, Totalview 8.9.2.2 is both the current and recommended Totalview version to load. You should still have the modules from building the module loaded, so your module environment should at least have:
$ module list Currently Loaded Modulefiles: 1) comp/intel-11.0.083 3) lib/mkl-10.0.3.020 5) tool/tview-8.9.2.2 2) mpi/impi-3.2.2.006 4) other/SIVO-PyD/spd_1.6.0_gcc-4.3.4-sp1
Then, in your ~/.cshrc
or ~/.tcshrc
file, add:
setenv TVDSVRLAUNCHCMD ssh
and, for safety's sake, run this command interactively as well:
$ setenv TVDSVRLAUNCHCMD ssh
This is needed by Totalview. It is safe to permanently leave it in your csh/tcsh setup file, since it only affects Totalview and is necessary.
Edit gcm_run.j to use Totalview
Once you have everything set up, you must alter