|
|
(24 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
| The instructions on this page are a summary of the GEOS-5 tutorial Arlindo da Silva has been presenting. For the moment, we presume that the user has an account on '''sourcemotel''' and access to the NCCS machines (e.g., '''discover''').
| | These are checkout and build instructions for GEOS-5. We presume that the user has an account and appropriate permissions on '''cvsacl''' to check code out, and access to the NCCS machines (e.g., '''discover''') to build and run code. |
| | | # [[GEOS-5 Checkout and Build Instructions (Heracles)]] |
| == How to Check Out the Code ==
| | # [[GEOS-5 Checkout and Build Instructions (Ganymede)]] |
| === Find a place to store and build the code ===
| | # [[GEOS-5 Checkout and Build Instructions (Fortuna)]] |
| The GEOS-5 (AGCM) source code checks out at about 40 MB of space. Once compiled, the complete package is about 500 MB. Your home space on '''discover''' may not be sufficient for checking out and building the code. You should consider either requesting either a larger quota in your home space (call the tag x6-9120 and ask, telling them you are doing GEOS-5 development work) or building in your (larger) nobackup space. But consider, ''nobackup'' is not backed up. So be careful...
| | # [[GEOS-5 Checkout and Build Instructions (pre-Fortuna)]] |
| | |
| One strategy I like is to check the code out to my ''nobackup'' space, but then make symlink from him home space back to that. For example, if I have my code stored at /discover/nobackup/colarco/GEOSagcm, I would make a symlink in my home space to point to that like:
| |
| | |
| === Setup your environment to check out code ===
| |
| With the above step complete, let's get ourselves ready to check out the code. We'll be using the UNIX ''cvs'' command to check out the code. The basic syntax is:
| |
| | |
| % cvs -d $CVSROOT checkout -r TAGNAME MODULENAME
| |
| | |
| Here, $CVSROOT specifies the CVS repository we'll be getting the code from, MODULENAME is the particular module (set of code) we'll be checking out, and TAGNAME is a particular version of that module. Let's fill in the blanks:
| |
| | |
| % cvs -d :ext:c-sourcemotel.gsfc.nasa.gov:/cvsroot/esma co -r GEOSagcm-Eros_7_25 GEOSagcm
| |
| | |
| So our module is ''GEOSagcm'' and the tag is ''GEOSagcm-Eros_7_25''. Note that I substituted the shortcut ''co'' for ''checkout'' in the above command.
| |
| | |
| The above command is generally valid. You ought to be able to execute it and checkout some code. If you don't have your ''ssh keys'' setup on '''sourcemotel''' then you should be prompted for your '''sourcemotel''' password. The assumption here is that your username on '''sourcemotel''' is the same as on the machine you are checking the code out on. If not, modify the command like this:
| |
| | |
| % cvs -d :ext:SOURCEMOTEL_USERNAME@c-sourcemotel.gsfc.nasa.gov:/cvsroot/esma co -r GEOSagcm-Eros_7_25 GEOSagcm
| |
| | |
| Here's a short cuts. So that you don't have to type in the ''-d :ext:c-sourcemotel.gsfc.nasa.gov:/cvsroot/esma'' business all the time, you can add the following lines to your, e.g., ''.cshrc'' file:
| |
| | |
| setenv CVSROOT ':ext:c-sourcemotel.gsfc.nasa.gov:/cvsroot/esma'
| |
| setenv CVS_RSH ssh
| |
| | |
| Modify as appropriate if you need to put your username in or if you use a different shell (i.e., put the analog of these lines into your .bashrc file or whatever).
| |
| | |
| If you set that up, you should be able now to type in:
| |
| | |
| % cvs co -r GEOSagcm-Eros_7_25 GEOSagcm
| |
| | |
| Enter your password when prompted and the code should check out.
| |
| | |
| If you want to avoid typing in your password all the time (recommended) you need to set up your ''ssh keys''. There are some good instructions on this [http://code613-3.gsfc.nasa.gov/Computing/security/ssh_keys.html here]. Once you have created your keys on the local machine (i.e., '''discover''') you want to give '''sourcemotel''' the contents of the public key file (~/.ssh/id_dsa.pub). Log on to '''sourcemotel''' in your browser and click on the "My Page" tab and then on the "Account Maintenance" heading. Scroll down the page and you'll see a little block called "Shell Account Information." Inside that is a link to "Edit Keys." Click the link and then paste the contents of your id_dsa.pub file as a new line into the window. Click "Update" and you're good to go. It may take 10-15 minutes for '''sourcemotel''' to propagate the updated keys, so until that happens you may still have to type in your password.
| |
| | |
| === Setup your environment to build the code ===
| |
| Now you've checked out the code. Your should have a directory called GEOSagcm in front of you. You're almost ready to build the code at this point.
| |
| | |
| The first thing to do is to make sure you have a compiler and the necessary libraries available. I'll assume you're on '''discover'''. In this case, you want to load some ''modules''. Modules define versions of compilers and libraries (like MPI and math libraries) used by the compiler. When you load the modules, they set your environment up to find the relevant versions. Again, assuming your shell is '''csh''' or '''tcsh''' then add the following lines to your ''.cshrc'' file and re-source it:
| |
| source /usr/share/modules/init/csh
| |
| module purge
| |
| module load comp/intel-9.1.052
| |
| module load lib/mkl-9.1.023
| |
| module load mpi/impi-3.1.038
| |
| This is a set of modules that will work with the version of GEOS-5 we are trying to compile here. Other module definitions are possible (you could, for example, look at the module definitions in /home/dasilva/.cshrc).
| |
| | |
| The model also has dependencies on the so-called "Baselibs." These will generally be prepared in advance for you. You just need to tell your shell where to find them. On '''discover''', you can set your environment up to know where the Baselibs are by setting the environment variable BASEDIR as follows:
| |
| | |
| setenv BASEDIR /discover/nobackup/projects/gmao/share/dao_ops/Baselibs/v2.1.1-build3
| |
| | |
| In fact, it's a good idea to add this line to your .cshrc file too. While you're at it, add the following line as well in your .cshrc file:
| |
| setenv ESMADIR /home/colarco/GEOSagcm
| |
| where you edit the path to point wherever your GEOSagcm directory is located. This sets the location of where the build gets installed. It should default to this location, but in case you have multiple instances of the code checked out it's a good idea to be explicit.
| |
| | |
| Now, with that set, navigate to the source directory:
| |
| | |
| % cd GEOSagcm/src
| |
| | |
| At this point, you can build the issuing the following command:
| |
| | |
| % gmake install
| |
| | |
| If you do that, go away and take a coffee break. A long one. This may take an hour or more to build. There are a couple of ways to speed this process up. One way is to build the code without optimization:
| |
| | |
| % gmake install FOPT=-g
| |
| | |
| The code builds faster in this instance, but be warned that without optimization any generated code will run very slowly.
| |
| | |
| A better way is to do a parallel build. To do this, start an interactive queue (on '''discover'''):
| |
| | |
| % qsub -I -W group_list=g0604 -N g5Debug -l ncpus=8 -l walltime=12:00:00 -S /bin/tcsh -V -j eo
| |
| | |
| Note that the string following "group_list=" is your group-id code. It's the project that gets charged for the computer time you use. If you're not on "g0604" that's okay, the queue system will let you know and it won't start your job. To find out which group you belong to, issue the following command:
| |
| % getsponsor
| |
| and you'll get a table of sponsor codes available to you. Enter one of those codes as the group_list string and try again.
| |
| | |
| Wait, what have we done here? We've started an interactive queue (interactive in the sense that you have a command line) where we've now go 8 cpus allocated to us (and us alone!) for the next 12 hours. We can use all those 8 cpus to speed up our build as follows:
| |
| % gmake --jobs=8 pinstall
| |
| The syntax here is that "--jobs=" specifies the number of cpus to use (up to the 8 we've requested in our queue) and "pinstall" means to do a parallel install. Don't worry, the result should be the same as "gmake install" above but take a fraction of the time.
| |
| | |
| What if something goes wrong? Sometimes the build just doesn't go right. It's useful to save the output that scrolls by on the screen to a file so you can analyze it later. Modify any of the build examples above as follows to capture the text to a file called "make.log":
| |
| % gmake --jobs=8 pinstall |& tee make.log
| |
| and now you have a record of how the build progressed. When the build completes (successfully or otherwise) you can analyze the build results by issuing the following command:
| |
| % Config/gmh.pl -v make.log
| |
| and you'll be given a list of what compiled and didn't, which will hopefull allow you to go in and find any problems.
| |
| | |
| If all goes well, you should have a brand-new build of GEOS-5. Step back up out of the ''src'' directory you should see the following sub-directories:
| |
| Config
| |
| CVS
| |
| Linux
| |
| src
| |
| In the ''Linux'' directory you'll find:
| |
| bin
| |
| Config
| |
| doc
| |
| etc
| |
| include
| |
| lib
| |
| The executables are in the ''bin'' directory.
| |
| | |
| In this example, the directory ''GEOSagcm'' is the root directory everything ends up under. You can specify another location by setting the environment variable ESMADIR to some other location and installing again.
| |
| | |
| == How to Setup and Run and Experiment ==
| |
| Now that you've built the code, let's try to run it. In the exercise that follows, we will ''clone'' a previous experiment. This will give you the basic idea of how to set up an experiment that we can refine in later exercises.
| |
| | |
| In what follows I will assume we are working on the NCCS computer '''discover'''.
| |
| | |
| Before we get going, let's make some light edits to your .cshrc file. First, near the top of your .cshrc file add the word:
| |
| unlimit
| |
| We're not sure what this means, but Arlindo says it is important.
| |
| | |
| Next, let's setup an environment variable GEOSUTIL that points to the GEOS_Util directory in your model build. In my case, it looks something like:
| |
| setenv GEOSUTIL /home/colarco/GEOSagcm/src/Shared/GEOS_Util
| |
| The GEOS_Util stuff is needed by the post-processing scripts.
| |
| | |
| Finally, let's make sure that your binaries from your compiled GEOS-5 code are in your path. Include the following line somewhere in your .cshrc file:
| |
| setenv PATH .:/home/colarco/GEOSagcm/Linux/bin:$PATH
| |
| where obviously you replace the particular path to my binaries (/home/colarco/GEOSagcm/Linux/bin) with your path. Note what this implies: it won't be a good idea to move or clean this directory while the model is running!
| |
| | |
| | |
| === Decide on Your Experiment ID and Setup the Associated Directories ===
| |
| Now we need to do is decide on an '''experiment ID'''; that is, the name of our experiment. I'll call my experiment '''dragnet''', but you can pick any name you like. Usually it's something to do with the experiment.
| |
| | |
| Next, we need to set up two directory structures. The first, the '''home directory''' will contain the scripts we use to run the experiment. We don't want to lose these, so we'll make the home directory for the experiment actually reside in our home directories on '''discover''', which is backed up and recoverable in case anything goes wrong. Since I potentially will run many GEOS-5 experiments before I retire, I'm accumulating these home directories in a sub-directory called ''geo5''. So, if you follow that, make the home directory:
| |
| | |
| % mkdir -p /home/colarco/geos5/dragnet
| |
| | |
| where you substitute your username for ''colarco''.
| |
| | |
| The second directory is the experiment directory, which is where the experiment is actually run from. This space is volatile, and the experiment can accumulate quite a bit of data as it runs, so we need a fairly large disk space to contain it. I put these things on my ''nobackup'' space on '''discover''', but note that ''nobackup'' means this is not backed up.
| |
| | |
| % mkdir /discover/nobackup/colarco/dragnet
| |
| | |
| There's nothing magical about these directory structures; you can use whatever you like. You'll just have to edit your scripts accordingly.
| |
| | |
| === Populate the Experiment Directory ===
| |
| At the beginning of the experiment, the experiment directory will contain a few things:
| |
| | |
| # Resource Files | |
| # Restart Files
| |
| # The GEOSgcm.x executable (the model itself)
| |
| | |
| I've provided a sample experiment directory (/discover/nobackup/colarco/dragnet). There are a number of files that end in ".rc". These are the ''resource files''. Copy these to your experiment directory.
| |
| | |
| I've prepared a set of restarts based on a previous GEOS-5 run that will serve for this example. You can find these files in /discover/nobackup/colarco/restart0/b72/. Copy all the "*rst" files to your experiment directory, e.g.,
| |
| % cp /discover/nobackup/colarco/restart0/b72/*rst /discover/nobackup/colarco/dragnet
| |
| | |
| Finally, copy your GEOSgcm.x executable from wherever you built the model to your experiment directory, e.g.,
| |
| % cp /discover/nobackup/colarco/GEOSagcm/Linux/bin/GEOSgcm.x /discover/nobackup/colarco/dragnet
| |
| | |
| There's one other file you need: in my experiment directory find a file called ''cap_restart''. Copy this over to your experiment directory. Take a look at this file. It has one line with two numbers:
| |
| 20021001 210000
| |
| These are the YYYYMMDD and HHMMSS timestamp I am using to start the model run. Note that I choose these dates to correspond to the date at which the restarts I gave you are valid.
| |
| | |
| I haven't told you anything much about the resource files. Suffice to say for now that this group I am providing here will differ from the resource files in your source code collection. I've modified these to run aerosols and use aerosol source files that will work for the time period we will simulate.
| |
| | |
| === Populate the Home Directory ===
| |
| Now go into your home directory and copy the contents of my sample into yours:
| |
| % cd /home/USERNAME/geos5/YOUR_EXPID
| |
| % cp /home/colarco/geos5/dragnet/* .
| |
| | |
| There are several files here:
| |
| AGCM.tmpl
| |
| CAP.tmpl
| |
| HISTORY.tmpl
| |
| gcm_run.j
| |
| gcm_post.j
| |
| gcm_post.script
| |
| gcm_regress.j
| |
| | |
| For the moment, we don't need to do anything to AGCM.tmpl or CAP.tmpl. They are filled in with information from the run script and written as AGCM.rc and CAP.rc in your experiment directory. (As a rule, if you have AGCM.rc and CAP.rc and HISTORY.rc in your experiment directory, you might want to remove them before starting an experiment so that you know you are using the *.tmpl files from your home directory.)
| |
| | |
| We have to make several edits to the remaining files. In particular, we need to change the group ID code used to charge your jobs to PBS, we need to change the experiment ID to match the ID you've chosen for your experiment, and we need to modify various path variables to point to your user ID (not mine!). You can accomplish these changes by using ''grep'' on the files. For example, look for instances of my group ID ''r0605'':
| |
| % grep r0605 *
| |
| on these files find the following:
| |
| gcm_post.j:#PBS -W group_list=r0605
| |
| gcm_regress.j:#PBS -W group_list=r0605
| |
| gcm_run.j:#PBS -W group_list=r0605
| |
| Unless you have permission to charge to r0605 you need to change these numbers. Use your favorite editor to do this. Recall, to find your group ID, use ''getsponsor''.
| |
| | |
| Here's the result of grepping on my username:
| |
| % grep colarco *
| |
| AGCM.tmpl:#REPLAY_FILE: /discover/nobackup/colarco/replay_b/winds/b5_merrasc_jan92.ana.eta.%y4%m2%d2_%h2z.hdf
| |
| gcm_post.j:setenv GEOSUTIL /home/colarco/GEOS_Util
| |
| gcm_post.j:setenv SOURCE /discover/nobackup/colarco/$EXPID
| |
| gcm_post.j:setenv ARCHIVE /archive/u/colarco/GEOS5.0/$EXPID
| |
| gcm_post.j:setenv HOMDIR /home/colarco/geos5/$EXPID
| |
| gcm_regress.j:setenv GEOSUTIL /home/colarco/GEOS_Util
| |
| gcm_regress.j:setenv EXPDIR /discover/nobackup/colarco/$EXPID
| |
| gcm_regress.j:setenv HOMDIR /home/colarco/geos5/$EXPID
| |
| gcm_run.j:setenv GEOSUTIL /home/colarco/GEOS_Util
| |
| gcm_run.j:setenv EXPDIR /discover/nobackup/colarco/$EXPID
| |
| gcm_run.j:setenv HOMDIR /home/colarco/geos5/$EXPID
| |
| gcm_run.j:/bin/ln -s /discover/nobackup/colarco/SST/dataoceanfile_MERRA_sst_1971-current.${OGCM_IM}x${OGCM_JM}.LE sst.data
| |
| gcm_run.j:/bin/ln -s /discover/nobackup/colarco/SST/dataoceanfile_MERRA_ice_temperature_1971-current.${OGCM_IM}x${OGCM_JM}.LE sstsi.data
| |
| gcm_run.j:/bin/ln -s /discover/nobackup/colarco/SST/dataoceanfile_MERRA_fraci_1971-current.${OGCM_IM}x${OGCM_JM}.LE fraci.data
| |
| Ignore the entry in AGCM.tmpl for now, it's been commented out. Ignore also the last three entries (pertaining to the location of the SST files); for whatever reason there were incompatible SST files on '''discover''' and these were copied by hand over from '''palm'''. For now, we'll use these. All the other instances of username "colarco" should be modified.
| |
| | |
| Finally, here's the results of grepping on my experiment ID:
| |
| % grep dragnet *
| |
| gcm_post.j:setenv EXPID dragnet
| |
| gcm_regress.j:#PBS -N dragnet
| |
| gcm_regress.j:setenv EXPID dragnet
| |
| gcm_run.j:#PBS -N dragnet
| |
| gcm_run.j:setenv EXPID dragnet
| |
| HISTORY.tmpl:EXPID: dragnet
| |
| Your should change all instances of '''dragnet''' here to reflect your user ID. Note that lines that look like ''#PBS -N dragnet'' are really just the name of the experiment that PBS queueing system reports. This does not have to replicate your experiment ID at all; it is merely a string that helps you identify your job running in PBS.
| |
| | |
| | |
| === Run the Simulation ===
| |
| At this point we're ready to take a crack at running the model. Go into your experiment home directory (e.g., /home/colarco/geos5/dragnet). You can start the model job by issuing:
| |
| % qsub gcm_run.j
| |
| And check the progress by issuing:
| |
| % qstat | grep YOUR_USERNAME
| |