Converting GEOS code from SLES 11 to SLES 12

Revision as of 12:52, 13 April 2020 by Mathomp4 (talk | contribs) (f2py)

This page will try and detail the changes needed to move a code base from SLES 11 to SLES 12.

If you have questions, please contact the SI Team.

Generic SLES 12 Information

NCCS SLES 12 FAQ

NCCS has created a Frequently Asked Questions page to help answer some generic questions about the SLES 12 transition.

Missing tools (xxdiff, ImageMagick, etc.)

You might notice that things like xxdiff and magick don't exist in your path anymore. On SLES 12, NCCS has moved many non-system-required utilities to LMod module files. So to get xxdiff, you should:

module load xxdiff

and similar for ImageMagick, tkcvs, and more. If a utility is not available, contacting NCCS Support is the first step.

Complaints on login about missing modules

You might see issues on logging into SLES 12 that modules cannot be found. The reason for this is often that you are loading modules in your .bashrc or .tcshrc files. A workaround while we are still in mixed SLES 11/SLES 12 mode is to do something like this for bash:

if [[ -e /etc/os-release ]]
then
   module load <modules-available-on-sles12>
else
   module load <modules-available-on-sles11>
fi

and similar for tcsh:

if ( -e /etc/os-release) then
   module load <modules-available-on-sles12>
else
   module load <modules-available-on-sles11>
endif

Here we are using the fact that /etc/os-release only exists on the SLES 12 systems as a proxy.

GEOSenv

This also applies to module use statements. If, for convenience, you have been loading GEOSenv at startup in interactive shells, something like:

if [[ -e /etc/os-release ]]
then
   module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES12
else
   module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES11
fi
module load GEOSenv

or in tcsh:

if (-e /etc/os-release) then
   module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES12
else
   module use -a /discover/swdev/gmao_SIteam/modulefiles-SLES11
endif
module load GEOSenv

would work, as there is a GEOSenv in both the SLES 11 and SLES 12 SI Team modulefiles. (Of course, any other modules must be protected.)

If you don't do this, you'll get errors like this on SLES 12:

Lmod has detected the following error: The following module(s) are unknown: "other/git" 

Please check the spelling or version number. Also try "module spider ..."

It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore-cache load "other/git"

Also make sure that all modulefiles written in TCL start with the string #%Module

Executing this command requires loading "other/git" which failed while processing the following
module(s): 

    Module fullname  Module Filename
    ---------------  ---------------
    GEOSenv          /discover/swdev/gmao_SIteam/modulefiles-SLES11/GEOSenv

Missing shared libraries

Eventually you might see something like:

rs_numtiles.x: error while loading shared libraries: libssl.so.0.9.8: cannot open shared object file: No such file or directory

What this usually means is that you are trying to run an executable built on SLES 11 on SLES 12. Not always, but in many cases.

GEOS Specific SLES 12

Building code on SLES 12

NCCS has requested that, if at all possible, GEOS parallel builds be done on compute nodes rather than head nodes. If you use parallel_build.csh this is done by default for you. However, if you use the more manual CMake then Make/Ninja build process for GEOS, you should run the make step on a compute node.

NCCS has noticed performance degradations on head nodes as people do builds like make -j12 as (at the moment) there are not as many head nodes on SLES 12 during this transition time (some are still SLES 11).

g5_modules

The first challenge is trying to get a g5_modules file that works with your tag. The first thing to do is look at the version of Baselibs being used. Note that if you want to take advantage of both the Haswell and Skylake nodes on SLES 12, you should use Intel MPI as your MPI stack.

NOTE: You should/must build with MPT on the Haswell compute nodes in SLES 12. If you try to build any code with MPT on a Skylake node it will fail due to missing libraries.


Baselibs 4

If your Baselibs is based on version 4 (say, 4.0.6), the best one to try is:

/gpfsm/dhome/mathomp4/GitG5Modules/SLES12/4.0.11/g5_modules.intel1805.impi1910

If you need MPT, you can use:

/gpfsm/dhome/mathomp4/GitG5Modules/SLES12/4.0.11/g5_modules.intel1805.mpt217

NOTE: It may not be an exact version match but it is essentially equivalent (has the same version of ESMF), with newer versions of some libraries that needed updating for newer OSs and Intel 18+.

Baselibs 5

Many of the tags with GEOS use Baselibs 5.1.x, so 5.1.8 is a good substitute. You can use:

/gpfsm/dhome/mathomp4/GitG5Modules/SLES12/5.1.8-Github/g5_modules.intel1805.impi1910

or for MPT:

 /gpfsm/dhome/mathomp4/GitG5Modules/SLES12/5.1.8-Github/g5_modules.intel1805.mpt217

For 5.2.x tags, use:

/gpfsm/dhome/mathomp4/GitG5Modules/SLES12/5.2.8-Github/g5_modules.intel1805.impi1910

or for MPT:

/gpfsm/dhome/mathomp4/GitG5Modules/SLES12/5.2.8-Github/g5_modules.intel1805.mpt217

Baselibs 6

If you using Baselibs 6.0.x, the files to use are:

 /gpfsm/dhome/mathomp4/GitG5Modules/SLES12/6.0.4/g5_modules.intel1805.impi1910
 /gpfsm/dhome/mathomp4/GitG5Modules/SLES12/6.0.4/g5_modules.intel1805.mpt217

src/Config

NOTE: Most of the fundamental issues with moving to SLES 12 are due to the files in src/Config. The reason is that older tags of GEOS don't handle Intel 18+ and Intel MPI 19+ well due to differences in flags and library names. Without updating files in here, the whole build will fall apart because it doesn't know that Intel 18+ existed!

Heracles

For a Heracles-based tag, you can start with updating these files to bw_Heracles-5_4_p3-SLES12:

ESMA_arch.mk
fdp

Icarus

Old-style make system

For an older Icarus-based tag (one that doesn't have ifort.mk and mpi.mk, try updating these files to Icarus-3_2_p9-SLES12:

ESMA_arch.mk
fdp

New-style make system

For a new style make system with files like ifort.mk and mpi.mk you will need some changes due to changes in compiler and MPI stacks.

mpi.mk

For this file, the Intel MPI section should look like:

 ifdef I_MPI_ROOT
     FC := mpiifort
     INC_MPI := $(I_MPI_ROOT)/include64
     LIB_MPI := -L$(I_MPI_ROOT)/lib64 -lmpifort -lmpi # Intel MPI
     LIB_MPI_OMP := -L$(I_MPI_ROOT)/lib64 -lmpifort -lmpi # Intel MPI
 else

Jason

A good comparison for Jason tags would be to compare against Jason-3_6_p1 when it comes to Config and other make issues.

f2py

One large challenge will be f2py based files. Any f2py build that depends on $(LIB_SDF) will need updating due to GEOSpyD (the Python stack on SLES12). The fix can be demonstrated with GFIO_.so. It was originally built as:

GFIO_.$(F2PYEXT): GFIO_py.F90 r4_install
       $(F2PY) -c -m GFIO_ $(M). $(M)$(INC_SDF) \
                GFIO_py.F90 r4/libGMAO_gfio_r4.a $(LIB_SDF) $(LIB_SYS) \
                only: gfioopen gfiocreate gfiodiminquire gfioinquire\
                      gfiogetvar gfiogetvart gfioputvar gfiogetbegdatetime\
                      gfiointerpxy gfiointerpnn gfiocoordnn gfioclose :</nowiki>

Notice how we pass in $(LIB_SDF) to f2py? The fix for this is to define a new $(XLIBS) and add that:

XLIBS =
ifeq ($(wildcard /etc/os-release/.*),)
   XLIBS = -L/usr/lib64 -lssl -lcrypto
endif

GFIO_.$(F2PYEXT): GFIO_py.F90 r4_install
       $(F2PY) -c -m GFIO_ $(M). $(M)$(INC_SDF) \
                GFIO_py.F90 r4/libGMAO_gfio_r4.a $(LIB_SDF) $(LIB_SYS) $(XLIBS)\
                only: gfioopen gfiocreate gfiodiminquire gfioinquire\
                      gfiogetvar gfiogetvart gfioputvar gfiogetbegdatetime\
                      gfiointerpxy gfiointerpnn gfiocoordnn gfioclose : </nowiki>

Here we use the fact that the file /etc/os-release doesn't exist on SLES 11.

NOTE that this does not present itself at compile time, but rather as a run-time error a la:

ImportError: ..../Linux/lib/Python/GFIO_.so: undefined symbol: SSLeay

A (partial?) list of f2py builds that need this fix are:

src/GMAO_Shared/GMAO_ods/GNUmakefile
src/GMAO_Shared/Chem_Base/GNUmakefile
src/GMAO_Shared/GMAO_gfio/GNUmakefile

Hardcoded -openmp in make

Build time error

Another common theme is the inclusion of a hardcoded -openmp flag in GNUmakefile. The reason is that Intel deprecated and then by Intel 18, removed the -openmp flag and changed it to -qopenmp.

Examples can be seen in GEOSgcs_GridComp/GEOSgcm_GridComp/GEOSagcm_GridComp/GEOSphysics_GridComp/GEOSsurface_GridComp/Shared/Raster/src/GNUmakefile:

RASTER_OMPFLAG =
ifeq ($(ESMA_FC), ifort)
#       RASTER_OMPFLAG = -openmp
endif
OPENMP_FLAG = -openmp

As said above, this flag was changed by Intel to be -qopenmp but a better change is to use the generic OMPFLAG alias:

RASTER_OMPFLAG =
ifeq ($(ESMA_FC), ifort)
#       RASTER_OMPFLAG = $(OMPFLAG)
endif
OPENMP_FLAG = $(OMPFLAG)

Link-time error

If, when you build, your executable has an issue saying things like kmp... not found, this can often be an issue with either trying to link with -openmp or not linking to it at all. An example is Applications/NCEP_Etc/NCEP_enkf/GNUmakefile_in in GEOSadas-5_17_0p5B:

USER_LDFLAGS = -openmp

which should become:

USER_LDFLAGS = $(OMPFLAG)

Issues with passing SetServices in ESMF_GridCompSetServices

You might occasionally get an error with a call to ESMF_GridCompSetServices. For example, building GEOSadas-5_17_0p5B, you will encounter:

geos_pertmod.F90(328): error #7061: The characteristics of dummy argument 1 of the associated actual procedure differ from the characteristics of dummy argument 1 of the dummy procedure.   [AGCMPERT_SETSERVICES]
   call ESMF_GridCompSetServices ( pertmod_gc, agcmPert_SetServices, rc=ier)
-----------------------------------------------^
compilation aborted for geos_pertmod.F90 (code 1)

The issue is that Intel 18+ is much stricter with Fortran, and requires the that procedure being passed to ESMF_GridCompSetServices have the exact signature required by ESMF for a SetServices. This interface is:

    interface
      subroutine userRoutine(gridcomp, rc)
        use ESMF_CompMod
        implicit none
        type(ESMF_GridComp)        :: gridcomp ! must not be optional
        integer, intent(out)       :: rc       ! must not be optional
      end subroutine
    end interface

So, if we look at the SetServices in GEOS_AgcmPertGridComp.F90, we see:

! !IROUTINE: SetServices -- Sets ESMF services for this component

! !INTERFACE:

    subroutine SetServices ( GC, RC )

! !ARGUMENTS:

    type(ESMF_GridComp), intent(INOUT) :: GC  ! gridded component
    integer            , intent(  out) :: RC  ! return code

Notice that the GC is intent(INOUT), but ESMF's interface does not have this. The solution is to remove the intent:

! !IROUTINE: SetServices -- Sets ESMF services for this component

! !INTERFACE:

    subroutine SetServices ( GC, RC )

! !ARGUMENTS:

    type(ESMF_GridComp)                :: GC  ! gridded component
    integer            , intent(  out) :: RC  ! return code

This can occur with other Gridded Components where RC is option, or has the wrong intent.

Internal Compiler Error with ADA_Module.F90

Occasionally, when you try to build ADA_Module.F90 with Intel 18+, you will get an Internal Compiler Error (ICE) and the build will crash. This is a bug in Intel and can be fixed by changing the optimization level of this file to anything but -O2. However, doing that can (and most likely will) change answers if this file is used.

One possible workaround that has worked is to change your TMPDIR when building. Why? No idea. But it seems to help. If building by hand, do:

setenv TMPDIR /tmp

If using parallel_build.csh, use parallel_build.csh -tmpdir /tmp.

No rule to make target 'it'

This one is due to the differences in the base GNU include files. Namely, at some point larger C-block comments were added to standard include files. There are various ways to solve this.

.P90 file

If the error happens with a .P90 file, then src/Config/ESMA_base.mk needs to be changed such that:

.P90.o
       @sed -e "/\!.*'/s/'//g" $< | $(CPP) -C -ansi -DANSI_CPP $(FPPFLAGS) > $*___.s90

becomes:

.P90.o
       @sed -e "/\!.*'/s/'//g" $< | $(CPP) -C -nostdinc -std=c99 -DANSI_CPP $(FPPFLAGS) > $*___.s90

In this case, the -nostdinc solved the 'it' issue. It also turns out cpp removed the -ansi flag, so we substitute -std=c99.

Other files

If this happens with another Fortran file, usually that means that directory is doing its own preprocessing. For example, GMAO_gems can encounter this because it has:

$(SRCS): %.f90: src/%.f90
	@echo Preprocessing $@ from $<
	-@$(FPP) -P -D_PARALLEL -C $< >$@
communication_primitives.f90: src/communication_primitives.mpi.f90
	@echo Preprocessing $@ from $<
	-@$(FPP) -P -D_PARALLEL -C $< >$@

This needs to change to:

$(SRCS): %.f90: src/%.f90
	@echo Preprocessing $@ from $<
	-@$(FPP) -P -D_PARALLEL -nostdinc -C $< >$@
communication_primitives.f90: src/communication_primitives.mpi.f90
	@echo Preprocessing $@ from $<
	-@$(FPP) -P -D_PARALLEL -nostdinc -C $< >$@

Double continuation characters

Some codes have doubled continuation characters which lead to:

odas/odas_decorrelation.F90(354): warning #5152: Invalid use of '&'. Not followed by comment or end of line
               chi2 = chi0, angle1 = odas_grid.angles(i, j, 1), angle2 = v.angle, scale1 = odas_grid.scales(i, j, 1), scale2 = & &
---------------------------------------------------------------------------------------------------------------------------------^

As the error says, remove the second one. Note that this simple error can throw a lot of additional errors due to the compiler incorrectly parsing files after this.

Undefined references to MPI, PMPI, etc

If you encounter messages like this:

ld: /discover/swdev/mathomp4/Baselibs/ESMA-Baselibs-4.0.11-SLES12/x86_64-unknown-linux-gnu/ifort_18.0.5.274-mpt_2.17/Linux/lib/libesmf.so: 
undefined reference to `MPI::Op::Free()'

when linking an executable, it usually means that the link step is missing $(LIB_MPI). Add it to the link flags (in this case *after* $(LIB_ESMF)).

ld: failed to convert GOTPCREL relocation; relink with --no-relax

The S2S V2 code on transition to SLES 12 encountered this error:

ld: failed to convert GOTPCREL relocation; relink with --no-relax

A possible solution (still under testing) is to do exactly what it says and add -Wl,--no-relax to USER_LDFLAGS

All my Perl scripts are failing!

As an overall comparison, a good tag to cvs diff against is GEOSadas-5_25_1_p2. You are looking for changes like those described below.

Shell

Many/All of the Perl scripts in the ADAS will fail due their dependence on an older version of Shell.pm that doesn't exist on SLES 12. Indeed, Shell.pm isn't even a correct call on SLES 12!

Now for many/most scripts, you don't even need a use Shell command. For example, if your file has:

use Shell qw(rm);

or:

use Shell qw(cat cut wc); # shell commands

and you search the script and there are no rm();, cat();, cut();, or wc calls, remove the line! Often it is cruft from years of accrual.

But, the main takeaway is, in the end, any Perl script with use Shell will need to be changed and use Shell must be removed. In one case (see below), you can change to use use CPAN::Shell, which is the Shell module on SLES 12, but your goal, again, is to remove use Shell.

rm (the most common)

This is by far the most common change needed. Change all:

rm();

to:

unlink();

cp

If you need cp(); do not use:

use Shell qw(cp);

use:

use File::Copy "cp";

mv

For mv(); use:

use File::Copy "mv";

rcp rsh...

Change:

use Shell qw(cat rcp rsh scp ssh);   # make C-shell commands available

to:

use CPAN::Shell qw(cat rcp rsh scp ssh);   # make C-shell commands available

Note this doesn't work for some of the Shell commands above because, say, rm(); was removed from Shell.pm.

timelocal.pl

Replace:

require('timelocal.pl');

with:

use Time::Local;

foreach loop issues

Another difference between the Perl on SLES 11 and in SLES 12 is in its handling of some foreach loops in GEOS. Examples can be found in fvsetup in some tags. For example:

foreach $dir qw(ana diag daotovs etc obs prog rs run recycle fcst asens anasa) {

will fail with something like:

syntax error at ./fvsetup line 4557, near "$dir qw(ana diag daotovs etc obs prog rs run recycle fcst asens anasa)"

The solution is that you have to surround the qw() with parentheses itself:

foreach $dir (qw(ana diag daotovs etc obs prog rs run recycle fcst asens anasa)) {
             *                                                                 *

Essentially all your foreach calls should have parentheses around the array you are looping over.

A GEOSadas user found that these files:

idcheck.pl
fvsetup
gen_silo_arc.pl
monthly.yyyymm.pl.tmpl

had foreach issues.

ImportError: No module named cross_validation

You might encounter this on SLES 12:

    from sklearn.cross_validation import cross_val_score
ImportError: No module named cross_validation

This is due to the newer version of Python on SLES 12. A solution is to do:

# sklearn changed where cross_val_score exists
try:
    from sklearn.model_selection  import cross_val_score
except ImportError:
    from sklearn.cross_validation import cross_val_score

Use of #PBS pragmas in run scripts

With SLES 12, NCCS has removed support for using #PBS pragmas in SLURM sbatch scripts. You will need to convert them to the appropriate #SBATCH pragmas. You can find various pages about this on the web, such as this one.