Using the ExtData component

Overview of the ExtData Component

MAPL_ExtDataGridCompMod is an internal MAPL gridded component used to fulfill imports fields in a MAPL hierarchy from netcdf files on disk. It is usually one of the three gridded components in the cap or main program of a MAPL application, the others being the root of the MAPL hierarchy it is servicing and MAPL_HistoryGridCompMod. It is instanciated and all its registered methods are run automatically by the CAP. MAPL_ExtDataGridCompMod will provide data to fields in the Import states of MAPL components that are not satisfied by a MAPL_AddConnectivity call in the MAPL hierarchy. In a MAPL application fields added to the Import state of a component are passed up the MAPL hierarchy looking for a connectivity to another component that will provide data to fill the import. If a connectivity is not found these fields will eventually reach the cap. At this point any fields that have not have their connectivity satisfied are passed to the MAPL_ExtDataGridCompMod through its Export state. MAPL_ExtDataGridCompMod is in essence a provider of last resort for Import fields that need to be filled with data. Like other components it has a run method that gets called every step in your MAPL application. What actually happens when it is run is determined by the resource file. The graphic below illustrates where ExtData fits in the MAPL application.

The user provides a resource file available to the MAPL_ExtDataGridCompMod GC. At its heart this resource file provides a connection between a field name and a variable name in a netcdf file on disk. The component receives a list of fields that need to be filled and parses the resource file to determine if MAPL_ExtDataGridCompMod can fill a variable of that name. We will refer to each field name-file variable combination as a primary export. Each primary export is an announcment that MAPL_ExtDataGridCompMod is capable of filling a field named A with data contained in variable B on file xyz. Note that the field name in each primary export does not need to actually be a field that needs to be filled by the model. The component only processes primary exports that are needed The resource file should be viewed as an anncountment of what MAPL_ExtDataGridCompMod can provide. In addition to simply announcing what MAPL_ExtDataGridCompMod can provide the user can specify other information such as how frequently to update the data from disk. This could be at every step, just once when starting the model run, or at a particular time each days. MAPL_ExtDataGridCompMod also allows data to be shifted and scaled.

Using ExtData requires constructing your input files in a logical way. There is one general principal to always remember. Unless the file represents a climatology you must have data that spans the time you intend to run your application. For example if you have data files with dates in 2009 on them and your application tries to run in 2001 it will crash!

ExtData Resource File

The ExtData resource file can have 3 types of entries, a Primary Export section, a Derived Export section, and a Mask section. The file itself is processed sequentially so that it can have any number of Primary, Derived, and Mask sections. Each of these sections is a list of single line entries. Each entry describes how to supply a variable of a given name with data. The tables currently use %% instead of the standard :: identifiers because of an odd behaviour in ESMF.

Primary Exports

The first is the primary exports. The example below is an example of a primary export block. This may have as many entries as the user desires. Note that only the last three entries in a line are optional, even though the units are not currently used (this may change someday).

PrimaryExports%%
# ---------|---------|-----|-----|------|------------ |----------------------|--------|-------|---------------------------------|-----------------------------------------------------------|
#  Export  |         |     | V   |      |             |_______ Refresh ______|____ Factors ___|________ External File __________|______________________External File Time Data______________|
#  Name    | Units   | Dim | Loc | Clim |Conservative |     Time Template    | Offset | Scale | Variable |      Template        |  Reference Time and frequency                             |
# ---------|---------|-----|-----|------|------------ |----------------------|--------|-------|----------|----------------------|-----------------------------------------------------------|
ALBNF           NA      xy    c      N        N               0                0.0      1.0     ALBNF     myfile.%y4%m2%d2.nc4   2000-04-15T00:00:00P03:00
du001           NA      xyz   c      N        N               0                0.0      1.0     du001     /dev/null
%%

The following is an explanation of each entry in a line.

Export Name - This is the actual name of the export in the application to fill

Units - Units not currently used, just a placeholder for now

Dimensions - xy for 2D or xyz for 3D

Vertical location - c for center, e for edge, if 2d enter c or e but this will obviously not be used

Clim - enter Y if the file is a 12 month climatology, otherwise enter N. If you specify it is a climatology ExtData the data can be on either one file or 12 files if they are templated appropriately with one per month. Note at this time that ExtData does not support a generic climatology. This may be an option in the future.

Conservative - Enter Y the data should be regridded in a mass conserving fashion through a tiling file. Otherwise enter N to use the non-conervative bilinear regridding.

Refresh template - you have 3 choices

1.) Enter '-'. In this case the field will only be updated once the first time ExtData runs

2.) Enter a refresh template of the form %y4-%m2-%h2T%h2:%n2:00 to set the recurring time to update the file. The file will be updated when the evaluated template changes. For example a template of the form %y4-%m2-%d2T12:00:00 will cause the variable to be updated at the start of a new day. Note that ExtData wil use the evaluated template as the working time for reading the file and will try to interpolate to that time. So in the example of %y4-%m2-%d2T12:00:00 when the clock hits 2007-08-02T00:00:00 it will update the variable but the time it will use for reading and interpolation is 2007-08-02T12:00:00

3.) Enter '0' to update the variable at every step. ExtData will do a linear interpolation using the available times on the file. This option allows the user more sophisticated option for breaking the data up into different files. See the discussion further down about the file frequency.

4.) Enter Phr:mn where hr is a two digit hour and mn is a two digit minute or enter Pyear-mm-ddThr:mn where now year, mm, and dd are a year, month and day, this is an interval at which to update the variable from the start time of the clock used in the program.

Offset - This is a factor the variable will be shifted by, if you enter "none", no shifting will be performed, If you do not want to shift do not put 0.0 as you will be wasting time since it will multiply by 0.0 instead of skipping the shifting.

Scale - This is a factor the variable will be scaled by, if you enter "none", no scaling will be performed, If you do not want to scale do not put 1.0 as you will be wasting time since it will multiply by 1.0 instead of skipping the scaling.

Variable - This is the name of the variable ON THE FILE. It need not be the same as the export name.

File Template - this is a grads style template, if there are no tokens in the template name ExtData will assume that all the data is on one file. Note that if the data on file is at a different resolution that the application grid, the underlying I/O library ExtData uses will regrid the data to the application grid. The user can enter /dev/null to simply fill the import with zero or enter /dev/null:300.0 to set the import to a non-zero constant value.

The next keyword is optional. Also note if your data is on one file then it makes there is no point to this keyword - this is time and time-interval that describes the start time and frequency of the file template you provided and has the form %y4-%m2-%d2T%h2%n2P%y4-%m2-%d2T%h2%n2 where the time before the P is a reference time and the time after the P is a time interval. Note that the year, month, and day can be left off as a unit in the time interval and it will assume that these are zero. This keyword says that the first time that the file template is good for is the reference time and there will be files using the supplied file template at the interval provided. This provides a direct way to specify if ExtData can not determine the file frequency from the template. For example if you have data every half hour to ingest, this can not be "guessed" from a file template that has a last token of a minute or you had files every 3 hours. For example, a valid entry would be 2012-01-01T21:00P03:00. This says you have a file every 3 hours from starting at 21z on 01/01/2012.

Refresh Template

The refresh template allows quite a bit of flexibility in how the data can be distributed in the files and when the data is updated.

If you enter '-' the field will be updated only the first time ExtData runs. It will take the current model time, apply that to the file template provided and finally try to find data on the file at the current model time to fill the variable. It will not do time interpolation.

If you enter a refresh template to update at a certain time each day, month, etc ... as described above there are a few considerations. When ExtData trieds to fill the field at the time described by the template it will once again apply the current model time to the file template, and then try to find data on file at that time. It will not do time interpolation.

The most complex case is if you enter '0' for the refresh template. In this case the field gets updated every step and ExtData will try to interpolate to that time using the data on file and can interpolate between data points in different files. In order to accomplish this if you select a refresh template of '0' ExtData tries to find two times that bracket the start time the first time ExtData runs. These get updated as time advances in your application. But how does it know what time to apply to the file template if the data is spread across multiple files? This is where the idea of a file frequency and reference time comes in. The user can specify a file frequency and a reference time either with the 3 optional key words to the primary export or by letting ExtData "guess" from the file template. The idea is that starting from a reference time and the given frequency ExtData tries to find the bracketing data on the file with the time closest to the current time. If it does not find the data there it checks the next time. The user can specify a reference time and the frequency. If these are not given it tries to guess this from the file template. For example if the last token in the template is an hour then the frequency is an hour and the reference time is whatever the current data is with anything after the hour set to zero. If the file template is myfile_%y4%m2%d2_%h2.nc4 and ExtData will assume the frequency is 1 hour and the reference time is the top of the hour on whatever date you start the application. The following picture illustrates another example.

Finally suppose you had data that was arrange in files like below with one time per file:

d591_fpit.tavg3_3d_nav_Ne.20130701_0130z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_0430z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_0730z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_1030z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_1330z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_1630z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_1930z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_2230z.nc4

and so on . . .

If the variable PLE was contained in the files and you wanted to do continuous time interpolation you would add line like this to ExtData.rc:

PLE NA xyz e N N 0 none none d591_fpit.tavg3_3d_nav_Ne.%y4%m2%d2_%h2%n2z.nc4 2013-07-01T01:30:00 hours 3

Special Options

There are several special options and cases that the user should be aware of.

Constant Files

Occasionally the user might want to satisfy a variable with a data set that is constant (not in the sense that you want to set the import to the same value but you have some geospatial data with only one time on a file that has no tokens) and should not be time interpolated. If the user specifies "-" for the refresh template (update once) and ExtData finds that the file template has not tokens, and that the file itself only has one time, ExtData will note this and update the variable once, and update the variable once with this set of data.

Vector Variables

When importing winds into an application the user has the option to handle these specially so that the u and v components are not separately treated as scalars, but as proper vector quantities when regridding to the target grid. Currently this is only supported when going from Lat-Lon data to an application on the Cube-Sphere grid. The syntax is for this is as follows:

UC0;VC0 'm s-1' xyz C N N 0 0.0 1.0 U;V file_template

Notice the difference is that the two components of the wind are entered on one line and separated by a semi-colon, likewise the corresponding variables in the file are separated by a semi-colon. In addition to regridding to the cube, the u and v components will be moved to the stagger location and rotation (along the meridonal and zonal directions or along the cube faces) specified in the import spec, for example to import winds to the cube-sphere grid on the c-grid stagger location and rotated along the cube faces, you would define the imports as such:

     call MAPL_AddImportSpec ( gc,                                  &
          SHORT_NAME = 'UC0',                                       &
          LONG_NAME  = 'eastward_wind_on_C-Grid_after_advection',   &
          UNITS      = 'm s-1',                                     &
          STAGGERING = MAPL_CGrid,                                  &
          ROTATION   = MAPL_RotateCube,                             &
          DIMS       = MAPL_DimsHorzVert,                           &
          VLOCATION  = MAPL_VLocationCenter,             RC=STATUS  )
     VERIFY_(STATUS)

     call MAPL_AddImportSpec ( gc,                                  &
          SHORT_NAME = 'VC0',                                       &
          LONG_NAME  = 'northward_wind_on_C-Grid_before_advection', &
          UNITS      = 'm s-1',                                     &
          STAGGERING = MAPL_CGrid,                                  &
          ROTATION   = MAPL_RotateCube,                             &
          DIMS       = MAPL_DimsHorzVert,                           &
          VLOCATION  = MAPL_VLocationCenter,             RC=STATUS  )
     VERIFY_(STATUS)

The possible staggering locations are MAPL_AGrid, MAPL_CGrid, MAPL_DGrid. The possible rotation options are MAPL_RotateLL (winds will be along meridonal and zonal directions) or MAPL_RotateCube (winds will be rotated along the cube faces).

Masks


Masks%%
# ---------|------------|------|
#  Export  | Name on    |      |
#  Name    |   File     | File |
# ---------|------------|------|
CO_MASK     regionMask   path_to_file
%%

Masks represent a special kind of data. Sometimes one has data that is purely integers. For example this might be different regions of the world that are tagged with a particular integer index. Just like primary entries a mask entry satisfies an import. However they are treated specially and interpolated so that the values on the application grid remain integers. There is also no facility to time interpolate this data so the file should only have one time entry. The masks are updated only once during initialization.

Derived Exports

The user can also specify derived exports. The user specifies an import to satisfy, an expression, and a refresh template. When the import is updated it is updated with the result of the expression. The expression is evaluated user the MAPL parser component and can involve any fields in the primary export section. Although it is currently not checked, the refresh template on the derived export should be the same as the primary exports in the expression. Also, the primary exports in the expression need not actually be needed to fulfill an import but can still be used in a derived expression. In addition to arithmetic expressions the there two functions that are currently supported for derived expressions. The two supported functions are a region mask and zone mask.

The region mask function has the following syntax:

regionmask(variable_to_mask,name_of_mask;n1,n2,...)

the variable_to_mask is en entry from the primary exports

the name_of_mask is an entry from the mask exports

following these two entries, there is a semi-colon and a comma-separated list of integers. Anywhere the mask is one of these integers it uses the value from the variable to be masked, outside these points the variable is undefined.

The zone masking function has the following syntax:

zonemask(variable_to_mask,lower_lat,upper_lat)

the variable_to_mask is an entry from the primary exports

lower_lat and upper_lat are the latitudes in degrees. Anywhere between these values the value of the variable from the primary export is used, outside the variable is undefined.


DerivedExports%%
# ---------|------------|------|
#  Export  | Name on    |      |
#  Name    |   File     | File |
# ---------|------------|------|
CO_CH4nbeu regionmask(CO_CH4,CO_regionMask;3,9) %y4-%m2-%d2t12:00:00
CO_CH4bbbo zonemask(CO_CH4,45,90) %y4-%m2-%d2t12:00:00
%%