Using the ExtData component

Revision as of 12:49, 14 May 2015 by Bmauer (talk | contribs) (Masks)

Overview of the ExtData Component

MAPL_ExtDataGridCompMod is an internal MAPL gridded component used to fulfill imports fields in a MAPL hierarchy from netcdf files on disk. It is usually one of the three gridded components in the cap or main program of a MAPL application, the others being the root of the MAPL hierarchy it is servicing and MAPL_HistoryGridCompMod. It is instanciated and all its registered methods are run automatically by the CAP. MAPL_ExtDataGridCompMod will provide data to fields in the Import states of MAPL components that are not satisfied by a MAPL_AddConnectivity call in the MAPL hierarchy. In a MAPL application fields added to the Import state of a component are passed up the MAPL hierarchy looking for a connectivity to another component that will provide data to fill the import. If a connectivity is not found these fields will eventually reach the cap. At this point any fields that have not have their connectivity satisfied are passed to the MAPL_ExtDataGridCompMod through its Export state. MAPL_ExtDataGridCompMod is in essence a provider of last resort for Import fields that need to be filled with data. Like other components it has a run method that gets called every step in your MAPL application. What actually happens when it is run is determined by the resource file. The graphic below illustrates where ExtData fits in the MAPL application.

 

The user provides a resource file available to the MAPL_ExtDataGridCompMod GC. At its heart this resource file provides a connection between a field name and a variable name in a netcdf file on disk. The component receives a list of fields that need to be filled and parses the resource file to determine if MAPL_ExtDataGridCompMod can fill a variable of that name. We will refer to each field name-file variable combination as a primary export. Each primary export is an announcment that MAPL_ExtDataGridCompMod is capable of filling a field named A with data contained in variable B on file xyz. Note that the field name in each primary export does not need to actually be a field that needs to be filled by the model. The component only processes primary exports that are needed The resource file should be viewed as an anncountment of what MAPL_ExtDataGridCompMod can provide. In addition to simply announcing what MAPL_ExtDataGridCompMod can provide the user can specify other information such as how frequently to update the data from disk. This could be at every step, just once when starting the model run, or at a particular time each days. MAPL_ExtDataGridCompMod also allows data to be shifted and scaled.

Using ExtData requires constructing your input files in a logical way. There is one general principal to always remember. Unless the file represents a climatology you must have data that spans the time you intend to run your application. For example if you have data files with dates in 2009 on them and your application tries to run in 2001 it will crash!

ExtData Resource File

Primary Exports

The ExtData.rc file itself can have up to 3 parts. The first is the primary exports. The example below is an example of a primary export block. This may have as many entries as the user desires. Note that only the last three entries in a line are optional, even though the units are not currently used (this may change someday).

PrimaryExports::
# ---------|---------|-----|-----|------|------------|----------------------|--------|-------|---------------------------------|-----------------------------------------------------------|
#  Export  |         |     | V   |      |            |_______ Refresh ______|____ Factors ___|________ External File __________|______________________External File Time Data______________|
#  Name    | Units   | Dim | Loc | Clim |Conervative |     Time Template    | Offset | Scale | Variable |      Template        |  Reference Time    | File Frequency Unit | File Frequency |
# ---------|---------|-----|-----|------|------------|------------------|--------|-------|----------|----------------------|--------------------|---------------------|----------------|
ALBNF           NA      xy    c      N        N               0               0.0      1.0     ALBNF    myfile.%y4%m2%d2.nc4   2000-04-15T00:00:00           days               1
du001           NA      xyz   c      N        N               0               0.0      1.0     du001    /dev/null
::

The following is an explanation of each entry in a line.

Export Name - This is the actual name of the export in the application to fill
Units - Units not currently used
Dimensions - xy for 2D or xyz for 3D
Vertical location - c for center, e for edge, if 2d enter c or e but this will obviously not be used
Clim - enter Y if the file is a 12 month climatology, otherwise enter N. If you specify it is a climatology ExtData the data can be on either one file or 12 files if they are templated appropriately with one per month. Note at this time that ExtData does not support a generic climatology. This may be an option in the future.
Conservatve - Enter Y if you with any regridding done to be performed in a mass conserving fashion through a tiling file. Otherwise enter N to use the non-conervative bilinear regridding.
Refresh template - you have 3 choices
1.) Enter '-'. In this case the field will only be updated once the first time ExtData runs
2.) Enter a refresh template of the form %y4-%m2-%h2T%h2:%n2:00 to set the recurring time to update the file. The file will be updated when the evaluated template changes. For example a template of the form %y4-%m2-%d2T12:00:00 will cause the variable to be updated at the start of a new day. Note that ExtData wil use the evaluated template as the working time for reading the file and will try to interpolate to that time. So in the example of %y4-%m2-%d2T12:00:00 when the clock hits 2007-08-02T00:00:00 it will update the variable but the time it will use for reading and interpolation is 2007-08-02T12:00:00
3.) Enter '0' to update the variable at every step. ExtData will do a linear interpolation using the available times on the file. This option allows the user more sophisticated option for breaking the data up into different files. See the discussion further down about the file frequency.
Offset - This is a factor the variable will be shifted by, if you enter "none", no shifting will be performed
Scale - This is a factor the variable will be scaled by, if you enter "none", no scaling will be performed
Variable - This is the name of the variable ON THE FILE. It need not be the same as the export name. You can also set this to /dev/null to set the
File Template - this is a grads style template, if there are no tokens in the template name ExtData will assume that all the data is on one file. Note that if the data on file is at a different resolution that the application grid, the underlying I/O library ExtData uses will regrid the data to the application grid.
The next 3 keywords are optional but all 3 must be supplied if used. Also note if your data is on one file then it makes no sense to worry about these.
Reference Time - When trying to find a file to get data from, the times tried when applying the template will be some multiple of a time interval from this reference time.
File Frequency Units - The frequency units for the file, options are years, months, days, hours, minutes.
File Frequency - An integer specifying the frequency in the units given in the previous keyword.


Refresh Template

The refresh template allows quite a bit of flexibility in how the data can be distributed in the files and when the data is updated.

If you enter '-' the field will be updated only the first time ExtData runs. It will take the current model time, apply that to the file template provided and finally try to find data on the file at the current model time to fill the variable. It will not do time interpolation.

If you enter a refresh template to update at a certain time each day, month, etc ... as described above there are a few considerations. When ExtData trieds to fill the field at the time described by the template it will once again apply the current model time to the file template, and then try to find data on file at that time. It will not do time interpolation.

The most complex case is if you enter '0' for the refresh template. In this case the field gets updated every step and ExtData will try to interpolate to that time using the data on file and can interpolate between data points in different files. In order to accomplish this if you select a refresh template of '0' ExtData tries to find two times that bracket the start time the first time ExtData runs. These get updated as time advances in your application. But how does it know what time to apply to the file template if the data is spread across multiple files? This is where the idea of a file frequency and reference time comes in. The user can specify a file frequency and a reference time either with the 3 optional key words to the primary export or by letting ExtData "guess" from the file template. The idea is that starting from a reference time and the given frequency ExtData tries to find the bracketing data on the file with the time closest to the current time. If it does not find the data there it checks the next time. The user can specify a reference time and the frequency. If these are not given it tries to guess this from the file template. For example if the last token in the template is an hour then the frequency is an hour and the reference time is whatever the current data is with anything after the hour set to zero. If the file template is myfile_%y4%m2%d2_%h2.nc4 and ExtData will assume the frequency is 1 hour and the reference time is the top of the hour on whatever date you start the application. The following picture illustrates another example.  

Finally suppose you had data that was arrange in files like below with one time per file:

d591_fpit.tavg3_3d_nav_Ne.20130701_0130z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_0430z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_0730z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_1030z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_1330z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_1630z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_1930z.nc4

d591_fpit.tavg3_3d_nav_Ne.20130701_2230z.nc4

and so on . . .

If the variable PLE was contained in the files and you wanted to do continuous time interpolation you would add line like this to ExtData.rc:

PLE NA xyz e N N 0 none none d591_fpit.tavg3_3d_nav_Ne.%y4%m2%d2_%h2%n2z.nc4 2013-07-01T01:30:00 hours 3

Masks

Masks::

  1. ---------|------------|------|
  2. Export | Name on | |
  3. Name | File | File |
  4. ---------|------------|------|

CO_MASK regionMask path_to_file

Masks represent a special kind of data. Sometimes one has data that is purely integers. For example this might be different regions of the world that are tagged with a particular integer index. Just like primary entries a mask entry satisfies an import. However they are treated specially and interpolated so that the values on the application grid remain integers. There is also no facility to time interpolate this data so the file should only have one time entry. The masks are updated only once during initialization.

Derived Exports

The user can also specify derived exports. The user specifies an import to satisfy, an expression, and a refresh template. When the import is updated it is updated with the result of the expression. The expression is evaluated user the MAPL parser component and can involve any fields in the primary export section. Although it is currently not checked, the refresh template on the derived export should be the same as the primary exports in the expression. Also, the primary exports in the expression need not actually be needed to fulfill an import but can still be used in a derived expression.