Using the ExtData component: Difference between revisions
No edit summary |
|||
Line 221: | Line 221: | ||
CO_CH4nbeu regionmask(CO_CH4,CO_regionMask;3,9) %y4-%m2-%d2t12:00:00 | CO_CH4nbeu regionmask(CO_CH4,CO_regionMask;3,9) %y4-%m2-%d2t12:00:00 | ||
CO_CH4bbbo zonemask(CO_CH4,45,90) %y4-%m2-%d2t12:00:00 | CO_CH4bbbo zonemask(CO_CH4,45,90) %y4-%m2-%d2t12:00:00 | ||
CO_CH4bbbo boxmask(CO_CH4,minLat,maxlat,minLon,maxLon) %y4-%m2-%d2t12:00:00 | |||
UVMAG U^2+V^2 0 | UVMAG U^2+V^2 0 | ||
%% | %% | ||
</pre> | </pre> |
Revision as of 11:18, 20 December 2018
Overview of the ExtData Component
MAPL_ExtDataGridCompMod is an internal MAPL gridded component used to fulfill imports fields in a MAPL hierarchy from NetCDF files containing gridded, geospatial data on disk. It is usually one of the three gridded components in the cap or main program of a MAPL application, the others being the root of the MAPL hierarchy it is servicing and MAPL_HistoryGridCompMod. It is instantiated and all its registered methods are run automatically by the CAP. MAPL_ExtDataGridCompMod will provide data to fields in the Import states of MAPL components that are not satisfied by a MAPL_AddConnectivity call in the MAPL hierarchy. In a MAPL application fields added to the Import state of a component are passed up the MAPL hierarchy looking for a connectivity to another component that will provide data to fill the import. If a connectivity is not found these fields will eventually reach the cap. At this point any fields that have not have their connectivity satisfied are passed to the MAPL_ExtDataGridCompMod through its Export state. MAPL_ExtDataGridCompMod is in essence a provider of last resort for Import fields that need to be filled with data. Like other components, it has a run method that gets called every step in your MAPL application. What actually happens when it is run is determined by the resource file. The graphic below illustrates where ExtData fits in the MAPL application.
The user provides a resource file available to the MAPL_ExtDataGridCompMod GC. At its heart this resource file provides a connection between a field name and a variable name in a NetCDF file on disk. The component receives a list of fields that need to be supplied with data and parses the resource file to determine if MAPL_ExtDataGridCompMod can supply a variable of that name. We will refer to each field name-file variable combination as a primary export of ExtData. Each primary export is an announcement that MAPL_ExtDataGridCompMod is capable of filling a field named A with data contained in variable B on file xyz. Note that the field name in each primary export does not need to actually be a field that needs to be filled by the model. The component only processes primary exports that are needed The resource file should be viewed as an announcement of what MAPL_ExtDataGridCompMod can provide, not what it necessarily will provide. In addition to simply announcing what MAPL_ExtDataGridCompMod can provide the user can specify other information such as how frequently to update the data from disk and how the data is organized. This update could be at every step, just once when starting the model run, or at a particular time each days. It also allows tremendous flexibility as to how the user chooses to organize the data files. MAPL_ExtDataGridCompMod also allows data to be shifted, scaled, and control what method is used to regrid the file to the application grid.
Using ExtData requires constructing your input files in a logical way. There is one general principal to always remember. Unless the file represents a climatology you must have data that spans the time you intend to run your application. For example if you have data files with dates in 2009 on them and your application tries to run in 2001 it will crash!
ExtData Resource File
The ExtData resource file can have 3 types of entries, a Primary Export section, a Derived Export section, and a Mask section. The file itself is processed sequentially so that it can have any number of Primary, Derived, and Mask sections. Each of these sections is a list of single line entries. Each entry describes how to supply a variable of a given name with data. The tables currently use %% instead of the standard :: identifiers because of an odd behaviour in ESMF.
Primary Exports - Before Heracles-5_4
The first table the user can define is a PrimaryExports table. Below is an example of a PrimaryExport table. This may have as many entries as the user desires. Note that only the last entry in a line is optional.
PrimaryExports%% # ---------|---------|-----|-----|------|------------ |----------------------|--------|-------|------------------------------------|-----------------------------------------------------------| # Export | | | V | | |_______ Refresh ______|____ Factors ___|________ External File _____________|______________________External File Time Data______________| # Name | Units | Dim | Loc | Clim |Conservative | Time Template | Offset | Scale | Variable | Template | Reference Time and frequency | # ---------|---------|-----|-----|------|------------ |----------------------|--------|-------|----------|-------------------------|-----------------------------------------------------------| ALBNF NA xy c N N 0 0.0 1.0 ALBNF myfile.%y4%m2%d2_%h2z.nc4 2000-04-15T00:00:00P03:00 du001 NA xyz c N N 0 0.0 1.0 du001 /dev/null %%
The following is an explanation of each entry in a line.
- Export Name - This is the actual name of the import in the application this entry will satisfy.
- Units - Units not currently used, just a placeholder for now, hopefully someday it might be used for automatic unit conversion for example.
- Dimensions - xy for 2D or xyz for 3D
- Vertical location - c for center, e for edge, if 2d enter c or e but this will obviously not be used
- Clim - enter Y if the file is a 12 month climatology, otherwise enter N. If you specify it is a climatology ExtData the data can be on either one file or 12 files if they are templated appropriately with one per month. Note at this time that ExtData does not support a generic climatology. This may be an option in the future.
- Conservative - Enter Y the data should be regridded in a mass conserving fashion through a tiling file. Otherwise enter N to use the non-conervative bilinear regridding.
- Refresh template - you have 3 choices
- 1.) Enter '-'. In this case the field will only be updated once the first time ExtData runs
- 2.) Enter a refresh template of the form %y4-%m2-%h2T%h2:%n2:00 to set the recurring time to update the file. The file will be updated when the evaluated template changes. For example a template of the form %y4-%m2-%d2T12:00:00 will cause the variable to be updated at the start of a new day. Note that ExtData will use the evaluated template as the working time for reading the file and will try to interpolate to that time. So in the example of %y4-%m2-%d2T12:00:00, when the clock hits 2007-08-02T00:00:00 it will update the variable but the time it will use for reading and interpolation is 2007-08-02T12:00:00
- 3.) Enter '0' to update the variable at every step. ExtData will do a linear interpolation to the current time using the available data.
- 4.) Enter Phr:mn where hr is a two digit hour and mn is a two digit minute or enter Pyear-mm-ddThr:mn where now year, mm, and dd are a year, month and day, this is an interval at which to update the variable from the start time of the clock used in the program.
- Offset - This is a factor the variable will be shifted by, if you enter "none", no shifting will be performed, If you do not want to shift do not put 0.0 as you will be wasting time since it will multiply by 0.0 instead of skipping the shifting.
- Scale - This is a factor the variable will be scaled by, if you enter "none", no scaling will be performed, If you do not want to scale do not put 1.0 as you will be wasting time since it will multiply by 1.0 instead of skipping the scaling.
- Variable - This is the name of the variable ON THE FILE. It need not be the same as the export name.
- File Template - this is a grads style template describing the time structure of your data if it is broken into multiple files. If there are no tokens in the template name ExtData will assume that all the data is on one file. Note that if the data on file is at a different resolution that the application grid, the underlying I/O library ExtData uses will regrid the data to the application grid. The user can enter /dev/null to simply fill the import with zero or enter /dev/null:300.0 to set the import to a non-zero constant value.
- File reference time and frequency - this keyword is optional. Also note if your data is on one file then it makes there is no point to this keyword. This entry is the time and time-interval that describes the start time and frequency of the file template you provided and has the form %y4-%m2-%d2T%h2%n2P%y4-%m2-%d2T%h2%n2 where the time before the P is a reference time and the time after the P is a time interval. Note that the year, month, and day can be left off as a unit in the time interval and it will assume that these are zero. This keyword says that the first time that the file template is good for is the reference time and there will be files using the supplied file template at the interval provided. This provides a direct way to specify if ExtData can not determine the file frequency from the template. For example if you have data every half hour to ingest, this can not be "guessed" from a file template that has a last token of a minute or you had files every 3 hours. For example, a valid entry would be 2012-01-01T21:00P03:00. This says you have a file every 3 hours from starting at 21z on 01/01/2012.
Primary Exports - Heracles-5_4 on
The first table the user can define is a PrimaryExports table. Below is an example of a PrimaryExport table. This may have as many entries as the user desires. Note that only the last entry in a line is optional.
PrimaryExports%% # ---------|---------|------|------------ |----------------------|--------|-------|------------------------------------|-----------------------------------------------------------| # Export | | | |_______ Refresh ______|____ Factors ___|________ External File _____________|______________________External File Time Data______________| # Name | Units | Clim |Conservative | Time Template | Offset | Scale | Variable | Template | Reference Time and frequency | # ---------|---------|--------------------|----------------------|--------|-------|----------|-------------------------|-----------------------------------------------------------| ALBNF NA N N 0 0.0 1.0 ALBNF myfile.%y4%m2%d2_%h2z.nc4 2000-04-15T00:00:00P03:00 du001 NA N N 0 0.0 1.0 du001 /dev/null %%
The following is an explanation of each entry in a line.
- Export Name - This is the actual name of the import in the application this entry will satisfy.
- Units - Units not currently used, just a placeholder for now, hopefully someday it might be used for automatic unit conversion for example.
- Dimensions - xy for 2D or xyz for 3D
- Vertical location - c for center, e for edge, if 2d enter c or e but this will obviously not be used
- Clim - enter Y if the file is a 12 month climatology, if you specify it is a climatology ExtData the data can be on either one file or 12 files if they are templated appropriately with one per month. Enter N if the data is not a climatological data set. Enter a year (2008 for example) if you have a years worth of data you want to use as a climatology, but is not 12 months, for example you had multiple years worth of data but wanted to use 1 particular year worth as a climatology
- Conservative - Enter Y the data should be regridded in a mass conserving fashion through a tiling file. Enter N to use the non-conervative bilinear regridding. Enter V to regrid using the conservative voting option. Enter F;integer to regrid using the fractional conservative option.
- Refresh template - you have 3 choices
- 1.) Enter '-'. In this case the field will only be updated once the first time ExtData runs
- 2.) Enter a refresh template of the form %y4-%m2-%h2T%h2:%n2:00 to set the recurring time to update the file. The file will be updated when the evaluated template changes. For example a template of the form %y4-%m2-%d2T12:00:00 will cause the variable to be updated at the start of a new day. Note that ExtData will use the evaluated template as the working time for reading the file and will try to interpolate to that time. So in the example of %y4-%m2-%d2T12:00:00, when the clock hits 2007-08-02T00:00:00 it will update the variable but the time it will use for reading and interpolation is 2007-08-02T12:00:00
- 3.) Enter '0' to update the variable at every step. ExtData will do a linear interpolation to the current time using the available data.
- 4.) Enter Phr:mn where hr is a two digit hour and mn is a two digit minute or enter Pyear-mm-ddThr:mn where now year, mm, and dd are a year, month and day, this is an interval at which to update the variable from the start time of the clock used in the program.
- Offset - This is a factor the variable will be shifted by, if you enter "none", no shifting will be performed, If you do not want to shift do not put 0.0 as you will be wasting time since it will multiply by 0.0 instead of skipping the shifting.
- Scale - This is a factor the variable will be scaled by, if you enter "none", no scaling will be performed, If you do not want to scale do not put 1.0 as you will be wasting time since it will multiply by 1.0 instead of skipping the scaling.
- Variable - This is the name of the variable ON THE FILE. It need not be the same as the export name.
- File Template - this is a grads style template describing the time structure of your data if it is broken into multiple files. If there are no tokens in the template name ExtData will assume that all the data is on one file. Note that if the data on file is at a different resolution that the application grid, the underlying I/O library ExtData uses will regrid the data to the application grid. The user can enter /dev/null to simply fill the import with zero or enter /dev/null:300.0 to set the import to a non-zero constant value.
- File reference time and frequency - this keyword is optional. Also note if your data is on one file then it makes there is no point to this keyword. This entry is the time and time-interval that describes the start time and frequency of the file template you provided and has the form %y4-%m2-%d2T%h2%n2P%y4-%m2-%d2T%h2%n2 where the time before the P is a reference time and the time after the P is a time interval. Note that the year, month, and day can be left off as a unit in the time interval and it will assume that these are zero. This keyword says that the first time that the file template is good for is the reference time and there will be files using the supplied file template at the interval provided. This provides a direct way to specify if ExtData can not determine the file frequency from the template. For example if you have data every half hour to ingest, this can not be "guessed" from a file template that has a last token of a minute or you had files every 3 hours. For example, a valid entry would be 2012-01-01T21:00P03:00. This says you have a file every 3 hours from starting at 21z on 01/01/2012.
Refresh Template
The refresh template allows quite a bit of flexibility in how the data can be distributed in the files and when the data is updated. In all cases, ExtData will try to time interpolate to the current model time if it is 0 or the time evaluated by applying the current application time to the refresh template if it is a grads style entry with tokens.
If you enter '-' the field will be updated only the first time ExtData runs. It will take the current model time and try to interpolate to that time using the available data.
If you enter a refresh template to update at a certain time each day, month, etc ... as described above there are a few considerations. When ExtData tries to fill the field at the time described by the template it will once again apply the current model time to the file template, then interpolate to that time using the available data.
If you enter '0' for the refresh template the field gets updated every step and ExtData will try to interpolate to the application time using the data on file.
ExtData always tries to interpolate to the current model time or evaluated time from the template using the data files you provide via the file template. To do the interpolation at any time ExtData must be able to find two times on the data series that bracket this time. These need not be on the same file as it can interpolate between data on different files. Also you should not have time on a file that lies outside the range indicated by the file name. For example if the yearly files so your file template looks like myfile_%y4.nc4 and you have a 2008 file, the 2008 file should not have any time in it that falls outside of 2008. This defeats the entire purpose of ExtData and will cause unpredictable behaviour. This as it is key to understanding how ExtData functions.
This bracketing data gets updated as time advances in your application. But how does it know where to find this, especially if the data is spread across multiple files? If your file_template has no grads tokens then all data you need to span the time span your application will run on needs to be in the file. More likely if you have multiple files you specify a grads style file template. When ExtData tries to find the bracketing times, it takes the time it is trying to interpolate to and applies that to the file template. It looks for data at two times that bracket the current time in this file. If it does not find either the left or right bracket time it looks in the previous file (for the left) or the next file (for the right). How does it know what the next file is. If the user does not specify a reference time and frequency it looks at the right most token and assumes that is the frequency of the files. For example if the right most token is %d2 it assumes that files are daily so to find the right bracket time, it will advance the time by one day, re-evaluate the file template, and try to look in that file to update the right bracket time. Now what if you have a file every 3 hours for example, this logic would not work. In this case you must specify a reference time and frequency using the optional keyword after the file template to tell it explicitly when the files start and how to advance the time to find the bracketing data if the data can not be found on the current file.
The following picture illustrates a simple example of how this works for a simple case where you have data on hourly files.
Finally suppose you had data that was arranged in files like below with one time per file:
d591_fpit.tavg3_3d_nav_Ne.20130701_0130z.nc4
d591_fpit.tavg3_3d_nav_Ne.20130701_0430z.nc4
d591_fpit.tavg3_3d_nav_Ne.20130701_0730z.nc4
d591_fpit.tavg3_3d_nav_Ne.20130701_1030z.nc4
d591_fpit.tavg3_3d_nav_Ne.20130701_1330z.nc4
d591_fpit.tavg3_3d_nav_Ne.20130701_1630z.nc4
d591_fpit.tavg3_3d_nav_Ne.20130701_1930z.nc4
d591_fpit.tavg3_3d_nav_Ne.20130701_2230z.nc4
and so on . . .
If the variable PLE was contained in the files and you wanted to do continuous time interpolation of PLE you would add line like this to ExtData.rc:
PLE NA xyz e N N 0 none none d591_fpit.tavg3_3d_nav_Ne.%y4%m2%d2_%h2%n2z.nc4 2013-07-01T01:30:00P03:00
Special Options
There are several special options and cases that the user should be aware of.
Time Offset in Continuous Update
If the user specifies a refresh template of 0, they can tell ExtData to apply a time offset when updating the variable. If they enter 0;0300 when updating the variable it will use an offset of +30 minutes when time interpolating to update the variable. Likewise they can enter 0;-3000 to use an offset of -30 minutes. In general you can specify 0;%h2%n2%s2 where you can leave out the hour and just do 0;%n2%s2.
Climatologies
Sometimes data represents a yearly climatology in that you have data that spans a year and you want ExtData to recycle this year of data. ExtData supports 12 month climatologies, i.e. one time per month. If the climatology keyword is set to yes you are telling ExtData that you have a file with 12 times spanning at year or 12 files, one per month. In either case ExtData does not use the application clock's year but replaces it with the year in the file for interpolation and wraps the data around at the end of the year.
Note in Heracles-5_4 onward this has been expanded so that you can have a climatological data set that is not just 12 timesteps. For example, suppose you had several years of daily files and you wanted to use one particular year. You can now specify a year in the keyword telling it to use that year for the interpolation. Essentially it replaces the clock year with the year you specify and correctly wraps back to the first piece of data in the year when you cross into the new year.
Constant Files
Occasionally the user might want to satisfy a variable with a data set that is constant (not in the sense that you want to set the import to the same value but you have some geospatial data with only one time on a file that has no tokens) and should not be time interpolated. If the user specifies "-" for the refresh template (update once), ExtData finds that the file template has no tokens, and that the file itself only has one time, ExtData will note this and update the variable once with this set of data.
Vector Variables
When importing winds into an application the user has the option to handle these specially so that the u and v components are not separately treated as scalars, but as proper vector quantities when regridding to the target grid. Currently this is only supported when going from Lat-Lon data to an application on the Cube-Sphere grid. The syntax is for this is as follows:
UC0;VC0 'm s-1' xyz C N N 0 0.0 1.0 U;V file_template
Notice the difference is that the two components of the wind are entered on one line and separated by a semi-colon, likewise the corresponding variables in the file are separated by a semi-colon. In addition to regridding to the cube, the u and v components will be moved to the stagger location and rotation (along the meridonal and zonal directions or along the cube faces) specified in the import spec, for example to import winds to the cube-sphere grid on the c-grid stagger location and rotated along the cube faces, you would define the imports as such:
call MAPL_AddImportSpec ( gc, & SHORT_NAME = 'UC0', & LONG_NAME = 'eastward_wind_on_C-Grid_after_advection', & UNITS = 'm s-1', & STAGGERING = MAPL_CGrid, & ROTATION = MAPL_RotateCube, & DIMS = MAPL_DimsHorzVert, & VLOCATION = MAPL_VLocationCenter, RC=STATUS ) VERIFY_(STATUS)
call MAPL_AddImportSpec ( gc, & SHORT_NAME = 'VC0', & LONG_NAME = 'northward_wind_on_C-Grid_before_advection', & UNITS = 'm s-1', & STAGGERING = MAPL_CGrid, & ROTATION = MAPL_RotateCube, & DIMS = MAPL_DimsHorzVert, & VLOCATION = MAPL_VLocationCenter, RC=STATUS ) VERIFY_(STATUS)
The possible staggering locations are MAPL_AGrid, MAPL_CGrid, MAPL_DGrid. The possible rotation options are MAPL_RotateLL (winds will be along meridonal and zonal directions) or MAPL_RotateCube (winds will be rotated along the cube faces).
Masks
NOTE AS OF HERACLES-5_4 THE MASK SECTION HAS BEEN ELIMINATED. YOU CAN REPLICATE THIS FUNCTIONALITY BY CHOOSING V FOR THE REGRIDDING KEYWORLD AND USING A - FOR THE REFRESH TEMPLATE.
Masks%% # ---------|------------|-------------| # Export | Name on | | # Name | File | File | # ---------|------------|-------------| CO_MASK regionMask path_to_file %%
Masks represent a special kind of data. Sometimes one has data that is purely integers. For example this might be different regions of the world that are tagged with a particular integer index. Just like primary entries a mask entry satisfies an import. However they are treated specially and interpolated so that the values on the application grid remain integers. There is also no facility to time interpolate this data so the file should only have one time entry. The masks are updated only once during initialization.
Derived Exports
The user can also specify Derived Export lists. These are import fields that are satisfied via an arithmetic expression or function using fields from the primary or mask export lists. The user specifies an import to satisfy, an expression involving mask or derived exports, and a refresh template. The import is updated with the result of the expression. The expression can be an arithmetic expression using fields from the Primary Export lists and is evaluated using the MAPL parser component. Although it is currently not checked, the refresh template on the derived export should be the same as the primary exports in the expression. Also, the primary exports in the expression need not actually be needed to fulfil an import but can still be used in a derived expression. In this case the ExtData component adds the Primary Export to the list of fields it needs to fulfil and allocates space for it. In addition to arithmetic expressions the there two functions that are currently supported for derived expressions. The two supported functions are a region mask and zone mask.
The region mask function has the following syntax:
regionmask(variable_to_mask,name_of_mask;n1,n2,...)
the variable_to_mask is en entry from the primary exports
the name_of_mask is an entry from the mask exports
following these two entries, there is a semi-colon and a comma-separated list of integers. Anywhere the mask is one of these integers it uses the value from the variable to be masked, outside these points the variable is undefined.
The zone masking function has the following syntax:
zonemask(variable_to_mask,lower_lat,upper_lat)
the variable_to_mask is an entry from the primary exports
lower_lat and upper_lat are the latitudes in degrees. Anywhere between these values the value of the variable from the primary export is used, outside the variable is undefined.
The following are some examples of derived expressions. The refresh template follows the same logic as the refresh template in the Primary Exports. DerivedExports%% # ---------|-----------------------------------|---------------------| # Import | Expression | Refresh | # Name | | Template | # ---------|-----------------------------------|---------------------| CO_CH4nbeu regionmask(CO_CH4,CO_regionMask;3,9) %y4-%m2-%d2t12:00:00 CO_CH4bbbo zonemask(CO_CH4,45,90) %y4-%m2-%d2t12:00:00 CO_CH4bbbo boxmask(CO_CH4,minLat,maxlat,minLon,maxLon) %y4-%m2-%d2t12:00:00 UVMAG U^2+V^2 0 %%