Useful Tools: Difference between revisions
(12 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Climate Data Operators (CDO) == | == Climate Data Operators (CDO) == | ||
As stated by their creators, the [https://code.zmaw.de/projects/cdo | As stated by their creators, the [https://code.zmaw.de/projects/cdo Climate Data Operators] (CDO) are: | ||
CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data. | CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data. | ||
=== Where to find CDO === | |||
==== NCCS ==== | |||
=== Display info about a file === | Users can access CDO in a variety of ways. On discover, the easiest way is to load the cdo module: | ||
module load other/cdo | |||
It is also included in GMAO-Baselibs-4_0_0 and higher (corresponds to Ganymed-3_0 or newer). It will be located at | |||
$BASEDIR/Linux/bin/cdo | |||
once <tt>g5_modules</tt> is sourced. | |||
==== NAS ==== | |||
At NAS, there are a couple options. First, it is available through Baselibs as above. You can also access a portable version (compiled with system gcc) at | |||
/nobackup/gmao_SIteam/Utilities/bin/cdo | |||
As this is completely portable, you can also copy this to a local <tt>bin</tt> directory if desired. | |||
==== GSFC Desktops ==== | |||
If you are on a GMAO desktop and have access to <tt>/ford1</tt>, there is a version at | |||
/ford1/local/EL6-64/bin/cdo | |||
as well as at: | |||
/ford1/share/gmao_SIteam/Utilities/bin/cdo | |||
The <tt>/ford1/share</tt> version could be more bleeding-edge than the <tt>/ford1/local</tt> version as it will usually be whatever version is in the latest Baselibs tag. | |||
=== Example Uses of CDO === | |||
==== Display info about a file ==== | |||
CDO offers many operators that provide information about a file. The most important two are '''infon''' and '''sinfon'''. Note the '''n''' at the end: that means "display variable names". If you don't provide that, you'll still get information, but it'll be for variables -1, -2, -3, etc. | CDO offers many operators that provide information about a file. The most important two are '''infon''' and '''sinfon'''. Note the '''n''' at the end: that means "display variable names". If you don't provide that, you'll still get information, but it'll be for variables -1, -2, -3, etc. | ||
==== Short Information ==== | ===== Short Information ===== | ||
If you just want a summary of what's in the file, use '''sinfon''': | If you just want a summary of what's in the file, use '''sinfon''': | ||
Line 64: | Line 96: | ||
cdo sinfon: Processed 26 variables over 1 timestep ( 0.02s ) | cdo sinfon: Processed 26 variables over 1 timestep ( 0.02s ) | ||
==== Information and Simple Statistics ==== | ===== Information and Simple Statistics ===== | ||
The other operator, '''infon''', provides more information and some useful statistics: | The other operator, '''infon''', provides more information and some useful statistics: | ||
Line 126: | Line 158: | ||
As you can see, you not only get information, but statistics. It provides the number of values on the grid, <tt>Gridsize</tt>; the number of Missing Values, <tt>Miss</tt>; and the <tt>Minimum</tt>, <tt>Mean</tt>, and <tt>Maximum</tt> for each <tt>Level</tt> for each value. | As you can see, you not only get information, but statistics. It provides the number of values on the grid, <tt>Gridsize</tt>; the number of Missing Values, <tt>Miss</tt>; and the <tt>Minimum</tt>, <tt>Mean</tt>, and <tt>Maximum</tt> for each <tt>Level</tt> for each value. | ||
=== Diff two files === | ==== Diff two files ==== | ||
One of the main reasons CDO is attractive is for diffing two files. With binary files, such as the old restarts, one could use <tt>cmp</tt> or <tt>diff</tt> to check for differences. However, <tt>cmp</tt> will not work on NetCDF files because while the data might be the same, the metadata surely won't: | |||
$ cmp stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ | |||
mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 | |||
stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ | |||
mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 differ: byte 55, line 4 | |||
There already is a tool for doing this with HDF5/NetCDF4 files: <tt>h5diff</tt>. However, <tt>h5diff</tt> is slow (at least the version we have): | |||
$ time h5diff stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 | $ time h5diff stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ | ||
mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 | |||
attribute: <Title of </>> and <Title of </>> | attribute: <Title of </>> and <Title of </>> | ||
37 differences found | 37 differences found | ||
0.116u 0.412s 0:44.38 1.1% 0+0k 0+0io 0pf+0w | 0.116u 0.412s '''0:44.38''' 1.1% 0+0k 0+0io 0pf+0w | ||
$ time cdo diffn stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 | |||
CDO, however, seems to be much more efficient at doing diff comparisons: | |||
$ time cdo diffn stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ | |||
mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 | |||
0 of 1201 records differ | 0 of 1201 records differ | ||
cdo diffn: Processed 39344760 values from 52 variables over 2 timesteps ( 0.49s ) | cdo diffn: Processed 39344760 values from 52 variables over 2 timesteps ( 0.49s ) | ||
0.424u 0.068s 0:04.28 11.2% 0+0k 0+0io 0pf+0w | 0.424u 0.068s '''0:04.28''' 11.2% 0+0k 0+0io 0pf+0w | ||
Of course, 44 seconds v 4 seconds isn't much (and with smaller files, GPFS can sometimes 'cache' the files making it seem fast), but with c360, things get inflated. Let's diff two c360 NetCDF4 fvcore_internal_rst files: | |||
$ time h5diff fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4 | |||
3.188u 6.824s '''14:57.19''' 1.1% 0+0k 400+0io 0pf+0w | |||
$ time cdo diffn fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4 | |||
0 of 505 records differ | |||
cdo diffn: Processed 785376000 values from 14 variables over 2 timesteps ( 18.76s ) | |||
17.125u 1.648s '''2:33.97''' 12.1% 0+0k 0+0io 0pf+0w | |||
This tells us two things, the <tt>h5diff</tt> we have to use is slow and one should develop at lower-resolution if diffing restarts is part of your process. | |||
==== Behavior when two files differ ==== | |||
All the above examples show two files that don't differ. If they do differ, CDO provides additional useful information like: | |||
For each pair of fields the operator prints one line with the following information: | |||
- Date and Time | |||
- Level, Gridsize and number of Missing values | |||
- Occurrence of coefficient pairs with different signs (S) | |||
- Occurrence of zero values (Z) | |||
- Maxima of absolute difference of coefficient pairs | |||
- Maxima of relative difference of non-zero coefficient pairs with equal signs | |||
- Parameter name | |||
as seen here: | |||
$ cdo diffn stock-G40U-CPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4 stock-G40U-GPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4 | |||
Date Time Level Gridsize Miss : S Z Max_Absdiff Max_Reldiff : Parameter name | |||
1 : 2000-04-15 00:00:00 1000 13104 5606 : F T 0.64627 0.041020 : H | |||
2 : 2000-04-15 00:00:00 975 13104 3202 : F F 0.64383 0.19488 : H | |||
3 : 2000-04-15 00:00:00 950 13104 2394 : F F 0.64948 0.0014438 : H | |||
4 : 2000-04-15 00:00:00 925 13104 2047 : F F 0.65057 0.00083438 : H | |||
5 : 2000-04-15 00:00:00 900 13104 1793 : F F 0.67102 0.00067943 : H | |||
6 : 2000-04-15 00:00:00 875 13104 1555 : F F 0.66370 0.00058364 : H | |||
7 : 2000-04-15 00:00:00 850 13104 1401 : F F 0.65869 0.00043395 : H | |||
8 : 2000-04-15 00:00:00 825 13104 1305 : F F 0.64941 0.00036629 : H | |||
9 : 2000-04-15 00:00:00 800 13104 1188 : F F 0.63110 0.00033837 : H | |||
...snip... | |||
477 : 2000-04-15 00:00:00 0.2 13104 0 : F F 0.019527 0.32925 : V | |||
478 : 2000-04-15 00:00:00 0.1 13104 0 : F F 0.059674 0.77863 : V | |||
479 : 2000-04-15 00:00:00 0.07 13104 0 : F F 0.057358 0.46071 : V | |||
480 : 2000-04-15 00:00:00 0.05 13104 0 : T F 0.021697 0.74741 : V | |||
481 : 2000-04-15 00:00:00 0.04 13104 0 : T F 0.021576 0.40354 : V | |||
482 : 2000-04-15 00:00:00 0.03 13104 0 : F F 0.020691 0.34921 : V | |||
483 : 2000-04-15 00:00:00 0.02 13104 0 : T F 0.020302 0.62736 : V | |||
438 of 483 records differ | |||
389 of 483 records differ more than 0.001 | |||
cdo diffn: Processed 12658464 values from 26 variables over 2 timesteps ( 0.17s ) | |||
$ echo $? | |||
0 | |||
''''NOTE'''': CDO returns a status of 0 any time it doesn't encounter an error. Thus, if two files are different, CDO will report differences, but the return status will still be 0 as cdo technically completed successfully as seen above. Thus, any tests you might have that depend on the return of cmp for binary need to be altered for NetCDF4. One possible formulation is: | |||
set NUMDIFF = `cdo -s diffn | grep differ | awk '{print $1}'` | |||
if ( $NUMDIFF == 0 ) then | |||
success | |||
else | |||
failure | |||
endif | |||
==== Extract fields(s) from a file ==== | |||
=== Extract | ===== Extract variable(s) from a file ===== | ||
Often, our NetCDF4 files have many variables and we only care about one. CDO allows one to extract or ''select'' one or more variables. For example, if you only want CLCN, use '''selname''': | Often, our NetCDF4 files have many variables and we only care about one. CDO allows one to extract or ''select'' one or more variables. For example, if you only want CLCN, use '''selname''': | ||
Line 168: | Line 273: | ||
cdo sinfon: Processed 1 variable over 1 timestep ( 0.00s ) | cdo sinfon: Processed 1 variable over 1 timestep ( 0.00s ) | ||
===== Extract time(s) from a file ===== | |||
=== Extract time(s) from a file === | |||
To extract a single (or multiple) year from a multi-step file, you can use <tt>selyear</tt>,''year''. For multiple years, you can either do <tt>cdo '''selyear,1999,2000,2001'''</tt> or <tt>cdo '''selyear,1999/2001'''</tt>. | To extract a single (or multiple) year from a multi-step file, you can use <tt>selyear</tt>,''year''. For multiple years, you can either do <tt>cdo '''selyear,1999,2000,2001'''</tt> or <tt>cdo '''selyear,1999/2001'''</tt>. | ||
Line 286: | Line 364: | ||
cdo sinfon: Processed 7 variables over 36 timesteps ( 0.00s ) | cdo sinfon: Processed 7 variables over 36 timesteps ( 0.00s ) | ||
== | ===== Other select operators ===== | ||
CDO has many of these operators: | |||
selparam Select parameters by identifier | |||
delparam Delete parameters by identifier | |||
selcode Select parameters by code number | |||
delcode Delete parameters by code number | |||
selname Select parameters by name | |||
delname Delete parameters by name | |||
selstdname Select parameters by standard name | |||
sellevel Select levels | |||
sellevidx Select levels by index | |||
selgrid Select grids | |||
selzaxis Select z-axes | |||
selltype Select GRIB level types | |||
seltabnum Select parameter table numbers | |||
seltimestep Select timesteps | |||
seltime Select times | |||
selhour Select hours | |||
selday Select days | |||
selmon Select months | |||
selyear Select years | |||
selseas Select seasons | |||
seldate Select dates | |||
selsmon Select single month | |||
sellonlatbox Select a longitude/latitude box | |||
selindexbox Select an index box | |||
==== Combining Operators ==== | |||
Often, you want to do multiple operations on a file. You could, say, do a '''selname''' and output only one variable to a file, then a '''sellevel''' on that new file to select a single level, and then a '''selyear''' on ''that'', etc.: | |||
$ cdo '''selname,OX''' pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.nc4 | |||
cdo selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 9.60s ) | |||
$ cdo '''sellevel,1.5''' onlyOX.nc4 onlyOX.only1.5.nc4 | |||
cdo sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 0.76s ) | |||
$ cdo '''selyear,1999''' onlyOX.only1.5.nc4 onlyOX.only1.5.only1999.nc4 | |||
cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 0.05s ) | |||
$ cdo sinfon onlyOX.only1.5.only1999.nc4 | |||
File format: netCDF4 | |||
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name | |||
1 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 1 1 91 1 F32 : OX | |||
Grid coordinates : | |||
1 : lonlat > size : dim = 91 nx = 0 ny = 91 | |||
lat : first = -1.57079637 last = 1.57079625 inc = 0.0349065065 radians | |||
Vertical coordinates : | |||
1 : generic layer : 1.5 | |||
Time coordinate : 12 steps | |||
RefTime = 1870-01-15 12:00:00 Units = hours Calendar = standard | |||
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss | |||
1999-01-14 22:00:00 1999-02-14 08:00:00 1999-03-16 18:00:00 1999-04-16 04:00:00 | |||
1999-05-16 14:00:00 1999-06-16 00:00:00 1999-07-16 10:00:00 1999-08-15 20:00:00 | |||
1999-09-15 06:00:00 1999-10-15 16:00:00 1999-11-15 02:00:00 1999-12-15 12:00:00 | |||
cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s ) | |||
Of course, this is not only annoying, but wasteful as you are creating many temporary files. Instead, CDO allows one to "combine" or "chain" operators. This is done by using -operator: | |||
cdo -L operatorN -operatorN-1 ... -operator2 -operator1 input (output) | |||
The -L is used because HDF5 isn't thread-safe as currently compiled. This "locks" I/O preventing an issue. We are working on trying to get CDO to work in parallel better. | |||
So, doing the above operator in one step: | |||
$ cdo -L selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX-1.5-1999.nc4 | |||
cdo selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)". | |||
cdo(2) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)". | |||
cdo(3) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.18s ) | |||
cdo(2) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.18s ) | |||
cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.18s ) | |||
$ cdo sinfon onlyOX-1.5-1999.nc4 | |||
File format: netCDF4 | |||
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name | |||
1 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 1 1 91 1 F32 : OX | |||
Grid coordinates : | |||
1 : lonlat > size : dim = 91 nx = 0 ny = 91 | |||
lat : first = -1.57079637 last = 1.57079625 inc = 0.0349065065 radians | |||
Vertical coordinates : | |||
1 : generic layer : 1.5 | |||
Time coordinate : 12 steps | |||
RefTime = 1870-01-15 12:00:00 Units = hours Calendar = standard | |||
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss | |||
1999-01-14 22:00:00 1999-02-14 08:00:00 1999-03-16 18:00:00 1999-04-16 04:00:00 | |||
1999-05-16 14:00:00 1999-06-16 00:00:00 1999-07-16 10:00:00 1999-08-15 20:00:00 | |||
1999-09-15 06:00:00 1999-10-15 16:00:00 1999-11-15 02:00:00 1999-12-15 12:00:00 | |||
cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s ) | |||
This file is the same as the one done in three steps: | |||
$ cdo diffn onlyOX.only1.5.only1999.nc4 onlyOX-1.5-1999.nc4 | |||
0 of 12 records differ | |||
cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 0.00s ) | |||
Of course, you could even do the '''diffn''' as well in the command: | |||
$ cdo -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4 | |||
cdo diffn: Started child process "selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)". | |||
cdo(2) selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)". | |||
cdo(3) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe3.1)". | |||
0 of 12 records differ | |||
cdo(4) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.13s ) | |||
cdo(3) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.13s ) | |||
cdo(2) selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.13s ) | |||
cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 3.13s ) | |||
Note: if you don't want the extraneous information, use '''-s''' to enable '''silent''' mode: | |||
cdo -s -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4 | |||
0 of 12 records differ | |||
== TkCVS == | |||
The main website for TkCVS can be found [http://www.twobarleycorns.net/tkcvs.html here]. | |||
On discover, you can use TkCVS by loading the <tt>other/tkcvs-8.2.3</tt> module: | |||
module load other/tkcvs-8.2.3 | |||
== ack == | == ack == | ||
The main website for ack can be found [http://beyondgrep.com/ here]. | |||
If you'd like to try ack out, you can run: | |||
curl http://beyondgrep.com/ack-2.04-single-file > ~/bin/ack && chmod 0755 !#:3 | |||
and it will install ack for you in your local bin directory. | |||
[[Category:SI Team]] | [[Category:SI Team]] | ||
[[Category:Brown Bags]] | [[Category:Brown Bags]] |