Useful Tools: Difference between revisions
(18 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Climate Data Operators (CDO) == | == Climate Data Operators (CDO) == | ||
As stated by their creators, the [https://code.zmaw.de/projects/cdo | As stated by their creators, the [https://code.zmaw.de/projects/cdo Climate Data Operators] (CDO) are: | ||
CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data. | CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data. | ||
=== Where to find CDO === | |||
==== NCCS ==== | |||
Users can access CDO in a variety of ways. On discover, the easiest way is to load the cdo module: | |||
module load other/cdo | |||
It is also included in GMAO-Baselibs-4_0_0 and higher (corresponds to Ganymed-3_0 or newer). It will be located at | |||
=== Extract time(s) from a file === | $BASEDIR/Linux/bin/cdo | ||
once <tt>g5_modules</tt> is sourced. | |||
==== NAS ==== | |||
At NAS, there are a couple options. First, it is available through Baselibs as above. You can also access a portable version (compiled with system gcc) at | |||
/nobackup/gmao_SIteam/Utilities/bin/cdo | |||
As this is completely portable, you can also copy this to a local <tt>bin</tt> directory if desired. | |||
==== GSFC Desktops ==== | |||
If you are on a GMAO desktop and have access to <tt>/ford1</tt>, there is a version at | |||
/ford1/local/EL6-64/bin/cdo | |||
as well as at: | |||
/ford1/share/gmao_SIteam/Utilities/bin/cdo | |||
The <tt>/ford1/share</tt> version could be more bleeding-edge than the <tt>/ford1/local</tt> version as it will usually be whatever version is in the latest Baselibs tag. | |||
=== Example Uses of CDO === | |||
==== Display info about a file ==== | |||
CDO offers many operators that provide information about a file. The most important two are '''infon''' and '''sinfon'''. Note the '''n''' at the end: that means "display variable names". If you don't provide that, you'll still get information, but it'll be for variables -1, -2, -3, etc. | |||
===== Short Information ===== | |||
If you just want a summary of what's in the file, use '''sinfon''': | |||
$ cdo '''sinfon''' stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 | |||
File format: netCDF4 | |||
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name | |||
1 : unknown unknown instant 48 1 16380 1 F32 : CLCN | |||
2 : unknown unknown instant 48 1 16380 1 F32 : CLLS | |||
3 : unknown unknown instant 48 1 16380 1 F32 : CNVMF0 | |||
4 : unknown unknown instant 48 1 16380 1 F32 : CNVMFC | |||
5 : unknown unknown instant 48 1 16380 1 F32 : CNVMFD | |||
6 : unknown unknown instant 48 1 16380 1 F32 : EVAPC | |||
7 : unknown unknown instant 48 1 16380 1 F32 : FCLD | |||
8 : unknown unknown instant 1 2 16380 1 F32 : PHIS | |||
9 : unknown unknown instant 48 1 16380 1 F32 : QI | |||
10 : unknown unknown instant 48 1 16380 1 F32 : QICN | |||
11 : unknown unknown instant 48 1 16380 1 F32 : QILS | |||
12 : unknown unknown instant 48 1 16380 1 F32 : QL | |||
13 : unknown unknown instant 48 1 16380 1 F32 : QLCN | |||
14 : unknown unknown instant 48 1 16380 1 F32 : QLLS | |||
15 : unknown unknown instant 48 1 16380 1 F32 : QR | |||
16 : unknown unknown instant 48 1 16380 1 F32 : REVAN | |||
17 : unknown unknown instant 48 1 16380 1 F32 : REVCN | |||
18 : unknown unknown instant 48 1 16380 1 F32 : REVLS | |||
19 : unknown unknown instant 48 1 16380 1 F32 : RH1 | |||
20 : unknown unknown instant 48 1 16380 1 F32 : RICE | |||
21 : unknown unknown instant 48 1 16380 1 F32 : RLIQ | |||
22 : unknown unknown instant 48 1 16380 1 F32 : RSUAN | |||
23 : unknown unknown instant 48 1 16380 1 F32 : RSUCN | |||
24 : unknown unknown instant 48 1 16380 1 F32 : RSULS | |||
25 : unknown unknown instant 48 1 16380 1 F32 : SUBLC | |||
26 : unknown unknown instant 48 1 16380 1 F32 : THIM | |||
Grid coordinates : | |||
1 : lonlat > size : dim = 16380 nlon = 180 nlat = 91 | |||
lon : first = -180 last = 178 inc = 2 degrees_east circular | |||
lat : first = -90 last = 90 inc = 2 degrees_north | |||
Vertical coordinates : | |||
1 : pressure hPa : 1000 975 950 925 900 875 850 825 800 775 750 725 | |||
700 650 600 550 500 450 400 350 300 250 200 150 | |||
100 70 50 40 30 20 10 7 5 4 3 2 1 0.699999988 0.5 | |||
0.400000006 0.300000012 0.200000003 0.100000001 | |||
0.0700000003 0.0500000007 0.0399999991 0.0299999993 | |||
0.0199999996 | |||
2 : surface : 0 | |||
Time coordinate : 1 step | |||
RefTime = 2000-04-15 09:00:00 Units = minutes Calendar = STANDARD | |||
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss | |||
2000-04-15 09:00:00 | |||
cdo sinfon: Processed 26 variables over 1 timestep ( 0.02s ) | |||
===== Information and Simple Statistics ===== | |||
The other operator, '''infon''', provides more information and some useful statistics: | |||
$ cdo infon stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 | |||
-1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name | |||
1 : 2000-04-15 09:00:00 1000 16380 7283 : 0.0000 0.00055663 0.053927 : CLCN | |||
2 : 2000-04-15 09:00:00 975 16380 3443 : 0.0000 0.0053316 0.19191 : CLCN | |||
3 : 2000-04-15 09:00:00 950 16380 2381 : 0.0000 0.021698 0.52759 : CLCN | |||
4 : 2000-04-15 09:00:00 925 16380 2008 : 0.0000 0.046928 0.62649 : CLCN | |||
5 : 2000-04-15 09:00:00 900 16380 1730 : 0.0000 0.068160 0.76086 : CLCN | |||
6 : 2000-04-15 09:00:00 875 16380 1488 : 0.0000 0.083106 0.78922 : CLCN | |||
7 : 2000-04-15 09:00:00 850 16380 1340 : 0.0000 0.082435 0.85127 : CLCN | |||
8 : 2000-04-15 09:00:00 825 16380 1216 : 0.0000 0.064390 0.79455 : CLCN | |||
9 : 2000-04-15 09:00:00 800 16380 1135 : 0.0000 0.049025 0.67987 : CLCN | |||
10 : 2000-04-15 09:00:00 775 16380 1024 : 0.0000 0.039010 0.71728 : CLCN | |||
11 : 2000-04-15 09:00:00 750 16380 927 : 0.0000 0.033034 0.67065 : CLCN | |||
12 : 2000-04-15 09:00:00 725 16380 800 : 0.0000 0.028919 0.82533 : CLCN | |||
13 : 2000-04-15 09:00:00 700 16380 680 : 0.0000 0.025265 0.71002 : CLCN | |||
14 : 2000-04-15 09:00:00 650 16380 161 : 0.0000 0.023027 0.60691 : CLCN | |||
15 : 2000-04-15 09:00:00 600 16380 21 : 0.0000 0.029113 0.79990 : CLCN | |||
16 : 2000-04-15 09:00:00 550 16380 2 : 0.0000 0.031865 0.86731 : CLCN | |||
17 : 2000-04-15 09:00:00 500 16380 0 : 0.0000 0.021714 0.86754 : CLCN | |||
18 : 2000-04-15 09:00:00 450 16380 0 : 0.0000 0.016821 0.80432 : CLCN | |||
19 : 2000-04-15 09:00:00 400 16380 0 : 0.0000 0.017517 0.82690 : CLCN | |||
20 : 2000-04-15 09:00:00 350 16380 0 : 0.0000 0.029577 0.84485 : CLCN | |||
21 : 2000-04-15 09:00:00 300 16380 0 : 0.0000 0.037508 0.84544 : CLCN | |||
22 : 2000-04-15 09:00:00 250 16380 0 : 0.0000 0.048294 0.80001 : CLCN | |||
23 : 2000-04-15 09:00:00 200 16380 0 : 0.0000 0.053185 0.77885 : CLCN | |||
24 : 2000-04-15 09:00:00 150 16380 0 : 0.0000 0.035487 0.83891 : CLCN | |||
25 : 2000-04-15 09:00:00 100 16380 0 : 0.0000 0.00044752 0.19931 : CLCN | |||
26 : 2000-04-15 09:00:00 70 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
27 : 2000-04-15 09:00:00 50 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
28 : 2000-04-15 09:00:00 40 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
29 : 2000-04-15 09:00:00 30 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
30 : 2000-04-15 09:00:00 20 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
31 : 2000-04-15 09:00:00 10 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
32 : 2000-04-15 09:00:00 7 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
33 : 2000-04-15 09:00:00 5 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
34 : 2000-04-15 09:00:00 4 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
35 : 2000-04-15 09:00:00 3 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
36 : 2000-04-15 09:00:00 2 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
37 : 2000-04-15 09:00:00 1 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
38 : 2000-04-15 09:00:00 0.7 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
39 : 2000-04-15 09:00:00 0.5 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
40 : 2000-04-15 09:00:00 0.4 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
41 : 2000-04-15 09:00:00 0.3 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
42 : 2000-04-15 09:00:00 0.2 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
43 : 2000-04-15 09:00:00 0.1 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
44 : 2000-04-15 09:00:00 0.07 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
45 : 2000-04-15 09:00:00 0.05 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
46 : 2000-04-15 09:00:00 0.04 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
47 : 2000-04-15 09:00:00 0.03 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
48 : 2000-04-15 09:00:00 0.02 16380 0 : 0.0000 0.0000 0.0000 : CLCN | |||
49 : 2000-04-15 09:00:00 1000 16380 7283 : 0.0000 0.019079 0.97125 : CLLS | |||
50 : 2000-04-15 09:00:00 975 16380 3443 : 0.0000 0.068198 0.94812 : CLLS | |||
51 : 2000-04-15 09:00:00 950 16380 2381 : 0.0000 0.083635 0.93380 : CLLS | |||
52 : 2000-04-15 09:00:00 925 16380 2008 : 0.0000 0.096190 0.83637 : CLLS | |||
... | |||
As you can see, you not only get information, but statistics. It provides the number of values on the grid, <tt>Gridsize</tt>; the number of Missing Values, <tt>Miss</tt>; and the <tt>Minimum</tt>, <tt>Mean</tt>, and <tt>Maximum</tt> for each <tt>Level</tt> for each value. | |||
==== Diff two files ==== | |||
One of the main reasons CDO is attractive is for diffing two files. With binary files, such as the old restarts, one could use <tt>cmp</tt> or <tt>diff</tt> to check for differences. However, <tt>cmp</tt> will not work on NetCDF files because while the data might be the same, the metadata surely won't: | |||
$ cmp stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ | |||
mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 | |||
stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ | |||
mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 differ: byte 55, line 4 | |||
There already is a tool for doing this with HDF5/NetCDF4 files: <tt>h5diff</tt>. However, <tt>h5diff</tt> is slow (at least the version we have): | |||
$ time h5diff stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ | |||
mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 | |||
attribute: <Title of </>> and <Title of </>> | |||
37 differences found | |||
0.116u 0.412s '''0:44.38''' 1.1% 0+0k 0+0io 0pf+0w | |||
CDO, however, seems to be much more efficient at doing diff comparisons: | |||
$ time cdo diffn stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ | |||
mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 | |||
0 of 1201 records differ | |||
cdo diffn: Processed 39344760 values from 52 variables over 2 timesteps ( 0.49s ) | |||
0.424u 0.068s '''0:04.28''' 11.2% 0+0k 0+0io 0pf+0w | |||
Of course, 44 seconds v 4 seconds isn't much (and with smaller files, GPFS can sometimes 'cache' the files making it seem fast), but with c360, things get inflated. Let's diff two c360 NetCDF4 fvcore_internal_rst files: | |||
$ time h5diff fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4 | |||
3.188u 6.824s '''14:57.19''' 1.1% 0+0k 400+0io 0pf+0w | |||
$ time cdo diffn fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4 | |||
0 of 505 records differ | |||
cdo diffn: Processed 785376000 values from 14 variables over 2 timesteps ( 18.76s ) | |||
17.125u 1.648s '''2:33.97''' 12.1% 0+0k 0+0io 0pf+0w | |||
This tells us two things, the <tt>h5diff</tt> we have to use is slow and one should develop at lower-resolution if diffing restarts is part of your process. | |||
==== Behavior when two files differ ==== | |||
All the above examples show two files that don't differ. If they do differ, CDO provides additional useful information like: | |||
For each pair of fields the operator prints one line with the following information: | |||
- Date and Time | |||
- Level, Gridsize and number of Missing values | |||
- Occurrence of coefficient pairs with different signs (S) | |||
- Occurrence of zero values (Z) | |||
- Maxima of absolute difference of coefficient pairs | |||
- Maxima of relative difference of non-zero coefficient pairs with equal signs | |||
- Parameter name | |||
as seen here: | |||
$ cdo diffn stock-G40U-CPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4 stock-G40U-GPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4 | |||
Date Time Level Gridsize Miss : S Z Max_Absdiff Max_Reldiff : Parameter name | |||
1 : 2000-04-15 00:00:00 1000 13104 5606 : F T 0.64627 0.041020 : H | |||
2 : 2000-04-15 00:00:00 975 13104 3202 : F F 0.64383 0.19488 : H | |||
3 : 2000-04-15 00:00:00 950 13104 2394 : F F 0.64948 0.0014438 : H | |||
4 : 2000-04-15 00:00:00 925 13104 2047 : F F 0.65057 0.00083438 : H | |||
5 : 2000-04-15 00:00:00 900 13104 1793 : F F 0.67102 0.00067943 : H | |||
6 : 2000-04-15 00:00:00 875 13104 1555 : F F 0.66370 0.00058364 : H | |||
7 : 2000-04-15 00:00:00 850 13104 1401 : F F 0.65869 0.00043395 : H | |||
8 : 2000-04-15 00:00:00 825 13104 1305 : F F 0.64941 0.00036629 : H | |||
9 : 2000-04-15 00:00:00 800 13104 1188 : F F 0.63110 0.00033837 : H | |||
...snip... | |||
477 : 2000-04-15 00:00:00 0.2 13104 0 : F F 0.019527 0.32925 : V | |||
478 : 2000-04-15 00:00:00 0.1 13104 0 : F F 0.059674 0.77863 : V | |||
479 : 2000-04-15 00:00:00 0.07 13104 0 : F F 0.057358 0.46071 : V | |||
480 : 2000-04-15 00:00:00 0.05 13104 0 : T F 0.021697 0.74741 : V | |||
481 : 2000-04-15 00:00:00 0.04 13104 0 : T F 0.021576 0.40354 : V | |||
482 : 2000-04-15 00:00:00 0.03 13104 0 : F F 0.020691 0.34921 : V | |||
483 : 2000-04-15 00:00:00 0.02 13104 0 : T F 0.020302 0.62736 : V | |||
438 of 483 records differ | |||
389 of 483 records differ more than 0.001 | |||
cdo diffn: Processed 12658464 values from 26 variables over 2 timesteps ( 0.17s ) | |||
$ echo $? | |||
0 | |||
''''NOTE'''': CDO returns a status of 0 any time it doesn't encounter an error. Thus, if two files are different, CDO will report differences, but the return status will still be 0 as cdo technically completed successfully as seen above. Thus, any tests you might have that depend on the return of cmp for binary need to be altered for NetCDF4. One possible formulation is: | |||
set NUMDIFF = `cdo -s diffn | grep differ | awk '{print $1}'` | |||
if ( $NUMDIFF == 0 ) then | |||
success | |||
else | |||
failure | |||
endif | |||
==== Extract fields(s) from a file ==== | |||
===== Extract variable(s) from a file ===== | |||
Often, our NetCDF4 files have many variables and we only care about one. CDO allows one to extract or ''select'' one or more variables. For example, if you only want CLCN, use '''selname''': | |||
$ cdo '''selname,CLCN''' mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 onlyclcn.nc4 | |||
cdo selname: Processed 786240 values from 26 variables over 1 timestep ( 0.03s ) | |||
$ cdo sinfon onlyclcn.nc4 | |||
File format: netCDF4 | |||
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name | |||
1 : unknown unknown instant 48 1 16380 1 F32 : CLCN | |||
Grid coordinates : | |||
1 : lonlat > size : dim = 16380 nx = 180 ny = 91 | |||
lon : first = -180 last = 178 inc = 2 degrees_east circular | |||
lat : first = -90 last = 90 inc = 2 degrees_north | |||
Vertical coordinates : | |||
1 : pressure hPa : 1000 975 950 925 900 875 850 825 800 775 750 725 | |||
700 650 600 550 500 450 400 350 300 250 200 150 | |||
100 70 50 40 30 20 10 7 5 4 3 2 1 0.699999988 0.5 | |||
0.400000006 0.300000012 0.200000003 0.100000001 | |||
0.0700000003 0.0500000007 0.0399999991 0.0299999993 | |||
0.0199999996 | |||
Time coordinate : 1 step | |||
RefTime = 2000-04-15 09:00:00 Units = minutes Calendar = standard | |||
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss | |||
2000-04-15 09:00:00 | |||
cdo sinfon: Processed 1 variable over 1 timestep ( 0.00s ) | |||
===== Extract time(s) from a file ===== | |||
To extract a single (or multiple) year from a multi-step file, you can use <tt>selyear</tt>,''year''. For multiple years, you can either do <tt>cdo '''selyear,1999,2000,2001'''</tt> or <tt>cdo '''selyear,1999/2001'''</tt>. | To extract a single (or multiple) year from a multi-step file, you can use <tt>selyear</tt>,''year''. For multiple years, you can either do <tt>cdo '''selyear,1999,2000,2001'''</tt> or <tt>cdo '''selyear,1999/2001'''</tt>. | ||
Line 106: | Line 364: | ||
cdo sinfon: Processed 7 variables over 36 timesteps ( 0.00s ) | cdo sinfon: Processed 7 variables over 36 timesteps ( 0.00s ) | ||
== | ===== Other select operators ===== | ||
CDO has many of these operators: | |||
selparam Select parameters by identifier | |||
delparam Delete parameters by identifier | |||
selcode Select parameters by code number | |||
delcode Delete parameters by code number | |||
selname Select parameters by name | |||
delname Delete parameters by name | |||
selstdname Select parameters by standard name | |||
sellevel Select levels | |||
sellevidx Select levels by index | |||
selgrid Select grids | |||
selzaxis Select z-axes | |||
selltype Select GRIB level types | |||
seltabnum Select parameter table numbers | |||
seltimestep Select timesteps | |||
seltime Select times | |||
selhour Select hours | |||
selday Select days | |||
selmon Select months | |||
selyear Select years | |||
selseas Select seasons | |||
seldate Select dates | |||
selsmon Select single month | |||
sellonlatbox Select a longitude/latitude box | |||
selindexbox Select an index box | |||
==== Combining Operators ==== | |||
Often, you want to do multiple operations on a file. You could, say, do a '''selname''' and output only one variable to a file, then a '''sellevel''' on that new file to select a single level, and then a '''selyear''' on ''that'', etc.: | |||
$ cdo '''selname,OX''' pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.nc4 | |||
cdo selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 9.60s ) | |||
$ cdo '''sellevel,1.5''' onlyOX.nc4 onlyOX.only1.5.nc4 | |||
cdo sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 0.76s ) | |||
$ cdo '''selyear,1999''' onlyOX.only1.5.nc4 onlyOX.only1.5.only1999.nc4 | |||
cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 0.05s ) | |||
$ cdo sinfon onlyOX.only1.5.only1999.nc4 | |||
File format: netCDF4 | |||
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name | |||
1 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 1 1 91 1 F32 : OX | |||
Grid coordinates : | |||
1 : lonlat > size : dim = 91 nx = 0 ny = 91 | |||
lat : first = -1.57079637 last = 1.57079625 inc = 0.0349065065 radians | |||
Vertical coordinates : | |||
1 : generic layer : 1.5 | |||
Time coordinate : 12 steps | |||
RefTime = 1870-01-15 12:00:00 Units = hours Calendar = standard | |||
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss | |||
1999-01-14 22:00:00 1999-02-14 08:00:00 1999-03-16 18:00:00 1999-04-16 04:00:00 | |||
1999-05-16 14:00:00 1999-06-16 00:00:00 1999-07-16 10:00:00 1999-08-15 20:00:00 | |||
1999-09-15 06:00:00 1999-10-15 16:00:00 1999-11-15 02:00:00 1999-12-15 12:00:00 | |||
cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s ) | |||
Of course, this is not only annoying, but wasteful as you are creating many temporary files. Instead, CDO allows one to "combine" or "chain" operators. This is done by using -operator: | |||
cdo -L operatorN -operatorN-1 ... -operator2 -operator1 input (output) | |||
The -L is used because HDF5 isn't thread-safe as currently compiled. This "locks" I/O preventing an issue. We are working on trying to get CDO to work in parallel better. | |||
So, doing the above operator in one step: | |||
$ cdo -L selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX-1.5-1999.nc4 | |||
cdo selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)". | |||
cdo(2) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)". | |||
cdo(3) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.18s ) | |||
cdo(2) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.18s ) | |||
cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.18s ) | |||
$ cdo sinfon onlyOX-1.5-1999.nc4 | |||
File format: netCDF4 | |||
-1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name | |||
1 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 1 1 91 1 F32 : OX | |||
Grid coordinates : | |||
1 : lonlat > size : dim = 91 nx = 0 ny = 91 | |||
lat : first = -1.57079637 last = 1.57079625 inc = 0.0349065065 radians | |||
Vertical coordinates : | |||
1 : generic layer : 1.5 | |||
Time coordinate : 12 steps | |||
RefTime = 1870-01-15 12:00:00 Units = hours Calendar = standard | |||
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss | |||
1999-01-14 22:00:00 1999-02-14 08:00:00 1999-03-16 18:00:00 1999-04-16 04:00:00 | |||
1999-05-16 14:00:00 1999-06-16 00:00:00 1999-07-16 10:00:00 1999-08-15 20:00:00 | |||
1999-09-15 06:00:00 1999-10-15 16:00:00 1999-11-15 02:00:00 1999-12-15 12:00:00 | |||
cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s ) | |||
This file is the same as the one done in three steps: | |||
$ cdo diffn onlyOX.only1.5.only1999.nc4 onlyOX-1.5-1999.nc4 | |||
0 of 12 records differ | |||
cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 0.00s ) | |||
Of course, you could even do the '''diffn''' as well in the command: | |||
$ cdo -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4 | |||
cdo diffn: Started child process "selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)". | |||
cdo(2) selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)". | |||
cdo(3) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe3.1)". | |||
0 of 12 records differ | |||
cdo(4) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.13s ) | |||
cdo(3) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.13s ) | |||
cdo(2) selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.13s ) | |||
cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 3.13s ) | |||
Note: if you don't want the extraneous information, use '''-s''' to enable '''silent''' mode: | |||
cdo -s -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4 | |||
0 of 12 records differ | |||
== TkCVS == | |||
The main website for TkCVS can be found [http://www.twobarleycorns.net/tkcvs.html here]. | |||
On discover, you can use TkCVS by loading the <tt>other/tkcvs-8.2.3</tt> module: | |||
module load other/tkcvs-8.2.3 | |||
== ack == | == ack == | ||
The main website for ack can be found [http://beyondgrep.com/ here]. | |||
If you'd like to try ack out, you can run: | |||
curl http://beyondgrep.com/ack-2.04-single-file > ~/bin/ack && chmod 0755 !#:3 | |||
and it will install ack for you in your local bin directory. | |||
[[Category:SI Team]] | [[Category:SI Team]] | ||
[[Category:Brown Bags]] | [[Category:Brown Bags]] |
Latest revision as of 07:58, 2 February 2017
Climate Data Operators (CDO)
As stated by their creators, the Climate Data Operators (CDO) are:
CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data.
Where to find CDO
NCCS
Users can access CDO in a variety of ways. On discover, the easiest way is to load the cdo module:
module load other/cdo
It is also included in GMAO-Baselibs-4_0_0 and higher (corresponds to Ganymed-3_0 or newer). It will be located at
$BASEDIR/Linux/bin/cdo
once g5_modules is sourced.
NAS
At NAS, there are a couple options. First, it is available through Baselibs as above. You can also access a portable version (compiled with system gcc) at
/nobackup/gmao_SIteam/Utilities/bin/cdo
As this is completely portable, you can also copy this to a local bin directory if desired.
GSFC Desktops
If you are on a GMAO desktop and have access to /ford1, there is a version at
/ford1/local/EL6-64/bin/cdo
as well as at:
/ford1/share/gmao_SIteam/Utilities/bin/cdo
The /ford1/share version could be more bleeding-edge than the /ford1/local version as it will usually be whatever version is in the latest Baselibs tag.
Example Uses of CDO
Display info about a file
CDO offers many operators that provide information about a file. The most important two are infon and sinfon. Note the n at the end: that means "display variable names". If you don't provide that, you'll still get information, but it'll be for variables -1, -2, -3, etc.
Short Information
If you just want a summary of what's in the file, use sinfon:
$ cdo sinfon stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 File format: netCDF4 -1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name 1 : unknown unknown instant 48 1 16380 1 F32 : CLCN 2 : unknown unknown instant 48 1 16380 1 F32 : CLLS 3 : unknown unknown instant 48 1 16380 1 F32 : CNVMF0 4 : unknown unknown instant 48 1 16380 1 F32 : CNVMFC 5 : unknown unknown instant 48 1 16380 1 F32 : CNVMFD 6 : unknown unknown instant 48 1 16380 1 F32 : EVAPC 7 : unknown unknown instant 48 1 16380 1 F32 : FCLD 8 : unknown unknown instant 1 2 16380 1 F32 : PHIS 9 : unknown unknown instant 48 1 16380 1 F32 : QI 10 : unknown unknown instant 48 1 16380 1 F32 : QICN 11 : unknown unknown instant 48 1 16380 1 F32 : QILS 12 : unknown unknown instant 48 1 16380 1 F32 : QL 13 : unknown unknown instant 48 1 16380 1 F32 : QLCN 14 : unknown unknown instant 48 1 16380 1 F32 : QLLS 15 : unknown unknown instant 48 1 16380 1 F32 : QR 16 : unknown unknown instant 48 1 16380 1 F32 : REVAN 17 : unknown unknown instant 48 1 16380 1 F32 : REVCN 18 : unknown unknown instant 48 1 16380 1 F32 : REVLS 19 : unknown unknown instant 48 1 16380 1 F32 : RH1 20 : unknown unknown instant 48 1 16380 1 F32 : RICE 21 : unknown unknown instant 48 1 16380 1 F32 : RLIQ 22 : unknown unknown instant 48 1 16380 1 F32 : RSUAN 23 : unknown unknown instant 48 1 16380 1 F32 : RSUCN 24 : unknown unknown instant 48 1 16380 1 F32 : RSULS 25 : unknown unknown instant 48 1 16380 1 F32 : SUBLC 26 : unknown unknown instant 48 1 16380 1 F32 : THIM Grid coordinates : 1 : lonlat > size : dim = 16380 nlon = 180 nlat = 91 lon : first = -180 last = 178 inc = 2 degrees_east circular lat : first = -90 last = 90 inc = 2 degrees_north Vertical coordinates : 1 : pressure hPa : 1000 975 950 925 900 875 850 825 800 775 750 725 700 650 600 550 500 450 400 350 300 250 200 150 100 70 50 40 30 20 10 7 5 4 3 2 1 0.699999988 0.5 0.400000006 0.300000012 0.200000003 0.100000001 0.0700000003 0.0500000007 0.0399999991 0.0299999993 0.0199999996 2 : surface : 0 Time coordinate : 1 step RefTime = 2000-04-15 09:00:00 Units = minutes Calendar = STANDARD YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss 2000-04-15 09:00:00 cdo sinfon: Processed 26 variables over 1 timestep ( 0.02s )
Information and Simple Statistics
The other operator, infon, provides more information and some useful statistics:
$ cdo infon stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 -1 : Date Time Level Gridsize Miss : Minimum Mean Maximum : Parameter name 1 : 2000-04-15 09:00:00 1000 16380 7283 : 0.0000 0.00055663 0.053927 : CLCN 2 : 2000-04-15 09:00:00 975 16380 3443 : 0.0000 0.0053316 0.19191 : CLCN 3 : 2000-04-15 09:00:00 950 16380 2381 : 0.0000 0.021698 0.52759 : CLCN 4 : 2000-04-15 09:00:00 925 16380 2008 : 0.0000 0.046928 0.62649 : CLCN 5 : 2000-04-15 09:00:00 900 16380 1730 : 0.0000 0.068160 0.76086 : CLCN 6 : 2000-04-15 09:00:00 875 16380 1488 : 0.0000 0.083106 0.78922 : CLCN 7 : 2000-04-15 09:00:00 850 16380 1340 : 0.0000 0.082435 0.85127 : CLCN 8 : 2000-04-15 09:00:00 825 16380 1216 : 0.0000 0.064390 0.79455 : CLCN 9 : 2000-04-15 09:00:00 800 16380 1135 : 0.0000 0.049025 0.67987 : CLCN 10 : 2000-04-15 09:00:00 775 16380 1024 : 0.0000 0.039010 0.71728 : CLCN 11 : 2000-04-15 09:00:00 750 16380 927 : 0.0000 0.033034 0.67065 : CLCN 12 : 2000-04-15 09:00:00 725 16380 800 : 0.0000 0.028919 0.82533 : CLCN 13 : 2000-04-15 09:00:00 700 16380 680 : 0.0000 0.025265 0.71002 : CLCN 14 : 2000-04-15 09:00:00 650 16380 161 : 0.0000 0.023027 0.60691 : CLCN 15 : 2000-04-15 09:00:00 600 16380 21 : 0.0000 0.029113 0.79990 : CLCN 16 : 2000-04-15 09:00:00 550 16380 2 : 0.0000 0.031865 0.86731 : CLCN 17 : 2000-04-15 09:00:00 500 16380 0 : 0.0000 0.021714 0.86754 : CLCN 18 : 2000-04-15 09:00:00 450 16380 0 : 0.0000 0.016821 0.80432 : CLCN 19 : 2000-04-15 09:00:00 400 16380 0 : 0.0000 0.017517 0.82690 : CLCN 20 : 2000-04-15 09:00:00 350 16380 0 : 0.0000 0.029577 0.84485 : CLCN 21 : 2000-04-15 09:00:00 300 16380 0 : 0.0000 0.037508 0.84544 : CLCN 22 : 2000-04-15 09:00:00 250 16380 0 : 0.0000 0.048294 0.80001 : CLCN 23 : 2000-04-15 09:00:00 200 16380 0 : 0.0000 0.053185 0.77885 : CLCN 24 : 2000-04-15 09:00:00 150 16380 0 : 0.0000 0.035487 0.83891 : CLCN 25 : 2000-04-15 09:00:00 100 16380 0 : 0.0000 0.00044752 0.19931 : CLCN 26 : 2000-04-15 09:00:00 70 16380 0 : 0.0000 0.0000 0.0000 : CLCN 27 : 2000-04-15 09:00:00 50 16380 0 : 0.0000 0.0000 0.0000 : CLCN 28 : 2000-04-15 09:00:00 40 16380 0 : 0.0000 0.0000 0.0000 : CLCN 29 : 2000-04-15 09:00:00 30 16380 0 : 0.0000 0.0000 0.0000 : CLCN 30 : 2000-04-15 09:00:00 20 16380 0 : 0.0000 0.0000 0.0000 : CLCN 31 : 2000-04-15 09:00:00 10 16380 0 : 0.0000 0.0000 0.0000 : CLCN 32 : 2000-04-15 09:00:00 7 16380 0 : 0.0000 0.0000 0.0000 : CLCN 33 : 2000-04-15 09:00:00 5 16380 0 : 0.0000 0.0000 0.0000 : CLCN 34 : 2000-04-15 09:00:00 4 16380 0 : 0.0000 0.0000 0.0000 : CLCN 35 : 2000-04-15 09:00:00 3 16380 0 : 0.0000 0.0000 0.0000 : CLCN 36 : 2000-04-15 09:00:00 2 16380 0 : 0.0000 0.0000 0.0000 : CLCN 37 : 2000-04-15 09:00:00 1 16380 0 : 0.0000 0.0000 0.0000 : CLCN 38 : 2000-04-15 09:00:00 0.7 16380 0 : 0.0000 0.0000 0.0000 : CLCN 39 : 2000-04-15 09:00:00 0.5 16380 0 : 0.0000 0.0000 0.0000 : CLCN 40 : 2000-04-15 09:00:00 0.4 16380 0 : 0.0000 0.0000 0.0000 : CLCN 41 : 2000-04-15 09:00:00 0.3 16380 0 : 0.0000 0.0000 0.0000 : CLCN 42 : 2000-04-15 09:00:00 0.2 16380 0 : 0.0000 0.0000 0.0000 : CLCN 43 : 2000-04-15 09:00:00 0.1 16380 0 : 0.0000 0.0000 0.0000 : CLCN 44 : 2000-04-15 09:00:00 0.07 16380 0 : 0.0000 0.0000 0.0000 : CLCN 45 : 2000-04-15 09:00:00 0.05 16380 0 : 0.0000 0.0000 0.0000 : CLCN 46 : 2000-04-15 09:00:00 0.04 16380 0 : 0.0000 0.0000 0.0000 : CLCN 47 : 2000-04-15 09:00:00 0.03 16380 0 : 0.0000 0.0000 0.0000 : CLCN 48 : 2000-04-15 09:00:00 0.02 16380 0 : 0.0000 0.0000 0.0000 : CLCN 49 : 2000-04-15 09:00:00 1000 16380 7283 : 0.0000 0.019079 0.97125 : CLLS 50 : 2000-04-15 09:00:00 975 16380 3443 : 0.0000 0.068198 0.94812 : CLLS 51 : 2000-04-15 09:00:00 950 16380 2381 : 0.0000 0.083635 0.93380 : CLLS 52 : 2000-04-15 09:00:00 925 16380 2008 : 0.0000 0.096190 0.83637 : CLLS ...
As you can see, you not only get information, but statistics. It provides the number of values on the grid, Gridsize; the number of Missing Values, Miss; and the Minimum, Mean, and Maximum for each Level for each value.
Diff two files
One of the main reasons CDO is attractive is for diffing two files. With binary files, such as the old restarts, one could use cmp or diff to check for differences. However, cmp will not work on NetCDF files because while the data might be the same, the metadata surely won't:
$ cmp stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 differ: byte 55, line 4
There already is a tool for doing this with HDF5/NetCDF4 files: h5diff. However, h5diff is slow (at least the version we have):
$ time h5diff stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 attribute: <Title of </>> and <Title of </>> 37 differences found 0.116u 0.412s 0:44.38 1.1% 0+0k 0+0io 0pf+0w
CDO, however, seems to be much more efficient at doing diff comparisons:
$ time cdo diffn stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \ mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 0 of 1201 records differ cdo diffn: Processed 39344760 values from 52 variables over 2 timesteps ( 0.49s ) 0.424u 0.068s 0:04.28 11.2% 0+0k 0+0io 0pf+0w
Of course, 44 seconds v 4 seconds isn't much (and with smaller files, GPFS can sometimes 'cache' the files making it seem fast), but with c360, things get inflated. Let's diff two c360 NetCDF4 fvcore_internal_rst files:
$ time h5diff fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4 3.188u 6.824s 14:57.19 1.1% 0+0k 400+0io 0pf+0w $ time cdo diffn fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4 0 of 505 records differ cdo diffn: Processed 785376000 values from 14 variables over 2 timesteps ( 18.76s ) 17.125u 1.648s 2:33.97 12.1% 0+0k 0+0io 0pf+0w
This tells us two things, the h5diff we have to use is slow and one should develop at lower-resolution if diffing restarts is part of your process.
Behavior when two files differ
All the above examples show two files that don't differ. If they do differ, CDO provides additional useful information like:
For each pair of fields the operator prints one line with the following information: - Date and Time - Level, Gridsize and number of Missing values - Occurrence of coefficient pairs with different signs (S) - Occurrence of zero values (Z) - Maxima of absolute difference of coefficient pairs - Maxima of relative difference of non-zero coefficient pairs with equal signs - Parameter name
as seen here:
$ cdo diffn stock-G40U-CPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4 stock-G40U-GPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4 Date Time Level Gridsize Miss : S Z Max_Absdiff Max_Reldiff : Parameter name 1 : 2000-04-15 00:00:00 1000 13104 5606 : F T 0.64627 0.041020 : H 2 : 2000-04-15 00:00:00 975 13104 3202 : F F 0.64383 0.19488 : H 3 : 2000-04-15 00:00:00 950 13104 2394 : F F 0.64948 0.0014438 : H 4 : 2000-04-15 00:00:00 925 13104 2047 : F F 0.65057 0.00083438 : H 5 : 2000-04-15 00:00:00 900 13104 1793 : F F 0.67102 0.00067943 : H 6 : 2000-04-15 00:00:00 875 13104 1555 : F F 0.66370 0.00058364 : H 7 : 2000-04-15 00:00:00 850 13104 1401 : F F 0.65869 0.00043395 : H 8 : 2000-04-15 00:00:00 825 13104 1305 : F F 0.64941 0.00036629 : H 9 : 2000-04-15 00:00:00 800 13104 1188 : F F 0.63110 0.00033837 : H ...snip... 477 : 2000-04-15 00:00:00 0.2 13104 0 : F F 0.019527 0.32925 : V 478 : 2000-04-15 00:00:00 0.1 13104 0 : F F 0.059674 0.77863 : V 479 : 2000-04-15 00:00:00 0.07 13104 0 : F F 0.057358 0.46071 : V 480 : 2000-04-15 00:00:00 0.05 13104 0 : T F 0.021697 0.74741 : V 481 : 2000-04-15 00:00:00 0.04 13104 0 : T F 0.021576 0.40354 : V 482 : 2000-04-15 00:00:00 0.03 13104 0 : F F 0.020691 0.34921 : V 483 : 2000-04-15 00:00:00 0.02 13104 0 : T F 0.020302 0.62736 : V 438 of 483 records differ 389 of 483 records differ more than 0.001 cdo diffn: Processed 12658464 values from 26 variables over 2 timesteps ( 0.17s ) $ echo $? 0
'NOTE': CDO returns a status of 0 any time it doesn't encounter an error. Thus, if two files are different, CDO will report differences, but the return status will still be 0 as cdo technically completed successfully as seen above. Thus, any tests you might have that depend on the return of cmp for binary need to be altered for NetCDF4. One possible formulation is:
set NUMDIFF = `cdo -s diffn | grep differ | awk '{print $1}'` if ( $NUMDIFF == 0 ) then success else failure endif
Extract fields(s) from a file
Extract variable(s) from a file
Often, our NetCDF4 files have many variables and we only care about one. CDO allows one to extract or select one or more variables. For example, if you only want CLCN, use selname:
$ cdo selname,CLCN mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 onlyclcn.nc4 cdo selname: Processed 786240 values from 26 variables over 1 timestep ( 0.03s ) $ cdo sinfon onlyclcn.nc4 File format: netCDF4 -1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name 1 : unknown unknown instant 48 1 16380 1 F32 : CLCN Grid coordinates : 1 : lonlat > size : dim = 16380 nx = 180 ny = 91 lon : first = -180 last = 178 inc = 2 degrees_east circular lat : first = -90 last = 90 inc = 2 degrees_north Vertical coordinates : 1 : pressure hPa : 1000 975 950 925 900 875 850 825 800 775 750 725 700 650 600 550 500 450 400 350 300 250 200 150 100 70 50 40 30 20 10 7 5 4 3 2 1 0.699999988 0.5 0.400000006 0.300000012 0.200000003 0.100000001 0.0700000003 0.0500000007 0.0399999991 0.0299999993 0.0199999996 Time coordinate : 1 step RefTime = 2000-04-15 09:00:00 Units = minutes Calendar = standard YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss 2000-04-15 09:00:00 cdo sinfon: Processed 1 variable over 1 timestep ( 0.00s )
Extract time(s) from a file
To extract a single (or multiple) year from a multi-step file, you can use selyear,year. For multiple years, you can either do cdo selyear,1999,2000,2001 or cdo selyear,1999/2001.
$ cdo sinfon pchem.species.CMIP-5.1870-2097.z_91x72.nc4 File format: netCDF4 -1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name 1 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : OX 2 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : N2O 3 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : CFC11 4 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : CFC12 5 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : CH4 6 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : HCFC22 7 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : H2O Grid coordinates : 1 : lonlat > size : dim = 91 nx = 0 ny = 91 lat : first = -1.57079637 last = 1.57079625 inc = 0.0349065065 radians Vertical coordinates : 1 : generic layer : 1.5 2.63499999 4.01500034 5.67999983 7.76499939 10.4499998 13.960001 18.5400009 24.4899979 32.1749992 42.0400009 54.6300011 70.5950012 90.7249985 115.995003 147.565002 186.790009 235.26001 294.829987 367.649994 456.169983 563.179993 691.830017 845.63501 1028.49011 1246.01501 1505.02502 1812.43494 2176.09985 2604.90991 3108.89014 3699.26978 4390.96533 5201.58984 6149.56494 7255.78467 8543.89941 10051.4355 11825 13911.501 16366.1504 19254.0977 22651.3496 26647.9004 31279.1504 35625 39375 43125 46875 50625 54375 58125 61875 65625 69375 73125.0156 76250 78750 81250.0156 83750.0156 85750.0156 87250.0078 88750 90249.9844 91749.9844 93250.0078 94750 96249.9766 97374.9766 98124.9922 98875 99625 Time coordinate : 2736 steps RefTime = 1870-01-15 12:00:00 Units = hours Calendar = standard YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss 1870-01-15 12:00:00 1870-02-14 22:00:00 1870-03-17 08:00:00 1870-04-16 18:00:00 1870-05-17 04:00:00 1870-06-16 14:00:00 1870-07-17 00:00:00 1870-08-16 10:00:00 1870-09-15 20:00:00 1870-10-16 06:00:00 1870-11-15 16:00:00 1870-12-16 02:00:00 1871-01-15 12:00:00 1871-02-14 22:00:00 1871-03-17 08:00:00 1871-04-16 18:00:00 ...snip... 2096-07-21 20:00:00 2096-08-21 06:00:00 2096-09-20 16:00:00 2096-10-21 02:00:00 2096-11-20 12:00:00 2096-12-20 22:00:00 2097-01-20 08:00:00 2097-02-19 18:00:00 2097-03-22 04:00:00 2097-04-21 14:00:00 2097-05-22 00:00:00 2097-06-21 10:00:00 2097-07-21 20:00:00 2097-08-21 06:00:00 2097-09-20 16:00:00 2097-10-21 02:00:00 cdo sinfon: Processed 7 variables over 2736 timesteps ( 0.11s ) $ cdo selyear,1999,2000,2001 pchem.species.CMIP-5.1870-2097.z_91x72.nc4 ~/test.nc4 cdo selyear: Processed 1651104 values from 7 variables over 2736 timesteps ( 0.73s ) $ cdo sinfon ~/test.nc4 File format: netCDF4 -1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name 1 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : OX 2 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : N2O 3 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : CFC11 4 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : CFC12 5 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : CH4 6 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : HCFC22 7 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 72 1 91 1 F32 : H2O Grid coordinates : 1 : lonlat > size : dim = 91 nx = 0 ny = 91 lat : first = -1.57079637 last = 1.57079625 inc = 0.0349065065 radians Vertical coordinates : 1 : generic layer : 1.5 2.63499999 4.01500034 5.67999983 7.76499939 10.4499998 13.960001 18.5400009 24.4899979 32.1749992 42.0400009 54.6300011 70.5950012 90.7249985 115.995003 147.565002 186.790009 235.26001 294.829987 367.649994 456.169983 563.179993 691.830017 845.63501 1028.49011 1246.01501 1505.02502 1812.43494 2176.09985 2604.90991 3108.89014 3699.26978 4390.96533 5201.58984 6149.56494 7255.78467 8543.89941 10051.4355 11825 13911.501 16366.1504 19254.0977 22651.3496 26647.9004 31279.1504 35625 39375 43125 46875 50625 54375 58125 61875 65625 69375 73125.0156 76250 78750 81250.0156 83750.0156 85750.0156 87250.0078 88750 90249.9844 91749.9844 93250.0078 94750 96249.9766 97374.9766 98124.9922 98875 99625 Time coordinate : 36 steps RefTime = 1870-01-15 12:00:00 Units = hours Calendar = standard YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss 1999-01-14 22:00:00 1999-02-14 08:00:00 1999-03-16 18:00:00 1999-04-16 04:00:00 1999-05-16 14:00:00 1999-06-16 00:00:00 1999-07-16 10:00:00 1999-08-15 20:00:00 1999-09-15 06:00:00 1999-10-15 16:00:00 1999-11-15 02:00:00 1999-12-15 12:00:00 2000-01-14 22:00:00 2000-02-14 08:00:00 2000-03-15 18:00:00 2000-04-15 04:00:00 2000-05-15 14:00:00 2000-06-15 00:00:00 2000-07-15 10:00:00 2000-08-14 20:00:00 2000-09-14 06:00:00 2000-10-14 16:00:00 2000-11-14 02:00:00 2000-12-14 12:00:00 2001-01-13 22:00:00 2001-02-13 08:00:00 2001-03-15 18:00:00 2001-04-15 04:00:00 2001-05-15 14:00:00 2001-06-15 00:00:00 2001-07-15 10:00:00 2001-08-14 20:00:00 2001-09-14 06:00:00 2001-10-14 16:00:00 2001-11-14 02:00:00 2001-12-14 12:00:00 cdo sinfon: Processed 7 variables over 36 timesteps ( 0.00s )
Other select operators
CDO has many of these operators:
selparam Select parameters by identifier delparam Delete parameters by identifier selcode Select parameters by code number delcode Delete parameters by code number selname Select parameters by name delname Delete parameters by name selstdname Select parameters by standard name sellevel Select levels sellevidx Select levels by index selgrid Select grids selzaxis Select z-axes selltype Select GRIB level types seltabnum Select parameter table numbers seltimestep Select timesteps seltime Select times selhour Select hours selday Select days selmon Select months selyear Select years selseas Select seasons seldate Select dates selsmon Select single month sellonlatbox Select a longitude/latitude box selindexbox Select an index box
Combining Operators
Often, you want to do multiple operations on a file. You could, say, do a selname and output only one variable to a file, then a sellevel on that new file to select a single level, and then a selyear on that, etc.:
$ cdo selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.nc4 cdo selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 9.60s ) $ cdo sellevel,1.5 onlyOX.nc4 onlyOX.only1.5.nc4 cdo sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 0.76s ) $ cdo selyear,1999 onlyOX.only1.5.nc4 onlyOX.only1.5.only1999.nc4 cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 0.05s ) $ cdo sinfon onlyOX.only1.5.only1999.nc4 File format: netCDF4 -1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name 1 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 1 1 91 1 F32 : OX Grid coordinates : 1 : lonlat > size : dim = 91 nx = 0 ny = 91 lat : first = -1.57079637 last = 1.57079625 inc = 0.0349065065 radians Vertical coordinates : 1 : generic layer : 1.5 Time coordinate : 12 steps RefTime = 1870-01-15 12:00:00 Units = hours Calendar = standard YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss 1999-01-14 22:00:00 1999-02-14 08:00:00 1999-03-16 18:00:00 1999-04-16 04:00:00 1999-05-16 14:00:00 1999-06-16 00:00:00 1999-07-16 10:00:00 1999-08-15 20:00:00 1999-09-15 06:00:00 1999-10-15 16:00:00 1999-11-15 02:00:00 1999-12-15 12:00:00 cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s )
Of course, this is not only annoying, but wasteful as you are creating many temporary files. Instead, CDO allows one to "combine" or "chain" operators. This is done by using -operator:
cdo -L operatorN -operatorN-1 ... -operator2 -operator1 input (output)
The -L is used because HDF5 isn't thread-safe as currently compiled. This "locks" I/O preventing an issue. We are working on trying to get CDO to work in parallel better.
So, doing the above operator in one step:
$ cdo -L selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX-1.5-1999.nc4 cdo selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)". cdo(2) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)". cdo(3) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.18s ) cdo(2) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.18s ) cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.18s ) $ cdo sinfon onlyOX-1.5-1999.nc4 File format: netCDF4 -1 : Institut Source Ttype Levels Num Gridsize Num Dtype : Parameter name 1 : unknown http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant 1 1 91 1 F32 : OX Grid coordinates : 1 : lonlat > size : dim = 91 nx = 0 ny = 91 lat : first = -1.57079637 last = 1.57079625 inc = 0.0349065065 radians Vertical coordinates : 1 : generic layer : 1.5 Time coordinate : 12 steps RefTime = 1870-01-15 12:00:00 Units = hours Calendar = standard YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss 1999-01-14 22:00:00 1999-02-14 08:00:00 1999-03-16 18:00:00 1999-04-16 04:00:00 1999-05-16 14:00:00 1999-06-16 00:00:00 1999-07-16 10:00:00 1999-08-15 20:00:00 1999-09-15 06:00:00 1999-10-15 16:00:00 1999-11-15 02:00:00 1999-12-15 12:00:00 cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s )
This file is the same as the one done in three steps:
$ cdo diffn onlyOX.only1.5.only1999.nc4 onlyOX-1.5-1999.nc4 0 of 12 records differ cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 0.00s )
Of course, you could even do the diffn as well in the command:
$ cdo -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4 cdo diffn: Started child process "selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)". cdo(2) selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)". cdo(3) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe3.1)". 0 of 12 records differ cdo(4) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.13s ) cdo(3) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.13s ) cdo(2) selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.13s ) cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 3.13s )
Note: if you don't want the extraneous information, use -s to enable silent mode:
cdo -s -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4 0 of 12 records differ
TkCVS
The main website for TkCVS can be found here.
On discover, you can use TkCVS by loading the other/tkcvs-8.2.3 module:
module load other/tkcvs-8.2.3
ack
The main website for ack can be found here.
If you'd like to try ack out, you can run:
curl http://beyondgrep.com/ack-2.04-single-file > ~/bin/ack && chmod 0755 !#:3
and it will install ack for you in your local bin directory.