Useful Tools: Difference between revisions

From GEOS-5
Jump to navigation Jump to search
m First example
 
 
(23 intermediate revisions by the same user not shown)
Line 1: Line 1:
== CDO ==
== Climate Data Operators (CDO) ==


Possible example uses of CDO are:
As stated by their creators, the [https://code.zmaw.de/projects/cdo Climate Data Operators] (CDO) are:


=== Select a time from a file ===
CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data.


To extract a single (or multiple) year from a multi-step file, you can use <tt>selyear</tt>,''year'':
=== Where to find CDO ===


  <nowiki>$ cdo sinfon pchem.species.CMIP-5.1870-2097.z_91x72.nc4
==== NCCS ====
 
Users can access CDO in a variety of ways. On discover, the easiest way is to load the cdo module:
 
module load other/cdo
 
It is also included in GMAO-Baselibs-4_0_0 and higher (corresponds to Ganymed-3_0 or newer). It will be located at
 
  $BASEDIR/Linux/bin/cdo
 
once <tt>g5_modules</tt> is sourced.
 
==== NAS ====
 
At NAS, there are a couple options. First, it is available through Baselibs as above. You can also access a portable version (compiled with system gcc) at
 
/nobackup/gmao_SIteam/Utilities/bin/cdo
 
As this is completely portable, you can also copy this to a local <tt>bin</tt> directory if desired.
 
==== GSFC Desktops ====
 
If you are on a GMAO desktop and have access to <tt>/ford1</tt>, there is a version at
 
/ford1/local/EL6-64/bin/cdo
 
as well as at:
 
/ford1/share/gmao_SIteam/Utilities/bin/cdo
 
The <tt>/ford1/share</tt> version could be more bleeding-edge than the <tt>/ford1/local</tt> version as it will usually be whatever version is in the latest Baselibs tag.
 
=== Example Uses of CDO ===
 
==== Display info about a file ====
 
CDO offers many operators that provide information about a file. The most important two are '''infon''' and '''sinfon'''. Note the '''n''' at the end: that means "display variable names". If you don't provide that, you'll still get information, but it'll be for variables -1, -2, -3, etc.
 
===== Short Information =====
 
If you just want a summary of what's in the file, use '''sinfon''':
 
$ cdo '''sinfon''' stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4
  File format: netCDF4
    -1 : Institut Source  Ttype    Levels Num  Gridsize Num Dtype : Parameter name
    1 : unknown  unknown  instant      48  1    16380  1  F32  : CLCN     
    2 : unknown  unknown  instant      48  1    16380  1  F32  : CLLS     
    3 : unknown  unknown  instant      48  1    16380  1  F32  : CNVMF0   
    4 : unknown  unknown  instant      48  1    16380  1  F32  : CNVMFC   
    5 : unknown  unknown  instant      48  1    16380  1  F32  : CNVMFD   
    6 : unknown  unknown  instant      48  1    16380  1  F32  : EVAPC     
    7 : unknown  unknown  instant      48  1    16380  1  F32  : FCLD     
    8 : unknown  unknown  instant      1  2    16380  1  F32  : PHIS     
    9 : unknown  unknown  instant      48  1    16380  1  F32  : QI       
    10 : unknown  unknown  instant      48  1    16380  1  F32  : QICN     
    11 : unknown  unknown  instant      48  1    16380  1  F32  : QILS     
    12 : unknown  unknown  instant      48  1    16380  1  F32  : QL       
    13 : unknown  unknown  instant      48  1    16380  1  F32  : QLCN     
    14 : unknown  unknown  instant      48  1    16380  1  F32  : QLLS     
    15 : unknown  unknown  instant      48  1    16380  1  F32  : QR       
    16 : unknown  unknown  instant      48  1    16380  1  F32  : REVAN     
    17 : unknown  unknown  instant      48  1    16380  1  F32  : REVCN     
    18 : unknown  unknown  instant      48  1    16380  1  F32  : REVLS     
    19 : unknown  unknown  instant      48  1    16380  1  F32  : RH1       
    20 : unknown  unknown  instant      48  1    16380  1  F32  : RICE     
    21 : unknown  unknown  instant      48  1    16380  1  F32  : RLIQ     
    22 : unknown  unknown  instant      48  1    16380  1  F32  : RSUAN     
    23 : unknown  unknown  instant      48  1    16380  1  F32  : RSUCN     
    24 : unknown  unknown  instant      48  1    16380  1  F32  : RSULS     
    25 : unknown  unknown  instant      48  1    16380  1  F32  : SUBLC     
    26 : unknown  unknown  instant      48  1    16380  1  F32  : THIM     
  Grid coordinates :
    1 : lonlat      > size      : dim = 16380  nlon = 180  nlat = 91
                        lon      : first = -180  last = 178  inc = 2  degrees_east  circular
                        lat      : first = -90  last = 90  inc = 2  degrees_north
  Vertical coordinates :
    1 : pressure            hPa : 1000 975 950 925 900 875 850 825 800 775 750 725
                                    700 650 600 550 500 450 400 350 300 250 200 150
                                    100 70 50 40 30 20 10 7 5 4 3 2 1 0.699999988 0.5
                                    0.400000006 0.300000012 0.200000003 0.100000001
                                    0.0700000003 0.0500000007 0.0399999991 0.0299999993
                                    0.0199999996
    2 : surface                  : 0
  Time coordinate :  1 step
    RefTime =  2000-04-15 09:00:00  Units = minutes  Calendar = STANDARD
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  2000-04-15 09:00:00
cdo sinfon: Processed 26 variables over 1 timestep ( 0.02s )
 
===== Information and Simple Statistics =====
 
The other operator, '''infon''', provides more information and some useful statistics:
 
$ cdo infon stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4
    -1 :      Date    Time  Level Gridsize    Miss :    Minimum        Mean    Maximum : Parameter name
    1 : 2000-04-15 09:00:00    1000    16380    7283 :      0.0000  0.00055663    0.053927 : CLCN     
    2 : 2000-04-15 09:00:00    975    16380    3443 :      0.0000  0.0053316    0.19191 : CLCN     
    3 : 2000-04-15 09:00:00    950    16380    2381 :      0.0000    0.021698    0.52759 : CLCN     
    4 : 2000-04-15 09:00:00    925    16380    2008 :      0.0000    0.046928    0.62649 : CLCN     
    5 : 2000-04-15 09:00:00    900    16380    1730 :      0.0000    0.068160    0.76086 : CLCN     
    6 : 2000-04-15 09:00:00    875    16380    1488 :      0.0000    0.083106    0.78922 : CLCN     
    7 : 2000-04-15 09:00:00    850    16380    1340 :      0.0000    0.082435    0.85127 : CLCN     
    8 : 2000-04-15 09:00:00    825    16380    1216 :      0.0000    0.064390    0.79455 : CLCN     
    9 : 2000-04-15 09:00:00    800    16380    1135 :      0.0000    0.049025    0.67987 : CLCN     
    10 : 2000-04-15 09:00:00    775    16380    1024 :      0.0000    0.039010    0.71728 : CLCN     
    11 : 2000-04-15 09:00:00    750    16380    927 :      0.0000    0.033034    0.67065 : CLCN     
    12 : 2000-04-15 09:00:00    725    16380    800 :      0.0000    0.028919    0.82533 : CLCN     
    13 : 2000-04-15 09:00:00    700    16380    680 :      0.0000    0.025265    0.71002 : CLCN     
    14 : 2000-04-15 09:00:00    650    16380    161 :      0.0000    0.023027    0.60691 : CLCN     
    15 : 2000-04-15 09:00:00    600    16380      21 :      0.0000    0.029113    0.79990 : CLCN     
    16 : 2000-04-15 09:00:00    550    16380      2 :      0.0000    0.031865    0.86731 : CLCN     
    17 : 2000-04-15 09:00:00    500    16380      0 :      0.0000    0.021714    0.86754 : CLCN     
    18 : 2000-04-15 09:00:00    450    16380      0 :      0.0000    0.016821    0.80432 : CLCN     
    19 : 2000-04-15 09:00:00    400    16380      0 :      0.0000    0.017517    0.82690 : CLCN     
    20 : 2000-04-15 09:00:00    350    16380      0 :      0.0000    0.029577    0.84485 : CLCN     
    21 : 2000-04-15 09:00:00    300    16380      0 :      0.0000    0.037508    0.84544 : CLCN     
    22 : 2000-04-15 09:00:00    250    16380      0 :      0.0000    0.048294    0.80001 : CLCN     
    23 : 2000-04-15 09:00:00    200    16380      0 :      0.0000    0.053185    0.77885 : CLCN     
    24 : 2000-04-15 09:00:00    150    16380      0 :      0.0000    0.035487    0.83891 : CLCN     
    25 : 2000-04-15 09:00:00    100    16380      0 :      0.0000  0.00044752    0.19931 : CLCN     
    26 : 2000-04-15 09:00:00      70    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    27 : 2000-04-15 09:00:00      50    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    28 : 2000-04-15 09:00:00      40    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    29 : 2000-04-15 09:00:00      30    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    30 : 2000-04-15 09:00:00      20    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    31 : 2000-04-15 09:00:00      10    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    32 : 2000-04-15 09:00:00      7    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    33 : 2000-04-15 09:00:00      5    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    34 : 2000-04-15 09:00:00      4    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    35 : 2000-04-15 09:00:00      3    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    36 : 2000-04-15 09:00:00      2    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    37 : 2000-04-15 09:00:00      1    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    38 : 2000-04-15 09:00:00    0.7    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    39 : 2000-04-15 09:00:00    0.5    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    40 : 2000-04-15 09:00:00    0.4    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    41 : 2000-04-15 09:00:00    0.3    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    42 : 2000-04-15 09:00:00    0.2    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    43 : 2000-04-15 09:00:00    0.1    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    44 : 2000-04-15 09:00:00    0.07    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    45 : 2000-04-15 09:00:00    0.05    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    46 : 2000-04-15 09:00:00    0.04    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    47 : 2000-04-15 09:00:00    0.03    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    48 : 2000-04-15 09:00:00    0.02    16380      0 :      0.0000      0.0000      0.0000 : CLCN     
    49 : 2000-04-15 09:00:00    1000    16380    7283 :      0.0000    0.019079    0.97125 : CLLS     
    50 : 2000-04-15 09:00:00    975    16380    3443 :      0.0000    0.068198    0.94812 : CLLS     
    51 : 2000-04-15 09:00:00    950    16380    2381 :      0.0000    0.083635    0.93380 : CLLS     
    52 : 2000-04-15 09:00:00    925    16380    2008 :      0.0000    0.096190    0.83637 : CLLS     
...
 
As you can see, you not only get information, but statistics. It provides the number of values on the grid, <tt>Gridsize</tt>; the number of Missing Values, <tt>Miss</tt>; and the <tt>Minimum</tt>, <tt>Mean</tt>, and <tt>Maximum</tt> for each <tt>Level</tt> for each value.
 
==== Diff two files ====
 
One of the main reasons CDO is attractive is for diffing two files. With binary files, such as the old restarts, one could use <tt>cmp</tt> or <tt>diff</tt> to check for differences. However, <tt>cmp</tt> will not work on NetCDF files because while the data might be the same, the metadata surely won't:
 
$ cmp stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \
      mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4
stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \
  mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 differ: byte 55, line 4
 
 
There already is a tool for doing this with HDF5/NetCDF4 files: <tt>h5diff</tt>. However, <tt>h5diff</tt> is slow (at least the version we have):
 
$ time h5diff stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \
              mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4
attribute: <Title of </>> and <Title of </>>
37 differences found
0.116u 0.412s '''0:44.38''' 1.1% 0+0k 0+0io 0pf+0w
 
CDO, however, seems to be much more efficient at doing diff comparisons:
 
$ time cdo diffn stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \
                  mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4
  0 of 1201 records differ
cdo diffn: Processed 39344760 values from 52 variables over 2 timesteps ( 0.49s )
0.424u 0.068s '''0:04.28''' 11.2% 0+0k 0+0io 0pf+0w
 
Of course, 44 seconds v 4 seconds isn't much (and with smaller files, GPFS can sometimes 'cache' the files making it seem fast), but with c360, things get inflated. Let's diff two c360 NetCDF4 fvcore_internal_rst files:
 
$ time h5diff fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4
3.188u 6.824s '''14:57.19''' 1.1% 0+0k 400+0io 0pf+0w
$ time cdo diffn fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4
  0 of 505 records differ
cdo diffn: Processed 785376000 values from 14 variables over 2 timesteps ( 18.76s )
17.125u 1.648s '''2:33.97''' 12.1% 0+0k 0+0io 0pf+0w
 
This tells us two things, the <tt>h5diff</tt> we have to use is slow and one should develop at lower-resolution if diffing restarts is part of your process.
 
==== Behavior when two files differ ====
 
All the above examples show two files that don't differ. If they do differ, CDO provides additional useful information like:
 
          For each pair of fields the operator prints one line with the following information:
          - Date and Time
          - Level, Gridsize and number of Missing values
          - Occurrence of coefficient pairs with different signs (S)
          - Occurrence of zero values (Z)
          - Maxima of absolute difference of coefficient pairs
          - Maxima of relative difference of non-zero coefficient pairs with equal signs
          - Parameter name
 
as seen here:
 
$ cdo diffn stock-G40U-CPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4 stock-G40U-GPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4
              Date    Time  Level Gridsize    Miss : S Z  Max_Absdiff Max_Reldiff : Parameter name
    1 : 2000-04-15 00:00:00    1000    13104    5606 : F T      0.64627    0.041020 : H         
    2 : 2000-04-15 00:00:00    975    13104    3202 : F F      0.64383    0.19488 : H         
    3 : 2000-04-15 00:00:00    950    13104    2394 : F F      0.64948  0.0014438 : H         
    4 : 2000-04-15 00:00:00    925    13104    2047 : F F      0.65057  0.00083438 : H         
    5 : 2000-04-15 00:00:00    900    13104    1793 : F F      0.67102  0.00067943 : H         
    6 : 2000-04-15 00:00:00    875    13104    1555 : F F      0.66370  0.00058364 : H         
    7 : 2000-04-15 00:00:00    850    13104    1401 : F F      0.65869  0.00043395 : H         
    8 : 2000-04-15 00:00:00    825    13104    1305 : F F      0.64941  0.00036629 : H         
    9 : 2000-04-15 00:00:00    800    13104    1188 : F F      0.63110  0.00033837 : H         
...snip...
  477 : 2000-04-15 00:00:00    0.2    13104      0 : F F    0.019527    0.32925 : V         
  478 : 2000-04-15 00:00:00    0.1    13104      0 : F F    0.059674    0.77863 : V         
  479 : 2000-04-15 00:00:00    0.07    13104      0 : F F    0.057358    0.46071 : V         
  480 : 2000-04-15 00:00:00    0.05    13104      0 : T F    0.021697    0.74741 : V         
  481 : 2000-04-15 00:00:00    0.04    13104      0 : T F    0.021576    0.40354 : V         
  482 : 2000-04-15 00:00:00    0.03    13104      0 : F F    0.020691    0.34921 : V         
  483 : 2000-04-15 00:00:00    0.02    13104      0 : T F    0.020302    0.62736 : V         
  438 of 483 records differ
  389 of 483 records differ more than 0.001
cdo diffn: Processed 12658464 values from 26 variables over 2 timesteps ( 0.17s )
$ echo $?
0
 
''''NOTE'''': CDO returns a status of 0 any time it doesn't encounter an error. Thus, if two files are different, CDO will report differences, but the return status will still be 0 as cdo technically completed successfully as seen above. Thus, any tests you might have that depend on the return of cmp for binary need to be altered for NetCDF4. One possible formulation is:
 
set NUMDIFF = `cdo -s diffn | grep differ | awk '{print $1}'`
if ( $NUMDIFF == 0 ) then
  success
else
  failure
endif
 
==== Extract fields(s) from a file ====
 
===== Extract variable(s) from a file =====
 
Often, our NetCDF4 files have many variables and we only care about one. CDO allows one to extract or ''select'' one or more variables. For example, if you only want CLCN, use '''selname''':
 
$ cdo '''selname,CLCN''' mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 onlyclcn.nc4
cdo selname: Processed 786240 values from 26 variables over 1 timestep ( 0.03s )
$ cdo sinfon onlyclcn.nc4
   File format: netCDF4
   File format: netCDF4
    -1 : Institut Source  Ttype    Levels Num  Gridsize Num Dtype : Parameter name
    1 : unknown  unknown  instant      48  1    16380  1  F32  : CLCN     
  Grid coordinates :
    1 : lonlat      > size      : dim = 16380  nx = 180  ny = 91
                        lon      : first = -180  last = 178  inc = 2  degrees_east  circular
                        lat      : first = -90  last = 90  inc = 2  degrees_north
  Vertical coordinates :
    1 : pressure            hPa : 1000 975 950 925 900 875 850 825 800 775 750 725
                                    700 650 600 550 500 450 400 350 300 250 200 150
                                    100 70 50 40 30 20 10 7 5 4 3 2 1 0.699999988 0.5
                                    0.400000006 0.300000012 0.200000003 0.100000001
                                    0.0700000003 0.0500000007 0.0399999991 0.0299999993
                                    0.0199999996
  Time coordinate :  1 step
    RefTime =  2000-04-15 09:00:00  Units = minutes  Calendar = standard
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  2000-04-15 09:00:00
cdo sinfon: Processed 1 variable over 1 timestep ( 0.00s )
===== Extract time(s) from a file =====
To extract a single (or multiple) year from a multi-step file, you can use <tt>selyear</tt>,''year''. For multiple years, you can either do <tt>cdo '''selyear,1999,2000,2001'''</tt> or <tt>cdo '''selyear,1999/2001'''</tt>.
$ cdo sinfon pchem.species.CMIP-5.1870-2097.z_91x72.nc4
  <nowiki>File format: netCDF4
     -1 : Institut Source  Ttype    Levels Num  Gridsize Num Dtype : Parameter name
     -1 : Institut Source  Ttype    Levels Num  Gridsize Num Dtype : Parameter name
     1 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : OX         
     1 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : OX         
Line 17: Line 287:
     6 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : HCFC22     
     6 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : HCFC22     
     7 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : H2O         
     7 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : H2O         
  Grid coordinates :
</nowiki>  Grid coordinates :
     1 : lonlat      > size      : dim = 91  nx = 0  ny = 91
     1 : lonlat      > size      : dim = 91  nx = 0  ny = 91
                         lat      : first = -1.57079637  last = 1.57079625  inc = 0.0349065065  radians
                         lat      : first = -1.57079637  last = 1.57079625  inc = 0.0349065065  radians
Line 42: Line 312:
   1870-09-15 20:00:00  1870-10-16 06:00:00  1870-11-15 16:00:00  1870-12-16 02:00:00
   1870-09-15 20:00:00  1870-10-16 06:00:00  1870-11-15 16:00:00  1870-12-16 02:00:00
   1871-01-15 12:00:00  1871-02-14 22:00:00  1871-03-17 08:00:00  1871-04-16 18:00:00
   1871-01-15 12:00:00  1871-02-14 22:00:00  1871-03-17 08:00:00  1871-04-16 18:00:00
...snip...
...snip...
   2096-07-21 20:00:00  2096-08-21 06:00:00  2096-09-20 16:00:00  2096-10-21 02:00:00
   2096-07-21 20:00:00  2096-08-21 06:00:00  2096-09-20 16:00:00  2096-10-21 02:00:00
   2096-11-20 12:00:00  2096-12-20 22:00:00  2097-01-20 08:00:00  2097-02-19 18:00:00
   2096-11-20 12:00:00  2096-12-20 22:00:00  2097-01-20 08:00:00  2097-02-19 18:00:00
   2097-03-22 04:00:00  2097-04-21 14:00:00  2097-05-22 00:00:00  2097-06-21 10:00:00
   2097-03-22 04:00:00  2097-04-21 14:00:00  2097-05-22 00:00:00  2097-06-21 10:00:00
   2097-07-21 20:00:00  2097-08-21 06:00:00  2097-09-20 16:00:00  2097-10-21 02:00:00
   2097-07-21 20:00:00  2097-08-21 06:00:00  2097-09-20 16:00:00  2097-10-21 02:00:00
cdo sinfon: Processed 7 variables over 2736 timesteps ( 0.11s )</nowiki>
cdo sinfon: Processed 7 variables over 2736 timesteps ( 0.11s )
$ cdo '''selyear,1999,2000,2001''' pchem.species.CMIP-5.1870-2097.z_91x72.nc4 ~/test.nc4
cdo selyear: Processed 1651104 values from 7 variables over 2736 timesteps ( 0.73s )
$ cdo sinfon ~/test.nc4
  <nowiki>File format: netCDF4
    -1 : Institut Source  Ttype    Levels Num  Gridsize Num Dtype : Parameter name
    1 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : OX       
    2 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : N2O       
    3 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : CFC11     
    4 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : CFC12     
    5 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : CH4       
    6 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : HCFC22   
    7 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72  1        91  1  F32  : H2O       
</nowiki>   Grid coordinates :
    1 : lonlat      > size      : dim = 91  nx = 0  ny = 91
                        lat      : first = -1.57079637  last = 1.57079625  inc = 0.0349065065  radians
  Vertical coordinates :
    1 : generic            layer : 1.5 2.63499999 4.01500034 5.67999983 7.76499939
                                    10.4499998 13.960001 18.5400009 24.4899979 32.1749992
                                    42.0400009 54.6300011 70.5950012 90.7249985 115.995003
                                    147.565002 186.790009 235.26001 294.829987 367.649994
                                    456.169983 563.179993 691.830017 845.63501 1028.49011
                                    1246.01501 1505.02502 1812.43494 2176.09985 2604.90991
                                    3108.89014 3699.26978 4390.96533 5201.58984 6149.56494
                                    7255.78467 8543.89941 10051.4355 11825 13911.501
                                    16366.1504 19254.0977 22651.3496 26647.9004 31279.1504
                                    35625 39375 43125 46875 50625 54375 58125 61875
                                    65625 69375 73125.0156 76250 78750 81250.0156 83750.0156
                                    85750.0156 87250.0078 88750 90249.9844 91749.9844
                                    93250.0078 94750 96249.9766 97374.9766 98124.9922
                                    98875 99625
  Time coordinate :  36 steps
    RefTime =  1870-01-15 12:00:00  Units = hours  Calendar = standard
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  1999-01-14 22:00:00  1999-02-14 08:00:00  1999-03-16 18:00:00  1999-04-16 04:00:00
  1999-05-16 14:00:00  1999-06-16 00:00:00  1999-07-16 10:00:00  1999-08-15 20:00:00
  1999-09-15 06:00:00  1999-10-15 16:00:00  1999-11-15 02:00:00  1999-12-15 12:00:00
  2000-01-14 22:00:00  2000-02-14 08:00:00  2000-03-15 18:00:00  2000-04-15 04:00:00
  2000-05-15 14:00:00  2000-06-15 00:00:00  2000-07-15 10:00:00  2000-08-14 20:00:00
  2000-09-14 06:00:00  2000-10-14 16:00:00  2000-11-14 02:00:00  2000-12-14 12:00:00
  2001-01-13 22:00:00  2001-02-13 08:00:00  2001-03-15 18:00:00  2001-04-15 04:00:00
  2001-05-15 14:00:00  2001-06-15 00:00:00  2001-07-15 10:00:00  2001-08-14 20:00:00
  2001-09-14 06:00:00  2001-10-14 16:00:00  2001-11-14 02:00:00  2001-12-14 12:00:00
cdo sinfon: Processed 7 variables over 36 timesteps ( 0.00s )
 
===== Other select operators =====
CDO has many of these operators:
 
  selparam Select parameters by identifier
  delparam Delete parameters by identifier
  selcode Select parameters by code number
  delcode Delete parameters by code number
  selname Select parameters by name
  delname Delete parameters by name
  selstdname Select parameters by standard name
  sellevel Select levels
  sellevidx Select levels by index
  selgrid Select grids
  selzaxis Select z-axes
  selltype Select GRIB level types
  seltabnum Select parameter table numbers
  seltimestep Select timesteps
  seltime Select times
  selhour Select hours
  selday Select days
  selmon Select months
  selyear Select years
  selseas Select seasons
  seldate Select dates
  selsmon Select single month
  sellonlatbox Select a longitude/latitude box
  selindexbox Select an index box
 
==== Combining Operators ====
 
Often, you want to do multiple operations on a file. You could, say, do a '''selname''' and output only one variable to a file, then a '''sellevel''' on that new file to select a single level, and then a '''selyear''' on ''that'', etc.:
 
$ cdo '''selname,OX''' pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.nc4
cdo selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 9.60s )
$ cdo '''sellevel,1.5''' onlyOX.nc4 onlyOX.only1.5.nc4
cdo sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 0.76s )
$ cdo '''selyear,1999''' onlyOX.only1.5.nc4 onlyOX.only1.5.only1999.nc4
cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 0.05s )
$ cdo sinfon onlyOX.only1.5.only1999.nc4
  File format: netCDF4
    -1 : Institut Source  Ttype    Levels Num  Gridsize Num Dtype : Parameter name
    1 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      1  1        91  1  F32  : OX       
  Grid coordinates :
    1 : lonlat      > size      : dim = 91  nx = 0  ny = 91
                        lat      : first = -1.57079637  last = 1.57079625  inc = 0.0349065065  radians
  Vertical coordinates :
    1 : generic            layer : 1.5
  Time coordinate :  12 steps
    RefTime =  1870-01-15 12:00:00  Units = hours  Calendar = standard
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  1999-01-14 22:00:00  1999-02-14 08:00:00  1999-03-16 18:00:00  1999-04-16 04:00:00
  1999-05-16 14:00:00  1999-06-16 00:00:00  1999-07-16 10:00:00  1999-08-15 20:00:00
  1999-09-15 06:00:00  1999-10-15 16:00:00  1999-11-15 02:00:00  1999-12-15 12:00:00
cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s )
 
Of course, this is not only annoying, but wasteful as you are creating many temporary files. Instead, CDO allows one to "combine" or "chain" operators. This is done by using -operator:
 
  cdo -L operatorN -operatorN-1 ... -operator2 -operator1 input (output)


The -L is used because HDF5 isn't thread-safe as currently compiled. This "locks" I/O preventing an issue. We are working on trying to get CDO to work in parallel better.


== tkcvs ==
So, doing the above operator in one step:
 
$ cdo -L selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX-1.5-1999.nc4
cdo selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)".
cdo(2) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)".
cdo(3) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.18s )
cdo(2) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.18s )
cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.18s )
$ cdo sinfon onlyOX-1.5-1999.nc4
  File format: netCDF4
    -1 : Institut Source  Ttype    Levels Num  Gridsize Num Dtype : Parameter name
    1 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      1  1        91  1  F32  : OX       
  Grid coordinates :
    1 : lonlat      > size      : dim = 91  nx = 0  ny = 91
                        lat      : first = -1.57079637  last = 1.57079625  inc = 0.0349065065  radians
  Vertical coordinates :
    1 : generic            layer : 1.5
  Time coordinate :  12 steps
    RefTime =  1870-01-15 12:00:00  Units = hours  Calendar = standard
  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
  1999-01-14 22:00:00  1999-02-14 08:00:00  1999-03-16 18:00:00  1999-04-16 04:00:00
  1999-05-16 14:00:00  1999-06-16 00:00:00  1999-07-16 10:00:00  1999-08-15 20:00:00
  1999-09-15 06:00:00  1999-10-15 16:00:00  1999-11-15 02:00:00  1999-12-15 12:00:00
cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s )
 
This file is the same as the one done in three steps:
 
$ cdo diffn onlyOX.only1.5.only1999.nc4 onlyOX-1.5-1999.nc4
  0 of 12 records differ
cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 0.00s )
 
Of course, you could even do the '''diffn''' as well in the command:
 
$ cdo -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4
cdo diffn: Started child process "selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)".
cdo(2) selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)".
cdo(3) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe3.1)".
  0 of 12 records differ
cdo(4) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.13s )
cdo(3) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.13s )
cdo(2) selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.13s )
cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 3.13s )
 
Note: if you don't want the extraneous information, use '''-s''' to enable '''silent''' mode:
 
cdo -s -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4
  0 of 12 records differ
 
== TkCVS ==
 
The main website for TkCVS can be found [http://www.twobarleycorns.net/tkcvs.html here].
 
On discover, you can use TkCVS by loading the <tt>other/tkcvs-8.2.3</tt> module:
 
module load other/tkcvs-8.2.3


== ack ==
== ack ==


The main website for ack can be found [http://beyondgrep.com/ here].
If you'd like to try ack out, you can run:
  curl http://beyondgrep.com/ack-2.04-single-file > ~/bin/ack && chmod 0755 !#:3
and it will install ack for you in your local bin directory.


[[Category:SI Team]]
[[Category:SI Team]]
[[Category:Brown Bags]]
[[Category:Brown Bags]]

Latest revision as of 07:58, 2 February 2017

Climate Data Operators (CDO)

As stated by their creators, the Climate Data Operators (CDO) are:

CDO is a collection of command line Operators to manipulate and analyse Climate and NWP model Data. 

Where to find CDO

NCCS

Users can access CDO in a variety of ways. On discover, the easiest way is to load the cdo module:

module load other/cdo

It is also included in GMAO-Baselibs-4_0_0 and higher (corresponds to Ganymed-3_0 or newer). It will be located at

$BASEDIR/Linux/bin/cdo

once g5_modules is sourced.

NAS

At NAS, there are a couple options. First, it is available through Baselibs as above. You can also access a portable version (compiled with system gcc) at

/nobackup/gmao_SIteam/Utilities/bin/cdo

As this is completely portable, you can also copy this to a local bin directory if desired.

GSFC Desktops

If you are on a GMAO desktop and have access to /ford1, there is a version at

/ford1/local/EL6-64/bin/cdo

as well as at:

/ford1/share/gmao_SIteam/Utilities/bin/cdo

The /ford1/share version could be more bleeding-edge than the /ford1/local version as it will usually be whatever version is in the latest Baselibs tag.

Example Uses of CDO

Display info about a file

CDO offers many operators that provide information about a file. The most important two are infon and sinfon. Note the n at the end: that means "display variable names". If you don't provide that, you'll still get information, but it'll be for variables -1, -2, -3, etc.

Short Information

If you just want a summary of what's in the file, use sinfon:

$ cdo sinfon stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4
  File format: netCDF4
   -1 : Institut Source   Ttype    Levels Num  Gridsize Num Dtype : Parameter name
    1 : unknown  unknown  instant      48   1     16380   1  F32  : CLCN       
    2 : unknown  unknown  instant      48   1     16380   1  F32  : CLLS       
    3 : unknown  unknown  instant      48   1     16380   1  F32  : CNVMF0     
    4 : unknown  unknown  instant      48   1     16380   1  F32  : CNVMFC     
    5 : unknown  unknown  instant      48   1     16380   1  F32  : CNVMFD     
    6 : unknown  unknown  instant      48   1     16380   1  F32  : EVAPC      
    7 : unknown  unknown  instant      48   1     16380   1  F32  : FCLD       
    8 : unknown  unknown  instant       1   2     16380   1  F32  : PHIS       
    9 : unknown  unknown  instant      48   1     16380   1  F32  : QI         
   10 : unknown  unknown  instant      48   1     16380   1  F32  : QICN       
   11 : unknown  unknown  instant      48   1     16380   1  F32  : QILS       
   12 : unknown  unknown  instant      48   1     16380   1  F32  : QL         
   13 : unknown  unknown  instant      48   1     16380   1  F32  : QLCN       
   14 : unknown  unknown  instant      48   1     16380   1  F32  : QLLS       
   15 : unknown  unknown  instant      48   1     16380   1  F32  : QR         
   16 : unknown  unknown  instant      48   1     16380   1  F32  : REVAN      
   17 : unknown  unknown  instant      48   1     16380   1  F32  : REVCN      
   18 : unknown  unknown  instant      48   1     16380   1  F32  : REVLS      
   19 : unknown  unknown  instant      48   1     16380   1  F32  : RH1        
   20 : unknown  unknown  instant      48   1     16380   1  F32  : RICE       
   21 : unknown  unknown  instant      48   1     16380   1  F32  : RLIQ       
   22 : unknown  unknown  instant      48   1     16380   1  F32  : RSUAN      
   23 : unknown  unknown  instant      48   1     16380   1  F32  : RSUCN      
   24 : unknown  unknown  instant      48   1     16380   1  F32  : RSULS      
   25 : unknown  unknown  instant      48   1     16380   1  F32  : SUBLC      
   26 : unknown  unknown  instant      48   1     16380   1  F32  : THIM       
  Grid coordinates :
    1 : lonlat       > size      : dim = 16380  nlon = 180  nlat = 91
                       lon       : first = -180  last = 178  inc = 2  degrees_east  circular
                       lat       : first = -90  last = 90  inc = 2  degrees_north
  Vertical coordinates :
    1 : pressure             hPa : 1000 975 950 925 900 875 850 825 800 775 750 725 
                                   700 650 600 550 500 450 400 350 300 250 200 150 
                                   100 70 50 40 30 20 10 7 5 4 3 2 1 0.699999988 0.5 
                                   0.400000006 0.300000012 0.200000003 0.100000001 
                                   0.0700000003 0.0500000007 0.0399999991 0.0299999993 
                                   0.0199999996 
    2 : surface                  : 0 
  Time coordinate :  1 step
    RefTime =  2000-04-15 09:00:00  Units = minutes  Calendar = STANDARD
 YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
 2000-04-15 09:00:00
cdo sinfon: Processed 26 variables over 1 timestep ( 0.02s )
Information and Simple Statistics

The other operator, infon, provides more information and some useful statistics:

$ cdo infon stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4
   -1 :       Date     Time   Level Gridsize    Miss :     Minimum        Mean     Maximum : Parameter name
    1 : 2000-04-15 09:00:00    1000    16380    7283 :      0.0000  0.00055663    0.053927 : CLCN       
    2 : 2000-04-15 09:00:00     975    16380    3443 :      0.0000   0.0053316     0.19191 : CLCN       
    3 : 2000-04-15 09:00:00     950    16380    2381 :      0.0000    0.021698     0.52759 : CLCN       
    4 : 2000-04-15 09:00:00     925    16380    2008 :      0.0000    0.046928     0.62649 : CLCN       
    5 : 2000-04-15 09:00:00     900    16380    1730 :      0.0000    0.068160     0.76086 : CLCN       
    6 : 2000-04-15 09:00:00     875    16380    1488 :      0.0000    0.083106     0.78922 : CLCN       
    7 : 2000-04-15 09:00:00     850    16380    1340 :      0.0000    0.082435     0.85127 : CLCN       
    8 : 2000-04-15 09:00:00     825    16380    1216 :      0.0000    0.064390     0.79455 : CLCN       
    9 : 2000-04-15 09:00:00     800    16380    1135 :      0.0000    0.049025     0.67987 : CLCN       
   10 : 2000-04-15 09:00:00     775    16380    1024 :      0.0000    0.039010     0.71728 : CLCN       
   11 : 2000-04-15 09:00:00     750    16380     927 :      0.0000    0.033034     0.67065 : CLCN       
   12 : 2000-04-15 09:00:00     725    16380     800 :      0.0000    0.028919     0.82533 : CLCN       
   13 : 2000-04-15 09:00:00     700    16380     680 :      0.0000    0.025265     0.71002 : CLCN       
   14 : 2000-04-15 09:00:00     650    16380     161 :      0.0000    0.023027     0.60691 : CLCN       
   15 : 2000-04-15 09:00:00     600    16380      21 :      0.0000    0.029113     0.79990 : CLCN       
   16 : 2000-04-15 09:00:00     550    16380       2 :      0.0000    0.031865     0.86731 : CLCN       
   17 : 2000-04-15 09:00:00     500    16380       0 :      0.0000    0.021714     0.86754 : CLCN       
   18 : 2000-04-15 09:00:00     450    16380       0 :      0.0000    0.016821     0.80432 : CLCN       
   19 : 2000-04-15 09:00:00     400    16380       0 :      0.0000    0.017517     0.82690 : CLCN       
   20 : 2000-04-15 09:00:00     350    16380       0 :      0.0000    0.029577     0.84485 : CLCN       
   21 : 2000-04-15 09:00:00     300    16380       0 :      0.0000    0.037508     0.84544 : CLCN       
   22 : 2000-04-15 09:00:00     250    16380       0 :      0.0000    0.048294     0.80001 : CLCN       
   23 : 2000-04-15 09:00:00     200    16380       0 :      0.0000    0.053185     0.77885 : CLCN       
   24 : 2000-04-15 09:00:00     150    16380       0 :      0.0000    0.035487     0.83891 : CLCN       
   25 : 2000-04-15 09:00:00     100    16380       0 :      0.0000  0.00044752     0.19931 : CLCN       
   26 : 2000-04-15 09:00:00      70    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   27 : 2000-04-15 09:00:00      50    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   28 : 2000-04-15 09:00:00      40    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   29 : 2000-04-15 09:00:00      30    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   30 : 2000-04-15 09:00:00      20    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   31 : 2000-04-15 09:00:00      10    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   32 : 2000-04-15 09:00:00       7    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   33 : 2000-04-15 09:00:00       5    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   34 : 2000-04-15 09:00:00       4    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   35 : 2000-04-15 09:00:00       3    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   36 : 2000-04-15 09:00:00       2    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   37 : 2000-04-15 09:00:00       1    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   38 : 2000-04-15 09:00:00     0.7    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   39 : 2000-04-15 09:00:00     0.5    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   40 : 2000-04-15 09:00:00     0.4    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   41 : 2000-04-15 09:00:00     0.3    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   42 : 2000-04-15 09:00:00     0.2    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   43 : 2000-04-15 09:00:00     0.1    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   44 : 2000-04-15 09:00:00    0.07    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   45 : 2000-04-15 09:00:00    0.05    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   46 : 2000-04-15 09:00:00    0.04    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   47 : 2000-04-15 09:00:00    0.03    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   48 : 2000-04-15 09:00:00    0.02    16380       0 :      0.0000      0.0000      0.0000 : CLCN       
   49 : 2000-04-15 09:00:00    1000    16380    7283 :      0.0000    0.019079     0.97125 : CLLS       
   50 : 2000-04-15 09:00:00     975    16380    3443 :      0.0000    0.068198     0.94812 : CLLS       
   51 : 2000-04-15 09:00:00     950    16380    2381 :      0.0000    0.083635     0.93380 : CLLS       
   52 : 2000-04-15 09:00:00     925    16380    2008 :      0.0000    0.096190     0.83637 : CLLS       
...

As you can see, you not only get information, but statistics. It provides the number of values on the grid, Gridsize; the number of Missing Values, Miss; and the Minimum, Mean, and Maximum for each Level for each value.

Diff two files

One of the main reasons CDO is attractive is for diffing two files. With binary files, such as the old restarts, one could use cmp or diff to check for differences. However, cmp will not work on NetCDF files because while the data might be the same, the metadata surely won't:

$ cmp stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \
      mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4
stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \
  mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 differ: byte 55, line 4


There already is a tool for doing this with HDF5/NetCDF4 files: h5diff. However, h5diff is slow (at least the version we have):

$ time h5diff stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \
              mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4
attribute: <Title of </>> and <Title of </>>
37 differences found
0.116u 0.412s 0:44.38 1.1%	0+0k 0+0io 0pf+0w

CDO, however, seems to be much more efficient at doing diff comparisons:

$ time cdo diffn stock-G40U-Intel11-2013Jun10-1day-c48.geosgcm_moist.20000415_0900z.nc4 \
                 mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4
 0 of 1201 records differ
cdo diffn: Processed 39344760 values from 52 variables over 2 timesteps ( 0.49s )
0.424u 0.068s 0:04.28 11.2%	0+0k 0+0io 0pf+0w

Of course, 44 seconds v 4 seconds isn't much (and with smaller files, GPFS can sometimes 'cache' the files making it seem fast), but with c360, things get inflated. Let's diff two c360 NetCDF4 fvcore_internal_rst files:

$ time h5diff fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4
3.188u 6.824s 14:57.19 1.1%	0+0k 400+0io 0pf+0w
$ time cdo diffn fvcore_internal_rst.control.nc4 fvcore_internal_rst.test.nc4
  0 of 505 records differ
cdo diffn: Processed 785376000 values from 14 variables over 2 timesteps ( 18.76s )
17.125u 1.648s 2:33.97 12.1%	0+0k 0+0io 0pf+0w

This tells us two things, the h5diff we have to use is slow and one should develop at lower-resolution if diffing restarts is part of your process.

Behavior when two files differ

All the above examples show two files that don't differ. If they do differ, CDO provides additional useful information like:

          For each pair of fields the operator prints one line with the following information:
          - Date and Time
          - Level, Gridsize and number of Missing values
          - Occurrence of coefficient pairs with different signs (S)
          - Occurrence of zero values (Z)
          - Maxima of absolute difference of coefficient pairs
          - Maxima of relative difference of non-zero coefficient pairs with equal signs
          - Parameter name

as seen here:

$ cdo diffn stock-G40U-CPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4 stock-G40U-GPU-2013Jun11-6hour-144.geosgcm_prog.20000415_0000z.nc4
              Date     Time   Level Gridsize    Miss : S Z  Max_Absdiff Max_Reldiff : Parameter name
    1 : 2000-04-15 00:00:00    1000    13104    5606 : F T      0.64627    0.041020 : H          
    2 : 2000-04-15 00:00:00     975    13104    3202 : F F      0.64383     0.19488 : H          
    3 : 2000-04-15 00:00:00     950    13104    2394 : F F      0.64948   0.0014438 : H          
    4 : 2000-04-15 00:00:00     925    13104    2047 : F F      0.65057  0.00083438 : H          
    5 : 2000-04-15 00:00:00     900    13104    1793 : F F      0.67102  0.00067943 : H          
    6 : 2000-04-15 00:00:00     875    13104    1555 : F F      0.66370  0.00058364 : H          
    7 : 2000-04-15 00:00:00     850    13104    1401 : F F      0.65869  0.00043395 : H          
    8 : 2000-04-15 00:00:00     825    13104    1305 : F F      0.64941  0.00036629 : H          
    9 : 2000-04-15 00:00:00     800    13104    1188 : F F      0.63110  0.00033837 : H          
...snip...
  477 : 2000-04-15 00:00:00     0.2    13104       0 : F F     0.019527     0.32925 : V          
  478 : 2000-04-15 00:00:00     0.1    13104       0 : F F     0.059674     0.77863 : V          
  479 : 2000-04-15 00:00:00    0.07    13104       0 : F F     0.057358     0.46071 : V          
  480 : 2000-04-15 00:00:00    0.05    13104       0 : T F     0.021697     0.74741 : V          
  481 : 2000-04-15 00:00:00    0.04    13104       0 : T F     0.021576     0.40354 : V          
  482 : 2000-04-15 00:00:00    0.03    13104       0 : F F     0.020691     0.34921 : V          
  483 : 2000-04-15 00:00:00    0.02    13104       0 : T F     0.020302     0.62736 : V          
 438 of 483 records differ
 389 of 483 records differ more than 0.001
cdo diffn: Processed 12658464 values from 26 variables over 2 timesteps ( 0.17s )
$ echo $?
0

'NOTE': CDO returns a status of 0 any time it doesn't encounter an error. Thus, if two files are different, CDO will report differences, but the return status will still be 0 as cdo technically completed successfully as seen above. Thus, any tests you might have that depend on the return of cmp for binary need to be altered for NetCDF4. One possible formulation is:

set NUMDIFF = `cdo -s diffn | grep differ | awk '{print $1}'`
if ( $NUMDIFF == 0 ) then
  success
else
  failure
endif

Extract fields(s) from a file

Extract variable(s) from a file

Often, our NetCDF4 files have many variables and we only care about one. CDO allows one to extract or select one or more variables. For example, if you only want CLCN, use selname:

$ cdo selname,CLCN mat-WW3-G40U-2013Jun10-NOWAVE-1day-c48.geosgcm_moist.20000415_0900z.nc4 onlyclcn.nc4
cdo selname: Processed 786240 values from 26 variables over 1 timestep ( 0.03s )
$ cdo sinfon onlyclcn.nc4
  File format: netCDF4
   -1 : Institut Source   Ttype    Levels Num  Gridsize Num Dtype : Parameter name
    1 : unknown  unknown  instant      48   1     16380   1  F32  : CLCN       
  Grid coordinates :
    1 : lonlat       > size      : dim = 16380  nx = 180  ny = 91
                       lon       : first = -180  last = 178  inc = 2  degrees_east  circular
                       lat       : first = -90  last = 90  inc = 2  degrees_north
  Vertical coordinates :
    1 : pressure             hPa : 1000 975 950 925 900 875 850 825 800 775 750 725 
                                   700 650 600 550 500 450 400 350 300 250 200 150 
                                   100 70 50 40 30 20 10 7 5 4 3 2 1 0.699999988 0.5 
                                   0.400000006 0.300000012 0.200000003 0.100000001 
                                   0.0700000003 0.0500000007 0.0399999991 0.0299999993 
                                   0.0199999996 
  Time coordinate :  1 step
    RefTime =  2000-04-15 09:00:00  Units = minutes  Calendar = standard
 YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
 2000-04-15 09:00:00
cdo sinfon: Processed 1 variable over 1 timestep ( 0.00s )
Extract time(s) from a file

To extract a single (or multiple) year from a multi-step file, you can use selyear,year. For multiple years, you can either do cdo selyear,1999,2000,2001 or cdo selyear,1999/2001.

$ cdo sinfon pchem.species.CMIP-5.1870-2097.z_91x72.nc4
  File format: netCDF4
    -1 : Institut Source   Ttype    Levels Num  Gridsize Num Dtype : Parameter name
     1 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : OX         
     2 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : N2O        
     3 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : CFC11      
     4 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : CFC12      
     5 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : CH4        
     6 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : HCFC22     
     7 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : H2O        
   Grid coordinates :
    1 : lonlat       > size      : dim = 91  nx = 0  ny = 91
                       lat       : first = -1.57079637  last = 1.57079625  inc = 0.0349065065  radians
  Vertical coordinates :
    1 : generic            layer : 1.5 2.63499999 4.01500034 5.67999983 7.76499939 
                                   10.4499998 13.960001 18.5400009 24.4899979 32.1749992 
                                   42.0400009 54.6300011 70.5950012 90.7249985 115.995003 
                                   147.565002 186.790009 235.26001 294.829987 367.649994 
                                   456.169983 563.179993 691.830017 845.63501 1028.49011 
                                   1246.01501 1505.02502 1812.43494 2176.09985 2604.90991 
                                   3108.89014 3699.26978 4390.96533 5201.58984 6149.56494 
                                   7255.78467 8543.89941 10051.4355 11825 13911.501 
                                   16366.1504 19254.0977 22651.3496 26647.9004 31279.1504 
                                   35625 39375 43125 46875 50625 54375 58125 61875 
                                   65625 69375 73125.0156 76250 78750 81250.0156 83750.0156 
                                   85750.0156 87250.0078 88750 90249.9844 91749.9844 
                                   93250.0078 94750 96249.9766 97374.9766 98124.9922 
                                   98875 99625 
  Time coordinate :  2736 steps
    RefTime =  1870-01-15 12:00:00  Units = hours  Calendar = standard
 YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
 1870-01-15 12:00:00  1870-02-14 22:00:00  1870-03-17 08:00:00  1870-04-16 18:00:00
 1870-05-17 04:00:00  1870-06-16 14:00:00  1870-07-17 00:00:00  1870-08-16 10:00:00
 1870-09-15 20:00:00  1870-10-16 06:00:00  1870-11-15 16:00:00  1870-12-16 02:00:00
 1871-01-15 12:00:00  1871-02-14 22:00:00  1871-03-17 08:00:00  1871-04-16 18:00:00
...snip...
 2096-07-21 20:00:00  2096-08-21 06:00:00  2096-09-20 16:00:00  2096-10-21 02:00:00
 2096-11-20 12:00:00  2096-12-20 22:00:00  2097-01-20 08:00:00  2097-02-19 18:00:00
 2097-03-22 04:00:00  2097-04-21 14:00:00  2097-05-22 00:00:00  2097-06-21 10:00:00
 2097-07-21 20:00:00  2097-08-21 06:00:00  2097-09-20 16:00:00  2097-10-21 02:00:00
cdo sinfon: Processed 7 variables over 2736 timesteps ( 0.11s )

$ cdo selyear,1999,2000,2001 pchem.species.CMIP-5.1870-2097.z_91x72.nc4 ~/test.nc4
cdo selyear: Processed 1651104 values from 7 variables over 2736 timesteps ( 0.73s )

$ cdo sinfon ~/test.nc4
  File format: netCDF4
    -1 : Institut Source   Ttype    Levels Num  Gridsize Num Dtype : Parameter name
     1 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : OX         
     2 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : N2O        
     3 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : CFC11      
     4 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : CFC12      
     5 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : CH4        
     6 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : HCFC22     
     7 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant      72   1        91   1  F32  : H2O        
   Grid coordinates :
    1 : lonlat       > size      : dim = 91  nx = 0  ny = 91
                       lat       : first = -1.57079637  last = 1.57079625  inc = 0.0349065065  radians
  Vertical coordinates :
    1 : generic            layer : 1.5 2.63499999 4.01500034 5.67999983 7.76499939 
                                   10.4499998 13.960001 18.5400009 24.4899979 32.1749992 
                                   42.0400009 54.6300011 70.5950012 90.7249985 115.995003 
                                   147.565002 186.790009 235.26001 294.829987 367.649994 
                                   456.169983 563.179993 691.830017 845.63501 1028.49011 
                                   1246.01501 1505.02502 1812.43494 2176.09985 2604.90991 
                                   3108.89014 3699.26978 4390.96533 5201.58984 6149.56494 
                                   7255.78467 8543.89941 10051.4355 11825 13911.501 
                                   16366.1504 19254.0977 22651.3496 26647.9004 31279.1504 
                                   35625 39375 43125 46875 50625 54375 58125 61875 
                                   65625 69375 73125.0156 76250 78750 81250.0156 83750.0156 
                                   85750.0156 87250.0078 88750 90249.9844 91749.9844 
                                   93250.0078 94750 96249.9766 97374.9766 98124.9922 
                                   98875 99625 
  Time coordinate :  36 steps
    RefTime =  1870-01-15 12:00:00  Units = hours  Calendar = standard
 YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
 1999-01-14 22:00:00  1999-02-14 08:00:00  1999-03-16 18:00:00  1999-04-16 04:00:00
 1999-05-16 14:00:00  1999-06-16 00:00:00  1999-07-16 10:00:00  1999-08-15 20:00:00
 1999-09-15 06:00:00  1999-10-15 16:00:00  1999-11-15 02:00:00  1999-12-15 12:00:00
 2000-01-14 22:00:00  2000-02-14 08:00:00  2000-03-15 18:00:00  2000-04-15 04:00:00
 2000-05-15 14:00:00  2000-06-15 00:00:00  2000-07-15 10:00:00  2000-08-14 20:00:00
 2000-09-14 06:00:00  2000-10-14 16:00:00  2000-11-14 02:00:00  2000-12-14 12:00:00
 2001-01-13 22:00:00  2001-02-13 08:00:00  2001-03-15 18:00:00  2001-04-15 04:00:00
 2001-05-15 14:00:00  2001-06-15 00:00:00  2001-07-15 10:00:00  2001-08-14 20:00:00
 2001-09-14 06:00:00  2001-10-14 16:00:00  2001-11-14 02:00:00  2001-12-14 12:00:00
cdo sinfon: Processed 7 variables over 36 timesteps ( 0.00s )
Other select operators

CDO has many of these operators:

 selparam	 Select parameters by identifier
 delparam	 Delete parameters by identifier
 selcode	 Select parameters by code number
 delcode	 Delete parameters by code number
 selname	 Select parameters by name
 delname	 Delete parameters by name
 selstdname	 Select parameters by standard name
 sellevel	 Select levels
 sellevidx	 Select levels by index
 selgrid	 Select grids
 selzaxis	 Select z-axes
 selltype	 Select GRIB level types
 seltabnum	 Select parameter table numbers
 seltimestep	 Select timesteps
 seltime	 Select times
 selhour	 Select hours
 selday 	 Select days
 selmon 	 Select months
 selyear	 Select years
 selseas	 Select seasons
 seldate	 Select dates
 selsmon	 Select single month
 sellonlatbox	 Select a longitude/latitude box
 selindexbox	 Select an index box

Combining Operators

Often, you want to do multiple operations on a file. You could, say, do a selname and output only one variable to a file, then a sellevel on that new file to select a single level, and then a selyear on that, etc.:

$ cdo selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.nc4
cdo selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 9.60s )
$ cdo sellevel,1.5 onlyOX.nc4 onlyOX.only1.5.nc4
cdo sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 0.76s )
$ cdo selyear,1999 onlyOX.only1.5.nc4 onlyOX.only1.5.only1999.nc4
cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 0.05s )
$ cdo sinfon onlyOX.only1.5.only1999.nc4
  File format: netCDF4
   -1 : Institut Source   Ttype    Levels Num  Gridsize Num Dtype : Parameter name
    1 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant       1   1        91   1  F32  : OX         
  Grid coordinates :
    1 : lonlat       > size      : dim = 91  nx = 0  ny = 91
                       lat       : first = -1.57079637  last = 1.57079625  inc = 0.0349065065  radians
  Vertical coordinates :
    1 : generic            layer : 1.5 
  Time coordinate :  12 steps
    RefTime =  1870-01-15 12:00:00  Units = hours  Calendar = standard
 YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
 1999-01-14 22:00:00  1999-02-14 08:00:00  1999-03-16 18:00:00  1999-04-16 04:00:00
 1999-05-16 14:00:00  1999-06-16 00:00:00  1999-07-16 10:00:00  1999-08-15 20:00:00
 1999-09-15 06:00:00  1999-10-15 16:00:00  1999-11-15 02:00:00  1999-12-15 12:00:00
cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s )

Of course, this is not only annoying, but wasteful as you are creating many temporary files. Instead, CDO allows one to "combine" or "chain" operators. This is done by using -operator:

 cdo -L operatorN -operatorN-1 ... -operator2 -operator1 input (output)

The -L is used because HDF5 isn't thread-safe as currently compiled. This "locks" I/O preventing an issue. We are working on trying to get CDO to work in parallel better.

So, doing the above operator in one step:

$ cdo -L selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX-1.5-1999.nc4
cdo selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)".
cdo(2) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)".
cdo(3) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.18s )
cdo(2) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.18s )
cdo selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.18s )
$ cdo sinfon onlyOX-1.5-1999.nc4
  File format: netCDF4
   -1 : Institut Source   Ttype    Levels Num  Gridsize Num Dtype : Parameter name
    1 : unknown  http://geos5.org/wiki/index.php?title=GEOS-5_Configuration_for_AR5 instant       1   1        91   1  F32  : OX         
  Grid coordinates :
    1 : lonlat       > size      : dim = 91  nx = 0  ny = 91
                       lat       : first = -1.57079637  last = 1.57079625  inc = 0.0349065065  radians
  Vertical coordinates :
    1 : generic            layer : 1.5 
  Time coordinate :  12 steps
    RefTime =  1870-01-15 12:00:00  Units = hours  Calendar = standard
 YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss  YYYY-MM-DD hh:mm:ss
 1999-01-14 22:00:00  1999-02-14 08:00:00  1999-03-16 18:00:00  1999-04-16 04:00:00
 1999-05-16 14:00:00  1999-06-16 00:00:00  1999-07-16 10:00:00  1999-08-15 20:00:00
 1999-09-15 06:00:00  1999-10-15 16:00:00  1999-11-15 02:00:00  1999-12-15 12:00:00
cdo sinfon: Processed 1 variable over 12 timesteps ( 0.00s )

This file is the same as the one done in three steps:

$ cdo diffn onlyOX.only1.5.only1999.nc4 onlyOX-1.5-1999.nc4
 0 of 12 records differ
cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 0.00s )

Of course, you could even do the diffn as well in the command:

$ cdo -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4
cdo diffn: Started child process "selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe1.1)".
cdo(2) selyear: Started child process "sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe2.1)".
cdo(3) sellevel: Started child process "selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 (pipe3.1)".
 0 of 12 records differ
cdo(4) selname: Processed 17926272 values from 7 variables over 2736 timesteps ( 3.13s )
cdo(3) sellevel: Processed 248976 values from 1 variable over 2736 timesteps ( 3.13s )
cdo(2) selyear: Processed 1092 values from 1 variable over 2736 timesteps ( 3.13s )
cdo diffn: Processed 2184 values from 2 variables over 24 timesteps ( 3.13s )

Note: if you don't want the extraneous information, use -s to enable silent mode:

cdo -s -L diffn -selyear,1999 -sellevel,1.5 -selname,OX pchem.species.CMIP-5.1870-2097.z_91x72.nc4 onlyOX.only1.5.only1999.nc4
 0 of 12 records differ

TkCVS

The main website for TkCVS can be found here.

On discover, you can use TkCVS by loading the other/tkcvs-8.2.3 module:

module load other/tkcvs-8.2.3

ack

The main website for ack can be found here.

If you'd like to try ack out, you can run:

 curl http://beyondgrep.com/ack-2.04-single-file > ~/bin/ack && chmod 0755 !#:3

and it will install ack for you in your local bin directory.