Copernicus Marine for GFTS#

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import intake
import hvplot.xarray  # noqa
# catalog_url specifies the URL for the catalog for reference data used.
catalog_url = "https://data-taos.ifremer.fr/kerchunk/ref-copernicus.yaml"

Open intake catalog#

cat = intake.open_catalog(catalog_url)

Get model data#

  • Data model is read as Xarray Dataset e.g. only metadata is loaded in memory.

chunks = {"lat": -1, "lon": -1, "depth": 11, "time": 8}

Get Temperature#

model = cat.data(type="TEM", chunks=chunks).to_dask()
model
/srv/conda/envs/notebook/lib/python3.12/site-packages/intake_xarray/base.py:21: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  'dims': dict(self._ds.dims),
<xarray.Dataset> Size: 13TB
Dimensions:  (depth: 33, lat: 1240, lon: 958, time: 41952)
Coordinates:
  * depth    (depth) float32 132B 0.0 3.0 5.0 10.0 ... 2e+03 3e+03 4e+03 5e+03
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
  * time     (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2023-10-15
Data variables:
    thetao   (time, depth, lat, lon) float64 13TB dask.array<chunksize=(8, 11, 1240, 958), meta=np.ndarray>
Attributes: (12/13)
    Conventions:          CF-1.7
    contact:              servicedesk.cmems@mercator-ocean.eu
    creation_date:        2020-09-30T00:08:59Z
    credit:               E.U. Copernicus Marine Service Information (CMEMS)
    forcing_data_source:  ECMWF Global Atmospheric Model (HRES); UKMO NATL12;...
    history:              See source and creation_date attributes
    ...                   ...
    licence:              http://marine.copernicus.eu/services-portfolio/serv...
    netcdf-version-id:    netCDF-4
    product:              NORTHWESTSHELF_ANALYSIS_FORECAST_PHY_004_013
    references:           http://marine.copernicus.eu/
    source:               PS-OS 44, AMM-FOAM 1.5 km (tidal) NEMO v3.6_WAVEWAT...
    title:                hourly-instantaneous potential temperature (3D)

Standardize metadata#

  • We change the metadata of the dataset to make it easier for end-users

Rename variables#

model = model.rename({"thetao": "TEMP"})
model
<xarray.Dataset> Size: 13TB
Dimensions:  (depth: 33, lat: 1240, lon: 958, time: 41952)
Coordinates:
  * depth    (depth) float32 132B 0.0 3.0 5.0 10.0 ... 2e+03 3e+03 4e+03 5e+03
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
  * time     (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2023-10-15
Data variables:
    TEMP     (time, depth, lat, lon) float64 13TB dask.array<chunksize=(8, 11, 1240, 958), meta=np.ndarray>
Attributes: (12/13)
    Conventions:          CF-1.7
    contact:              servicedesk.cmems@mercator-ocean.eu
    creation_date:        2020-09-30T00:08:59Z
    credit:               E.U. Copernicus Marine Service Information (CMEMS)
    forcing_data_source:  ECMWF Global Atmospheric Model (HRES); UKMO NATL12;...
    history:              See source and creation_date attributes
    ...                   ...
    licence:              http://marine.copernicus.eu/services-portfolio/serv...
    netcdf-version-id:    netCDF-4
    product:              NORTHWESTSHELF_ANALYSIS_FORECAST_PHY_004_013
    references:           http://marine.copernicus.eu/
    source:               PS-OS 44, AMM-FOAM 1.5 km (tidal) NEMO v3.6_WAVEWAT...
    title:                hourly-instantaneous potential temperature (3D)

Get other model variables into our dataset#

  • XE: Sea surface height above geoid

  • HO: Bathymetry

  • mask: Land-Sea mask

model = model.assign(
    {
        "XE": cat.data(type="SSH", chunks=chunks).to_dask().get("zos"),
        "H0": (
            cat.data_tmp(type="mdt", chunks=chunks)
            .to_dask()
            .get("deptho")
            .rename({"latitude": "lat", "longitude": "lon"})
        ),
        "mask": (
            cat.data_tmp(type="mdt", chunks=chunks)
            .to_dask()
            .get("mask")
            .rename({"latitude": "lat", "longitude": "lon"})
        ),
    }
)
model
/srv/conda/envs/notebook/lib/python3.12/site-packages/intake_xarray/base.py:21: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  'dims': dict(self._ds.dims),
/srv/conda/envs/notebook/lib/python3.12/site-packages/xarray/core/dataset.py:277: UserWarning: The specified chunks separate the stored chunks along dimension "depth" starting at index 11. This could degrade performance. Instead, consider rechunking after loading.
  warnings.warn(
<xarray.Dataset> Size: 14TB
Dimensions:  (depth: 33, lat: 1240, lon: 958, time: 41952)
Coordinates:
  * depth    (depth) float32 132B 0.0 3.0 5.0 10.0 ... 2e+03 3e+03 4e+03 5e+03
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
  * time     (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2023-10-15
Data variables:
    TEMP     (time, depth, lat, lon) float64 13TB dask.array<chunksize=(8, 11, 1240, 958), meta=np.ndarray>
    XE       (time, lat, lon) float64 399GB dask.array<chunksize=(8, 1240, 958), meta=np.ndarray>
    H0       (lat, lon) float32 5MB dask.array<chunksize=(620, 479), meta=np.ndarray>
    mask     (depth, lat, lon) float32 157MB dask.array<chunksize=(11, 1240, 958), meta=np.ndarray>
Attributes: (12/13)
    Conventions:          CF-1.7
    contact:              servicedesk.cmems@mercator-ocean.eu
    creation_date:        2020-09-30T00:08:59Z
    credit:               E.U. Copernicus Marine Service Information (CMEMS)
    forcing_data_source:  ECMWF Global Atmospheric Model (HRES); UKMO NATL12;...
    history:              See source and creation_date attributes
    ...                   ...
    licence:              http://marine.copernicus.eu/services-portfolio/serv...
    netcdf-version-id:    netCDF-4
    product:              NORTHWESTSHELF_ANALYSIS_FORECAST_PHY_004_013
    references:           http://marine.copernicus.eu/
    source:               PS-OS 44, AMM-FOAM 1.5 km (tidal) NEMO v3.6_WAVEWAT...
    title:                hourly-instantaneous potential temperature (3D)

Compute Dynamic depth and dynamic bathymetry#

  • Apply a simple formula and set the attributes (metadata) accordingly

model = model.assign(
    {
        "dynamic_depth": lambda ds: (ds["depth"] + ds["XE"]).assign_attrs(
            {
                "units": "m",
                "positive": "down",
                "long_name": "Dynamic Depth",
                "standard_name": "dynamic_depth",
            }
        ),
        "dynamic_bathymetry": lambda ds: (ds["H0"] + ds["XE"]).assign_attrs(
            {
                "units": "m",
                "positive": "down",
                "long_name": "Dynamic Bathymetry",
                "standard_name": "dynamic_bathymetry",
            }
        ),
    }
)

What is xarray?#

Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like multi-dimensional arrays, which allows for a more intuitive, more concise, and less error-prone developer experience.

How is xarray structured?#

Xarray has two core data structures, which build upon and extend the core strengths of NumPy and Pandas libraries. Both data structures are fundamentally N-dimensional:

  • DataArray is the implementation of a labeled, N-dimensional array. It is an N-D generalization of a Pandas.Series. The name DataArray itself is borrowed from Fernando Perez’s datarray project, which prototyped a similar data structure.

  • Dataset is a multi-dimensional, in-memory array database. It is a dict-like container of DataArray objects aligned along any number of shared dimensions, and serves a similar purpose in xarray as the pandas.DataFrame.

model
<xarray.Dataset> Size: 27TB
Dimensions:             (depth: 33, lat: 1240, lon: 958, time: 41952)
Coordinates:
  * depth               (depth) float32 132B 0.0 3.0 5.0 ... 3e+03 4e+03 5e+03
  * lat                 (lat) float32 5kB 46.0 46.01 46.03 ... 62.72 62.73 62.74
  * lon                 (lon) float32 4kB -16.0 -15.97 -15.94 ... 12.97 13.0
  * time                (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2...
Data variables:
    TEMP                (time, depth, lat, lon) float64 13TB dask.array<chunksize=(8, 11, 1240, 958), meta=np.ndarray>
    XE                  (time, lat, lon) float64 399GB dask.array<chunksize=(8, 1240, 958), meta=np.ndarray>
    H0                  (lat, lon) float32 5MB dask.array<chunksize=(620, 479), meta=np.ndarray>
    mask                (depth, lat, lon) float32 157MB dask.array<chunksize=(11, 1240, 958), meta=np.ndarray>
    dynamic_depth       (depth, time, lat, lon) float64 13TB dask.array<chunksize=(33, 8, 1240, 958), meta=np.ndarray>
    dynamic_bathymetry  (lat, lon, time) float64 399GB dask.array<chunksize=(620, 479, 8), meta=np.ndarray>
Attributes: (12/13)
    Conventions:          CF-1.7
    contact:              servicedesk.cmems@mercator-ocean.eu
    creation_date:        2020-09-30T00:08:59Z
    credit:               E.U. Copernicus Marine Service Information (CMEMS)
    forcing_data_source:  ECMWF Global Atmospheric Model (HRES); UKMO NATL12;...
    history:              See source and creation_date attributes
    ...                   ...
    licence:              http://marine.copernicus.eu/services-portfolio/serv...
    netcdf-version-id:    netCDF-4
    product:              NORTHWESTSHELF_ANALYSIS_FORECAST_PHY_004_013
    references:           http://marine.copernicus.eu/
    source:               PS-OS 44, AMM-FOAM 1.5 km (tidal) NEMO v3.6_WAVEWAT...
    title:                hourly-instantaneous potential temperature (3D)

Accessing Coordinates and Data Variables#

DataArray, within Datasets, can be accessed through:

  • the dot notation like Dataset.NameofVariable

  • or using square brackets, like Dataset[‘NameofVariable’] (NameofVariable needs to be a string so use quotes or double quotes)

model.TEMP
<xarray.DataArray 'TEMP' (time: 41952, depth: 33, lat: 1240, lon: 958)> Size: 13TB
dask.array<open_dataset-thetao, shape=(41952, 33, 1240, 958), dtype=float64, chunksize=(8, 11, 1240, 958), chunktype=numpy.ndarray>
Coordinates:
  * depth    (depth) float32 132B 0.0 3.0 5.0 10.0 ... 2e+03 3e+03 4e+03 5e+03
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
  * time     (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2023-10-15
Attributes:
    long_name:      Sea Water Potential Temperature
    standard_name:  sea_water_potential_temperature
    units:          degrees_C
    valid_max:      30000
    valid_min:      -30000
model["TEMP"]
<xarray.DataArray 'TEMP' (time: 41952, depth: 33, lat: 1240, lon: 958)> Size: 13TB
dask.array<open_dataset-thetao, shape=(41952, 33, 1240, 958), dtype=float64, chunksize=(8, 11, 1240, 958), chunktype=numpy.ndarray>
Coordinates:
  * depth    (depth) float32 132B 0.0 3.0 5.0 10.0 ... 2e+03 3e+03 4e+03 5e+03
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
  * time     (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2023-10-15
Attributes:
    long_name:      Sea Water Potential Temperature
    standard_name:  sea_water_potential_temperature
    units:          degrees_C
    valid_max:      30000
    valid_min:      -30000

Same can view the attributes only with DataArray.attrs ; this will return a dictionary.

model.TEMP.attrs
{'long_name': 'Sea Water Potential Temperature',
 'standard_name': 'sea_water_potential_temperature',
 'units': 'degrees_C',
 'valid_max': 30000,
 'valid_min': -30000}

Check metadata of each variable#

model.TEMP
<xarray.DataArray 'TEMP' (time: 41952, depth: 33, lat: 1240, lon: 958)> Size: 13TB
dask.array<open_dataset-thetao, shape=(41952, 33, 1240, 958), dtype=float64, chunksize=(8, 11, 1240, 958), chunktype=numpy.ndarray>
Coordinates:
  * depth    (depth) float32 132B 0.0 3.0 5.0 10.0 ... 2e+03 3e+03 4e+03 5e+03
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
  * time     (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2023-10-15
Attributes:
    long_name:      Sea Water Potential Temperature
    standard_name:  sea_water_potential_temperature
    units:          degrees_C
    valid_max:      30000
    valid_min:      -30000
model.XE
<xarray.DataArray 'XE' (time: 41952, lat: 1240, lon: 958)> Size: 399GB
dask.array<open_dataset-zos, shape=(41952, 1240, 958), dtype=float64, chunksize=(8, 1240, 958), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
  * time     (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2023-10-15
Attributes:
    long_name:      Sea surface height above geoid
    standard_name:  sea_surface_height_above_geoid
    units:          m
    valid_max:      10000
    valid_min:      -10000
model.H0
<xarray.DataArray 'H0' (lat: 1240, lon: 958)> Size: 5MB
dask.array<open_dataset-deptho, shape=(1240, 958), dtype=float32, chunksize=(620, 479), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
Attributes:
    long_name:      Bathymetry
    standard_name:  sea_floor_depth_below_geoid
    units:          m
model.mask
<xarray.DataArray 'mask' (depth: 33, lat: 1240, lon: 958)> Size: 157MB
dask.array<open_dataset-mask, shape=(33, 1240, 958), dtype=float32, chunksize=(11, 1240, 958), chunktype=numpy.ndarray>
Coordinates:
  * depth    (depth) float32 132B 0.0 3.0 5.0 10.0 ... 2e+03 3e+03 4e+03 5e+03
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
Attributes:
    long_name:      Land-sea mask: sea = 1 ; land = 0
    standard_name:  sea_binary_mask
    units:          1
model.dynamic_depth
<xarray.DataArray 'dynamic_depth' (depth: 33, time: 41952, lat: 1240, lon: 958)> Size: 13TB
dask.array<add, shape=(33, 41952, 1240, 958), dtype=float64, chunksize=(33, 8, 1240, 958), chunktype=numpy.ndarray>
Coordinates:
  * depth    (depth) float32 132B 0.0 3.0 5.0 10.0 ... 2e+03 3e+03 4e+03 5e+03
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
  * time     (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2023-10-15
Attributes:
    units:          m
    positive:       down
    long_name:      Dynamic Depth
    standard_name:  dynamic_depth
model.dynamic_bathymetry
<xarray.DataArray 'dynamic_bathymetry' (lat: 1240, lon: 958, time: 41952)> Size: 399GB
dask.array<add, shape=(1240, 958, 41952), dtype=float64, chunksize=(620, 479, 8), chunktype=numpy.ndarray>
Coordinates:
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
  * time     (time) datetime64[ns] 336kB 2019-01-01T01:00:00 ... 2023-10-15
Attributes:
    units:          m
    positive:       down
    long_name:      Dynamic Bathymetry
    standard_name:  dynamic_bathymetry

Xarray and Memory usage#

Once a Data Array|Set is opened, xarray loads into memory only the coordinates and all the metadata needed to describe it. The underlying data, the component written into the datastore, are loaded into memory as a NumPy array, only once directly accessed; once in there, it will be kept to avoid re-readings. This brings the fact that it is good practice to have a look to the size of the data before accessing it. A classical mistake is to try loading arrays bigger than the memory with the obvious result of killing a notebook Kernel or Python process. If the dataset does not fit in the available memory, then the only option will be to load it through the chunking; later on, in the tutorial ‘chunking_introduction’, we will introduce this concept.

As the size of the data is too big to fit into memory, we need to chunk the data (and this is where we can use dask to parallelise.

model.TEMP.data
Array Chunk
Bytes 11.97 TiB 797.55 MiB
Shape (41952, 33, 1240, 958) (8, 11, 1240, 958)
Dask graph 15732 chunks in 2 graph layers
Data type float64 numpy.ndarray
41952 1 958 1240 33

Select and Plot#

As it is not easy to remember the order of dimensions, Xarray really helps by making it possible to select the position using names:

.isel -> selection based on positional index .sel -> selection based on coordinate values

model.TEMP.isel(time=0, depth=0)
<xarray.DataArray 'TEMP' (lat: 1240, lon: 958)> Size: 10MB
dask.array<getitem, shape=(1240, 958), dtype=float64, chunksize=(1240, 958), chunktype=numpy.ndarray>
Coordinates:
    depth    float32 4B 0.0
  * lat      (lat) float32 5kB 46.0 46.01 46.03 46.04 ... 62.7 62.72 62.73 62.74
  * lon      (lon) float32 4kB -16.0 -15.97 -15.94 -15.91 ... 12.94 12.97 13.0
    time     datetime64[ns] 8B 2019-01-01T01:00:00
Attributes:
    long_name:      Sea Water Potential Temperature
    standard_name:  sea_water_potential_temperature
    units:          degrees_C
    valid_max:      30000
    valid_min:      -30000
model.TEMP.isel(time=0, depth=0).plot(cmap="coolwarm")
<matplotlib.collections.QuadMesh at 0x7f9b9205bc50>
_images/e079a24ab1611002b6d4925bbd241d92b09443ee7ed0c09673c2036a674ba687.png

Reproject data#

fig = plt.figure(1, figsize=[20, 10])

# We're using cartopy and are plotting in PlateCarree projection
# (see documentation on cartopy)
ax = plt.subplot(1, 1, 1, projection=ccrs.PlateCarree())
ax.coastlines(resolution="10m")
ax.gridlines(draw_labels=True)

model["TEMP"].sel(time="2023-10-14T22:00:00.000000000").isel(depth=0).plot(
    ax=ax, transform=ccrs.PlateCarree()
)

# One way to customize your title
plt.title("Copernicus Marine", fontsize=18)
Text(0.5, 1.0, 'Copernicus Marine')
_images/928010a6c67c8e238aaec6c7c6eae0d4e4f441e816ba66a14573231e8432cc4f.png
fig = plt.figure(1, figsize=[20, 10])

# We're using cartopy and are plotting in PlateCarree projection
# (see documentation on cartopy)
ax = plt.subplot(1, 1, 1, projection=ccrs.AlbersEqualArea())
ax.coastlines(resolution="10m")
ax.gridlines(draw_labels=True)

model["TEMP"].sel(time="2023-10-14T22:00:00.000000000").isel(depth=0).plot(
    ax=ax, transform=ccrs.PlateCarree()
)

# One way to customize your title
plt.title("Copernicus Marine", fontsize=18)
Text(0.5, 1.0, 'Copernicus Marine')
_images/573b8cfe8ee9233c6c5713e6c521514a1a6368b3e6913a9ed38167aa16bb2949.png

Interactive visualization with holoviews#

model.isel(time=0, depth=0).hvplot(cmap="coolwarm", width=1000, height=1000)

Having a look to data distribution can reveal a lot about the data.

model["TEMP"][0].hvplot.hist(cmap="coolwarm", bins=25, width=800, height=700)

Multi-plots using groupby#

To be able to visualize interactively all the different available times, we can use groupby time.

model["TEMP"].isel(depth=0).hvplot(
    groupby="time", cmap="coolwarm", width=800, height=700
)