ERA5 hourly data on single levels from 1940 to present
This notebook shows how to authenticate with the DestinE API, queries and downloads ERA5 single-level reanalysis data using the DEDL HDA service, and visualizes the result with EarthKit.
To search and access DEDL data a DestinE user account is needed
Earthkit and HDA Polytope used in this context are both packages provided by the European Centre for Medium-Range Weather Forecasts (ECMWF).
This notebook demonstrates how to use the HDA (Harmonized Data Access) API to access ERA5 hourly data on single levels, and how to visualize the data using the Earthkit package provided by ECMWF.
The method used to access this dataset can be applied to all the datasets hosted by the climate data store provided through HDA.
The complete list of the climate data store federated datasets provided by HDA: https://
Below the main steps covered by this tutorial.
Authenticate: How to authenticate for searching and access DEDL collections.
Order data: How to order ERA5 hourly data on single levels data through HDA.
Download data: How to download hourly data on single levels data through HDA.
Visualize: How to visualize hourly data on single levels data through Earthkit.
Authenticate¶
First we import the required packages
import requests
import json
import os
import json
from getpass import getpass
from IPython.display import JSON
import destinelab as deauth
from tqdm import tqdm
import time
from urllib.parse import unquote
from time import sleepWe get an access token for the API
DESP_USERNAME = input("Please input your DESP username or email: ")
DESP_PASSWORD = getpass("Please input your DESP password: ")
auth = deauth.AuthHandler(DESP_USERNAME, DESP_PASSWORD)
access_token = auth.get_token()
if access_token is not None:
print("DEDL/DESP Access Token Obtained Successfully")
else:
print("Failed to Obtain DEDL/DESP Access Token")
auth_headers = {"Authorization": f"Bearer {access_token}"}Order¶
Climate data store datasets need to be ordered. Below the steps to order the data of our interest.
HDA endpoint¶
HDA API is based on the Spatio Temporal Asset Catalog specification (STAC), it is convenient define a costant with its endpoint.
HDA_STAC_ENDPOINT="https://hda.data.destination-earth.eu/stac/v2"Collection discovery¶
Each HDA collection has its own ID to be used to query the collection. Let’s discover the ID of our collection of interest using the HDA discovery API with some filter, e.g. the federated provider along with the “ERA5” text parameter.
response = requests.get(f"{HDA_STAC_ENDPOINT}/collections",params = {"query": '{"federation:backends": {"eq": "cop_cds"}}',"q": "ERA5"})
JSON(response.json()) The collection “ERA5 hourly data on single levels from 1940 to present” has the id “EO.ECMWF.DAT.REANALYSIS_ERA5_SINGLE_LEVELS” in HDA. We will use this ID for all the following operations.
COLLECTION_ID = "EO.ECMWF.DAT.REANALYSIS_ERA5_SINGLE_LEVELS"Filtering¶
The “ERA5 hourly data on single levels from 1940 to present” dataset, as well as the others datasets provided by the CDS (Climate Data Store), can be subset requesting only the data of interest.
The set of parameters to subset the data are available through the HDA queryables endpoint. For this specific collection: Filter Options To understand how the HDA queryables API works you can also have a look at the queryables notebook
Using the information provided by the queryables endpoint, we can download the data we are interested in. In this example we will download the 2m temperature and sea surface temperature data for the hottest day in 2024, July 22nd.
Search into asynchronous datasets, as the CDS datasets are, always return a single item:
filters = {
key: {"eq": value}
for key, value in {
"ecmwf:data_format": "grib",
"ecmwf:variable": ["2m_temperature","sea_surface_temperature"],
"ecmwf:time": "12:00",
"ecmwf:day": "22",
"ecmwf:month": "7",
"ecmwf:year": "2024"
}.items()
}
response = requests.post(f"{HDA_STAC_ENDPOINT}/search", headers=auth_headers, json={
"collections": ["EO.ECMWF.DAT.REANALYSIS_ERA5_SINGLE_LEVELS"],
"query": filters
})
if(response.status_code!= 200):
(print(response.text))
response.raise_for_status()
product = response.json()["features"][0]
JSON(product)The single item returned (above) contains:
The product id: “ERA5_SL_ORDERABLE_...”, that is a placeholder, its name contains the term “ORDERABLE”.
The storage:tier that indicates that the product is “offline”
The order:status that indicates that the product is “orderable”
Request params used for the order extracted from the search result:
ecmwf:variables: “2m_temperature”, “sea_surface_temperature”
ecmwf:day:“22”
ecmwf:dataset:“reanalysis-era5-single-levels”
ecmwf:month:“7”
ecmwf:data_format:“grib”
ecmwf:time:“12:00”
ecmwf:year:“2024”
ecmwf:product_type:“reanalysis”
Order data¶
We have now all the information to order the data.
From the search results we know that the product is orderable and offline, we then need to order the product we searched for.
response = requests.post(f"{HDA_STAC_ENDPOINT}/collections/{COLLECTION_ID}/order", json={
"ecmwf:data_format": "grib",
"ecmwf:variable": ["2m_temperature","sea_surface_temperature"],
"ecmwf:time": "12:00",
"ecmwf:day": "22",
"ecmwf:month": "7",
"ecmwf:year": "2024"
}, headers=auth_headers)
if response.status_code != 200:
print(response.content)
response.raise_for_status()
ordered_item = response.json()
product_id = ordered_item["id"]
storage_tier = ordered_item["properties"].get("storage:tier", "online")
order_status = ordered_item["properties"].get("order:status", "unknown")
federation_backend = ordered_item["properties"].get("federation:backends", [None])[0]
print(f"Product ordered: {product_id}")
print(f"Provider: {federation_backend}")
print(f"Storage tier: {storage_tier} (product must have storage tier \"online\" to be downloadable)")
print(f"Order status: {order_status}")Poll the API until product is ready¶
We request the product itself to get an update of its status.
#timeout and step for polling (sec)
TIMEOUT = 300
STEP = 1
ONLINE_STATUS = "online"
self_url = f"{HDA_STAC_ENDPOINT}/collections/{COLLECTION_ID}/items/{product_id}"
item = {}
for i in range(0, TIMEOUT, STEP):
print(f"Polling {i + 1}/{TIMEOUT // STEP}")
response = requests.get(self_url, headers=auth_headers)
response.raise_for_status()
item = response.json()
storage_tier = item["properties"].get("storage:tier", ONLINE_STATUS)
if storage_tier == ONLINE_STATUS:
download_url = item["assets"]["downloadLink"]["href"]
print("Product is ready to be downloaded.")
print(f"Asset URL: {download_url}")
break
sleep(STEP)
else:
order_status = item["properties"].get("order:status", "unknown")
print(f"We could not download the product after {TIMEOUT // STEP} tries. Current order status is {order_status}")
Download¶
response = requests.get(download_url, stream=True, headers=auth_headers)
response.raise_for_status()
content_disposition = response.headers.get('Content-Disposition')
total_size = int(response.headers.get("content-length", 0))
if content_disposition:
filename = content_disposition.split('filename=')[1].strip('"')
filename = unquote(filename)
else:
filename = os.path.basename(url)
# Open a local file in binary write mode and write the content
print(f"downloading {filename}")
with tqdm(total=total_size, unit="B", unit_scale=True) as progress_bar:
with open(filename, 'wb') as f:
for data in response.iter_content(1024):
progress_bar.update(len(data))
f.write(data)import earthkit.data
import earthkit.plots
import earthkit.regrid
data = earthkit.data.from_source("file", filename)
earthkit.plots.quickplot(data)