Skip to article frontmatterSkip to article content

HDA PySTAC-Client Introduction

This notebook shows the basic use of DestinE Data Lake Harmonised Data Access using pystac-client.

πŸš€ Launch in JupyterHub
Prerequisites:
  • To search and access DEDL data a DestinE user account is needed

  • References:
  • DestinE Data Lake (DEDL) Harmonized Data Access (HDA) documentation

  • Credit:
    This notebook uses PySTAC
    Β© PySTAC Developers
    Licensed under Apache License 2.0

    This notebook shows the basic use of DestinE Data Lake Harmonised Data Access using pystac-client. It will include iterating through Collections and Items, and perform simple spatio-temporal searches.

    Obtain DEDL Access Token to use the HDA serviceΒΆ

    import requests
    import json
    import os
    from getpass import getpass
    import destinelab as deauth
    DESP_USERNAME = input("Please input your DESP username or email: ")
    DESP_PASSWORD = getpass("Please input your DESP password: ")
    
    auth = deauth.AuthHandler(DESP_USERNAME, DESP_PASSWORD)
    access_token = auth.get_token()
    if access_token is not None:
        print("DEDL/DESP Access Token Obtained Successfully")
    else:
        print("Failed to Obtain DEDL/DESP Access Token")
    
    auth_headers = {"Authorization": f"Bearer {access_token}"}

    Set username and password as environment variables to be used for DEDL data accessΒΆ

    import os
    
    os.environ["EODAG__DEDL__AUTH__CREDENTIALS__USERNAME"] = DESP_USERNAME
    os.environ["EODAG__DEDL__AUTH__CREDENTIALS__PASSWORD"] = DESP_PASSWORD

    Create pystac client object for HDA STAC APIΒΆ

    We first connect to an API by retrieving the root catalog, or landing page, of the API with the Client.open function.

    from pystac_client import Client
    
    HDA_API_URL = "https://hda.data.destination-earth.eu/stac/v2"
    cat = Client.open(HDA_API_URL, headers=auth_headers)

    Query all available collectionsΒΆ

    As with a static catalog the get_collections function will iterate through the Collections in the Catalog. Notice that because this is an API it can get all the Collections through a single call, rather than having to fetch each one individually.

    from rich.console import Console
    import rich.table
    
    console = Console()
    
    hda_collections = cat.get_collections()
    
    table = rich.table.Table(title="HDA collections", expand=True)
    table.add_column("ID", style="cyan", justify="right",no_wrap=True)
    table.add_column("Title", style="violet", no_wrap=True)
    for collection in hda_collections:
        table.add_row(collection.id, collection.title)
    console.print(table)

    Obtain provider information for each individual collectionΒΆ

    table = rich.table.Table(title="HDA collections | Providers", expand=True)
    table.add_column("Title", style="cyan", justify="right", no_wrap=True)
    table.add_column("Provider", style="violet", no_wrap=True)
    
    hda_collections = cat.get_collections()
    
    for collection in hda_collections:
        collection_details = cat.get_collection(collection.id)
        provider = ','.join(str(x.name) for x in collection_details.providers)
        table.add_row(collection_details.title, provider)
    console.print(table)

    Inspect Items of a CollectionΒΆ

    The main functions for getting items return iterators, where pystac-client will handle retrieval of additional pages when needed. Note that one request is made for the first ten items, then a second request for the next ten.

    coll_name = 'EO.ESA.DAT.SENTINEL-1.L1_GRD'
    search = cat.search(
        max_items=10,
        collections=[coll_name],
        bbox=[-72.5,40.5,-72,41],
        datetime="2023-09-09T00:00:00Z/2023-09-20T23:59:59Z"
    )
    
    coll_items = search.item_collection()
    console.print(f"For collection {coll_name} we found {len(coll_items)} items")
    import geopandas
    
    df = geopandas.GeoDataFrame.from_features(coll_items.to_dict(), crs="epsg:4326")
    df.head()
    
    

    Inspect STAC assets of an itemΒΆ

    import rich.table
    
    selected_item = coll_items[3]
    
    table = rich.table.Table(title="Assets in STAC Item")
    table.add_column("Asset Key", style="cyan", no_wrap=True)
    table.add_column("Description")
    for asset_key, asset in selected_item.assets.items():
        table.add_row(asset_key, asset.title)
    
    console.print(table)
    from IPython.display import Image
    url =selected_item.assets["quick-look.png"].href
    headers = {
        "Authorization": "Bearer " + access_token
    }
    
    response = requests.get(url, headers=headers)
    response.raise_for_status()
    
    Image(data=response.content)
    down_uri = selected_item.assets["downloadLink"].href
    console.print(f"Download link of asset is {down_uri}")

    Download asset to JupyterLabΒΆ

    selected_item.id
    selected_item.assets["downloadLink"]
    # Make http request for remote file data
    data = requests.get(selected_item.assets["downloadLink"].href,
                       headers=auth_headers)
    mtype = selected_item.assets["downloadLink"].media_type.split("/")[1]
    # Save file data to local copy
    with open(f"{selected_item.id}.{mtype}", 'wb')as file:
        file.write(data.content)