Skip to article frontmatterSkip to article content

STACK Service Dask

This notebook introduces authentication and multi-cluster management using the DEDL Stack client with OIDC, enabling users to securely spawn, monitor, and scale Dask clusters across Central and LUMI locations within the DestinE Data Lake.

🚀 Launch in JupyterHub

Authentication via OIDC password grant flow

The DEDL Stack client library holds the DaskOIDC class as a helper class to authenticate a user against the given identity provider of DestinE Data Lake. The users password is directly handed over to the request object and is not stored. Refreshed token is used to request a new access token in case it is expired.

The DaskMultiCluster class provides an abstraction layer to spawn multiple Dask clusters, one per location, within the data lake. Each cluster will be composed of 2 workers per default, with adaptive scaling enabled towards a maximum of 10 workers. In addition, the workers are configured to have 2 cores and 2 GB RAM per default. This can be changed via the cluster options exposed up to the given service quota of the individual user role:

  • Worker cores:

    • min: 1

    • max: ..::service-quota::..

  • Worker memory:

    • min: 1 GB

    • max: ..::service-quota::.. GB

Dask Worker and Scheduler nodes are based on a custom build container image with the aim to match the environment, Jupyter Kernel, of the DEDL JupyterLab instance. Warnings will be displayed if a version missmatch is detected. Feel free to use your custom image to run your workloads by replacing the container image in the cluster options object.

from dedl_stack_client.authn import DaskOIDC
from dedl_stack_client.dask import DaskMultiCluster
from rich.prompt import Prompt

myAuth = DaskOIDC(username=Prompt.ask(prompt="Username"))
myDEDLClusters = DaskMultiCluster(auth=myAuth)
myDEDLClusters.new_cluster()

Print the given client object details per location as well as the link to the Dask dashboard.

with myDEDLClusters.as_current(location="central") as myclient:
    print(myclient)
    print(myclient.dashboard_link)
with myDEDLClusters.as_current(location="lumi") as myclient:
    print(myclient)
    print(myclient.dashboard_link)

Shutdown the all clusters and free up all resources

myDEDLClusters.shutdown()