GFTS JupyterHub

Overview

The GFTS Hub is a powerful modeling environment based on JupyterHub, designed for reconstructing fish tracks using biologging data and environmental ocean fields. Hosted on the OVH cloud infrastructure, it provides a scalable and interactive workspace for researchers and data scientists.

Features

Interactive Data Analysis: Utilize Jupyter notebooks for dynamic data exploration and modeling.
Robust Computational Resources: Leverage cloud capabilities for large-scale computations.
Integration with Environmental Data: Access and analyze datasets from Copernicus Marine and other sources.

Setup and administration

This section provides an overview of where to find information and source code related to the deployment and administration of GFTS JupyterHub. It also explains how and by whom users can be added.

GFTS Hub deployment

The source code and documentation to deploy the GFTS Hub on the DestinE Cloud Infrastructure (currently at OVH) are available in the gfts-track-reconstruction folder, located in the root directory.

A good starting point is the README.md file, located in the gfts-track-reconstruction/jupyterhub folder.

GFTS Hub User Image

The user image is defined in the gfts-track-reconstruction/jupyterhub/images/user folder and consists of:

Dockerfile where you can review the Pangeo base image we are using, and see how additional conda and pip packages are installed, among other details;
conda-requirements.txt contains the list of packages to be installed with conda. To suggest the installation of a new conda package, you can edit this file and make a Pull Request. Be sure to follow the requirements.txt format when adding new packages.
requirements.txt contains the list of packages installed with pip. Similarly, you can suggest adding new pip packages by editing this file and submitting a Pull Request.

S3 Buckets

Limited storage is available on the GFTS hub itself but we have setup different S3 buckets where we manage the data we need for the GFTS project. We currently have different S3 buckets which we can split into 2 categories:

Public S3 buckets: these buckets enable to share data to all users of the platform.
- “gfts-reference-data”: This S3 bucket contains reference datasets (such as Copernicus Marine) that have been copied here to speed up access and data processing. Most GFTS users will only require read-only access to this bucket; a few GFTS users can write to it to copy new reference datasets. If you need additional reference datasets, please create an issue here.
- “destine-gfts-data-lake”: This S3 bucket contains datasets generated by the GFTS, which are intended to be made public for all users with access to the GFTS hub;
Private S3 buckets: these buckets contain datasets that are private to a specific group. All users of a given group have read and write access to their corresponding buckets. Users can store private datasets or save intermediate results that will be shared in destine-gfts-data-lake once validated.
- “gfts-ifremer”: This S3 bucket contains datasets that are private to IFREMER GFTS users.
- “gfts-vliz”: This S3 bucket contains datasets that are private to VLIZ GFTS users.

Admin Hub

User Management: Administrators can add new users to the GFTS Hub, granting access to computational resources and data sets.
Deployment Configuration: Details on configuring the JupyterHub environment, including Docker image setups and package installations.
Data Access Control: Instructions for managing permissions to S3 buckets and other data resources.