Setup instructions and User Access#

This section provides an overview of where to find information and source code related to the deployment and administration of GFTS JupyterHub. It also explains how users can be added and by whom.

GFTS Hub deployment#

The source code and documentation to deploy the GFTS Hub on the DestinE Cloud Infrastructure (currently at OVH) are available in the gfts-track-reconstruction folder, located in the root directory.

A good starting point is the README.md file, located in the gfts-track-reconstruction/jupyterhub folder.

GFTS Hub User Image#

The user image is defined in the gfts-track-reconstruction/jupyterhub/images/user folder and consists of:

  • Dockerfile where you can review the Pangeo base image we are using, and see how additional conda and pip packages are installed, among other details;

  • conda-requirements.txt contains the list of packages to be installed with conda. To suggest the installation of a new conda package, you can edit this file and make a Pull Request. Be sure to follow the requirements.txt format when adding new packages.

  • requirements.txt contains the list of packages installed with pip. Similarly, you can suggest adding new pip packages by editing this file and submitting a Pull Request.

S3 Buckets#

Limited storage is available on the GFTS hub itself but we have setup different S3 buckets where we manage the data we need for the GFTS project. We currently have different S3 buckets which we can split into 2 categories:

  1. Public S3 buckets: these buckets enable to share data to all users of the platform.

    • gfts-reference-data”: This S3 bucket contains reference datasets (such as Copernicus Marine) that have been copied here to speed up access and data processing. Most GFTS users will only require read-only access to this bucket; a few GFTS users can write to it to copy new reference datasets. If you need additional reference datasets, please create an issue here.

    • destine-gfts-data-lake”: This S3 bucket contains datasets generated by the GFTS, which are intended to be made public for all users with access to the GFTS hub;

  2. Private S3 buckets: these buckets contain datasets that are private to a specific group. All users of a given group have read and write access to their corresponding buckets. Users can store private datasets or save intermediate results that will be shared in destine-gfts-data-lake once validated.

    • gfts-ifremer”: This S3 bucket contains datasets that are private to IFREMER GFTS users.

    • gfts-vliz”: This S3 bucket contains datasets that are private to VLIZ GFTS users.

Access to the GFTS Hub and S3 Buckets#

Getting access to the GFTS Hub and S3 Buckets#

The first step is to create an issue with the following information:

  1. The GitHub username of the person you want to add to the GFTS Hub;

  2. The list of S3 buckets this new person would need to access.

  3. If a new group of users is required, please specify the name of the new S3 private bucket to be created for this group and identify any existing users who need access to it. A new group of users is necessary if you have a unique set of biologging data that must remain private and cannot be shared publicly, or if you need to share intermediate and non-validated results within a specific group before making them available to the GFTS community.

See also

The current list of authorized GFTS users can be found in gfts-track-reconstruction/jupyterhub/gfts-hub/values.yaml.

Giving access to the GFTS Hub and existing S3 Buckets (Admin only)#

Everyone can initiate a Pull Request to add a new user with read-only access to gfts-reference-data and destine-gfts-data-lake. There is only one step:

  1. Add the new user (github username) in lowercase in gfts-track-reconstruction/jupyterhub/gfts-hub/values.yaml

When the PR is merged, the github user will have read-only access to gfts-reference-data and destine-gfts-data-lake and will be able to:

import s3fs
s3 = s3fs.S3FileSystem(anon=False)
s3.listdir("gfts-reference-data")

To grant read access to private data or write access, the user must be added to an s3 group in the tofu configuration, adding the following steps which can only be done by a GFTS Hub admin:

  1. Add the github username (lowercase) in one of the s3_ groups in gfts-track-reconstruction/jupyterhub/tofu/main.tf for the following permissions:

    • s3_ifremer_developers: write access to gfts-ifremer and gfts-reference-data

    • s3_ifremer_users: write access to gfts-ifremer only

    • s3_vliz_users: write access to gfts-vliz only

    • s3_admins: admin access to all s3 buckets

    If you need to create a new user group and a private S3 bucket for them, please read the next section on creating a new user group before proceeding with steps 3–6.

  2. Run tofu apply to apply the S3 permissions. Ensure you are in the gfts-track-reconstruction/jupyterhub/tofu folder before executing the tofu command and have run source secrets/ovh-creds.sh.

  3. Update gfts-track-reconstruction/jupyterhub/secrets/config.yaml with the output of the command tofu output -json s3_credentials_json. This command needs to be executed in the tofu folder after applying the S3 permissions with tofu apply. If the file contains binary content, it means you do not have the rights to add new users to the GFTS S3 buckets and will need to ask a GFTS admin for assistance.

  4. Run pytest in the tofu directory to test s3 permissions.

  5. Don’t forget to commit and push your changes!

Steps 3 and 4 are what actually grant the jupyterhub user s3 access.

Creating a new group of users (Admin only)#

If you need to create a new user group with a corresponding private S3 bucket, follow the additional step below (to be completed after step 1 and before step 2).

Choose a new group name (not too long e.g. < 8 characters) which can be the organisation name of the user(s) or its acronym. We suggest to add the prefix gfts- (e.g. gfts-ifremer, gfts-vliz, etc.). In the example below, we are adding a new group of users called gfts-vliz:

  • Add the new bucket name in the s3_buckets variable in gfts-track-reconstruction/jupyterhub/tofu/main.tf:

  s3_buckets = toset([
    "gfts-vliz",
    "gfts-ifremer",
    "gfts-reference-data",
    "destine-gfts-data-lake",
  ])
  • Create a new variable to list the users who will have access to the new S3 bucket. Locate the variable s3_ifremer_users and add the new variable immediately after it:

  s3_vliz_users = toset([
    "davidcasalsvliz",
  ])
  • Update the s3_users variable by adding the new list of users (here local.s3_vliz_users):

s3_users = setunion(local.s3_readonly_users, local.s3_admins, local.s3_vliz_users, local.s3_ifremer_developers, local.s3_ifremer_users)
  • Create a new resource policy for this new group of users (search for resource "ovh_cloud_project_user_s3_policy" "s3_ifremer_users" to locate the section on resource policy for users):

resource "ovh_cloud_project_user_s3_policy" "s3_vliz_users" {
  for_each     = local.s3_vliz_users
  service_name = local.service_name
  user_id      = ovh_cloud_project_user.s3_users[each.key].id
  policy = jsonencode({
    "Statement" : concat([
      {
        "Sid" : "Admin",
        "Effect" : "Allow",
        "Action" : local.s3_admin_action,
        "Resource" : [
          "arn:aws:s3:::${aws_s3_bucket.gfts-vliz.id}",
          "arn:aws:s3:::${aws_s3_bucket.gfts-vliz.id}/*",
        ]
      },
    ], local.s3_default_policy)
  })
}

Make sure your replace vliz with the new group name!

  • Create the new S3 bucket by locating resource "aws_s3_bucket" "gfts-ifremer" and adding the new bucket configuration immediately after it:

resource "aws_s3_bucket" "gfts-vliz" {
  bucket = "gfts-vliz"
}

Caution

The following packages need to be installed on your system:

  1. ssh-vault;

  2. git-crypt;

  3. opentofu

As an admin, you’ll need to set up your environment. The GFTS maintainer will provide you with a key encrypted with your GitHub SSH key. Save the content sent by the GFTS maintainer into a file, and name it ssh-vault.txt. At the moment, the keys are known to annefou and minrk.

cat ssh-vault.txt | ssh-vault view | base64 --decode > keyfile && git-crypt unlock keyfile && rm keyfile

Before executing the command above, ensure you have changed the directory to the root of the DestinE_ESA_GFTS git repository.

Thanks to the previous command, you should be able to cat gfts-track-reconstruction/jupyterhub/tofu/secrets/ovh-creds.sh and see a text file.

Finally to initialize your environment and execute tofu commands, you need to change the directory to the gfts-track-reconstruction/jupyterhub/tofu folder and source secrets/ovh-creds.sh e.g.:

source secrets/ovh-creds.sh
tofu init
tofu apply

Then you are ready to go and can follow the steps explained above to grant access to S3 buckets to a new user.