Setup instructions and User Access#
This section provides an overview of where to find information and source code related to the deployment and administration of GFTS JupyterHub. It also explains how users can be added and by whom.
GFTS Hub deployment#
The source code and documentation to deploy the GFTS Hub on the DestinE Cloud Infrastructure (currently at OVH) are available in the gfts-track-reconstruction
folder, located in the root directory.
A good starting point is the README.md file, located in the gfts-track-reconstruction/jupyterhub
folder.
GFTS Hub User Image#
The user image is defined in the gfts-track-reconstruction/jupyterhub/images/user folder and consists of:
Dockerfile where you can review the Pangeo base image we are using, and see how additional conda and pip packages are installed, among other details;
conda-requirements.txt contains the list of packages to be installed with conda. To suggest the installation of a new conda package, you can edit this file and make a Pull Request. Be sure to follow the
requirements.txt
format when adding new packages.requirements.txt contains the list of packages installed with pip. Similarly, you can suggest adding new pip packages by editing this file and submitting a Pull Request.
S3 Buckets#
Limited storage is available on the GFTS hub itself but we have setup different S3 buckets where we manage the data we need for the GFTS project. We currently have different S3 buckets which we can split into 2 categories:
Public S3 buckets: these buckets enable to share data to all users of the platform.
“gfts-reference-data”: This S3 bucket contains reference datasets (such as Copernicus Marine) that have been copied here to speed up access and data processing. Most GFTS users will only require read-only access to this bucket; a few GFTS users can write to it to copy new reference datasets. If you need additional reference datasets, please create an issue here.
“destine-gfts-data-lake”: This S3 bucket contains datasets generated by the GFTS, which are intended to be made public for all users with access to the GFTS hub;
Private S3 buckets: these buckets contain datasets that are private to a specific group. All users of a given group have read and write access to their corresponding buckets. Users can store private datasets or save intermediate results that will be shared in destine-gfts-data-lake once validated.
“gfts-ifremer”: This S3 bucket contains datasets that are private to IFREMER GFTS users.
“gfts-vliz”: This S3 bucket contains datasets that are private to VLIZ GFTS users.
Access to the GFTS Hub and S3 Buckets#
Getting access to the GFTS Hub and S3 Buckets#
The first step is to create an issue with the following information:
The GitHub username of the person you want to add to the GFTS Hub;
The list of S3 buckets this new person would need to access.
If a new group of users is required, please specify the name of the new S3 private bucket to be created for this group and identify any existing users who need access to it. A new group of users is necessary if you have a unique set of biologging data that must remain private and cannot be shared publicly, or if you need to share intermediate and non-validated results within a specific group before making them available to the GFTS community.
See also
The current list of authorized GFTS users can be found in gfts-track-reconstruction/jupyterhub/gfts-hub/values.yaml
.
Giving access to the GFTS Hub and existing S3 Buckets (Admin only)#
Everyone can initiate a Pull Request to add a new user with read-only access to gfts-reference-data
and destine-gfts-data-lake
.
There is only one step:
Add the new user (github username) in lowercase in
gfts-track-reconstruction/jupyterhub/gfts-hub/values.yaml
When the PR is merged, the github user will have read-only access to gfts-reference-data
and destine-gfts-data-lake
and will be able to:
import s3fs
s3 = s3fs.S3FileSystem(anon=False)
s3.listdir("gfts-reference-data")
To grant read access to private data or write access, the user must be added to an s3 group in the tofu
configuration,
adding the following steps which can only be done by a GFTS Hub admin:
Add the github username (lowercase) in one of the
s3_
groups ingfts-track-reconstruction/jupyterhub/tofu/main.tf
for the following permissions:s3_ifremer_developers
: write access togfts-ifremer
andgfts-reference-data
s3_ifremer_users
: write access togfts-ifremer
onlys3_vliz_users
: write access togfts-vliz
onlys3_admins
: admin access to all s3 buckets
If you need to create a new user group and a private S3 bucket for them, please read the next section on creating a new user group before proceeding with steps 3–6.
Run
tofu apply
to apply the S3 permissions. Ensure you are in thegfts-track-reconstruction/jupyterhub/tofu
folder before executing thetofu
command and have runsource secrets/ovh-creds.sh
.Update
gfts-track-reconstruction/jupyterhub/secrets/config.yaml
with the output of the commandtofu output -json s3_credentials_json
. This command needs to be executed in thetofu
folder after applying the S3 permissions withtofu apply
. If the file contains binary content, it means you do not have the rights to add new users to the GFTS S3 buckets and will need to ask a GFTS admin for assistance.Run
pytest
in thetofu
directory to test s3 permissions.Don’t forget to commit and push your changes!
Steps 3 and 4 are what actually grant the jupyterhub user s3 access.
Creating a new group of users (Admin only)#
If you need to create a new user group with a corresponding private S3 bucket, follow the additional step below (to be completed after step 1 and before step 2).
Choose a new group name (not too long e.g. < 8 characters) which can be the organisation name of the user(s) or its acronym. We suggest to add the prefix gfts-
(e.g. gfts-ifremer, gfts-vliz, etc.). In the example below, we are adding a new group of users called gfts-vliz
:
Add the new bucket name in the
s3_buckets
variable ingfts-track-reconstruction/jupyterhub/tofu/main.tf
:
s3_buckets = toset([
"gfts-vliz",
"gfts-ifremer",
"gfts-reference-data",
"destine-gfts-data-lake",
])
Create a new variable to list the users who will have access to the new S3 bucket. Locate the variable
s3_ifremer_users
and add the new variable immediately after it:
s3_vliz_users = toset([
"davidcasalsvliz",
])
Update the
s3_users
variable by adding the new list of users (herelocal.s3_vliz_users
):
s3_users = setunion(local.s3_readonly_users, local.s3_admins, local.s3_vliz_users, local.s3_ifremer_developers, local.s3_ifremer_users)
Create a new resource policy for this new group of users (search for
resource "ovh_cloud_project_user_s3_policy" "s3_ifremer_users"
to locate the section on resource policy for users):
resource "ovh_cloud_project_user_s3_policy" "s3_vliz_users" {
for_each = local.s3_vliz_users
service_name = local.service_name
user_id = ovh_cloud_project_user.s3_users[each.key].id
policy = jsonencode({
"Statement" : concat([
{
"Sid" : "Admin",
"Effect" : "Allow",
"Action" : local.s3_admin_action,
"Resource" : [
"arn:aws:s3:::${aws_s3_bucket.gfts-vliz.id}",
"arn:aws:s3:::${aws_s3_bucket.gfts-vliz.id}/*",
]
},
], local.s3_default_policy)
})
}
Make sure your replace vliz
with the new group name!
Create the new S3 bucket by locating resource
"aws_s3_bucket" "gfts-ifremer"
and adding the new bucket configuration immediately after it:
resource "aws_s3_bucket" "gfts-vliz" {
bucket = "gfts-vliz"
}
You are done with the configuration of the new group and its corresponding private S3 bucket. Go back to the previous section on giving access to the GFTS Hub and S3 buckets and follow the steps 2-6.
Caution
The following packages need to be installed on your system:
As an admin, you’ll need to set up your environment. The GFTS maintainer will provide you with a key encrypted with your GitHub SSH key. Save the content sent by the GFTS maintainer into a file, and name it ssh-vault.txt
. At the moment, the keys are known to annefou and minrk.
cat ssh-vault.txt | ssh-vault view | base64 --decode > keyfile && git-crypt unlock keyfile && rm keyfile
Before executing the command above, ensure you have changed the directory to the root of the DestinE_ESA_GFTS
git repository.
Thanks to the previous command, you should be able to cat gfts-track-reconstruction/jupyterhub/tofu/secrets/ovh-creds.sh
and see a text file.
Finally to initialize your environment and execute tofu
commands, you need to change the directory to the gfts-track-reconstruction/jupyterhub/tofu
folder and source secrets/ovh-creds.sh
e.g.:
source secrets/ovh-creds.sh
tofu init
tofu apply
Then you are ready to go and can follow the steps explained above to grant access to S3 buckets to a new user.