Panel dashboard to visualise Fish tracks#


This notebooks is used to plot a panel dashboard to visualize data for the GFTS project#


The panel dashboard displays the following informations :

  • The temperature measured by the fish.

  • The depth measured by the fish.

  • The evolution of the latitude and the longitude during the time.

  • The trajectory generated by the algorithm


First of all, you need to access the following jubyterhub server, you will need to log with your github account and you will select a configuration for starting a notebook.

To acces this notebook you have to clone the following github repository. First you will fork the repository, see this link. Then, you will clone your forked version, use the git section of this notebook with the side bar. Here is a tutorial.
If you want to submit code reviews, please see this rule of participation

To be able to run this notebook you need to clone pangeo-fish repository, you can reuse the previous steps you used before, no need to fork the repository this time. Once you have clone the pangeo-fish repository, you need to install it with the following command :
!pip install pangeo-fish/ /!\ You might need to update the path to the pangeo-fish folder you cloned.
Normally, you can turn the first cell of the notebook into a code cell (select the cell and press Y key) and run it, wait for the installation to finish, then turn it back into a raw cell (select the cell and press R key), otherwise this cell will needlessly be runned by panel once you launch the dashboard. You can also use the notebook toolbar to change the cell type form code to raw.

Normally, all the other library to use this notebook should already installed once this has been done. To start the preview with panel, click the blue pannel logo Panel logo in the notebook tool bar.
This will open a new tab in your notebook that displays panel informations.

Please note that all the tracks have been generated but note checked manually, some can display incoherent data. It will be checked and corrected later on.

The trajectories available from S3 has been generated using this the papermill_launcher.ipynb notebook, see this notebook to understand more about it.
Using this panel, user can examine the results of a computation. This helps to understand how the algorithm behave for different situations.

Here is a table that sums up all the generations You can use the value of generation name and set the variable generation_name to acces the corresponding generation.

generation_name

dataset

bbox

correction methods

comment

tracks

IBI_MULTIYEAR_PHY_005_002

latitude : 42,53- longitude -8,4

None

415 computed, 12 wrong tracks

tracks_2

IBI_MULTIYEAR_PHY_005_002

latitude : 40,56- longitude -13,5

Coastal variability, alpha=0.01 and MarkovAutoregression

431 computed, 32 wrong tracks. New bounding box that extends to all the fishes recapture positions.

tracks_3

IBI_MULTIYEAR_PHY_005_002

latitude : 40,56- longitude -13,5

Coastal_variability, alpha=0.01 and MarkovAutoregression

395 computed, 39 wrong tracks. Implementation of a new method to correct issues due to anomalies in biologging data and test of a modelisation of the coastal variability This generation was not complete because there was an issue in the data.

tracks_4

IBI_MULTIYEAR_PHY_005_002

latitude : 40,56- longitude -13,5

Coastal variability, alpha=0.01

421 computed, 60 wrong tracks. Same parameters as the previous one but with corrected input data. This is not implementing the anomalies in biologging fix.

DK_1

IBI_MULTIYEAR_PHY_005_002

latitude : 40,56- longitude -13,5

Markov autoregression

This generation is focusing only on fixing the issues observed from tags in region of dunkerque, which are subject to heat spikes anomalies. To correct this issue, a spike detection algorithm was implemented using a markovautoregression algorithm.

DK_2

IBI_MULTIYEAR_PHY_005_002

latitude : 40,56- longitude -13,5

Diff

This generation is using another techinque since markovautoregression was not detecting all the spikes.

DK_3

IBI_MULTIYEAR_PHY_005_002

latitude : 40,56- longitude -13,5

Diff

This generation is using a lower value of maximum speed, this contains only the results from the ones that has already failed before.

DK_final

IBI_MULTIYEAR_PHY_005_002

latitude : 40,56- longitude -13,5

Diff

In this folder, there is the data for all the DK tags. The corrected ones that from DK_3 and the ones that were already correct from the beginning.

In the table above, a wrong track is considered as a track that presents incoherence or where the algorithm that estimates the fish speeds has not converged. In a nutshell, these are the tracks we know for sure that they are wrong, but they might be others that are wrong too.

# Import necessary libraries and modules.
import holoviews as hv
import hvplot.xarray  # noqa
import movingpandas as mpd
import pandas as pd
import panel as pn
import s3fs
import xarray as xr
from pangeo_fish import visualization
from pangeo_fish.io import open_tag, read_trajectories
from pangeo_fish.tags import to_time_slice

Parameters Explanation#

This section of the code is responsible for setting up the necessary parameters and configurations to access and process the data for analysis. Below is an explanation of each parameter:

  1. S3 Filesystem Setup (s3):

    • This configures access to an S3-like storage system (OVH cloud in this case). It sets up authentication and defines the endpoint URL to access the data.

  2. generation_name:

    • This variable defines the folder name where the results are stored. You can update this to change the dataset being accessed.

  3. remote_path:

    • The base path to the folder where the tags are stored in the S3 bucket. It points to the “bargip” subdirectory under the “gfts-ifremer/tags” folder.

  4. tag_list_ and tag_list:

    • These variables list all available tags within the specified folder (determined by generation_name). The tags are cleaned to only contain the relevant part of the path.

  5. cloud_root:

    • Specifies the root URL for tag data stored in the cloud (S3). This is the base location where all files for the analysis are stored.

  6. tag_root:

    • Defines the root URL where the cleaned tag data, used for computation, is located. This is derived from cloud_root and the “cleaned” folder.

  7. scratch_root:

    • Specifies the directory where the GFTS computation data is stored. It combines the cloud_root with the folder for the current generation of tracks.

  8. storage_options:

    • Contains the options used to configure the storage system.

  9. bbox (Bounding Box):

    • Defines the geographical region (latitude and longitude range) for which the analysis is focused. The values here cover a region in the Atlantic Ocean.

  10. track_modes:

    • Specifies the two types of tracks that have been computed for GFTS: “mean” and “mode”. These represent different methods of track analysis for the data.

Each of these parameters sets up essential parts of the data access and storage for running the analysis on the fish tracking data.

## Update following with each expeirment you will examine

# The name of experiment
remote_path = "gfts-ifremer/tags/bargip"

# The name of the folder where the results are stored
generation_name = "tracks_4"

# bbox, bounding box, defines the latitude and longitude range for the analysis area.
bbox = {"latitude": [40, 56], "longitude": [-13, 5]}


## Parameters to access data, in GFTS these informations are static

# tramodes are the two types of track that have been computed for GFTS.
track_modes = ["mean", "mode"]

cloud_root = f"s3://{remote_path}"

# tag_root specifies the root URL for tag data used for this computation.
tag_root = f"{cloud_root}/cleaned"


pn.extension()
s3 = s3fs.S3FileSystem(
    anon=False,
    client_kwargs={
        "endpoint_url": "https://s3.gra.perf.cloud.ovh.net",
    },
)


# storage_options specifies options for the filesystem storing and/or opening output files.
storage_options = {
    "anon": False,
    # 'profile' : "gfts",
    "client_kwargs": {
        "endpoint_url": "https://s3.gra.perf.cloud.ovh.net",
        "region_name": "gra",
    },
}


# Tag list is the list of available tags

tag_list_ = s3.ls(f"{remote_path}/{generation_name}")
tag_list = [
    tag.replace(f"{remote_path}/{generation_name}/", "")
    for tag in tag_list_
    if tag.replace(f"{remote_path}/{generation_name}/", "")
]


# scratch_root specifies the root directory where are GFTS computation data stored.
scratch_root = f"{cloud_root}/{generation_name}"

Plotting functions#

Here are short descriptions of what each function plots:

  1. plot_time_series:

    • Plots a time series of temperature, depth, and the fish’s latitude and longitude over time. It visualizes how these parameters change throughout the tracking period for a given tag ID.

  2. plot_track:

    • Plots the movement track of a fish (mean and mode trajectories) on a map, color-coded by month. It shows the fish’s path over time for a given tag ID.

  3. plot_emission:

    • Plots the emission probability and states on a map for a specific time range. It compares these two datasets to visualize the fish’s possible states and their corresponding emissions.

# pn.cache stores all the plot outputs to avoid doing the computation every time. Might need to disable if manipulating a wide amount of files.
# @pn.cache


# Functions to plot the different visualization for a given tag id
def plot_time_series(tag_id="CB_A11071"):
    # load trajectories
    trajectories = read_trajectories(
        track_modes, f"{scratch_root}/{tag_id}", storage_options, format="parquet"
    )

    # Converting the trajectories to pandas DataFrames to access data easily
    mean_df = trajectories.trajectories[0].df
    mode_df = trajectories.trajectories[1].df

    tag = open_tag(tag_root, tag_id)
    time_slice = to_time_slice(tag["tagging_events/time"])

    time = tag["dst"].ds.time
    cond = (time <= time_slice.stop) & (time >= time_slice.start)

    tag_log = tag["dst"].ds.where(cond, drop=True)

    min_ = tag_log.time[0]
    max_ = tag_log.time[-1]

    time_slice = slice(min_.data, max_.data)

    tag_log = tag["dst"].ds.sel(time=time_slice)

    # Creating pandas series for xarrray dataset
    mean_lon_ = pd.Series(mean_df.geometry.x, name="longitude")
    mean_lat_ = pd.Series(mean_df.geometry.y, name="latitude")
    mode_lon_ = pd.Series(mode_df.geometry.x, name="longitude")
    mode_lat_ = pd.Series(mode_df.geometry.y, name="latitude")

    # Creating xarray datasets
    mean_coords = xr.Dataset(pd.concat([mean_lon_, mean_lat_], axis=1))
    mode_coords = xr.Dataset(pd.concat([mode_lon_, mode_lat_], axis=1))

    # Assigning dataarrays to variables
    mean_lon = mean_coords["longitude"]
    mean_lat = mean_coords["latitude"]
    mode_lon = mode_coords["longitude"]
    mode_lat = mode_coords["latitude"]

    tag_log["depth"] = tag_log["pressure"]
    temp_plot = tag_log["temperature"].hvplot(
        color="Red", title="Temperature (°C)", grid=True, height=200, width=600
    )
    depth_plot = (-tag_log["depth"]).hvplot(
        color="Blue", title="Depth (m)", grid=True, height=200, width=600
    )
    lon_plot = (
        mean_lat.hvplot(label="mean", clim=[mean_lat_.min(), mean_lat_.max()])
        * mode_lat.hvplot(label="mode", clim=[mode_lat_.min(), mean_lat_.max()])
    ).opts(height=200, width=600, show_grid=True, title="Fish latitude over time")
    lat_plot = (
        mean_lon.hvplot(label="mean", clim=[mean_lon_.min(), mean_lat_.max()])
        * mode_lon.hvplot(label="mode", clim=[mode_lon_.min(), mean_lat_.max()])
    ).opts(height=200, width=600, show_grid=True, title="Fish longitude over time")

    return (temp_plot + depth_plot + lon_plot + lat_plot).cols(1)


def plot_track(tag_id="CB_A11071"):
    sigma = pd.read_json(f"{scratch_root}/{tag_id}/parameters.json").to_dict()[0][
        "sigma"
    ]
    trajectories = read_trajectories(
        track_modes, f"{scratch_root}/{tag_id}", storage_options, format="parquet"
    )

    # Converting the trajectories to pandas DataFrames to access data easily
    mean_df = trajectories.trajectories[0].df
    mode_df = trajectories.trajectories[1].df

    # Adding month data
    mean_df["month"] = mean_df.index.month
    mode_df["month"] = mode_df.index.month

    # Converting back to trajectories
    mean_traj = mpd.Trajectory(
        mean_df, traj_id=mean_df.traj_id.drop_duplicates().values[0]
    )
    mode_traj = mpd.Trajectory(
        mode_df, traj_id=mode_df.traj_id.drop_duplicates().values[0]
    )
    trajectories = mpd.TrajectoryCollection([mean_traj, mode_traj])

    traj_plots = [
        traj.hvplot(
            c="month",
            tiles="CartoLight",
            cmap="rainbow",
            title=f"{tag_id} , {traj.id}, {sigma}",
            width=375,
            height=375,
        )
        for traj in trajectories.trajectories
    ]

    return hv.Layout(traj_plots).cols(1)


def plot_emission(tag_id="CB_A11071"):
    ## Might not work if dask involved or slider involved, I have to test
    emission = xr.open_dataset(
        f"{scratch_root}/{tag_id}/combined.zarr",
        engine="zarr",
        chunks={},
        inline_array=True,
        storage_options=storage_options,
    ).rename_vars({"pdf": "emission"})

    states = xr.open_dataset(
        f"{scratch_root}/{tag_id}/states.zarr",
        engine="zarr",
        chunks={},
        inline_array=True,
        storage_options=storage_options,
    ).where(emission["mask"])

    data = xr.merge([states, emission.drop_vars(["mask"])])
    plot1 = visualization.plot_map(
        data["states"].sel(time=slice("2015-09-04", "2015-09-10")), bbox, cmap="cool"
    ).opts(height=350, width=600)
    plot2 = visualization.plot_map(
        data["emission"].sel(time=slice("2015-09-04", "2015-09-10")), bbox, cmap="cool"
    ).opts(height=350, width=600)
    plot = hv.Layout([plot1, plot2]).cols(1)

    return plot
# Panel parameters
value = tag_list[0]
# Initalizing the widget for tag selection
tag_widget = pn.widgets.Select(name="tag_id", value=value, options=tag_list)

# Binding widget with the plots

time_plot = pn.bind(plot_time_series, tag_id=tag_widget)
track_plot = pn.bind(plot_track, tag_id=tag_widget)
# Commenting emission because it's too long to load panel
# emission_plot = pn.bind(plot_emission,tag_id=tag_widget)
track_emission = pn.Row(time_plot, track_plot)

# Combining plots with the widget
plots = pn.Row(tag_widget, track_emission)

pn.template.FastListTemplate(
    site="Tag data display",
    title="plots",
    main=plots,
).servable();
plots