Research Article: Robust learning algorithms for capturing oceanic dynamics and transport of Noctiluca blooms using linear dynamical models

Date Published: June 13, 2019

Publisher: Public Library of Science

Author(s): Yan Yan, Tony Jebara, Ryan Abernathey, Joaquim Goes, Helga Gomes, Juan A. Añel.


The blooms of Noctiluca in the Gulf of Oman and the Arabian Sea have been intensifying in recent years, posing now a threat to regional fisheries and the long-term health of an ecosystem supporting a coastal population of nearly 120 million people. We present the results of a local-scale data analysis to investigate the onset and patterns of the Noctiluca blooms, which form annually during the winter monsoon in the Gulf of Oman and in the Arabian Sea. Our approach combines methods in physical and biological oceanography with machine learning techniques. In particular, we present a robust algorithm, the variable-length Linear Dynamic Systems (vLDS) model, that extracts the causal factors and latent dynamics at the local-scale along each individual drifter trajectory, and demonstrate its effectiveness by using it to generate predictive plots for all variables and test macroscopic scientific hypotheses. The vLDS model is a new algorithm specifically designed to analyze the irregular dataset from surface velocity drifters, in which the multivariate time series trajectories are having variable or unequal lengths. The test results provide local-scale statistical evidence to support and check the macroscopic physical and biological Oceanography hypotheses on the Noctiluca blooms; it also helps identify complementary local trajectory-scale dynamics that might not be visible or discoverable at the macroscopic scale. The vLDS model also exhibits a generalization capability (as a machine learning methodology) to investigate important causal factors and hidden dynamics associated with ocean biogeochemical processes and phenomena at the population-level and local trajectory-scale.

Partial Text

With the optimally selected latent space dimension k=11, the vLDS algorithm obtains a set of model parameters Θ:=A,C,Γ,Σ,μ0,V0 when the stopping criterion inside the Expectation-Maximization algorithm is reached. The spatial distribution of the vLDS prediction error for the chlorophyll a concentration is shown in Fig 9. Fig 10 shows the prediction results for some drifter ids in the cross-validation dataset, using Eq (1) and the expected conditional mean of the latent variables at the last iteration of the Expectation steps 4, 5, and 6 in Algorithm 2. The dark lines are the observations, and the cyan lines are the predictions. Most of the hidden dynamics of the float profiles inside the cross-validation dataset are well captured by the vLDS model. The R2 values of the drifters in Fig 10 are 0.95, 0.98, 0.98, 0.98, 0.99, respectively. We note the positive correlations among ‘chlor_a’, ‘cdm’, and ‘kd490’ in the recovered vLDS latent dynamics (cyan lines in Fig 10) at the local drifter-scale and population-level in the cross-validation dataset. The model captures this correlation with some overshooting or undershooting in certain regions. Also, ‘t865’, the aerosol optical thickness over water, turns out to be independent of the chlorophyll a concentration and other ocean profiles. Moreover, the spatial information, namely, the longitude, latitude, velocity, speed of the float, and distance to the nearest coast, is all well recovered by the vLDS model (lat and spd are not shown in Figs 10 and 11 due to space limitations.)

We have introduced a new model vLDS and showed that it offers a new local-scale trajectory-based data analysis tool to recover biogeochemical mechanisms underlying chaotic drifter trajectories that might be unobservable at the macroscopic scale or accessible only in controlled laboratory experiments. The vLDS model generates predictions that recover the causal relationship among the Noctiluca blooms, physical dispersal, and physico-chemical environments (Figs 10 and 11 and Table 3.) The model’s generalization capability also summarizes, recovers, and predicts the latent dynamics from unknown heldout testing datasets, thus inspiring confidence in our local-scale findings along drifter trajectories and macroscopic findings of pooled data. The highly correlated relationships between the ‘chlor_a’ and ‘cdm’ (colored dissolved organic matter CDOM), and between the ‘chlor_a’ and ‘kd490’ (light under the sea surface) are close to linear. The tightly correlated relationships between the ‘chlor_a’ and ‘par’ (light on the sea surface PAR), and between the ‘chlor_a’ and ‘sst4’ (sea surface temperature SST4) are nonlinear. The vLDS model does not provide evidence of a strong relationship between ‘t865’ and the latent dynamics of the Noctiluca’s growth.

The source code for the variable-length Linear Dynamical System (vLDS) method is available at: