Prediction of Water Quality Parameters in the Paraopeba River Basin Using Remote Sensing Products and Machine Learning.
Rafael Luís Silva Dias, Ricardo Santos Silva Amorim, Demetrius David da Silva, Elpídio Inácio Fernandes-Filho, Gustavo Vieira Veloso, Ronam Henrique Fonseca Macedo
Abstract
Open AccessMonitoring surface water quality is essential for assessing water resources and identifying their quality patterns. Traditional monitoring methods, based on conventional point-sampling stations, are reliable but costly and limited in frequency and spatial coverage. These constraints hinder the ability to evaluate water quality parameters at the temporal and spatial scales required to detect the effects of extreme events on aquatic systems. Satellite imagery offers a viable complementary alternative to enhance the temporal and spatial monitoring scales of traditional assessment methods. However, limitations related to spectral, spatial, temporal, and/or radiometric resolution still pose significant challenges to prediction accuracy. This study aimed to propose a methodology for predicting optically active and inactive water quality parameters in lotic and lentic environments using remote-sensing data and machine-learning techniques. Three remote-sensing datasets were organized and evaluated: (i) data extracted from Sentinel-2 imagery; (ii) data obtained from raw PlanetScope (PS) imagery; and (iii) data from PS imagery normalized using the methodology developed by Dias. Data on water quality parameters were collected from 24 monitoring stations located along the Paraopeba River channel and the Três Marias Reservoir, covering the period from 2016 to 2023. Four machine-learning algorithms were applied to predict water quality parameters: Random Forest, k-Nearest Neighbors, Support Vector Machines with Radial Basis Function Kernel, and Cubist. Model performance was evaluated using four statistical metrics: root-mean-square error, mean absolute error, Lin's concordance correlation coefficient, and the coefficient of determination. Models based on normalized PS data achieved the best performance in parameter estimation. Additionally, decision-tree-based algorithms showed superior generalization capability, outperforming the other models tested. The proposed methodology proved suitable for this type of analysis, confirming not only the applicability of PS data but also providing relevant insights for its use in diverse environmental-monitoring applications.