Machine learning-based prediction of harmful algal blooms in water supply reservoirs

Bongseok Jeong, Maria Renee Chapeta, Mingu Kim, Jinho Kim, Jihoon Shin, Yoon Kyung Cha

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Harmful algal blooms (HABs) pose a potential risk to human and ecosystem health. HAB occurrences are influenced by numerous environmental factors; thus, accurate predictions of HABs and explanations about the predictions are required to implement preventive water quality management. In this study, machine learning (ML) algorithms, i.e., random forest (RF) and extreme gradient boosting (XGB), were employed to predict HABs in eight water supply reservoirs in South Korea. The use of synthetic minority oversampling technique for addressing imbalanced HAB occurrences improved classification performance of the ML algorithms. Although RF and XGB resulted in marginal performance differences, XGB exhibited more stable performance in the presence of data imbalance. Furthermore, a post hoc explanation technique, Shapley additive explanation was employed to estimate relative feature importance. Among the input features, water temperature and concentrations of total nitrogen and total phosphorus appeared important in predicting HAB occurrences. The results suggest that the use of ML algorithms along with explanation methods increase the usefulness of predictive models as a decision-making tool for water quality management.

Original languageEnglish
Pages (from-to)304-318
Number of pages15
JournalWater Quality Research Journal
Volume57
Issue number4
DOIs
StatePublished - 1 Nov 2022

Keywords

  • SHAP
  • cyanobacteria bloom
  • data imbalance
  • feature importance
  • machine learning
  • water quality management

Fingerprint

Dive into the research topics of 'Machine learning-based prediction of harmful algal blooms in water supply reservoirs'. Together they form a unique fingerprint.

Cite this