TY - JOUR
T1 - Simultaneous feature engineering and interpretation
T2 - Forecasting harmful algal blooms using a deep learning approach
AU - Kim, Tae Ho
AU - Shin, Jihoon
AU - Lee, Do Yeon
AU - Kim, Young Woo
AU - Na, Eunhye
AU - Park, Jong hwan
AU - Lim, Chaehong
AU - Cha, Yoon Kyung
N1 - Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/5/15
Y1 - 2022/5/15
N2 - Routine monitoring for harmful algal blooms (HABs) is generally undertaken at low temporal frequency (e.g., weekly to monthly) that is unsuitable for capturing highly dynamic variations in cyanobacteria abundance. Therefore, we developed a model incorporating reverse time attention with a decay mechanism (RETAIN-D) to forecast HABs with simultaneous improvements in temporal resolution, forecasting performance, and interpretability. The usefulness of RETAIN-D in forecasting HABs was illustrated by its application to two sites located in the lower sections of the Nakdong and Yeongsan rivers, South Korea, where HABs pose a critical water quality issue. Three variations of recurrent neural network models, i.e., long short-term memory (LSTM), gated recurrent unit (GRU), and reverse time attention (RETAIN), were adopted for comparisons of performance with RETAIN-D. Input features encompassing meteorological, hydrological, environmental, and biological factors were used to forecast cyanobacteria abundance (total cyanobacteria cell counts and cell counts of dominant cyanobacteria taxa). Incorporation of a decay mechanism into the deep learning structure in RETAIN-D allowed forecasts of HABs on a high temporal resolution (daily) without manual feature engineering, increasing the usefulness of resulting forecasts for water quality and resources management. RETAIN-D yielded a high degree of accuracy (RMSE = 0.29–1.67, R2 = 0.76–0.98, MAE = 0.18–1.14, SMAPE = 9.77–87.94% for test sets; on natural log scales) across model outputs and sites, successfully capturing high variability and irregularities in the time series. RETAIN-D showed higher accuracy than RETAIN (except for comparable accuracy in forecasting Microcystis abundance at the Nakdong River site) and outperformed LSTM and GRU across all model outputs and sites. Ambient temperature had high importance in forecasting cyanobacteria abundance across all model outputs and sites, whereas the relative importance of other input features varied by the output and site. Increases in contributions with increasing irradiance, decreasing flow rates, and increasing residence time were more pronounced in summer than other seasons. Differences in the contributions of input features among different time steps (1 to 7 days prior to forecasting) were larger in the Yeongsan River site. RETAIN-D is applicable to a wide range of forecasting models that can benefit from improved temporal resolution, performance, and interpretability.
AB - Routine monitoring for harmful algal blooms (HABs) is generally undertaken at low temporal frequency (e.g., weekly to monthly) that is unsuitable for capturing highly dynamic variations in cyanobacteria abundance. Therefore, we developed a model incorporating reverse time attention with a decay mechanism (RETAIN-D) to forecast HABs with simultaneous improvements in temporal resolution, forecasting performance, and interpretability. The usefulness of RETAIN-D in forecasting HABs was illustrated by its application to two sites located in the lower sections of the Nakdong and Yeongsan rivers, South Korea, where HABs pose a critical water quality issue. Three variations of recurrent neural network models, i.e., long short-term memory (LSTM), gated recurrent unit (GRU), and reverse time attention (RETAIN), were adopted for comparisons of performance with RETAIN-D. Input features encompassing meteorological, hydrological, environmental, and biological factors were used to forecast cyanobacteria abundance (total cyanobacteria cell counts and cell counts of dominant cyanobacteria taxa). Incorporation of a decay mechanism into the deep learning structure in RETAIN-D allowed forecasts of HABs on a high temporal resolution (daily) without manual feature engineering, increasing the usefulness of resulting forecasts for water quality and resources management. RETAIN-D yielded a high degree of accuracy (RMSE = 0.29–1.67, R2 = 0.76–0.98, MAE = 0.18–1.14, SMAPE = 9.77–87.94% for test sets; on natural log scales) across model outputs and sites, successfully capturing high variability and irregularities in the time series. RETAIN-D showed higher accuracy than RETAIN (except for comparable accuracy in forecasting Microcystis abundance at the Nakdong River site) and outperformed LSTM and GRU across all model outputs and sites. Ambient temperature had high importance in forecasting cyanobacteria abundance across all model outputs and sites, whereas the relative importance of other input features varied by the output and site. Increases in contributions with increasing irradiance, decreasing flow rates, and increasing residence time were more pronounced in summer than other seasons. Differences in the contributions of input features among different time steps (1 to 7 days prior to forecasting) were larger in the Yeongsan River site. RETAIN-D is applicable to a wide range of forecasting models that can benefit from improved temporal resolution, performance, and interpretability.
KW - Cyanobacteria
KW - Decay mechanism
KW - Explainable artificial intelligence
KW - Harmful algal bloom
KW - Recurrent neural network
KW - Reverse time attention mechanism
UR - http://www.scopus.com/inward/record.url?scp=85126510349&partnerID=8YFLogxK
U2 - 10.1016/j.watres.2022.118289
DO - 10.1016/j.watres.2022.118289
M3 - Article
C2 - 35303563
AN - SCOPUS:85126510349
SN - 0043-1354
VL - 215
JO - Water Research
JF - Water Research
M1 - 118289
ER -