TY - JOUR
T1 - Generalizability evaluations of heterogeneous ensembles for river health predictions
AU - Park, Taeseung
AU - Shin, Jihoon
AU - Park, Baekyung
AU - Moon, Jeongsuk
AU - Cha, Yoon Kyung
N1 - Publisher Copyright:
© 2024 The Authors
PY - 2024/9
Y1 - 2024/9
N2 - Predictive models leverage the relationships between environmental factors and river health to predict the river health at unmonitored sites. Such models should be generalizable to unseen data. Among various machine learning models, heterogeneous ensembles are known to be generalizable owing to their structural diversity. The present study compares the generalizability of heterogeneous ensembles with those of homogeneous ensembles and single models. The models classified five grades (very good to very poor) of river health indices (RHIs) for three taxa (benthic macroinvertebrates, fish, and diatoms) given various environmental factors (water quality, hydrology, meteorological, land cover, and stream properties) as inputs. The data were monitored at 2915 sites in the four major river watersheds in South Korea during the 2016–2021 period. The results indicated better generalizability of the heterogeneous and homogeneous ensembles than single models. Moreover, heterogeneous ensembles tended to show higher generalizability than homogeneous ensembles, although the differences were marginal. Weighted soft voting was the most generalizable of the heterogeneous ensembles, with losses ranging from 0.49 to 0.59 across the three taxa. Weighted soft voting also delivered acceptable classification performance on the test set, with accuracies ranging from 0.42 to 0.52 across the taxa. The relative contributions of the environmental factors to RHI predictions and the directions of their effects agreed with established knowledge, confirming the reliability of the predictions. However, as heterogeneous ensembles have been rarely applied to RHI prediction, the extent to which heterogeneous ensembles improve the generalizability of prediction must be investigated in future studies.
AB - Predictive models leverage the relationships between environmental factors and river health to predict the river health at unmonitored sites. Such models should be generalizable to unseen data. Among various machine learning models, heterogeneous ensembles are known to be generalizable owing to their structural diversity. The present study compares the generalizability of heterogeneous ensembles with those of homogeneous ensembles and single models. The models classified five grades (very good to very poor) of river health indices (RHIs) for three taxa (benthic macroinvertebrates, fish, and diatoms) given various environmental factors (water quality, hydrology, meteorological, land cover, and stream properties) as inputs. The data were monitored at 2915 sites in the four major river watersheds in South Korea during the 2016–2021 period. The results indicated better generalizability of the heterogeneous and homogeneous ensembles than single models. Moreover, heterogeneous ensembles tended to show higher generalizability than homogeneous ensembles, although the differences were marginal. Weighted soft voting was the most generalizable of the heterogeneous ensembles, with losses ranging from 0.49 to 0.59 across the three taxa. Weighted soft voting also delivered acceptable classification performance on the test set, with accuracies ranging from 0.42 to 0.52 across the taxa. The relative contributions of the environmental factors to RHI predictions and the directions of their effects agreed with established knowledge, confirming the reliability of the predictions. However, as heterogeneous ensembles have been rarely applied to RHI prediction, the extent to which heterogeneous ensembles improve the generalizability of prediction must be investigated in future studies.
KW - Bias–variance decomposition
KW - Generalizability
KW - Heterogenous ensembles
KW - Multimetric biological indices
KW - River health
KW - SHapley additive exPlanations
UR - http://www.scopus.com/inward/record.url?scp=85198501439&partnerID=8YFLogxK
U2 - 10.1016/j.ecoinf.2024.102719
DO - 10.1016/j.ecoinf.2024.102719
M3 - Article
AN - SCOPUS:85198501439
SN - 1574-9541
VL - 82
JO - Ecological Informatics
JF - Ecological Informatics
M1 - 102719
ER -