TY - JOUR
T1 - Learning hierarchical Bayesian networks to assess the interaction effects of controlling factors on spatiotemporal patterns of fecal pollution in streams
AU - Kim, Tae Ho
AU - Lee, Do Yeon
AU - Shin, Jihoon
AU - Kim, Young Woo
AU - Cha, Yoon Kyung
N1 - Publisher Copyright:
© 2021
PY - 2022/3/15
Y1 - 2022/3/15
N2 - The dynamics of fecal indicator bacteria, such as fecal coliforms (FC) in streams, are influenced by the interactions of a myriad of factors. To predict complex spatiotemporal patterns of FC in streams and assess the relative importance of numerous controlling factors, the adoption of a hierarchical Bayesian network (HBN) was proposed in this study. By introducing latent variables correlated to the observed variables into a Bayesian network, the HBN can represent causal relationships among a large set of variables with a multilevel hierarchy. The study area encompasses 215 sites across the watersheds of the four major rivers in South Korea. The monitoring data collected during the 2012–2019 period included 32 input variables pertaining to meteorology, geography, soil characteristics, land cover, urbanization index, livestock density, and point sources. As model endpoints, the exceedance probability of the FC standard concentration as well as two pollution characteristics (i.e., pollution degree and type), derived from FC load duration curves were used. The probability of exceeding an FC threshold value (200 CFU/100 mL) showed spatiotemporal variations, whereas pollution degree and type showed spatial variations that represent long-term severity and relative dominance of nonpoint and point source fecal pollution, respectively. The conceptual model was validated using structural equation modeling to develop the HBN. The results demonstrate that the HBN effectively simplified the model structure, while showing strong model performance (AUC = 0.81, accuracy = 0.74). The results of the sensitivity analysis indicate that land cover is the most important factor in predicting the probability of exceedance and pollution degree, whereas the urbanization index explains most of the variability in pollution type. Furthermore, the results of the scenario analysis suggest that the HBN provides an interpretable framework in which the interaction of controlling factors has causal relationships at different levels that can be identified and visualized.
AB - The dynamics of fecal indicator bacteria, such as fecal coliforms (FC) in streams, are influenced by the interactions of a myriad of factors. To predict complex spatiotemporal patterns of FC in streams and assess the relative importance of numerous controlling factors, the adoption of a hierarchical Bayesian network (HBN) was proposed in this study. By introducing latent variables correlated to the observed variables into a Bayesian network, the HBN can represent causal relationships among a large set of variables with a multilevel hierarchy. The study area encompasses 215 sites across the watersheds of the four major rivers in South Korea. The monitoring data collected during the 2012–2019 period included 32 input variables pertaining to meteorology, geography, soil characteristics, land cover, urbanization index, livestock density, and point sources. As model endpoints, the exceedance probability of the FC standard concentration as well as two pollution characteristics (i.e., pollution degree and type), derived from FC load duration curves were used. The probability of exceeding an FC threshold value (200 CFU/100 mL) showed spatiotemporal variations, whereas pollution degree and type showed spatial variations that represent long-term severity and relative dominance of nonpoint and point source fecal pollution, respectively. The conceptual model was validated using structural equation modeling to develop the HBN. The results demonstrate that the HBN effectively simplified the model structure, while showing strong model performance (AUC = 0.81, accuracy = 0.74). The results of the sensitivity analysis indicate that land cover is the most important factor in predicting the probability of exceedance and pollution degree, whereas the urbanization index explains most of the variability in pollution type. Furthermore, the results of the scenario analysis suggest that the HBN provides an interpretable framework in which the interaction of controlling factors has causal relationships at different levels that can be identified and visualized.
KW - Fecal coliforms
KW - Fecal indicator bacteria
KW - Hierarchical Bayesian network
KW - Land cover
KW - Load duration curve
KW - Structural equation modeling
UR - http://www.scopus.com/inward/record.url?scp=85121913340&partnerID=8YFLogxK
U2 - 10.1016/j.scitotenv.2021.152520
DO - 10.1016/j.scitotenv.2021.152520
M3 - Article
C2 - 34953848
AN - SCOPUS:85121913340
SN - 0048-9697
VL - 812
JO - Science of the Total Environment
JF - Science of the Total Environment
M1 - 152520
ER -