TY - JOUR
T1 - Clustering micropollutants and estimating rate constants of sorption and biodegradation using machine learning approaches
AU - Lim, Seung Ji
AU - Seo, Jangwon
AU - Seid, Mingizem Gashaw
AU - Lee, Jiho
AU - Ejerssa, Wondesen Workneh
AU - Lee, Doo Hee
AU - Jeong, Eunhoo
AU - Chae, Sung Ho
AU - Lee, Yunho
AU - Son, Moon
AU - Hong, Seok Won
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - Effluent from wastewater treatment plants is considered an important source of micropollutants (MPs) in aquatic environments. However, monitoring MPs in effluents is often inefficient owing to the variety in their types. Thus, this study derived marker constituents to estimate the behavior of MPs in each cluster using the self-organizing map (SOM), a machine learning-based clustering analysis method. In SOM analysis, the physicochemical properties, functional groups, and the initial biotransformation rules of 29 out 42 MPs were used to ultimately estimate the degradation rate constants of 13 MPs. Consequently, when the physicochemical properties and functional groups were considered, SOM analysis showed outstanding performance to label MPs with an accuracy value of 0.75 for each aerobic and anoxic condition. Based on the clustering results, 11 MPs were determined to be marker constituents under each aerobic and anoxic condition. Moreover, an estimation method for the rate constants of unlabeled MPs was successfully developed using the identified markers with the random forest classifier. The proposed algorithm could estimate both sorption and biotransformation of MPs regardless of dominant removal mechanisms, whether the MPs were removed by sorption or biotransformation. An accuracy of 0.77 was calculated for estimating rate constants under both aerobic and anoxic conditions, which is remarkably higher than those reported previously. The proposed procedure could be extended further to efficiently monitor MPs in effluents.
AB - Effluent from wastewater treatment plants is considered an important source of micropollutants (MPs) in aquatic environments. However, monitoring MPs in effluents is often inefficient owing to the variety in their types. Thus, this study derived marker constituents to estimate the behavior of MPs in each cluster using the self-organizing map (SOM), a machine learning-based clustering analysis method. In SOM analysis, the physicochemical properties, functional groups, and the initial biotransformation rules of 29 out 42 MPs were used to ultimately estimate the degradation rate constants of 13 MPs. Consequently, when the physicochemical properties and functional groups were considered, SOM analysis showed outstanding performance to label MPs with an accuracy value of 0.75 for each aerobic and anoxic condition. Based on the clustering results, 11 MPs were determined to be marker constituents under each aerobic and anoxic condition. Moreover, an estimation method for the rate constants of unlabeled MPs was successfully developed using the identified markers with the random forest classifier. The proposed algorithm could estimate both sorption and biotransformation of MPs regardless of dominant removal mechanisms, whether the MPs were removed by sorption or biotransformation. An accuracy of 0.77 was calculated for estimating rate constants under both aerobic and anoxic conditions, which is remarkably higher than those reported previously. The proposed procedure could be extended further to efficiently monitor MPs in effluents.
UR - https://www.scopus.com/pages/publications/85175204340
U2 - 10.1038/s41545-023-00282-6
DO - 10.1038/s41545-023-00282-6
M3 - Article
AN - SCOPUS:85175204340
SN - 2059-7037
VL - 6
JO - npj Clean Water
JF - npj Clean Water
IS - 1
M1 - 69
ER -