TY - JOUR
T1 - An interpretable machine learning method for supporting ecosystem management
T2 - Application to species distribution models of freshwater macroinvertebrates
AU - Cha, Yoon Kyung
AU - Shin, Jihoon
AU - Go, Byeong Geon
AU - Lee, Dae Seong
AU - Kim, Young Woo
AU - Kim, Tae Ho
AU - Park, Young Seuk
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2021/8/1
Y1 - 2021/8/1
N2 - Species distribution models (SDMs), in which species occurrences are related to a suite of environmental variables, have been used as a decision-making tool in ecosystem management. Complex machine learning (ML) algorithms that lack interpretability may hinder the use of SDMs for ecological explanations, possibly limiting the role of SDMs as a decision-support tool. To meet the growing demand of explainable MLs, several interpretable ML methods have recently been proposed. Among these methods, SHaply Additive exPlanation (SHAP) has drawn attention for its robust theoretical justification and analytical gains. In this study, the utility of SHAP was demonstrated by the application of SDMs of four benthic macroinvertebrate species. In addition to species responses, the dataset contained 22 environmental variables monitored at 436 sites across five major rivers of South Korea. A range of ML algorithms was employed for model development. Each ML model was trained and optimized using 10-fold cross-validation. Model evaluation based on the test dataset indicated strong model performance, with an accuracy of ≥0.7 in all evaluation metrics for all MLs and species. However, only the random forest algorithm showed a behavior consistent with the known ecology of the investigated species. SHAP presents an integrated framework in which local interpretations that incorporate local interaction effects are combined to represent the global model structure. Consequently, this framework offered a novel opportunity to assess the importance of variables in predicting species occurrence, not only across sites, but also for individual sites. Furthermore, removing interaction effects from variable importance values (SHAP values) clearly revealed non-linear species responses to variations in environmental variables, indicating the existence of ecological thresholds. This study provides guidelines for the use of a new interpretable method supporting ecosystem management.
AB - Species distribution models (SDMs), in which species occurrences are related to a suite of environmental variables, have been used as a decision-making tool in ecosystem management. Complex machine learning (ML) algorithms that lack interpretability may hinder the use of SDMs for ecological explanations, possibly limiting the role of SDMs as a decision-support tool. To meet the growing demand of explainable MLs, several interpretable ML methods have recently been proposed. Among these methods, SHaply Additive exPlanation (SHAP) has drawn attention for its robust theoretical justification and analytical gains. In this study, the utility of SHAP was demonstrated by the application of SDMs of four benthic macroinvertebrate species. In addition to species responses, the dataset contained 22 environmental variables monitored at 436 sites across five major rivers of South Korea. A range of ML algorithms was employed for model development. Each ML model was trained and optimized using 10-fold cross-validation. Model evaluation based on the test dataset indicated strong model performance, with an accuracy of ≥0.7 in all evaluation metrics for all MLs and species. However, only the random forest algorithm showed a behavior consistent with the known ecology of the investigated species. SHAP presents an integrated framework in which local interpretations that incorporate local interaction effects are combined to represent the global model structure. Consequently, this framework offered a novel opportunity to assess the importance of variables in predicting species occurrence, not only across sites, but also for individual sites. Furthermore, removing interaction effects from variable importance values (SHAP values) clearly revealed non-linear species responses to variations in environmental variables, indicating the existence of ecological thresholds. This study provides guidelines for the use of a new interpretable method supporting ecosystem management.
KW - EPT taxa
KW - Interpretable machine learning
KW - Macroinvertebrate
KW - SHAP
KW - Species distribution model
KW - Tree-based model
UR - http://www.scopus.com/inward/record.url?scp=85104952974&partnerID=8YFLogxK
U2 - 10.1016/j.jenvman.2021.112719
DO - 10.1016/j.jenvman.2021.112719
M3 - Article
C2 - 33946026
AN - SCOPUS:85104952974
SN - 0301-4797
VL - 291
JO - Journal of Environmental Management
JF - Journal of Environmental Management
M1 - 112719
ER -