TY - JOUR
T1 - Predicting Residential Water Demand with Machine-Based Statistical Learning
AU - Lee, Dongwoo
AU - Derrible, Sybil
N1 - Publisher Copyright:
© 2019 American Society of Civil Engineers.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Predicting residential water demand is challenging because of two technical questions: (1) which data and variables should be used and (2) which modeling technique is most appropriate for high prediction accuracy. To address these issues, this article investigates 12 statistical techniques, including parametric models and machine learning (ML) models, to predict daily household water use. In addition, two data scenarios are adopted, one with only 6 variables, generally available to cities and water utilities (general scenario), and one with all 19 variables available from the Residential End-Use 2016 database (REU 2016 scenario). The results for the REU 2016 scenario indicate that ML models outperform linear models. In particular, gradient boosting regression (GBR) performs best with an Radj2 of 0.69 compared to 0.54 for linear regression. The performance gap between ML and linear models becomes even wider for the general scenario with an Radj2 of 0.60 for GBR compared to 0.33 for linear regression. The finding in this article can be useful to researchers, municipalities, and utilities seeking novel modeling techniques that can provide consistent modeling performance-i.e., high prediction accuracy-depending on data availability. Future work could include the development of new measures to increase the interpretability of ML models to better understand causal relationships between independent variables and daily household water use.
AB - Predicting residential water demand is challenging because of two technical questions: (1) which data and variables should be used and (2) which modeling technique is most appropriate for high prediction accuracy. To address these issues, this article investigates 12 statistical techniques, including parametric models and machine learning (ML) models, to predict daily household water use. In addition, two data scenarios are adopted, one with only 6 variables, generally available to cities and water utilities (general scenario), and one with all 19 variables available from the Residential End-Use 2016 database (REU 2016 scenario). The results for the REU 2016 scenario indicate that ML models outperform linear models. In particular, gradient boosting regression (GBR) performs best with an Radj2 of 0.69 compared to 0.54 for linear regression. The performance gap between ML and linear models becomes even wider for the general scenario with an Radj2 of 0.60 for GBR compared to 0.33 for linear regression. The finding in this article can be useful to researchers, municipalities, and utilities seeking novel modeling techniques that can provide consistent modeling performance-i.e., high prediction accuracy-depending on data availability. Future work could include the development of new measures to increase the interpretability of ML models to better understand causal relationships between independent variables and daily household water use.
UR - http://www.scopus.com/inward/record.url?scp=85074438667&partnerID=8YFLogxK
U2 - 10.1061/(ASCE)WR.1943-5452.0001119
DO - 10.1061/(ASCE)WR.1943-5452.0001119
M3 - Article
AN - SCOPUS:85074438667
SN - 0733-9496
VL - 146
JO - Journal of Water Resources Planning and Management - ASCE
JF - Journal of Water Resources Planning and Management - ASCE
IS - 1
M1 - 04019067
ER -