TY - JOUR
T1 - Inferring heterogeneous treatment effects of crashes on highway traffic
T2 - A doubly robust causal machine learning approach
AU - Li, Shuang
AU - Pu, Ziyuan
AU - Cui, Zhiyong
AU - Lee, Seunghyeon
AU - Guo, Xiucheng
AU - Ngoduy, Dong
N1 - Publisher Copyright:
© 2024 Elsevier Ltd
PY - 2024/3
Y1 - 2024/3
N2 - Accurate estimating causal effects of crashes on highway traffic is crucial for mitigating the negative impacts of crashes. Previous studies have built up a series of methods via traditional causal inference theory and machine learning methods to estimate the impacts of crashes. Since the structures and variable dimensions of traditional causal inference models are pre-defined, they can not accommodate the characteristics of individual crashes. They only can estimate the average causal effects for the crashes in certain categories, e.g., crash types, crash severity, and occurring locations. For machine learning-based algorithms, they cannot be used for causal reasoning due to their reliance on correlation rather than causation. However, considering the impacts of crashes on traffic status vary across influential factors, such as time periods and locations, heterogeneous causal effects are essential for a better understanding of the effects on traffic status and crash intervention strategy development. To address the aforementioned issues, this study proposes a novel doubly robust causal machine learning framework to infer heterogeneous treatment effects of crashes on highway traffic status. Doubly Robust Learning (DRL), integrating machine learning techniques to perform predictive tasks, is applied into the framework due to its stronger robustness. Considerning treatment predictors and colliders may bring bias in estimation results, Conditional Shapley Value Index (CSVI) is proposed for selecting confounders from numerous factors. A 3-year crah dataset collected by 3594 real highway crashes in Washington is utilized for demonstrating the designed experiments, including construting confidence intervals, estimated errors evaluation, and sensitivity analysis of variable selection for various thresholds of CSVI. According to the results, the distinctive propagation and dissipation processes of congestion caused by various types of crashes can be achieved. The results also validate the effectiveness of variable selection, and the superiority in estimation accuracy compared to the selected baseline models. Future study includes considering spatial–temporal causal relationships and predicting counterfactual real-time traffic conditions.
AB - Accurate estimating causal effects of crashes on highway traffic is crucial for mitigating the negative impacts of crashes. Previous studies have built up a series of methods via traditional causal inference theory and machine learning methods to estimate the impacts of crashes. Since the structures and variable dimensions of traditional causal inference models are pre-defined, they can not accommodate the characteristics of individual crashes. They only can estimate the average causal effects for the crashes in certain categories, e.g., crash types, crash severity, and occurring locations. For machine learning-based algorithms, they cannot be used for causal reasoning due to their reliance on correlation rather than causation. However, considering the impacts of crashes on traffic status vary across influential factors, such as time periods and locations, heterogeneous causal effects are essential for a better understanding of the effects on traffic status and crash intervention strategy development. To address the aforementioned issues, this study proposes a novel doubly robust causal machine learning framework to infer heterogeneous treatment effects of crashes on highway traffic status. Doubly Robust Learning (DRL), integrating machine learning techniques to perform predictive tasks, is applied into the framework due to its stronger robustness. Considerning treatment predictors and colliders may bring bias in estimation results, Conditional Shapley Value Index (CSVI) is proposed for selecting confounders from numerous factors. A 3-year crah dataset collected by 3594 real highway crashes in Washington is utilized for demonstrating the designed experiments, including construting confidence intervals, estimated errors evaluation, and sensitivity analysis of variable selection for various thresholds of CSVI. According to the results, the distinctive propagation and dissipation processes of congestion caused by various types of crashes can be achieved. The results also validate the effectiveness of variable selection, and the superiority in estimation accuracy compared to the selected baseline models. Future study includes considering spatial–temporal causal relationships and predicting counterfactual real-time traffic conditions.
KW - Causal machine learning
KW - Doubly robust learning
KW - Heterogeneous treatment effect
KW - Highway crashes
KW - Neyman-rubin causal model
UR - http://www.scopus.com/inward/record.url?scp=85186522737&partnerID=8YFLogxK
U2 - 10.1016/j.trc.2024.104537
DO - 10.1016/j.trc.2024.104537
M3 - Article
AN - SCOPUS:85186522737
SN - 0968-090X
VL - 160
JO - Transportation Research Part C: Emerging Technologies
JF - Transportation Research Part C: Emerging Technologies
M1 - 104537
ER -