Abstract
Spatial autocorrelation is a fundamental property of spatial data, which violates the assumption of independence between training and test datasets in general cross-validation (CV). Previous studies have reported strong positive spatial autocorrelation generally leads to optimistic biases in general CV results. Spatial CV methods have been developed to address this bias, but their effectiveness remains controversial owing to their potential for excessively pessimistic estimations. This study examines the impact of spatial autocorrelation on general CV results and validates the effectiveness of spatial CV. The first simulation explores the impact of varying spatial autocorrelation levels on the general CV results. Specifically, strong and moderate positive spatial autocorrelation introduces optimistic biases, whereas weak positive or negative spatial autocorrelations have no significant impact. The second simulation shows spatial CV methods can mitigate the optimistic biases in general CV results when dealing with spatial data having strong and moderate positive spatial autocorrelations. However, the hyperparameters of spatial CV should be adjusted based on the level of spatial autocorrelation to avoid excessively pessimistic estimations.
Original language | English |
---|---|
Journal | Cartography and Geographic Information Science |
DOIs | |
State | Accepted/In press - 2024 |
Keywords
- cross-validation
- machine learning
- Spatial autocorrelation
- spatial cross-validation
- spatial data