TY - JOUR
T1 - Assessing Effective Sampling Method and Sample Size for Species Distribution Modeling of Korean Red Pine (Pinus densiflora)
AU - Sung, Sun Yong
AU - Lee, Dong Kun
AU - Park, Chan
AU - Kim, Ho Gul
AU - Kil, Sung Ho
AU - Chae, Hee Mun
AU - Park, Gwan Soo
AU - Ohga, Shoji
N1 - Publisher Copyright:
© 2018 Kyushu University. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Sampling method and sample size can alter the performance of species distribution models (SDMs). In this study, we identified an effective sampling method and sample size for modeling Korean red pine (Pinus densiflora). We used 3 sampling methods (simple random sampling, stratified sampling, and area–weighted sampling), 7 different sample sizes (30, 50, 100, 200, 500, 1000, and 3000), and 8 SDMs (GLM, GAM, CTA, ANN, GBM, RF, FDA, and MAXENT). The performance of each model was evaluated using the area under the receiver operating characteristic curve. Differences among the models were validated using ANOVA. We found that the area–weighted sampling method was the most effective and stable. As sample size increased, model performance increased in the random and stratified sampling methods. However, performance became saturated as sample size exceeded 200 in the area–weighted sample due to spatial autocorrelation among samples. All models exhibited different levels of performance. The RF and GBM models exhibited the highest performance (AUC = 0.838 and 0.839, respectively), while the ANN model exhibited the lowest performance (AUC = 0.658). Therefore, sampling method and sample size should be carefully considered when selecting SDMs depending on the objective of the study.
AB - Sampling method and sample size can alter the performance of species distribution models (SDMs). In this study, we identified an effective sampling method and sample size for modeling Korean red pine (Pinus densiflora). We used 3 sampling methods (simple random sampling, stratified sampling, and area–weighted sampling), 7 different sample sizes (30, 50, 100, 200, 500, 1000, and 3000), and 8 SDMs (GLM, GAM, CTA, ANN, GBM, RF, FDA, and MAXENT). The performance of each model was evaluated using the area under the receiver operating characteristic curve. Differences among the models were validated using ANOVA. We found that the area–weighted sampling method was the most effective and stable. As sample size increased, model performance increased in the random and stratified sampling methods. However, performance became saturated as sample size exceeded 200 in the area–weighted sample due to spatial autocorrelation among samples. All models exhibited different levels of performance. The RF and GBM models exhibited the highest performance (AUC = 0.838 and 0.839, respectively), while the ANN model exhibited the lowest performance (AUC = 0.658). Therefore, sampling method and sample size should be carefully considered when selecting SDMs depending on the objective of the study.
KW - Analysis of variance
KW - Area–weighted sampling
KW - BIOMOD2
KW - Korean red pine
UR - http://www.scopus.com/inward/record.url?scp=85132656773&partnerID=8YFLogxK
U2 - 10.5109/1955384
DO - 10.5109/1955384
M3 - Article
AN - SCOPUS:85132656773
SN - 0023-6152
VL - 63
SP - 211
EP - 221
JO - Journal of the Faculty of Agriculture, Kyushu University
JF - Journal of the Faculty of Agriculture, Kyushu University
IS - 2
ER -