Validity evaluation of a machine-learning model for chlorophyll a retrieval using Sentinel-2 from inland and coastal waters

Young Woo Kim, Tae Ho Kim, Jihoon Shin, Dae Seong Lee, Young Seuk Park, Yeji Kim, Yoon Kyung Cha

Research output: Contribution to journalArticlepeer-review

28 Scopus citations


The MultiSpectral Instrument (MSI) on-board Sentinel-2 provides satellite images at spatiotemporal resolutions suitable for chlorophyll a (Chla) retrieval from inland and coastal waters. Machine-learning (ML) algorithms including light gradient boosting machine (LGBM) were employed for Chl a retrieval from MSI. The study area encompasses 78 lakes and estuaries located across four major river watersheds in South Korea. Matchup data between MSI overpass and near-concurrent in situ Chl a measurements from December 2018 to April 2021 were included. The remote sensing reflectance (Rrs) values of six single spectral bands and four two-band ratios were used as the input features. Despite the difficulty in Chla estimation in optically complex waters, ML algorithms showed overall reasonable accuracy. Among the ML algorithms, LGBM exhibited the best performance (R2 = 0.75, bias = -0.15, slope = 0.73, RMSE = 15.15 mg·m-3, MAE = 9.49 mg·m-3) over a wide range of trophic states. Post-hoc interpretations of the best performing LGBM using Shapley additive explanations indicated that Rrs(7 0 4)/Rrs(6 6 5) was the most important feature, while Rrs(7 3 9)/Rrs(7 0 4) and Rrs(4 9 2)/Rrs(5 6 0) played auxiliary roles in Chl a retrieval through interaction with Rrs(7 0 4)/Rrs(6 6 5). Among-lake spatial variations of Chla were explained by percent forest and agricultural area within the buffer zone at multiple scales (buffer widths of 50 m and 500 m). The associations between the modeled Chla and buffer land cover types, that is, increase in Chla concentration with increase in percent forest and decrease in percent agricultural area, were consistent with the established ecological knowledge. Overall, the model interpretations and spatial variations in Chla within and among lakes confirmed the validity of LGBM for retrieving MSI-derived Chla from lakes and estuaries. Our study can serve as the reference for evaluating the validity of complex ML models for inland water remote sensing.

Original languageEnglish
Article number108737
JournalEcological Indicators
StatePublished - Apr 2022


  • Chlorophyll a
  • Inland and coastal waters
  • Land cover
  • MSI on-board Sentinel-2
  • Machine learning
  • Multiscale


Dive into the research topics of 'Validity evaluation of a machine-learning model for chlorophyll a retrieval using Sentinel-2 from inland and coastal waters'. Together they form a unique fingerprint.

Cite this