The implications of Simpson's paradox for cross-scale inference among lakes

Song S. Qian, Craig A. Stow, Farnaz Nojavan A., Joseph Stachelek, Yoonkyung Cha, Ibrahim Alameddine, Patricia Soranno

Research output: Contribution to journalArticlepeer-review

16 Scopus citations


Using cross-sectional data for making ecological inference started as a practical means of pooling data to enable meaningful empirical model development. For example, limnologists routinely use sample averages from numerous individual lakes to examine patterns across lakes. The basic assumption behind the use of cross-lake data is often that responses within and across lakes are identical. As data from multiple study units across a wide spatiotemporal scale are increasingly accessible for researchers, an assessment of this assumption is now feasible. In this study, we demonstrate that this assumption is usually unjustified, due largely to a statistical phenomenon known as the Simpson's paradox. Through comparisons of a commonly used empirical model of the effect of nutrients on algal growth developed using several data sets, we discuss the cognitive importance of distinguishing factors affecting lake eutrophication operating at different spatial and temporal scales. Our study proposes the use of the Bayesian hierarchical modeling approach to properly structure the data analysis when data from multiple lakes are employed.

Original languageEnglish
Article number114855
JournalWater Research
StatePublished - 15 Oct 2019


  • Chlorophyll a
  • Multilevel/hierarchical model
  • NLA


Dive into the research topics of 'The implications of Simpson's paradox for cross-scale inference among lakes'. Together they form a unique fingerprint.

Cite this