An Investigation of Microbial Groundwater Contamination Seasonality and Extreme Weather Event Interruptions Using “Big Data”, Time-Series Analyses, and Unsupervised Machine Learning

42 Pages Posted: 10 Dec 2024

See all articles by Ioan Petculescu

Ioan Petculescu

affiliation not provided to SSRN

R. Stephen Brown

affiliation not provided to SSRN

Kevin McDermott

Public Health Ontario

Anna Majury

Queen's University - Department of Biomedical and Molecular Sciences; Queen's University - Department of Public Health Sciences; University of Toronto - Department of Laboratory Medicine and Pathobiology

Paul Hynds

University College Dublin- Irish Centre for Research in Applied Geosciences

Abstract

Temporal studies of groundwater potability have historically focused on E. coli detection rates, with non-E. coli coliforms (NEC) and microbial concentrations remaining understudied by comparison. Additionally, “big data” (i.e., large, diverse datasets that grow over time) have yet to be employed for assessing the effects of low return-period extreme weather events on groundwater quality. The current investigation employed ≈1.1 million Ontarian private well samples collected between 2010 and 2021, seeking to address these knowledge gaps via applying time-series decomposition, interrupted time-series analysis (ITSA), and unsupervised machine learning to five microbial contamination parameters: E. coli and NEC concentrations (CFU/100 mL) and detection rates (%), and the calculated NEC:E. coli ratio. Time-series decompositions revealed E. coli concentrations and the NEC:E. coli ratio as complementary metrics, with concurrent interpretation of their seasonal signals indicating that localized contamination mechanisms dominate during winter months. ITSA findings highlighted the importance of hydrogeological time lags: for example, a significant E. coli detection rate increase (2.4% vs 1.8%, p = 0.02) was identified 12 weeks after the May 2017 flood event. Unsupervised machine learning spatially classified annual contamination cycles across Ontarian subregions (n = 27), with the highest inter-cluster variability identified among E. coli detection rates and the lowest among NEC detection rates and the NEC:E. coli ratio. Given the spatiotemporal consistency identified for NEC and the NEC:E. coli ratio, associated interpretations and recommendations are likely transferable across large, heterogeneous regions. The presented study may serve as a methodological blueprint for future temporal investigations employing “big” groundwater quality data.

Keywords: Seasonal decomposition, interrupted time-series, Machine learning, private wells, total coliforms, E. coli

Suggested Citation

Petculescu, Ioan and Brown, R. Stephen and McDermott, Kevin and Majury, Anna and Hynds, Paul, An Investigation of Microbial Groundwater Contamination Seasonality and Extreme Weather Event Interruptions Using “Big Data”, Time-Series Analyses, and Unsupervised Machine Learning. Available at SSRN: https://ssrn.com/abstract=5049890 or http://dx.doi.org/10.2139/ssrn.5049890

Ioan Petculescu

affiliation not provided to SSRN ( email )

R. Stephen Brown

affiliation not provided to SSRN ( email )

Kevin McDermott

Public Health Ontario ( email )

Kingston
Canada

Anna Majury

Queen's University - Department of Biomedical and Molecular Sciences ( email )

Kingston, Ontario K7L 3N6
Canada

Queen's University - Department of Public Health Sciences ( email )

Kingston, Ontario
Canada

University of Toronto - Department of Laboratory Medicine and Pathobiology ( email )

Toronto, Ontario
Canada

Paul Hynds (Contact Author)

University College Dublin- Irish Centre for Research in Applied Geosciences ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
19
Abstract Views
101
PlumX Metrics