Predicting Groundwater Withdrawals Using Machine Learning with Limited Metering Data: Assessment of Training Data Requirements

Asfaw, Dawit  Wolday; Smith, Ryan; Majumdar, Sayantan; Grote, Katherine; Fang, Bin; Wilson, Blake  B.; Lakshmi, Venkataraman; Butler, James  J.

doi:10.2139/ssrn.5177058

Download This Paper

Open PDF in Browser

Add Paper to My Library

Predicting Groundwater Withdrawals Using Machine Learning with Limited Metering Data: Assessment of Training Data Requirements

32 Pages Posted: 13 Mar 2025

See all articles by Dawit Wolday Asfaw

Katherine Grote

Missouri University of Science and Technology

Blake B. Wilson

University of Kansas - Kansas Geological Survey

James J. Butler

University of Kansas - Kansas Geological Survey

Abstract

Groundwater level declines threaten the long-term prospects of many aquifers supporting irrigated agriculture. In order to implement sustainable groundwater solutions for these systems, a time series of groundwater pumping is needed. However, metering of pumping is limited in most parts of the United States and elsewhere. Some studies have used machine learning techniques to estimate pumping in regions where metering data are abundant. However, the data quality and quantity requirements to produce a robust estimate of regional groundwater pumping are not readily available or well-studied. In many areas of the United States, 20% or fewer of high-capacity wells are metered. This study seeks to determine which parameters are most useful for predicting groundwater pumping and what quantity of data is needed. We carried out this study in a data-rich groundwater management district in the High Plains aquifer in the state of Kansas in the central United States. We built pumping prediction machine learning models using a random forest algorithm that was based on public domain remote sensing data, land surface model output, and hydrogeological variables to predict pumping for the period from 2008 – 2020. We predicted pumping at two spatial scales, point scale (individual wells) and over a 2 km by 2 km grid where data are aggregated within each grid cell. For both scales of prediction, we evaluated a combination of different training splits against a constant test set to understand the performance variability of the models. Predictions based on point-scale inputs did not sufficiently capture the variability of actual pumping measurements. But at the 2 km scale, we observed that a model trained on 10% of the total available data showed coefficient of determination (R2) values of 0.98 and 0.75 for training and testing, respectively. The total predicted volume of pumping, as well as annual variation in pumping, also matched observations within 3%. Knowledge of crop irrigation area enabled summing up predicted pumping over a grid and also reduced uncertainty of pairing individual wells to irrigated areas by aggregating spatially, and we find that summing up of estimates improved the spatial and temporal pumping estimates. These results suggest that in data-sparse regions, if 10% of all irrigation wells are metered, reasonably accurate estimates of regional irrigation pumping are possible at the 2 km by 2 km scale if the irrigated area is known. This finding has significant implications for groundwater management in regions where metering is limited.

Keywords: Groundwater pumping, irrigation, Remote sensing, Machine learning, field-scale

Suggested Citation: Suggested Citation

Asfaw, Dawit Wolday and Smith, Ryan and Majumdar, Sayantan and Grote, Katherine and Fang, Bin and Wilson, Blake B. and Lakshmi, Venkataraman and Butler, James J., Predicting Groundwater Withdrawals Using Machine Learning with Limited Metering Data: Assessment of Training Data Requirements. Available at SSRN: https://ssrn.com/abstract=5177058 or http://dx.doi.org/10.2139/ssrn.5177058