Comparative Analysis of Dissolved Oxygen Predictions in the Yellow River Basin Using Different Environmental Predictors Based on Machine Learning
44 Pages Posted: 9 Sep 2024
Abstract
Dissolved oxygen (DO) serves as a crucial water quality indicator reflecting river health. Machine learning (ML) models have gained popularity for water quality prediction; however, their accuracy heavily depends on predictor variables. Predictor availability varies considerably, prompting the inquiry of whether easily accessible catchment attributes, when combined with ML—specifically a random forest (RF) model—can effectively predict river DO dynamics in the Yellow River Basin (YRB)? Is there a necessity to collect additional water quality data to improve model performance? To address this, we collected about 10800 monthly DO data during 2016-2022 from 135 monitoring sites in the YRB and categorized 36 predictor variables into three groups: catchment attributes and meteorology (CAM), water quality parameters (WQPs), and a combination of both (CAM + WQP). The RF models achieved satisfactory performance, with Nash–Sutcliffe efficiency exceeding 0.35 at 69%, 61%, and 73% of sites for CAM, WQP, and CAM + WQP, respectively. CAM alone outperformed WQP alone, with marginal improvement upon including WQP. Nevertheless, all models encountered difficulties with sites showing substantial DO fluctuations, indicating inherent model limitations in reproducing extreme values. Conversely, in heavily human-impacted regions like the Fen River Basin, the addition of WQP notably improved DO prediction accuracy. However, across the entire YRB, the CAM + WQP model exhibited lower performance in highly urbanized catchments. This is attributed to the fact that WQP can partially, but not fully, reflect anthropogenic activities. Further analysis revealed several factors impeding DO predictions, including unbalanced data in high-altitude watersheds, insufficient anthropogenic emission data, and a lack of water transfer information. Additionally, through feature importance and partial dependence analysis, key factors affecting DO were identified, including water and soil temperature, precipitation, and pH. DO sampling sites need to be increased in the plateau region of the YRB. Moreover, recognizing the limited enhancement offered by WQP and its spatial extrapolation constraints, acquiring additional data on anthropogenic activity may prove more beneficial in enhancing DO prediction than solely monitoring WQPs.
Keywords: Water quality, Random forest, Catchment scale, Influential factor, Predictor variables
Suggested Citation: Suggested Citation