Comparative Analysis of Dissolved Oxygen Predictions in the Yellow River Basin Using Different Environmental Predictors Based on Machine Learning
33 Pages Posted: 18 May 2024
Abstract
Dissolved oxygen (DO) serves as a crucial water quality indicator reflecting river health. Machine learning (ML) models have gained popularity for water quality prediction; however, their accuracy heavily depends on predictor variables. Predictor availability varies considerably, prompting the inquiry of whether easily accessible catchment attributes, when combined with ML—specifically a random forest (RF) model—can effectively predict river DO dynamics in the Yellow River Basin (YRB). Is there a necessity to collect additional water quality data to improve model performance? To address this, we collected ~11,500 monthly DO data from 135 monitoring sites in the YRB from 2016-2022 and categorized predictor variables into three groups: catchment attributes and meteorology (CAM), water quality parameters (WQPs), and a combination of both (CAM + WQP). The RF models achieved satisfactory performance, with Nash–Sutcliffe efficiency exceeding 0.35 at 68.38%, 61.03%, and 72.06% of sites for CAM, WQP, and CAM + WQP, respectively. CAM alone outperformed WQP alone, with marginal improvement upon including WQP. Nevertheless, all models encountered difficulties with sites showing substantial DO fluctuations, indicating inherent model limitations in reproducing extreme values. Conversely, in heavily human-impacted regions like the Fen River Basin, the addition of WQP notably improved DO prediction accuracy, attributed to the partial reflection of anthropogenic activities through WQP. Despite this, both WQP and CAM + WQP models exhibited diminished performance in highly urbanized areas, implying that WQP inadequately captures human impacts. Further analysis revealed several factors impeding DO predictions, including unbalanced data in high-altitude watersheds, insufficient anthropogenic emission data, and a lack of water transfer information. Additionally, through feature importance and partial dependence analysis, key factors affecting DO were identified, including water and soil temperature, precipitation, and pH. Our findings underscore the need for additional DO sampling sites in the plateau region of the YRB. Moreover, recognizing the limited enhancement offered by WQP and its spatial extrapolation constraints, acquiring additional data on anthropogenic activity may prove more beneficial in enhancing DO prediction than solely monitoring WQPs.
Keywords: Dissolved oxygen, Random forest, Yellow River, Influential factor, Predictor variables
Suggested Citation: Suggested Citation