Comparative Analysis of Dissolved Oxygen Predictions in the Yellow River Basin Using Different Environmental Predictors Based on Machine Learning

44 Pages Posted: 9 Sep 2024

See all articles by Lingling Liu

Lingling Liu

Chinese Research Academy of Environmental Sciences

Xiaoli Zhao

Chinese Research Academy of Environmental Sciences

Lingfeng Zhou

Chinese Research Academy of Environmental Sciences

Jiangtao Liu

Pennsylvania State University

Abstract

Dissolved oxygen (DO) serves as a crucial water quality indicator reflecting river health. Machine learning (ML) models have gained popularity for water quality prediction; however, their accuracy heavily depends on predictor variables. Predictor availability varies considerably, prompting the inquiry of whether easily accessible catchment attributes, when combined with ML—specifically a random forest (RF) model—can effectively predict river DO dynamics in the Yellow River Basin (YRB)? Is there a necessity to collect additional water quality data to improve model performance? To address this, we collected about 10800 monthly DO data during 2016-2022 from 135 monitoring sites in the YRB and categorized 36 predictor variables into three groups: catchment attributes and meteorology (CAM), water quality parameters (WQPs), and a combination of both (CAM + WQP). The RF models achieved satisfactory performance, with Nash–Sutcliffe efficiency exceeding 0.35 at 69%, 61%, and 73% of sites for CAM, WQP, and CAM + WQP, respectively. CAM alone outperformed WQP alone, with marginal improvement upon including WQP. Nevertheless, all models encountered difficulties with sites showing substantial DO fluctuations, indicating inherent model limitations in reproducing extreme values. Conversely, in heavily human-impacted regions like the Fen River Basin, the addition of WQP notably improved DO prediction accuracy. However, across the entire YRB, the CAM + WQP model exhibited lower performance in highly urbanized catchments. This is attributed to the fact that WQP can partially, but not fully, reflect anthropogenic activities. Further analysis revealed several factors impeding DO predictions, including unbalanced data in high-altitude watersheds, insufficient anthropogenic emission data, and a lack of water transfer information. Additionally, through feature importance and partial dependence analysis, key factors affecting DO were identified, including water and soil temperature, precipitation, and pH. DO sampling sites need to be increased in the plateau region of the YRB. Moreover, recognizing the limited enhancement offered by WQP and its spatial extrapolation constraints, acquiring additional data on anthropogenic activity may prove more beneficial in enhancing DO prediction than solely monitoring WQPs.

Keywords: Water quality, Random forest, Catchment scale, Influential factor, Predictor variables

Suggested Citation

Liu, Lingling and Zhao, Xiaoli and Zhou, Lingfeng and Liu, Jiangtao, Comparative Analysis of Dissolved Oxygen Predictions in the Yellow River Basin Using Different Environmental Predictors Based on Machine Learning. Available at SSRN: https://ssrn.com/abstract=4951171 or http://dx.doi.org/10.2139/ssrn.4951171

Lingling Liu

Chinese Research Academy of Environmental Sciences ( email )

China

Xiaoli Zhao (Contact Author)

Chinese Research Academy of Environmental Sciences ( email )

Lingfeng Zhou

Chinese Research Academy of Environmental Sciences ( email )

China

Jiangtao Liu

Pennsylvania State University ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
14
Abstract Views
102
PlumX Metrics