Machine Learning Based on Functional Principal Component Analysis to Identify Major Influential Factors of Wheat Yield
31 Pages Posted: 2 Sep 2022
There are 2 versions of this paper
Machine Learning Based on Functional Principal Component Analysis to Identify Major Influential Factors of Wheat Yield
Machine Learning Based on Functional Principal Component Analysis to Identify Major Influential Factors of Wheat Yield
Abstract
Assessing the response of crop yield to year-to-year climate variability at the field scale is often done using process-based models and regression techniques. Although powerful, these tools rely on strong assumptions and can lead to substantial prediction errors. In this study, we investigate the use of a flexible machine learning algorithm combining Random Forest and Functional Principal Component Analysis, to relate field scale wheat yield to local daily climate variables. Instead of computing seasonal, monthly or any other arbitrary time-frame climate averages, climate time series are decomposed into several basis functions by Functional Principal Component Analysis in order to summarize the dynamic of key climate variables by a limited number of easy-to interpret components. Scores associated to these components are then used as inputs of a Random Forest algorithm for yield prediction. To evaluate our approach, we use a French national database including wheat yield data as well as climate and management practice data for 298 farm fields from 2011 to 2016 in four main producing regions. Depending on the regions, our approach can explain from 62% to 81% of the yield variability when both agronomic and climate variables are included, down to 56% to 81% when ignoring agronomic variables and 51% to 74% when ignoring climate variables. Based on a year-by-year cross-validation, RMSE ranges from 0.5 tha −1 to 1.8 tha −1 in non-extreme years. However, prediction error can reach 3.6 tha −1 in case of exceptional weather conditions, such as those experienced in 2016 in Northern France. We find that this new approach performs better than traditional yield forecasting techniques and that it can help agronomists to easily identify the most influential factors for yield prediction.
Keywords: Random Forest, FPCA, Yield loss, on-farm yields, Variable importance, Accumulated Local Effects
Suggested Citation: Suggested Citation