Machine Learning Based on Functional Principal Component Analysis to Identify Major Influential Factors of Wheat Yield

31 Pages Posted: 2 Sep 2022

See all articles by Florent Bonneu

Florent Bonneu

Avignon University

David Makowski

University of Paris-Saclay - INRAE

Julien Joly

affiliation not provided to SSRN

Denis Allard

affiliation not provided to SSRN

Multiple version iconThere are 2 versions of this paper

Abstract

Assessing the response of crop yield to year-to-year climate variability at the field scale is often done using process-based models and regression techniques. Although powerful, these tools rely on strong assumptions and can lead to substantial prediction errors. In this study, we investigate the use of a flexible machine learning algorithm combining Random Forest and Functional Principal Component Analysis, to relate field scale wheat yield to local daily climate variables. Instead of computing seasonal, monthly or any other arbitrary time-frame climate averages, climate time series are decomposed into several basis functions by Functional Principal Component Analysis in order to summarize the dynamic of key climate variables by a limited number of easy-to interpret components. Scores associated to these components are then used as inputs of a Random Forest algorithm for yield prediction. To evaluate our approach, we use a French national database including wheat yield data as well as climate and management practice data for 298 farm fields from 2011 to 2016 in four main producing regions. Depending on the regions, our approach can explain from 62% to 81% of the yield variability when both agronomic and climate variables are included, down to 56% to 81% when ignoring agronomic variables and 51% to 74% when ignoring climate variables. Based on a year-by-year cross-validation, RMSE ranges from 0.5 tha −1 to 1.8 tha −1 in non-extreme years. However, prediction error can reach 3.6 tha −1 in case of exceptional weather conditions, such as those experienced in 2016 in Northern France. We find that this new approach performs better than traditional yield forecasting techniques and that it can help agronomists to easily identify the most influential factors for yield prediction.

Keywords: Random Forest, FPCA, Yield loss, on-farm yields, Variable importance, Accumulated Local Effects

Suggested Citation

Bonneu, Florent and Makowski, David and Joly, Julien and Allard, Denis, Machine Learning Based on Functional Principal Component Analysis to Identify Major Influential Factors of Wheat Yield. Available at SSRN: https://ssrn.com/abstract=4207476 or http://dx.doi.org/10.2139/ssrn.4207476

Florent Bonneu

Avignon University ( email )

Avignon
France

David Makowski

University of Paris-Saclay - INRAE ( email )

Antony
France

Julien Joly

affiliation not provided to SSRN ( email )

No Address Available

Denis Allard (Contact Author)

affiliation not provided to SSRN ( email )

No Address Available

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
87
Abstract Views
496
Rank
493,747
PlumX Metrics