Synthetic Data at Scale: A Paradigm to Efficiently Leverage Machine Learning in Agriculture

44 Pages Posted: 9 Jan 2023

See all articles by Jonathan Klein

Jonathan Klein

King Abdullah University of Science and Technology (KAUST)

Rebekah E. Waller

King Abdullah University of Science and Technology (KAUST)

Sören Pirk

Adobe Research

Wojtek Pałubicki

Adam Mickiewicz University

Mark Tester

King Abdullah University of Science and Technology (KAUST)

Dominik Michels

King Abdullah University of Science and Technology (KAUST) - Department of Computer, Electrical and Mathematical Sciences & Engineering

Abstract

The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms has been one of the most exciting developments in agriculture within the last decade. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this contribution, we present a paradigm for the iterative cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. In particular, a binary classifier is developed to distinguish between healthy and infected tomato plants based on photographs taken by an unmanned aerial vehicle (UAV) in a greenhouse complex. The classifier is trained by exclusively using synthetic images which are generated iteratively to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. We find that our approach leads to a more cost efficient use of ML-aided computer vision tasks in agriculture.

Keywords: Artificial intelligence, Data Generation and Annotation, machine learning, Synthetic Data, Tomato Plants (Solanum lycopersicum)

Suggested Citation

Klein, Jonathan and Waller, Rebekah E. and Pirk, Sören and Pałubicki, Wojtek and Tester, Mark and Michels, Dominik, Synthetic Data at Scale: A Paradigm to Efficiently Leverage Machine Learning in Agriculture. Available at SSRN: https://ssrn.com/abstract=4314564 or http://dx.doi.org/10.2139/ssrn.4314564

Jonathan Klein

King Abdullah University of Science and Technology (KAUST) ( email )

Thuwal 23955- 6900
Thuwal, 4700
Saudi Arabia

Rebekah E. Waller

King Abdullah University of Science and Technology (KAUST) ( email )

Thuwal 23955- 6900
Thuwal, 4700
Saudi Arabia

Sören Pirk

Adobe Research ( email )

321 Park Avenue
San Jose, CA 95113

Wojtek Pałubicki

Adam Mickiewicz University ( email )

Wieniawskiego 1
Poznan, 61-712
Poland

Mark Tester

King Abdullah University of Science and Technology (KAUST) ( email )

Thuwal 23955- 6900
Thuwal, 4700
Saudi Arabia

Dominik Michels (Contact Author)

King Abdullah University of Science and Technology (KAUST) - Department of Computer, Electrical and Mathematical Sciences & Engineering ( email )

Al-Khawarizmi
Thuwal, 23955
Saudi Arabia

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
211
Abstract Views
746
Rank
282,325
PlumX Metrics