Generating Synthetic Data in Finance: Opportunities, Challenges and Pitfalls

10 Pages Posted: 16 Jul 2020 Last revised: 11 Dec 2020

Date Written: June 23, 2020


Financial services generate a huge volume of data that is extremely complex and varied. These datasets are often stored in silos within organisations for various reasons, including but not limited to, regulatory requirements and business needs. As a result, data sharing within different lines of business as well as outside of the organisation (e.g. to the research community) is severely limited. It is therefore critical to investigate methods for synthesising financial datasets that follow the same properties of the real data while respecting the need for privacy of the parties involved in a particular dataset.

This introductory paper aims to highlight the growing need for effective synthetic data generation in the financial domain. We highlight three main areas of focus for the academic community: 1) Generating realistic synthetic datasets. 2) Measuring the similarities between real and generated datasets 3) Ensuring the generative process satisfies any privacy constraints.

Although these challenges are also present in other domains, the extra regulatory and privacy requirements add another dimension of complexity and offer a unique opportunity to study the topic in financial services. Finally, we aim to develop a shared vocabulary and context for generating synthetic financial data using two types of financial datasets as examples.

Keywords: Synthetic Data, generative models, privacy

JEL Classification: C

Suggested Citation

Assefa, Samuel, Generating Synthetic Data in Finance: Opportunities, Challenges and Pitfalls (June 23, 2020). Available at SSRN: or

Samuel Assefa (Contact Author)

JP Morgan Chase ( email )

383 Madison Avenue
New York, NY 10179-0001
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Abstract Views
PlumX Metrics