Possibilities of Source Documentation and Disclosure for Generative AI Systems

9 Pages Posted: 31 Mar 2025 Last revised: 4 Mar 2025

See all articles by Sebastian Stober

Sebastian Stober

Otto-von-Guericke University, Magdeburg

Date Written: February 28, 2025

Abstract

Training generative AI models requires large amounts of training data, a significant portion of which is obtained through web scraping from the internet. Additionally, AI systems sometimes access web sources during operation to answer specific queries. This has led to a broad debate about copyright and usage rights. Undoubtedly, rights are affected here. Regardless of the extent to which legal claims exist, the question arises whether and how these can be asserted. A basic prerequisite for this is a sufficiently detailed source documentation and an adequate means for rights holders to obtain information about the sources. Is this technically possible and feasible with reasonable effort? The short answer is: Yes, it is technically possible and in many cases – especially for web sources – trivialto document sources and make them available for disclosure. This paper describes in detail what pragmatic solutions could look like.

Keywords: Generative AI, Source Documentation, Source Disclosure, Data Provenance

Suggested Citation

Stober, Sebastian, Possibilities of Source Documentation and Disclosure for Generative AI Systems (February 28, 2025). Available at SSRN: https://ssrn.com/abstract=5165118 or http://dx.doi.org/10.2139/ssrn.5165118

Sebastian Stober (Contact Author)

Otto-von-Guericke University, Magdeburg ( email )

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
98
Abstract Views
508
Rank
593,042
PlumX Metrics