Possibilities of Source Documentation and Disclosure for Generative AI Systems

Stober, Sebastian

doi:10.2139/ssrn.5165118

Download This Paper

Open PDF in Browser

Add Paper to My Library

Possibilities of Source Documentation and Disclosure for Generative AI Systems

9 Pages Posted: 31 Mar 2025 Last revised: 4 Mar 2025

See all articles by Sebastian Stober

Sebastian Stober

Otto-von-Guericke University, Magdeburg

Date Written: February 28, 2025

Abstract

Training generative AI models requires large amounts of training data, a significant portion of which is obtained through web scraping from the internet. Additionally, AI systems sometimes access web sources during operation to answer specific queries. This has led to a broad debate about copyright and usage rights. Undoubtedly, rights are affected here. Regardless of the extent to which legal claims exist, the question arises whether and how these can be asserted. A basic prerequisite for this is a sufficiently detailed source documentation and an adequate means for rights holders to obtain information about the sources. Is this technically possible and feasible with reasonable effort? The short answer is: Yes, it is technically possible and in many cases – especially for web sources – trivialto document sources and make them available for disclosure. This paper describes in detail what pragmatic solutions could look like.

Keywords: Generative AI, Source Documentation, Source Disclosure, Data Provenance

Suggested Citation: Suggested Citation

Stober, Sebastian, Possibilities of Source Documentation and Disclosure for Generative AI Systems (February 28, 2025). Available at SSRN: https://ssrn.com/abstract=5165118 or http://dx.doi.org/10.2139/ssrn.5165118