Fair Use and Machine Learning

43 Pages Posted: 24 Jun 2019

Date Written: June 18, 2019


There would be a beaten path to the maker of software that could reliably state whether a use of a copyrighted work was protected as fair use. But applying machine learning to fair use faces considerable hurdles. Fair use has generated hundreds of reported cases, but machine learning works best with examples in greater numbers. More examples may be available, from mining the decision making of web sites, from having humans judge fair use examples just as they label images to teach self-driving cars, and using machine learning itself to generate examples. Beyond the number of examples, the form of the data is more abstract than the concrete examples on which machine learning has succeeded, such as computer vision, viewing recommendations, and even in comparison to machine translation, where the operative unit was the sentence, not a concept that could be distributed across a document. But techniques presently in use do find patterns in data to build more abstract features, and then use the same process to build more abstract features. It may be that such automated processes can provide the conceptual blocks necessary. In addition, tools drawn from knowledge engineering (ironically, the branch of artificial intelligence that of late has been eclipsed by machine learning) may extract concepts from such data as judicial opinions. Such tools would include new methods of knowledge representation and automated tagging.

If the data questions are overcome, machine learning provides intriguing possibilities, but also faces challenges from the nature of fair use law. Artificial neural networks have shown formidable performance in classification. Classifying fair use examples raises a number of questions. Fair use law is often considered contradictory, vague, and unpredictable. In computer science terminology, the data is “noisy.” That inconsistency could flummox artificial neural networks, or the networks could disclose consistencies that have eluded commentators. Other algorithms such as nearest neighbor and support vectors could likewise both use and test legal reasoning by analogy. Another approach to machine learning, decision trees, may be simpler than other approaches in some respects, but could work on smaller data sets (addressing one of the data issues above) and provide something that machine learning often lacks: transparency. Decision trees disclose their decision-making process, whereas neural networks, especially deep learning, are opaque black boxes. Finally, unsupervised machine learning could be used to explore fair use case law for patterns, whether they be consistent structures in its jurisprudence, or biases that have played an undisclosed role. Any possible patterns found, however, should be treated as possibilities, pending testing by other means.

Keywords: artificial intelligence, machine learning, fair use, copyright, algorithm, intellectual property, data

Suggested Citation

McJohn, Stephen M. and McJohn, Ian, Fair Use and Machine Learning (June 18, 2019). Northeastern University Law Review, Forthcoming, Available at SSRN: https://ssrn.com/abstract=3406283

Stephen M. McJohn (Contact Author)

Suffolk University Law School ( email )

120 Tremont Street
Boston, MA 02108-4977
United States

Ian McJohn


Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
PlumX Metrics