An Empirical Analysis of the Python Package Index (PyPI)

15 Pages Posted: 26 Jul 2019 Last revised: 29 Jul 2019

See all articles by Ethan Bommarito

Ethan Bommarito

University of Michigan Ann Arbor

Michael James Bommarito

Bommarito Consulting, LLC; Licensio, LLC; Stanford Center for Legal Informatics; Michigan State College of Law

Date Written: July 25, 2019

Abstract

In this research, we provide a comprehensive empirical summary of the Python Package Repository, PyPI, including both package metadata and source code covering 178,592 packages, 1,745,744 releases, 76,997 contributors, and 156,816,750 import statements. We provide counts and trends for packages, releases, dependencies, category classifications, licenses, and package imports, as well as authors, maintainers, and organizations. As one of the largest and oldest software repositories as of publication, PyPI provides insight not just into the Python ecosystem today, but also trends in software development and licensing more broadly over time. Within PyPI, we find that the growth of the repository has been robust under all measures, with a compound annual growth rate of 47% for active packages, 39% for new authors, and 61% for new import statements over the last 15 years. As with many similar social systems, we find a number of highly right-skewed distributions, including the distribution of releases per package, packages and releases per author, imports per package, and size per package and release. However, we also find that most packages are contributed by single individuals, not multiple individuals or organizations. The data, methods, and calculations herein provide an anchor for public discourse on PyPI and serve as a foundation for future research on the Python software ecosystem.

Keywords: software, software development, licensing, open source, dependency, collaboration, python, complex system

Suggested Citation

Bommarito, Ethan and Bommarito, Michael James, An Empirical Analysis of the Python Package Index (PyPI) (July 25, 2019). Available at SSRN: https://ssrn.com/abstract=3426281 or http://dx.doi.org/10.2139/ssrn.3426281

Ethan Bommarito

University of Michigan Ann Arbor ( email )

Michael James Bommarito (Contact Author)

Bommarito Consulting, LLC ( email )

MI 48098
United States

HOME PAGE: http://bommaritollc.com

Licensio, LLC ( email )

Okemos, MI 48864
United States

Stanford Center for Legal Informatics ( email )

559 Nathan Abbott Way
Stanford, CA 94305-8610
United States

Michigan State College of Law ( email )

318 Law College Building
East Lansing, MI 48824-1300
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
8
Abstract Views
153
PlumX Metrics