Mass Digitization of Chinese Court Decisions: How to Use Text as Data in the Field of Chinese Law

48 Pages Posted: 15 Jun 2017 Last revised: 24 Nov 2019

See all articles by Benjamin L. Liebman

Benjamin L. Liebman

Columbia University - Law School

Margaret E. Roberts

University of California, San Diego (UCSD) - 21st Century China Center

Rachel E. Stern

University of California, Berkeley - Department of Jurisprudence & Social Policy; University of California, Berkeley - Berkeley Center on Comparative Equality & Anti-Discrimination Law

Alice Wang

Columbia Law School

Date Written: October 1, 2019

Abstract

Over the past five years, Chinese courts have placed tens of millions of court judgments online. We analyze the promise and pitfalls of using this remarkable new data source through the construction and examination of a dataset of 1,058,990 documents from Henan province. Courts posted judgments in roughly half of all cases in 2014 and, although the percent of cases posted online has likely risen since then, the single greatest challenge facing researchers remains documenting gaps in the data. We find that missing data varies widely by court, and that intermediate courts disclose significantly more documents than basic level courts. But court level, GDP per capita, population, and mediation rates are insufficient fully to explain variation in disclosure rates. Further work is needed to better understand how resources and incentives might be skewing the data. Despite incomplete information, however, a topic model of 20,321 administrative court judgments demonstrates how mass digitization of court decisions opens a new window into the practice of everyday law in China. Unsupervised machine learning combined with close reading of selected cases reveals surprising trends in administrative disputes as well as important research questions. Taken together, our findings suggest a need for humility and methodological pluralism among scholars seeking to use large-scale data from Chinese courts. The vast amount of incomplete data now available may frustrate attempts to find quick answers to existing questions, but the data excel at opening new pathways for research and at adding nuance to existing assumptions about the role of courts in Chinese society.

Keywords: Data, Law, Chinese Courts, Court Cases, Text as Data, Court Judgements

Suggested Citation

Liebman, Benjamin L. and Roberts, Margaret E. and Stern, Rachel E. and Wang, Alice, Mass Digitization of Chinese Court Decisions: How to Use Text as Data in the Field of Chinese Law (October 1, 2019). 21st Century China Center Research Paper No. 2017-01, Columbia Public Law Research Paper No. 14-551, Available at SSRN: https://ssrn.com/abstract=2985861 or http://dx.doi.org/10.2139/ssrn.2985861

Benjamin L. Liebman

Columbia University - Law School ( email )

435 West 116th Street
New York, NY 10025
United States

Margaret E. Roberts (Contact Author)

University of California, San Diego (UCSD) - 21st Century China Center ( email )

9500 Gilman Drive #0519
La Jolla, CA 92093-0519
United States

Rachel E. Stern

University of California, Berkeley - Department of Jurisprudence & Social Policy ( email )

School of Law
University of California
Berkeley, CA 94720-2150
United States

University of California, Berkeley - Berkeley Center on Comparative Equality & Anti-Discrimination Law

Boalt Hall
Berkeley, CA 94720-7200
United States

Alice Wang

Columbia Law School ( email )

435 West 116th Street
New York, NY 10025
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
1,586
Abstract Views
8,863
Rank
18,645
PlumX Metrics