August 2021
Download the Bias Paper Read the Paper Summary Try the gender classifier
We obtained case records from the Indian e-Courts platform -- a public system put in place by the Indian government in 2013. The publicly available information includes the filing, registration, hearing, and decision dates for each case, as well as petitioner and respondent names, the position of the presiding judge, the acts and sections under which the case was filed, and the final decision or disposition.
The database covers India's lower judiciary -- all courts including and under the jurisdiction of District and Sessions courts. We also obtained data on judges pertaining to all courts in the Indian lower judiciary from the e-Courts platform. The data for each judge includes the judge's name, their position or designation, and the start and end date of the judge's appointment to each court. We joined the case-level data with the judge-level data based on the judge's designation and the initial case filing date. However, this case-judge matching process was only conducted for criminal cases and has a match rate between 50% and 75% depending on the jurisdiction.
The e-Courts platform does not provide demographic metadata on judges and defendants. However, gender and religious identity can be determined quite accurately in India based on individuals' names. We trained a bidirectional Long Short-Term Memory (LSTM) model on a large database of labeled names and then used it to assign these characteristics in the legal data. Each name record was assigned two binary labels: male/female, and Muslim/Non-Muslim. Only the male/female classification was released in the public dataset. We conducted a manual verification of random subsets of names classified by gender and religion, stratified across all states. We can confirm an accuracy rate of 97% for both the gender and religion classification based on manual verification.
All public data has been fully anonymized to prevent the identification of individual judges or litigants. The full sample of cases is 81.2 million, of which about 10 million are categorized as criminal cases. For more information on data construction, see Section 3 of our study on in-group bias among judges in India’s lower judiciary.
If you are a researcher who is interested in working with additional fields from the eCourts dataset which are not included in the public data (such as religious classification or eCourts identifiers for back-linking), please send an email with an author list and a brief project outline to info@devdatalab.org.
The diagram below illustrates the concordance between e-courts HTML fields and the variable names in our data.
If you use these data, please cite our paper:
@Unpublished{ aabcdgns2021measuring,
author = {Ash, Elliot and Asher, Sam and Bhowmick, Aditi and Bhupatiraju, Sandeep and Chen, Daniel and Devi, Tanaya and Goessmann, Christoph, and Novosad, Paul and Siddiqi, Bilal},
note = {Working Paper},
title = {{In-group bias in the Indian judiciary: Evidence from 5 million criminal cases}},
year = {2021}
}
For a sample block of Stata code that serves as an introductory explainer on how to merge and utilize the data in this repository, please see the GitHub wiki entry here.
We thank Alison Campion, Rebecca Cai, Nikhitha Cheeti, Kritarth Jha, Romina Jafarian, Ornelie Manzambi, Chetana Sabnis, and Jonathan Tan for helpful research assistance. A special thanks to Sandeep Bhupatiraju for contributions in preparation of the data. We thank the World Bank Program on Data and Evidence for Justice Reform, the World Bank Research Support Budget, the Emergent Ventures fund at the Mercatus Center, the UC Berkeley Center for Effective Global Action, and the DFID Economic Development and Institutions program for financial support.