Course: CS 109a: Introduction to Data Science
Course Level: Upper-level undergraduate
Course Description: “This course is the first half of a one‐year introduction to data science. We will focus on the analysis of data to perform predictions using statistical and machine learning methods. Topics include data scraping, data management, data visualization, regression and classification methods, and deep neural networks. You will get ample practice through weekly homework assignments. The class material integrates the five key facets of an investigation using data:
Module Topic: Algorithmic Fairness and Recidivism Prediction
Module Author: Heather Spradley
Semesters Taught: Spring 2020
In this module, we discuss algorithmic fairness, focusing on the special case of fairness in recidivism prediction. The central case study for the module is COMPAS, a recidivism prediction tool that is used widely in the criminal justice system. In 2016, ProPublica published a piece arguing that COMPAS is unfairly biased against black defendants on the grounds that the tool’s false positive rate for black defendants is higher than its false positive rate for white defendants. Northpointe, the company that developed COMPAS, responded by arguing that the tool is “racially neutral” because it is calibrated between races: any two individuals that receive the same score are equally likely to reoffend, regardless of race. After reconstructing and evaluating both arguments (and drawing on John Rawls’ views about procedural fairness in A Theory of Justice), we consider more general questions about fairness in recidivism prediction. How, in general, might preexisting racial bias affect the performance of recidivism prediction tools based on machine learning? What can data scientists working on recidivism prediction problems do to help ensure that the systems they develop are fair? And should the criminal justice system be using recidivism prediction algorithms to make decisions in the first place?
Connection to Course Material: In this course, students learn how to build predictive models and consider various problems that interfere with the accuracy of these models, such as feedback loops. In the module, we consider how to develop predictive models that are both accurate and fair. We also challenge the idea that ensuring fairness requires sacrificing accuracy, particularly in the case of recidivism prediction.
Key Philosophical Questions:
Key Philosophical Concepts:
Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, “Machine Bias” (ProPublica).
This piece by ProPublica initiated the debate about whether COMPAS is biased against black defendants. In addition to introducing students to one of the central arguments in that debate, the reading provides useful background about COMPAS and how it is used in the criminal justice system.
One of the main reasons for choosing this topic was the extent to which it has been discussed and addressed in the CS community. Although it is often helpful to draw students’ attention brand new issues they might not have noticed before, we had a unique opportunity here to address something that already felt pressing to students and to give them philosophical tools to navigate the ongoing discussion in the CS community. Since this is the article that drew so much attention to the topic, it served as a great starting point for walking students through the current discussion in their community while also introducing them to philosophical tools for reflecting on the concepts such as “fairness” and “bias”.
Sample Class Activity:
In order to get students to feel the force of the ethical questions about predictive algorithms used in recidivism prediction, the module begins with two polls. In the first poll, students are asked to consider a scenario in which they are a judge making a pre-trial decision: they must decide whether to make that decision based on their own judgment or based on a risk assessment produced by a predictive algorithm. In the second poll, they are asked to make the same decision but from the perspective of the detainee about whom the pre-trial decision is being made. After the polls are complete, the students discuss their reasons for their answers. Then, they are asked to consider the common assumption that predictive tools will allow us to pass the buck on certain kinds of responsibilities in high stakes cases. They then discuss the way in which the responsibility of the creator of the predictive algorithm is heightened because of the way in which the algorithms are relied upon. This module was prepared for a very large class, so part of the motivation behind this assignment was that it could be done with several hundred people. In smaller classes, longer or more complicated activities would be possible. Taking polls worked well in that students were able to respond simultaneously and the results brought out interesting discussion.
In a post-module assignment, students are asked to explore recidivism data and corresponding COMPAS scores published by ProPublica. They are then asked to: (1) find correlations and differences between a defendant’s race and various other variables in the data; (2) write a short response to the question, “With respect to these variables, how could bias in the data or data collection be impacting or causing these differences?”; (3) build three predictive models from the data that leave out race and other correlating variables in different ways in order to see what impact different variables are having on the model; and (4) discuss the resulting false positive rates amongst different racial groups in each of their models and what implications this has for the fairness of predictive algorithms. One motivation behind this assignment was to get students to see just how different the results of various models can be depending on decisions made with respect to the fairness of the data. Many students tend to think of the task as one of representing data as accurately as possible in a model, but this exercise forces them to challenge that idea. They are able to think through in practice the relation between accuracy and fairness by building and running several models.
Since the course had recently covered the technical concepts of calibration and false positive rates, we assumed that spending time reviewing these concepts would be unnecessary. In practice, however, we found that some students were not fluent enough with these concepts to readily apply them in the context of a new discussion about algorithmic fairness. When we teach the module again, we plan to spend more time reviewing these concepts before introducing new material.