Embedded ethics at Harvard: Harvard: bringing ethical reasoning into the computer science curriculum.

Embedded EthiCS @ Harvard: bringing ethical reasoning into the computer science curriculum.
Course Modules / CS 109a: Introduction to Data Science

Repository of Open Source Course Modules

 

Course: CS 109a: Introduction to Data Science

Course Level: Upper-level undergraduate

Course Description: “This course is the first half of a one‐year introduction to data science. We will focus on the analysis of data to perform predictions using statistical and machine learning methods. Topics include data scraping, data management, data visualization, regression and classification methods, and deep neural networks. You will get ample practice through weekly homework assignments. The class material integrates the five key facets of an investigation using data:

  1. data collection ‐ data wrangling, cleaning, and sampling to get a suitable data set
  2. data management ‐ accessing data quickly and reliably
  3. exploratory data analysis – generating hypotheses and building intuition
  4. prediction or statistical learning
  5. communication – summarizing results through visualization, stories, and interpretable summaries
(Course description )"


Module Topic: Algorithmic Fairness and Recidivism Prediction

Module Author: Heather Spradley

Semesters Taught: Spring 2020

Tags:

fairness phil
discrimination phil
prediction CS
machine learning CS
bias both

Module Overview:

In this module, we discuss algorithmic fairness, focusing on the special case of fairness in recidivism prediction. The central case study for the module is COMPAS, a recidivism prediction tool that is used widely in the criminal justice system. In 2016, ProPublica published a piece arguing that COMPAS is unfairly biased against black defendants on the grounds that the tool’s false positive rate for black defendants is higher than its false positive rate for white defendants. Northpointe, the company that developed COMPAS, responded by arguing that the tool is “racially neutral” because it is calibrated between races: any two individuals that receive the same score are equally likely to reoffend, regardless of race. After reconstructing and evaluating both arguments (and drawing on John Rawls’ views about procedural fairness in A Theory of Justice), we consider more general questions about fairness in recidivism prediction. How, in general, might preexisting racial bias affect the performance of recidivism prediction tools based on machine learning? What can data scientists working on recidivism prediction problems do to help ensure that the systems they develop are fair? And should the criminal justice system be using recidivism prediction algorithms to make decisions in the first place?


Connection to Course Material: In this course, students learn how to build predictive models and consider various problems that interfere with the accuracy of these models, such as feedback loops. In the module, we consider how to develop predictive models that are both accurate and fair. We also challenge the idea that ensuring fairness requires sacrificing accuracy, particularly in the case of recidivism prediction.

 

Module Goals:

  1. Introduce students to the topic of algorithmic fairness, with a focus on fairness in recidivism prediction.
  2. Consider various ways in which predictive algorithms might be unfair, as well as how to develop fairer predictive algorithms.
  3. Equip students with philosophical tools to help them think more clearly about algorithmic fairness.

Key Philosophical Questions:

  1. What is fairness and what features of a predictive algorithm make it fair?
  2. What kinds of features of an individual would a fair algorithm take into account?
  3. In what ways can data collection be done fairly or unfairly and how does that impact the fairness of the predictive model that was trained on that data?

 

Key Philosophical Concepts:

  • Fairness
  • John Rawls’ “veil of ignorance” thought experiment
  • Moral relevance and irrelevance
  • Bias
  • Discrimination

Assigned Readings:

Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, “Machine Bias” (ProPublica).

This piece by ProPublica initiated the debate about whether COMPAS is biased against black defendants. In addition to introducing students to one of the central arguments in that debate, the reading provides useful background about COMPAS and how it is used in the criminal justice system.
One of the main reasons for choosing this topic was the extent to which it has been discussed and addressed in the CS community. Although it is often helpful to draw students’ attention brand new issues they might not have noticed before, we had a unique opportunity here to address something that already felt pressing to students and to give them philosophical tools to navigate the ongoing discussion in the CS community. Since this is the article that drew so much attention to the topic, it served as a great starting point for walking students through the current discussion in their community while also introducing them to philosophical tools for reflecting on the concepts such as “fairness” and “bias”.

 

Class Agenda:

  1. Overview.
  2. Case study: the COMPAS recidivism prediction tool.
  3. ProPublica’s argument that COMPAS is unfair.
  4. Philosophical concepts: fairness, moral relevance, John Rawls’ veil of ignorance thought experiment.
  5. Technical concepts: false positive rates and calibration.
  6. Argument that COMPAS is fair (based on Rob Long’s article “Fairness in Machine Learning”).
  7. Data and data collection as further objections to the fairness of COMPAS.
  8. Discussion.

Sample Class Activity:

In order to get students to feel the force of the ethical questions about predictive algorithms used in recidivism prediction, the module begins with two polls. In the first poll, students are asked to consider a scenario in which they are a judge making a pre-trial decision: they must decide whether to make that decision based on their own judgment or based on a risk assessment produced by a predictive algorithm. In the second poll, they are asked to make the same decision but from the perspective of the detainee about whom the pre-trial decision is being made. After the polls are complete, the students discuss their reasons for their answers. Then, they are asked to consider the common assumption that predictive tools will allow us to pass the buck on certain kinds of responsibilities in high stakes cases. They then discuss the way in which the responsibility of the creator of the predictive algorithm is heightened because of the way in which the algorithms are relied upon. This module was prepared for a very large class, so part of the motivation behind this assignment was that it could be done with several hundred people. In smaller classes, longer or more complicated activities would be possible. Taking polls worked well in that students were able to respond simultaneously and the results brought out interesting discussion.


Module Assignment:

In a post-module assignment, students are asked to explore recidivism data and corresponding COMPAS scores published by ProPublica. They are then asked to: (1) find correlations and differences between a defendant’s race and various other variables in the data; (2) write a short response to the question, “With respect to these variables, how could bias in the data or data collection be impacting or causing these differences?”; (3) build three predictive models from the data that leave out race and other correlating variables in different ways in order to see what impact different variables are having on the model; and (4) discuss the resulting false positive rates amongst different racial groups in each of their models and what implications this has for the fairness of predictive algorithms. One motivation behind this assignment was to get students to see just how different the results of various models can be depending on decisions made with respect to the fairness of the data. Many students tend to think of the task as one of representing data as accurately as possible in a model, but this exercise forces them to challenge that idea. They are able to think through in practice the relation between accuracy and fairness by building and running several models.


Lessons Learned:

Since the course had recently covered the technical concepts of calibration and false positive rates, we assumed that spending time reviewing these concepts would be unnecessary. In practice, however, we found that some students were not fluent enough with these concepts to readily apply them in the context of a new discussion about algorithmic fairness. When we teach the module again, we plan to spend more time reviewing these concepts before introducing new material.