Course: CS 181: Machine Learning
Course Level: Upper-level undergraduate
Course Description: "This course provides a broad and rigorous introduction to machine learning, probabilistic reasoning and decision making in uncertain environments."
Module Topic: Discrimination and Machine Learning
Module Author: Kate Vredenburgh
Semesters Taught: Spring 2018
Module Overview: In this module, we probe the ways that machine learning models can be discriminatory and examine different methods for preventing discriminatory outcomes. We begin by introducing two concepts of discrimination: disparate treatment and disparate impact. We then use those concepts to argue that there are at least four sets of important tools for reducing discrimination from the use of machine-learning models in the social sphere: the reduction of bias from data, the definition of the optimization problem, the choice of features, and the use statistical fairness criteria. Finally, we discuss an impossibility result regarding three statistical fairness criteria, and explain why this impossibility result is not surprising, given that the data is generated by biased institutions.
Connection to Course Technical Material: This topic connects to course content about bias (in the technical sense of the term from the machine learning literature). As we discuss in the module, technical bias can give rise to discriminatory bias. The module topic also connects with course content about feature extraction from data and optimization.
This topic was chosen because it connects material in the course with current research in machine learning on discrimination and statistical fairness criteria. It also connects with an important, contemporary social issue, discrimination resulting from the use of machine learning models to make important decisions about how individuals are treated.
© 2018 by Kate Vredenburgh, "Discrimination and Machine Learning" is made available under a Creative Commons Attribution 4.0 International license (CC BY 4.0).
Key Philosophical Questions:
Question (1) is the over-arching question of the module. The rest of the questions are raised to help students think through different aspects of the over-arching question.
Key Philosophical Concepts:
Discrimination is an incredibly important concept for current work in computer science on machine learning and fairness. This module aims to show students that it is important to draw on domain experts such as lawyers to address ethical problems through design.
Barocas and Selbst discuss (1) how discrimination arises in algorithmic decision-making, and (2) whether that discrimination is wrongful, according to the disparate impact standard in the law. They identify two philosophical foundations for anti-discrimination law in the United States, and argue that these two foundations differ on when and why discrimination is wrongful.
Sample Class Activity: In small groups, discuss whether the following case (1) is a case of wrongful discrimination, according to the disparate impact standard, and (2) where you think it is a case of discrimination. If you answered yes to (2), explain why you think it is a case of discrimination. If you answered no, explain why you think it is not a case of discrimination.
This class activity facilitates student understanding of disparate impact and disparate treatment accounts of discrimination by asking them to determine whether the Glap case is a case of discrimination according to either of those standards. The activity also encourages students to begin to identify potential limitations of the disparate impact standard: many students judge that the Glap case is a case of wrongful discrimination, but this judgment cannot be explained by appeal to standard disparate impact accounts of discrimination. Finally, the activity sets up discussion of the impossibility result considered later in class. Given that discriminatory behavior by individuals produces some of the data on which the system is trained, is it surprising that individuals from subordinated groups have a higher probability of being incorrectly unfavorable classified than those from privileged groups?
Module Assignment: Recall the Glap class activity. In class, we thought about the problem statically: given historical data, such as data about sales performance, who should Glap hire right now?
In this follow-up assignment, I want you to think about consumer behavior and firm hiring practice dynamically. Looking at features of the labor market dynamically allows you more, or different, degrees of freedom in your model. For example, in class, you probably took consumer preference about the race of their sales representative as given. What would happen if you allowed consumer preference to vary (say, on the basis of changing racial demographics in the sales force)?
Lessons Learned: Student response to this module has been overwhelmingly positive. A few lessons stand out.