Embedded ethics at Harvard: Harvard: bringing ethical reasoning into the computer science curriculum.

Embedded EthiCS @ Harvard: bringing ethical reasoning into the computer science curriculum.
Course Modules / CS 271: Topics in Data Visualization

Repository of Open Source Course Modules

 

Course: CS 271: Topics in Data Visualization

Course Level: Graduate

Course Description: "This course covers advanced topics in data visualization. Over the course of the semester, we will examine seminal works and recent state-of-the-art research in information visualization, scientific visualization, and visual analytics. Students are encouraged to bring in ongoing or related research. Topics covered in this class include interaction, storytelling, evaluation, color, volume rendering, vector field visualization, visualization in sciences, big data visualization, uncertainty visualization, and visualization for machine learning. Students will work on a semester-long visualization project that will allow them to visualize their own data sets. We will take a structured approach on how to read, analyze, present, and discuss research topics. Furthermore, we will employ peer-feedback and formal design critiques to analyze each other’s work.""

(Course description )


Module Topic: The Ethics and Politics of Data Visualization

Module Author: Marion Boulicault

Semesters Taught: Spring 2020

Tags:

data visualization CS
design CS
big data CS
disability phil
feminist theory phil
stereotypes phil
ethical design principles both

Module Overview:

In this module, we discuss the ethical and political dimensions of data visualization. The module sets the stakes by beginning with a discussion of the social and epistemic power of data visualization in today’s world. It then focuses on a set of commonplace principles for effective data visualization. The political and ethical dimensions of each principle are considered and debated. Finally, the students are asked to expand on the meaning of ‘effective’ by brainstorming alternative data visualization principles that center the ethical and political dimensions.


Connection to Course Material: The course teaches technical skills, strategies, and principles for effective data visualization. The module examines the ethical and political dimensions of these skills, strategies and principles. For example, one of the suggested strategies for effective data visualization is to ‘reduce cognitive load’ for the audience. As part of the module, the Embedded EthiCS TA leads a discussion about one way cognitive load could be reduced: taking advantage of (and therefore potentially reinforcing) problematic commonplace existing stereotypes, such as using the color ‘blue’ to indicate male and the color ‘pink’ to indicate female. By highlighting examples like these, the module provides a lens and set of tools for identifying and analyzing the ethical dimensions of the technical practice of data visualization.

 

Module Goals:

  1. Provide students with philosophical tools to think critically about the ethical implications of data visualization design choices.
  2. Introduce the idea that “data do not speak for themselves” and must always be considered and evaluated with respect to the context in which they are generated and used.
  3. Teach students how to think about the effects of design choices on marginalized groups, particularly people with disabilities.
  4. Give students access to resources for feminist and community-based data visualization principles and practices.

Key Philosophical Questions:

  1. Can data visualization ever be “objective” and what do we mean by “objective”?
  2. What are some of the ethical dimensions of commonplace data visualization design principles?
  3. Why does attending to context matter when creating and evaluating data visualizations?
  4. How can we draw on work in disability studies and feminist theory to craft ethical data visualization principles?

 

Key Philosophical Concepts:

  • Feminist theory
  • Objectivity
  • Disability studies
  • Ethical design principles
  • Context

Assigned Readings:

Lundgard, Alan, Crystal Lee, and Arvind Satyanarayan. “Sociotechnical Considerations for Accessible Visualization Design.”

 

Class Agenda:

  1. Overview.
  2. Case study: the COMPAS recidivism prediction tool.
  3. ProPublica’s argument that COMPAS is unfair.
  4. Philosophical concepts: fairness, moral relevance, John Rawls’ veil of ignorance thought experiment.
  5. Technical concepts: false positive rates and calibration.
  6. Argument that COMPAS is fair (based on Rob Long’s article “Fairness in Machine Learning”).
  7. Data and data collection as further objections to the fairness of COMPAS.
  8. Discussion.

Sample Class Activity:

In order to get students to feel the force of the ethical questions about predictive algorithms used in recidivism prediction, the module begins with two polls. In the first poll, students are asked to consider a scenario in which they are a judge making a pre-trial decision: they must decide whether to make that decision based on their own judgment or based on a risk assessment produced by a predictive algorithm. In the second poll, they are asked to make the same decision but from the perspective of the detainee about whom the pre-trial decision is being made. After the polls are complete, the students discuss their reasons for their answers. Then, they are asked to consider the common assumption that predictive tools will allow us to pass the buck on certain kinds of responsibilities in high stakes cases. They then discuss the way in which the responsibility of the creator of the predictive algorithm is heightened because of the way in which the algorithms are relied upon.


Module Assignment:

In a post-module assignment, students are asked to explore recidivism data and corresponding COMPAS scores published by ProPublica. They are then asked to: (1) find correlations and differences between a defendant’s race and various other variables in the data; (2) write a short response to the question, “With respect to these variables, how could bias in the data or data collection be impacting or causing these differences?”; (3) build three predictive models from the data that leave out race and other correlating variables in different ways in order to see what impact different variables are having on the model; and (4) discuss the resulting false positive rates amongst different racial groups in each of their models and what implications this has for the fairness of predictive algorithms.


Lessons Learned:

Since the course had recently covered the technical concepts of calibration and false positive rates, we assumed that spending time reviewing these concepts would be unnecessary. In practice, however, we found that some students were not fluent enough with these concepts to readily apply them in the context of a new discussion about algorithmic fairness. When we teach the module again, we plan to spend more time reviewing these concepts before introducing new material.