Applied Privacy for Data Science (CS 208) - 2022 Spring

Semester: 

Spring

Offered: 

2022
First time reviewing a module? Click here.First time reviewing a module? Click here.

Click to access marginalia information, such as reflections from the module designer, pedagogical decisions, and additional sources.

Click "Download full module write-up” to download a copy of this module and all marginalia information available.

Module Topic: Differential Privacy in Context
Module Author: Sophie Gibert

Download full module write-up

Course Level: Graduate
AY: 2021-2022

Course Description: “The risks to privacy when making human subjects data available for research and how to protect against these risks using the formal framework of differential privacy. Methods for attacking statistical data releases, the mathematics of and software implementations of differential privacy, deployed solutions in industry and government. Assignments will include implementation and experimentation on data science tasks.” (Course Description)

Semesters Taught: Spring 2022

Tags

  • Differential privacy [CS]
  • Privacy budget [CS]
  • Privacy [Phil]
  • Contextual integrity [Phil]
  • Information norms [Phil]
  • Context-appropriate information flow [Phil]
  • Individual values [Phil]
  • Collective values [Phil]
  • Context-specific values and goals [Phil]

Module Overview: This module introduces students to the framework of contextual integrity. The framework is used to help answer two questions: (1) When and why do some technologies and information practices raise legitimate privacy concerns? (2) When and to what extent do the tools of differential privacy help address legitimate privacy concerns? The module begins with an overview of Helen Nissenbaum’s theory of privacy as contextual integrity. Then, most of the time is spent on a scaffolded activity. Guided by a worksheet, students apply the contextual integrity framework to a fictionalized case in which differential privacy techniques are employed. To close, students reflect on the role that differential privacy played in their analysis and consider the strengths and limitations of differential privacy in addressing privacy concerns.

Connection to Course Material: Select circle icon with letter 'i' to read the marginalia for this paragraph

The topic was chosen because of its direct connection to the technical material covered in the course. The framework of contextual integrity was chosen because it calls attention to ways in which the inappropriate flow of information can be bad not only for individuals, but for collectives. By hypothesis, differential privacy does an adequate job of protecting individuals against some of the harms that come with re-identification but does little to address collective concerns about contemporary data practices.

Students in this course learn how to measure and protect against privacy risks using the tools of differential privacy. Differential privacy is sometimes thought to be a magic bullet for addressing privacy concerns. The module helps students think critically and systematically about whether and to what extent differential privacy can solve privacy problems.

Goals

Module Goals

By the end of the module, students will be able to:

  1. Explain the key components of the contextual integrity framework.
  2. Apply the framework to cases, including cases in which differential privacy techniques are employed.
  3. Reflect critically and systematically about whether and to what extent differential privacy addresses legitimate privacy concerns.

Key Philosophical Questions Select circle icon with letter 'i' to read the marginalia for this paragraph

Question 1: In order to assess the extent to which differential privacy techniques help address legitimate privacy concerns, students need to be able to identify legitimate privacy concerns and explain why they are legitimate.

Question 2: A key component of the theory of privacy as contextual integrity is the claim that different norms governing the flow of information are appropriate in different contexts. Since privacy is understood as the context-appropriate flow of information, the very same data practice may threaten privacy in one context and not another.

Question 3: By applying the framework of contextual integrity to a case involving differential privacy, students will gain insight into the kinds of privacy concerns that differential privacy is best suited to address.

  1. When and why do technologies and data practices raise legitimate privacy concerns?
  2. Why might the same data practice raise legitimate privacy concerns in one context and not in another?
  3. What kinds of privacy concerns do the tools of differential privacy tend to address, and what kinds do they tend not to address?

Materials

Key Philosophical Concepts Select circle icon with letter 'i' to read the marginalia for this paragraph

In accordance with the theory of contextual integrity, privacy is understood as the context-appropriate flow of information.

There are three steps to the contextual integrity framework. The first involves explaining how, if at all, the technology or information practice at issue disrupts the prevailing information norms of the social context in which it’s embedded.

The second step involves evaluating any disruptions identified in the first step in light of individual values, collective values, and context-specific values and goals.

The third step involves making a prescription or judgment. If the technology or information practice at issue disrupts the prevailing information norms in a way that does not serve general and context-specific values and goals, then something must change.

  • Contextual integrity
  • Privacy
  • Context-appropriate flow of information
  • Information norms
  • Social context
  • Individual values
  • Collective values
  • Context-specific values and goals

Assigned Readings

  • Introduction (pp. 1-11 only) and Chapter 7 (pp. 140-150 only) in Nissenbaum, Helen. 2010. Privacy in Context: Technology, Policy, and the Integrity of Social Life. Stanford, CA: Stanford University Press. Select circle icon with letter 'i' to read the marginalia for this paragraph

    In Privacy in Context, Helen Nissenbaum defends the theory of privacy as contextual integrity and demonstrates how to apply the contextual integrity framework. These selections from the Introduction and Chapter 7 provide an overview of key ideas. Details of the theory and how to apply the framework are discussed during the module itself.

Implementation

Class Agenda

  1. Introduction and preparation for small-group discussion (5 minutes)
  2. Overview of the framework of contextual integrity, including a worked example (20 minutes)
  3. Activity in small groups: Scaffolded application of the framework to a case, aided by a worksheet, pausing at various points for class-wide discussion (40 minutes)
  4. Class-wide discussion of the role that differential privacy plays in the analysis (10 minutes)

Sample Class Activity Select circle icon with letter 'i' to read the marginalia for this paragraph

The fictionalized case was designed to involve central differential privacy techniques, as students do not learn about local differential privacy until later in the course.

Students are presented with a fictionalized case. The case is designed by the Embedded EthiCS TA in collaboration with the professor and CS TAs. In the case, a company called “Coachable” designs wearable fitness trackers for athletes and allows third parties, such as sports recruiters and advertisers, to perform differentially private analyses for their own purposes on the data it collects.

In small groups, students analyze the case using the framework of contextual integrity. They are provided with information about the company’s data practices and a worksheet that breaks the analysis into 9 questions. In questions 1-3, students describe how information flows in the case, describe how it flows in a relevant social context (that of the athlete-coach relationship), and identify disruptions—i.e., differences between the two. In question 4, they consider how their responses to questions 1-3 would differ if Coachable did not use differential privacy techniques. This reveals the role that differential privacy can play in preserving the prevailing informational norms of the relevant context. In questions 5-8, students evaluate the disruptions they identified in question 3 in light of general and context-specific values and goals. In question 9, students prescribe, making a judgment about whether any of Coachable’s data practices should change.

Module Assignment Select circle icon with letter 'i' to read the marginalia for this paragraph

Students in this course use Perusall for all their pre-class reading assignments. The platform worked well, allowing for many kinds of engagement and giving the Embedded EthiCS TA a good sense of what students understood and where their interests lay.

The case for the post-module assignment involved differential privacy in the context of machine learning, as requested by the professor and CS TAs, because students learned about machine learning applications in the class sessions immediately preceding the module.

The post-module assignment questions provided less scaffolding than the in-class activity questions. Instead of asking students to apply the framework in 9 simple steps, the assignment asked them to do so in 3 more complex steps.

Before the module, students are asked to read two excerpts from Privacy in Context. Students use Perusall to collectively annotate the reading, leave comments, and ask questions. The Embedded EthiCS TA and CS TAs monitor Perusall to answer questions and respond to comments.

After the module, students complete an assignment that builds on the in-class activity. The assignment presents the same fictionalized case discussed in class but adds an additional detail. In the new case, Coachable employs user data to train machine learning models that predict what kinds of ads their users will respond to. Differential privacy techniques are used during the model training process. Coachable uses insights from the model to micro-target ads to its users and releases the model to researchers, advertisers, and sports recruiters who are interested in the relationship between behaviors, demographic traits, and athletic performance.

Students respond to 3 questions. In questions 1-2, they apply the contextual integrity framework to determine whether this new data practice raises any legitimate privacy concerns. In question 3, they reflect on the role that Coachable’s use of differential privacy played in their analysis.

Lessons Learned

Student responses to this module were overwhelmingly positive. There was high engagement across the room during the activity portion and class-wide discussions.

The Coachable case worked well: it engaged many kinds of students, including athletes, and the stakes in the case were both high enough to make the analysis valuable but low enough to make students comfortable speaking with their peers about their ethical opinions.

Pedagogical lessons learned:

  • Worksheets work extremely well in this setting. They set clear expectations and provide a lot of structure. This is important in classrooms where students are not used to discussing ethics with their peers. They also contribute to engagement by giving students something to write on and minimizing their use of electronics. Writing forces students to put their thoughts onto paper, which can reveal gaps in their reasoning and understanding that would go overlooked in a typical conversation.
  • Students appreciate learning to apply a normative framework to their area of interest, especially when the framework has concrete steps. After learning to use a framework, they also appreciate the opportunity to critique it and discuss its limitations.
Applied Privacy for Data Science (CS 208) - 2022 Spring119 KB