Embedded EthiCSTM @ Harvard Bringing ethical reasoning into the computer science curriculum
We Value Your Feedback! Help us improve by sharing your thoughts in a brief survey. Your input makes a difference—thank you!
Big Data Systems (CS 265) – 2023 Spring
First time reviewing a module? Click here.
Click ⓘ to access marginalia information, such as reflections from the module designer, pedagogical decisions, and additional sources.
Click “Download full module write-up” to download a copy of this module and all marginalia information available.
Module Topic: Privacy and Promoting Public Health with Big Data Systems
Module Author: Michael Pope
Course Level: Graduate
AY: 2022-2023
Course Description: “Big data is everywhere. A fundamental goal across numerous modern businesses and sciences is to be able to utilize as many machines as possible, to consume as much information as possible and as fast as possible. The big challenge is how to turn data into useful knowledge. This is a moving target as both the underlying hardware and our ability to collect data evolve. In this class, we discuss how to design data systems, data structures, and algorithms for key data-driven areas, including relational systems, distributed systems, graph systems, noSQL, newSQL, machine learning, and neural networks. We see how they all rely on the same set of very basic concepts and we learn how to synthesize efficient solutions for any problem across these areas using those basic concepts.”Big data is everywhere. A fundamental goal across numerous modern businesses and sciences is to be able to utilize as many machines as possible, to consume as much information as possible and as fast as possible. The big challenge is how to turn data into useful knowledge. This is a moving target as both the underlying hardware and our ability to collect data evolve. In this class, we discuss how to design data systems, data structures, and algorithms for key data-driven areas, including relational systems, distributed systems, graph systems, noSQL, newSQL, machine learning, and neural networks. We see how they all rely on the same set of very basic concepts and we learn how to synthesize efficient solutions for any problem across these areas using those basic concepts.” (Harvard course catalog. Course website)
Semesters Taught: Spring 2019, Spring 2020, Spring 2023
Tags
ⓘ
- Privacy [phil]
- Big data [CS]
- Data systems [CS]
- Trade-offs [both]
- Trust [phil]
- Design [CS]
- Public good [phil]
Module Overview
This module focuses on efforts to preserve privacy in contexts where big data systems can promote public goods. The primary case study, involving contact tracing applications, raises questions about leveraging available data in the service of public health and preserving privacy. In particular, the module examines how centralized and decentralized data collection and storage approaches can promote autonomy, democratic ideals, and protections from harm. Finally, the module concludes with a discussion of the limits of privacy.
Connection to Course Technical Material
ⓘ
One primary promise of big data systems is their ability to manage and utilize large data sets for solving problems. In particular, harnessing large data sets to track and limit the impact of vectors of disease serves a primary public health goal. Yet, achieving that goal could require collecting, storing, and utilizing large amounts of sensitive data, raising questions around privacy, consent, and public trust.
This course focuses on cutting-edge research into the design and deployment of big data systems. The module integrates considerations of privacy’s social and ethical importance into the course’s focus on leveraging big data to enhance system performance.
Goals
Module Goals
- Identify potential privacy violations in leveraging big data systems for contact tracing.
- Discuss ways that data structure can preserve privacy or fail to preserve privacy.
- Examine how performance goals interact with valuable functions of privacy.
Key Philosophical Questions
ⓘ
Q1: Through student discussion, this module generates a number of conceptions of privacy, ranging from solitude to control over others’ access to one’s information. For designing data systems, the module focused on limits placed on others’ access to data.
Q2 and Q3: By looking at centralized and decentralized contact tracing applications during the COVID-19 pandemic, this module invites students to reflect on privacy’s value (e.g., in promoting autonomy) and reasonable limits on privacy. While there is agreement in many cases, instances where privacy and public health goals can be in tension present students with opportunities to negotiate and discern the appropriate balance. This is the focus of the module’s final discussion.
- What is privacy?
- Why should data system design choices protect privacy?
- What are the limits of privacy restrictions?
Materials
Key Philosophical Concepts
ⓘ
Introducing privacy in contexts where available data could be utilized to promote public goods, such as population health, helps students connect the value of privacy to justifying public trust and increasing system uptake.
- Privacy
- Public good
- Value trade-offs
- Trust and trustworthiness
Assigned Readings
ⓘ
Warzel and Ngu discuss how the evolution of Google’s privacy policy demonstrates the internet’s transformation, identifying complexities for preserving privacy while providing a valuable service.
Sharon overviews Google and Apple’s exposure notification API in connection with privacy concerns. The article raises questions both about the value of privacy and the potential for privacy-preserving technologies to mask other social and ethical issues.
- Warzel C. and Ngu A. (2019). “Google’s 4,000-word privacy policy is a secret history of the internet.” The New York Times.
- Sharon T. (2021). “Blind-sided by privacy? Digital contact tracing, the Apple/Google API and big tech’s newfound role as global health policy makers.” Ethics and Information Technology.
Implementation
Class Agenda
- Introduction to contact tracing techniques, including its history
- Discussion: leveraging big data systems for contact tracing
- Case study: Apple and Google’s API for exposure notification
- Discussion: balancing privacy concerns and performance maximization
- Final activity and debrief: student identification and discussion of privacy concerns through various case studies
Sample Class Activity
ⓘ
This activity serves two module goals. First, it allows students to connect the contact tracing case study to the technical content of the course, especially efficiency concerns for data structures. Second, it presents a natural opportunity to develop a definition of privacy and apply it to the case study through student discussion.
Before raising privacy concerns, students break into groups and discuss the types of data most relevant to contact tracing, including efficiency considerations for gathering and storing the relevant data. Through group discussion, students write data sources on post-it notes and place them on the board. After collecting all input, the class generates possible reasons for utilizing or avoiding data sources in contact tracing, facilitating a conceptualization of privacy through discussion. Potentially problematic data sources are deliberatively ‘taken off the table’ by moving them to a special area of the board.
Module Assignment
ⓘ
One possible assignment is to apply the module’s discussion of privacy to techniques for managing large data structures, such as logical deletion.
There was no assignment for this module.
Lessons Learned
Student engagement and feedback for this module was universally positive. An important upshot of student feedback is that modules are more impactful to the extent that they integrate students’ experiences. Inviting students to think about how privacy functions in their own lives promotes their participation in identifying privacy’s value. For example, student reflection (including private reflection) about activities they would not engage in if they had no privacy, what powers they are subjected to due to our current data practices, etc. increases buy-in and naturally enhances discussions.
About
About Approach
History
News
Team
Placement Team
Team Alums
Column block with website map links
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution 4.0 International License.
Embedded EthiCS is a trademark of President and Fellows of Harvard College | Contact us