Human-Centered Data Science

An Introduction

Best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of large datasets.

Human-centered data science is a new interdisciplinary field that draws from human-computer interaction, social science, statistics, and computational techniques. This book, written by founders of the field, introduces best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of very large datasets. It offers a brief and accessible overview of many common statistical and algorithmic data science techniques, explains human-centered approaches to data science problems, and presents practical guidelines and real-world case studies to help readers apply these methods.
 
The authors explain how data scientists’ choices are involved at every stage of the data science workflow—and show how a human-centered approach can enhance each one, by making the process more transparent, asking questions, and considering the social context of the data. They describe how tools from social science might be incorporated into data science practices, discuss different types of collaboration, and consider data storytelling through visualization. The book shows that data science practitioners can build rigorous and ethical algorithms and design projects that use cutting-edge computational tools and address social concerns.
Cecilia Aragon is Professor in the Department of Human Centered Design and Engineering at the University of Washington. Shion Guha is Assistant Professor in the Faculty of Information at the University of Toronto. Marina Kogan is Assistant Professor in the School of Computing at the University of Utah. Michael Muller is a Research staff member at IBM Research. Gina Neff is Director of the Minderoo Centre for Technology and Democracy at the University of Cambridge and Professor of Technology and Society at the Oxford Internet Institute and the Department of Sociology at the University of Oxford. 
 
 
Acknowledgments ix
1 Data Science to Human-Centered Data Science 1
2 The Data Science Cycle 13
3 Interrogating Data Science 31
4 Techniques and Tools for Data Science Models 51
5 Human-Centered Approaches to Data Science Problems 75
6 Human-Centered Data Science Methods 93
7 Collaborations across and beyond Data Science 115
8 Storytelling with Data 129
9 The Future of Human-Centered Data Science 147
Glossary 153
References 163
Index 179

About

Best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of large datasets.

Human-centered data science is a new interdisciplinary field that draws from human-computer interaction, social science, statistics, and computational techniques. This book, written by founders of the field, introduces best practices for addressing the bias and inequality that may result from the automated collection, analysis, and distribution of very large datasets. It offers a brief and accessible overview of many common statistical and algorithmic data science techniques, explains human-centered approaches to data science problems, and presents practical guidelines and real-world case studies to help readers apply these methods.
 
The authors explain how data scientists’ choices are involved at every stage of the data science workflow—and show how a human-centered approach can enhance each one, by making the process more transparent, asking questions, and considering the social context of the data. They describe how tools from social science might be incorporated into data science practices, discuss different types of collaboration, and consider data storytelling through visualization. The book shows that data science practitioners can build rigorous and ethical algorithms and design projects that use cutting-edge computational tools and address social concerns.

Author

Cecilia Aragon is Professor in the Department of Human Centered Design and Engineering at the University of Washington. Shion Guha is Assistant Professor in the Faculty of Information at the University of Toronto. Marina Kogan is Assistant Professor in the School of Computing at the University of Utah. Michael Muller is a Research staff member at IBM Research. Gina Neff is Director of the Minderoo Centre for Technology and Democracy at the University of Cambridge and Professor of Technology and Society at the Oxford Internet Institute and the Department of Sociology at the University of Oxford. 
 
 

Table of Contents

Acknowledgments ix
1 Data Science to Human-Centered Data Science 1
2 The Data Science Cycle 13
3 Interrogating Data Science 31
4 Techniques and Tools for Data Science Models 51
5 Human-Centered Approaches to Data Science Problems 75
6 Human-Centered Data Science Methods 93
7 Collaborations across and beyond Data Science 115
8 Storytelling with Data 129
9 The Future of Human-Centered Data Science 147
Glossary 153
References 163
Index 179