The Data Nutrition Project

Empowering data scientists and policymakers with practical tools to improve AI outcomes

Our Mission

We believe that technology should help us move forward without mirroring existing systemic injustice

The Data Nutrition Project team:

  • 1. Creates tools and practices that encourage responsible AI development
  • 2. Partners across disciplines to drive broader change
  • 3. Builds inclusion and equity into our work

Want to get involved? Contact Us!

The Problem

Garbage in, Garbage out

Incomplete, misunderstood, and historically problematic data can negatively influence AI algorithms.

Algorithms matter, and so does the data they’re trained on. To improve the accuracy and fairness of algorithms that determine everything from navigation directions to mortgage approvals, we need to make it easier for practitioners to quickly assess the viability and fitness of datasets they intend to train AI algorithms on.

There’s a missing step in the AI development pipeline: assessing datasets based on standard quality measures that are both qualitative and quantitative. We are working on packaging up these measures into an easy to use Dataset Nutrition Label.

Technical Solution

Standard interactive reports

A "nutrition label" for datasets.

The Data Nutrition Project aims to create a standard label for interrogating datasets for measures that will ultimately drive the creation of better, more inclusive algorithms.

Our current prototype includes a highly-generalizable interactive data diagnostic label that allows for exploring any number of domain-specific aspects in datasets. Similar to a nutrition label on food, our Dataset Nutrition Label aims to highlight the key ingredients in a dataset such as meta-data and populations, as well as unique or anomalous features regarding distributions, missing data, and comparisons to other ‘ground truth’ datasets. We are currently testing our label on several datasets, with an eye towards open sourcing this effort and gathering community feedback.

The design utilizes a ‘modular’ framework that can be leveraged to add or remove areas of investigation based on the domain of the dataset. For example, Dataset Nutrition Labels for data about people may include modules about the representation of race and gender, while Nutrition Labels for data about trees may not require that module.

To learn more, check out our live prototype built on the Dollars for Docs dataset from ProPublica. A first draft of our paper can be found here.

Community Solution

Workshops and Conversations
We believe that building artificial intelligence is as much about learning as it is about technical implementation. Through our workshop series, the Data Nutrition Project brings a curriculum of awareness to organizations of all sizes and types - from small technical teams to larger, non-technical communities.

Demystifying AI Workshop

Our first workshop in the series is a brief, non-technical overview of how Artificial Intelligence (AI) algorithms work. Participants participate in an experiential activity in which you get to “be the algorithm”, and afterwards reflect on how bias is perpetuated in the stages of algorithm development you experienced. We also tie this experience into current industry themes and examples and discuss the complexities of building tools that mitigate the issue.

This workshop is great for community groups looking to better understand how AI works, and how it is used in tools that we all use on a daily basis. It's also helpful for tech professionals who do not code, such as designers, project managers, etc. Contact Us to find out more!
Photo Credit: Jess Benjamin
Photo Credit: Jess Benjamin

Our Team

We are a group of researchers and technologists working together to tackle the challenges of ethics and governance of Artificial Intelligence as a part of the Assembly program at the Berkman Klein Center at Harvard University & MIT Media Lab.

Please note: This project is the work of individuals who participated in the Assembly program. If named, participants' employers are provided for identification purposes only.

Kasia Chmielinski

Project Lead
Technologist at McKinsey working to drive impact in the healthcare industry through advanced analytics. Previously at The US Digital Service (The White House) and the Scratch project at the MIT Media Lab. Ex-Googler, native Bostonian. Dabbled in architecture at the Chinese University of Hong Kong before graduating with a degree in physics from Harvard University. Avid bird-watcher.
Mary Jane

Sarah Newman

Research & Strategy
Senior Researcher at metaLAB at Harvard, Fellow at the Berkman Klein Center for Internet & Society, AI Grant Fellow. Studies new technologies and their effects on people. Creates interactive art installations that explore social and cultural dimensions of new tech, runs research workshops with creative materials. Leads metaLAB's work on AI + Art. Persuaded by the power of metaphors.

Josh Joseph

AI Research
Chief Intelligence Architect for MIT's Quest for Intelligence. Previously, Chief Science Officer at Alpha Features, an alternative data distribution platform, and co-founded a proprietary trading company based on machine learning driven strategy discovery and fully autonomous trading. Has done a variety of consulting work across finance, life sciences, and robotics. Aero/Astro PhD on modeling and planning in the presence of complex dynamics from MIT. BS in Applied Mathematics and Mechanical Engineering from RIT. Spends too much time arguing about consciousness. Terrible improviser.

Matt Taylor

Data Science & Workshop Facilitation
Freelance learning experience designer and facilitator, with a background in AI implementation. Previously worked as an engineer in natural language processing, moderation tool development, and creative coding platform development. Currently creating learning experiences in STEAM for young people, and demystifying AI for all people. Seasoned pun specialist.

Chelsea Qiu

Research Collaborator
Researcher at metaLAB at Harvard, architect in training. Previous research focused on the co-inhabitation of human and machines. Work explores the intersection of spaces, technology, and senses through physical and digital means. Teaches the integrated process of design and fabrication. M.Arch from MIT. Fascinated by the human brain and enjoys puzzles of all kinds.

Collaborating Organizations


Humanity Innovation Labs

User Experience Research & Design Collaborator
HIL is an agile consultancy that offers exploratory research and design services for ingenious proof of concepts in wearables, such as digital experiences and physical devices. We work in the ambiguous space of emerging technologies and use qualitative and quantitative methods in order to drive design. The sectors we work within are health and fitness, medical and industrial applications.



Sarah Holland

Research & Public Policy

Ahmed Hosny

Data Science
Photo Credit: Jess Benjamin

Frequently Asked Questions

A few questions you might have

Q. Do you have a prototype or more information?

Yes, we do! You can take a look at a live protoype of the Dataset Nutrition Label for the Dollars for Docs dataset that our friends at ProPublica have made available to our group. We are also currently working on a paper describing our work, the protoype, and future directions.

Q. What inspired this project?

We believe that algorithm developers want to build responsible and smart AI models, but that there is a key step missing in the standard way these models are built. This step is to interrogate the dataset for a variety of imbalances or problems it could have and ascertain if it is the right dataset for the model. We are inspired by the FDA's Nutrition Facts label in that it provides basic yet powerful facts that highlight issues in an accessible way. We aspire to do the same for datasets.

Q. Whom have you been speaking with?

We have been speaking with researchers in academia, practitioners at large technology companies, individual data scientists, organizations, and government institutions that host or open datasets to the public. If you’re interested in getting involved, please contact us.

Q. Is your work open source?

Yes. You can view our live protoype here, and the code behind the prototype on Github.

Q. Who is the intended beneficiary of this work?

Our primary audience for the Dataset Nutrition Label is primarily the data science and developer community who are building algorithmic AI models. However, we believe that a larger conversation must take place in order to shift the industry. Thus, we are also engaging with educators, policymakers, and researchers on best ways to amplify and highlight the potential of the Dataset Nutrition Label and the importance of data interrogation before model creation. If you’re interested in getting involved, please contact us.

Q. How will this project scale?

We believe that the Data Nutrition Project addresses a broad need in the model development ecosystem, and that the project will scale to address that need. Feedback on our prototype and opportunities to build additional prototypes on more datasets will certainly help us make strides.

Q. Is this a Harvard/MIT project?

This is a project of Assembly, a program run by the MIT Media Lab and the Berkman Klein Center.

Supported By:


The DNP project is a cross-industry collective. We are happy to welcome more into the fold, whether you are a policymaker, scientist, engineer, designer, or just a curious member of the public. We’d love to hear from you.