The Dataset Nutrition Label
Driving healthy data use through increased transparency

A Nutrition Label for Datasets
Similar to food nutrition labels, Dataset Nutrition Labels provides transparency into the contents of a dataset to drive “healthier” use.
The Dataset Nutrition Label is a free, public-facing, voluntarily disclosed dataset standard. We believe that increased visibility into dataset provenance, quality, and intended use drives better data practices and helps mitigate bias in AI systems.

How It Works
3 easy steps to get started
You’ll need a few things to get started: a user account, information about a dataset, and team member emails to enable collaboration.

1. Create your first label
Follow this link to set up your account and create your first Label.
As you follow prompts in the Label Maker, you can always save your progress and log back in to continue.

2. Add collaborators
We have found that often, teams are building Labels together.
You can specify and share Label drafts with collaborators who will be able to edit the label you have created.

3. Click “Submit” when you’re done
Submit your completed Label for review.
When you have finished completing the Label Maker process, you can submit the Label for review to the Data Nutrition Project team. While your Label is under review, you will receive a watermarked “Draft” of your Label.

The Story
A “nutrition label” for datasets.
The Data Nutrition Project aims
to create a standard label for interrogating datasets.
Our belief is that transparency into dataset health can lead to better decisions, which will in turn lead to better AI.
Founded in 2018 through the Harvard-MIT Assembly Fellowship, the Data Nutrition Project takes inspiration from nutritional labels on food, aiming to build labels that highlight the key ingredients in a dataset, such as metadata and demographic representation, as well as unique or anomalous features regarding distributions, missing data, and comparisons to other “ground truth” datasets.
Now in its third generation design, the current Dataset Nutrition Label provides information about a dataset including its intended use and other known uses, the process of cleaning, managing, and curating that data, ethical and or technical reviews, the inclusion of subpopulations in the dataset, and a series of potential risks or limitations in the dataset.

Third Generation Dataset Nutrition Label (2022)
Take a Closer Look
See examples of Dataset Nutrition Labels
Measuring Massive Multitask Language Understanding
This is a massive multitask test consisting of multiple-choice questions from various branches of knowledge. The test spans subjects in the humanities…
ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition
ASL Citizen is the first crowdsourced Isolated Sign Language Recognition (ISLR) dataset, collected with consent and containing 83,399 videos…
2024 ISIC Challenge SLICE-3D Dataset
The 2024 ISIC Challenge SLICE-3D (“Skin Lesion Image Crops Extracted from 3D Total Body Photography”) dataset was created for the 2024 ISIC…
Replication Data for: Capturing Bonding, Bridging, and Linking Social Capital through Publicly…
A growing body of research has illuminated the powerful role played by social capital in influencing disaster and resilience outcomes. Popular…
Don’t know where to begin?
Check out the tutorial above to see a step-by-step explanation of how to start your Label using our web-based Label Maker tool. If you have more questions, feel free to reach out!