NIST Unlinkable Data Challenge

Designed to advance data privacy and public safety data, the Unlinkable Data Challenge is focused on proactively protecting individual privacy while allowing for data to be used by researchers for positive purposes and outcomes.  It is well known that privacy in data release is an important area for the Federal Government (which has an Open Data Policy), state governments, the public safety sector and many commercial non-governmental organizations. Developments coming out of this competition would drive major advances in the practical applications of differential privacy for these organizations.

UnlinkableDataChallenge - Opportunity to Venture Smarter

The purpose of this series of competitions is to provide a platform for researchers to develop more advanced differentially private methods that can substantially improve the privacy protection and utility of the resulting datasets.

Important Dates:

Submission deadline             July 26, 2018 - 5pm ET

People’s Choice Voting         August 14 – August 28, 2018

Winners Announced              September 12, 2018


The Unlinkable Data Challenge is a multi-stage Challenge.  This first stage of the Challenge is intended to source detailed concepts for new approaches, inform the final design in the two subsequent stages, and provide recommendations for matching stage 1 competitors into teams for subsequent stages.  Teams will predict and justify where their algorithm fails with respect to the utility-privacy frontier curve.

In this stage, competitors are asked to propose how to de-identify a dataset using less than the available privacy budget, while also maintaining the dataset’s utility for analysis.  For example, the de-identified data, when put through the same analysis pipeline as the original dataset, produces comparable results (i.e. similar coefficients in a linear regression model, or a classifier that produces similar predictions on sub-samples of the data).

This stage of the Challenge seeks Conceptual Solutions that describe how to use and/or combine methods in differential privacy to mitigate privacy loss when publicly releasing datasets in a variety of industries such as public safety, law enforcement, healthcare/biomedical research, education, and finance.  We are limiting the scope to addressing research questions and methodologies that require regression, classification, and clustering analysis on datasets that contain numerical, geo-spatial, and categorical data.

To compete in this stage, we are asking that you propose a new algorithm utilizing existing or new randomized mechanisms with a justification of how this will optimize privacy and utility across different analysis types.  We are also asking you to propose a dataset that you believe would make a good use case for your proposed algorithm, and provide a means of comparing your algorithm and other algorithms.