A method for managing re-identification risk from small geographic areas in Canada

Authors: Khaled El Emam, Ann Brown, Philip AbdelMalik, Angelica Neisa, Mark Walker, Jim Bottomley, and Tyson Roffey

Overview

Abstract (English)

Background: A common disclosure control practice for health datasets is to identify small geographic areas and either suppress records from these small areas or aggregate them into larger ones. A recent study provided a method for deciding when an area is too small based on the uniqueness criterion. The uniqueness criterion stipulates that an the area is no longer too small when the proportion of unique individuals on the relevant variables (the quasi-identifiers) approaches zero. However, using a uniqueness value of zero is quite a stringent threshold, and is only suitable when the risks from data disclosure are quite high. Other uniqueness thresholds that have been proposed for health data are 5% and 20%. Methods: We estimated uniqueness for urban Forward Sortation Areas (FSAs) by using the 2001 long form Canadian census data representing 20% of the population. We then constructed two logistic regression models to predict when the uniqueness is greater than the 5% and 20% thresholds, and validated their predictive accuracy using 10-fold cross-validation. Predictor variables included the population size of the FSA and the maximum number of possible values on the quasi-identifiers (the number of equivalence classes). Results: All model parameters were significant and the models had very high prediction accuracy, with specificity above 0.9, and sensitivity at 0.87 and 0.74 for the 5% and 20% threshold models respectively. The application of the models was illustrated with an analysis of the Ontario newborn registry and an emergency department dataset. At the higher thresholds considerably fewer records compared to the 0% threshold would be considered to be in small areas and therefore undergo disclosure control actions. We have also included concrete guidance for data custodians in deciding which one of the three uniqueness thresholds to use (0%, 5%, 20%), depending on the mitigating controls that the data recipients have in place, the potential invasion of privacy if the data is disclosed, and the motives and capacity of the data recipient to re-identify the data. Conclusion: The models we developed can be used to manage the re-identification risk from small geographic areas. Being able to choose among three possible thresholds, a data custodian can adjust the definition of “small geographic area” to the nature of the data and recipient.

Abstract (French)

Please note that abstracts only appear in the language of the publication and might not have a translation.

Details

Type	Journal article
Author	Khaled El Emam, Ann Brown, Philip AbdelMalik, Angelica Neisa, Mark Walker, Jim Bottomley, and Tyson Roffey
Publication Year	2010
Title	A method for managing re-identification risk from small geographic areas in Canada
Volume	10
Journal Name	BMC Medical Informatics and Decision Making
Number	18
Pages	13-Jan
Publication Language	English

Download Citation (.bib)

Related Publications

Khaled El Emam, Ann Brown, and Philip AbdelMalik (2009).

Evaluating predictors of geographic area population size cut-offs to manage re-identification risk

Journal of American Medical Informatics Association , 256-266

Elizabeth Dhuey, Christine Neill, and Jean Eid (2019).

Parental employment effects of switching from half day to full day kindergarten: Evidence from Ontario's french schools

IZA Discussion Paper Series Number

Nicole Fortin, Thomas Lemieux, and Javier Torres (2016).

Foreign human capital and the earnings gap between immigrants and Canadian-born workers

Labour Economics , 104-119

Brahim Boudarbat, Thomas Lemieux, and W. Craig Riddell (2010).

The evolution of the returns to human capital in Canada, 1980-2005

Canadian Public Policy , 63-89

Benoît Dostie (2018).

Polarisation du marché du travail, structure industrielle et croissance économique

Rapport de projet

Canadian Research Data Centre Network

A method for managing re-identification risk from small geographic areas in Canada

Overview

Abstract (English)

Abstract (French)

Details

Subjects

Quick Links

Related Publications

Khaled El Emam, Ann Brown, and Philip AbdelMalik (2009).

Evaluating predictors of geographic area population size cut-offs to manage re-identification risk

Elizabeth Dhuey, Christine Neill, and Jean Eid (2019).

Parental employment effects of switching from half day to full day kindergarten: Evidence from Ontario's french schools

Nicole Fortin, Thomas Lemieux, and Javier Torres (2016).

Foreign human capital and the earnings gap between immigrants and Canadian-born workers

Brahim Boudarbat, Thomas Lemieux, and W. Craig Riddell (2010).

The evolution of the returns to human capital in Canada, 1980-2005

Benoît Dostie (2018).

Polarisation du marché du travail, structure industrielle et croissance économique

Theodora Pouliou (2009).

Overweight and obesity in Canada: Understanding the individual and socio-environmental determinants

Sébastien Breau (2014).

The occupy movement and the geography of the top 1% in Canada

Patrick Sabourin and Alain Bélanger (2015).

La dynamique des substitutions linguistiques au Canada

Data Used

Canadian Population Census

Research Data Centre(s)

ORDC (Ottawa)