Information Box Group
What are microdata and how are they used in an RDC?
Microdata are observations at the level of households, businesses or individuals. In an RDC, they are protected in a way that preserves individual confidentiality. Their value is in allowing the robust analysis of group differences. For example, rather than relying on the average education, income, or health status of individuals in a particular region, microdata allow the study of important and policy-relevant subgroup differences such as: men/women, immigrants/native-born, young/old or those with less or more education.
The RDC (or Research Data Centre) is a secure facility where researchers are able to access detailed microdata. There are RDC locations at universities across Canada and an RDC for federal government employees in Ottawa.
The microdata used by CRDCN researchers come primarily from Statistics Canada Survey Master files. These are detailed individual level data. Increasingly, the RDCs are repositories of administrative records from a variety of sources including tax, employment insurance, social assistance, and hospitalization records.
Microdata confidentiality in the RDCs
Individual data are protected in three important ways.
- All names and ID numbers are removed from the data before they are placed in the RDC.
- RDC researchers are only permitted to access data on secure systems that have no internet access.
- After initially training researchers on issues regarding confidentiality and privacy, specially trained Statistics Canada personnel use “disclosure analysis” to ensure that no results leave the RDC that could identify an individual, household or business.
Statistics Canada has created a variety of Public Use Microdata Files (or PUMFS) based on the RDC masterfiles. There are many reasons why a researcher might require access to an RDC masterfile:
- Some masterfiles do not have an available PUMF and so the information is only accessible by accessing the masterfile.
- Masterfiles commonly have additional detail not available in PUMFS. For example, PUMFs often categorize age in broad ranges while masterfiles provide the age in years. But for many analyses it is helpful to separate people into finer age groups: those aged 16 can be very different from those aged 24.