Principal component approximation and interpretation in health survey and biobank data

Authors: Yi-Sheng Chao, Hsing-Chien Wu, Chao-Jung Wu, and Wei-Chih Chen

Overview

Abstract (English)

Background: Increasing numbers of variables in surveys and administrative databases are created. Principal component analysis (PCA) is important to summarize data or reduce dimensionality. However, one disadvantage of using PCA is the interpretability of the principal components (PCs), especially in a high-dimensional database. By analyzing the variance distribution according to PCA loadings and approximating PCs with input variables, we aim to demonstrate the importance of variables based on the proportions of total variances contributed or explained by input variables. Methods: There were five data sets of various sizes used to understand the performance of PC approximation: Hitters, SF-12v2 subset of the 2004-2011 Medical Expenditure Panel Survey (MEPS), and the full set of 1996-2011 MEPS data, along with two data sets derived from the Canadian Health Measures Survey (CHMS): a spirometry subset with the measures from the first trial of spirometry and a full data set that contained non-redundant variables. The variables in data sets were first centered and scaled before PCA. PCs were approximated through two approaches. First, the PC loadings were squared to estimate the variance contribution by variables to PCs. The other method was to use forward-stepwise regression to approximate PCs with all input variables. Results: The first few PCs had large variances in each data set. Approximating PCs using stepwise regression could efficiently identify the input variables that explain large portions of PC variances than approximating according to PCA loadings in the data sets. It required fewer numbers of variables to explain more than 80% of the PC variances through stepwise regression. Conclusion: Approximating and interpreting PCs with stepwise regression is highly feasible.PC approximation is useful to (1) interpret PCs with input variables, (2) understand the major sources of variances in data sets, (3) select unique sources of information, and (4) search and rank input variables according to the proportions of PC variance explained. This can be an approach to systematically understand databases and search for variables that are important to databases.

Abstract (French)

Please note that abstracts only appear in the language of the publication and might not have a translation.

Details

Type	Journal article
Author	Yi-Sheng Chao, Hsing-Chien Wu, Chao-Jung Wu, and Wei-Chih Chen
Publication Year	2018
Title	Principal component approximation and interpretation in health survey and biobank data
Volume	5
Journal Name	Frontiers in Digital Humanities
Pages	11-Jan
Publication Language	English

Download Citation (.bib)

Related Publications

Yi-Sheng Chao, Chao-Jung Wu, Hsing-Chien Wu, and Wei-Chih Chen (2019).

Drug trends among non-institutionalized Canadians and the impact of data collection changes in the Canadian Health Measures Survey 2007 to 2015

PlosONE , 17-Jan

Yi-Sheng Chao, Chao-Jung Wu, Hsing-Chien Wu, and Wei-Chih Chen (2018).

Stages of biological development across Age: An analysis of Canadian Health Measure Survey 2007-2011

Frontiers in Public Health , 9-Jan

Yi-Sheng Chao, Chao-Jung Wu, Hsing-Chien Wu, Hui-Ting Hsu, Lien-Cheng Tsao, Yen-Po Cheng, Yi-Chun Lai, and Wei-Chih Chen (2020).

Opportunities and challenges from leading trends in a biomonitoring project: Canadian Health Measures Survey 2007–2017

Frontiers in Public Health , 9-Jan

Yi-Sheng Chao, Chao-Jung Wu, Hsing-Chien Wu, and Wei-Chih Chen (2018).

Trend analysis for national surveys: Application to all variables from the Canadian Health Measures Survey cycle 1 to 4

PLoS ONE , 15-Jan

Youssef Oulhote, Jonathan Chevrier, and Maryse F. Bouchard (2015).

Exposure to polybrominated diphenyl ethers (PBDEs) and hypothyroidism in Canadian women

Journal of Clinical Endocrinology and Metabolism , 590-598

Scott B. Patten and Jeanne V. A. Williams (2020).

Lithium, an infrequently used medication

Canadian Journal of Psychiatry , 204-205

P. J. Allison, T. Bailey, L. Beattie, S. Birch, L. Dempster, N. Edwards, B. Graham, J. Gray, D. Legault, N. E. MacDonald, M. McNally, R. Palmer, C. Quinonez, V. Ravaghi, and J. Steele (2014).

Improving access to oral health care for vulnerable people living in Canada

V. O. Onywera, M. Héroux, E. Jáuregui Ulloa, K. B. Adamo, J. López Taylor, I. Janssen, and Mark S. Tremblay (2013).

Adiposity and physical activity among children in countries at different stages of the physical activity transition: Canada, Mexico and Kenya

African Journal for Physical, Health Education, Recreation and Dance , 134-144

Data Used

Canadian Health Measures Survey

Cross-Sectional, Repeated (2007 to 2019)

CHMS

Research Data Centre(s)

QICSS (McGill-Concordia)

MCG

Canadian Research Data Centre Network

Principal component approximation and interpretation in health survey and biobank data

Overview

Abstract (English)

Abstract (French)

Details

Subjects

Quick Links

Related Publications

Yi-Sheng Chao, Chao-Jung Wu, Hsing-Chien Wu, and Wei-Chih Chen (2019).

Drug trends among non-institutionalized Canadians and the impact of data collection changes in the Canadian Health Measures Survey 2007 to 2015

Yi-Sheng Chao, Chao-Jung Wu, Hsing-Chien Wu, and Wei-Chih Chen (2018).

Stages of biological development across Age: An analysis of Canadian Health Measure Survey 2007-2011

Yi-Sheng Chao, Chao-Jung Wu, Hsing-Chien Wu, Hui-Ting Hsu, Lien-Cheng Tsao, Yen-Po Cheng, Yi-Chun Lai, and Wei-Chih Chen (2020).

Opportunities and challenges from leading trends in a biomonitoring project: Canadian Health Measures Survey 2007–2017

Yi-Sheng Chao, Chao-Jung Wu, Hsing-Chien Wu, and Wei-Chih Chen (2018).

Trend analysis for national surveys: Application to all variables from the Canadian Health Measures Survey cycle 1 to 4

Youssef Oulhote, Jonathan Chevrier, and Maryse F. Bouchard (2015).

Exposure to polybrominated diphenyl ethers (PBDEs) and hypothyroidism in Canadian women

Scott B. Patten and Jeanne V. A. Williams (2020).

Lithium, an infrequently used medication

P. J. Allison, T. Bailey, L. Beattie, S. Birch, L. Dempster, N. Edwards, B. Graham, J. Gray, D. Legault, N. E. MacDonald, M. McNally, R. Palmer, C. Quinonez, V. Ravaghi, and J. Steele (2014).

Improving access to oral health care for vulnerable people living in Canada

V. O. Onywera, M. Héroux, E. Jáuregui Ulloa, K. B. Adamo, J. López Taylor, I. Janssen, and Mark S. Tremblay (2013).

Adiposity and physical activity among children in countries at different stages of the physical activity transition: Canada, Mexico and Kenya

Data Used

Canadian Health Measures Survey

Research Data Centre(s)

QICSS (McGill-Concordia)