Analysis of correlated data with measurement error in responses or covariates

Authors: Zhijian Chen

Overview

Abstract (English)

Correlated data frequently arise from epidemiological studies, especially familial and longitudinal studies. Longitudinal design has been used by researchers to investigate the changes of certain characteristics over time at the individual level as well as how potential factors influence the changes. Familial studies are often designed to investigate the dependence of health conditions among family members. Various models have been developed for this type of multivariate data, and a wide variety of estimation techniques have been proposed. However, data collected from observational studies are often far from perfect, as measurement error may arise from different sources such as defective measuring systems, diagnostic tests without gold references, and self-reports. Under such scenarios only rough surrogate variables are measured. Measurement error in covariates in various regression models has been discussed extensively in the literature. It is well known that naive approaches ignoring covariate error often lead to inconsistent estimators for model parameters. In this thesis, we develop inferential procedures for analyzing correlated data with response measurement error. We consider three scenarios: (i) likelihood-based inferences for generalized linear mixed models when the continuous response is subject to nonlinear measurement errors; (ii) estimating equations methods for binary responses with misclassifications; and (iii) estimating equations methods for ordinal responses when the response variable and categorical/ordinal covariates are subject to misclassifications. The first problem arises when the continuous response variable is difficult to measure. When the true response is defined as the long-term average of measurements, a single measurement is considered as an error-contaminated surrogate. We focus on generalized linear mixed models with nonlinear response error and study the induced bias in naive estimates. We propose likelihood-based methods that can yield consistent and efficient estimators for both fixed-effects and variance parameters. Results of simulation studies and analysis of a data set from the Framingham Heart Study are presented. Marginal models have been widely used for correlated binary, categorical, and ordinal data. The regression parameters characterize the marginal mean of a single outcome, without conditioning on other outcomes or unobserved random effects. The generalized estimating equations (GEE) approach, introduced by Liang and Zeger (1986), only models the first two moments of the responses with associations being treated as nuisance characteristics. For some clustered studies especially familial studies, however, the association structure may be of scientific interest. With binary data Prentice (1988) proposed additional estimating equations that allow one to model pairwise correlations. We consider marginal models for correlated binary data with misclassified responses. We develop “corrected” estimating equations approaches that can yield consistent estimators for both mean and association parameters. The idea is related to Nakamura (1990) that is originally developed for correcting bias induced by additive covariate measurement error under generalized linear models. Our approaches can also handle correlated misclassifications rather than a simple misclassification process as considered by Neuhaus (2002) for clustered binary data under generalized linear mixed models. We extend our methods and further develop marginal approaches for analysis of longitudinal ordinal data with misclassification in both responses and categorical covariates. Simulation studies show that our proposed methods perform very well under a variety of scenarios. Results from application of the proposed methods to real data are presented. Measurement error can be coupled with many other features in the data, e.g., complex survey designs, that can complicate inferential procedures. We explore combining survey weights and misclassification in ordinal covariates in logistic regression analyses. We propose an approach that incorporates survey weights into estimating equations to yield design-based unbiased estimators. In the final part of the thesis we outline some directions for future work, such as transition models and semiparametric models for longitudinal data with both incomplete observations and measurement error. Missing data is another common feature in applications. Developing novel statistical techniques for dealing with both missing data and measurement error can be beneficial.

Abstract (French)

Please note that abstracts only appear in the language of the publication and might not have a translation.

Details

Type	PhD dissertation
Author	Zhijian Chen
Publication Year	2010
Title	Analysis of correlated data with measurement error in responses or covariates
City	Waterloo, ON
Department	Statistics and Actuarial Science
University	University of Waterloo
Publication Language	English

Download Citation (.bib)

Related Publications

Zhijian Chen, Grace Y. Yi, and Changbao Wu (2014).

Marginal analysis of longitudinal ordinal data with misclassification in both response and covariates

Biometrical Journal , 69-85

Zhijian Chen, Grace Y. Yi, and Changbao Wu (2011).

Marginal methods for correlated binary data with misclassified responses

Biometrika , 647-662

Grace Y. Yi, Zhijian Chen, and Changbao Wu (2016).

Analysis of correlated data with error-prone response under generalized linear mixed models

Peter Smith, B. T. Smith, Cameron Mustard, Hong Lu, and Rick Glazier (2013).

Estimating the direct and indirect pathways between education and diabetes incidence among Canadian men and women: A mediation analysis

Annals of Epidemiology , 143-149

Tanner Cassidy, Amanda Fortin, Stephanie Kaczmer, Jessica T. L. Shumaker, Jessica Szeto, and Stéphanie J. Madill (2017).

Relationship between back pain and urinary incontinence in the Canadian population

Physical Therapy , 449-454

F. C. Breslin and E. M. Adlaf (2005).

Part-time Work and Adolescent heavy episodic drinking: the influence of Family and Community Context

Journal of Studies on Alcohol , 784-794

Scott B. Patten, Jeanne V. A. Williams, Dina H. Lavorato, Kirsten M. Fiest, Andrew G. M. Bulloch, and JianLi Wang (2014).

Antidepressant use in Canada has stopped increasing

Canadian Journal of Psychiatry , 609-614

Mélissa Murray (2007).

Le rôle de la profession et du secteur économique sur le risque de dépression majeure

Data Used

Canadian Community Health Survey - Nutrition

Cross-Sectional, Repeated (2004 to 2015)

CCHS

Canadian Research Data Centre Network