La régression expectile pour l’analyse des données longitudinales

Auteurs: Amadou Diogo Barry

Aperçu

Résumé (français)

La régression, au sens large, est l’une des méthodes d’inférence les plus utilisées en modélisation. La régression modélise la relation entre les régresseurs et la variable réponse. Cette modélisation se résume par l’estimation de l’influence des régresseurs sur la moyenne conditionnelle de la variable réponse. Alors que l’inférence sur la moyenne conditionnelle est généralement acceptable, il arrive que l’intérêt porte sur l’estimation des queues de la distribution de la variable réponse conditionnellement aux régresseurs. Dans ce contexte, la régression classique est inefficace et il faut aller au-delà de l’estimation de la moyenne conditionnelle. La littérature moderne offre des approches pour répondre à ce genre de problématique, notamment avec la régression asymétrique des moindres carrés pondérés. La régression asymétrique des moindres carrés pondérés ou régression expectile (RE) a récemment gagné en popularité, en partie grâce à ses propriétés statistiques et computationnelles attrayantes. La RE estime les fonctions expectiles/percentiles de la distribution de la variable réponse en fonction des régresseurs et de leur coefficient. Par conséquent, la RE permet d’examiner et d’analyser l’influence des régresseurs sur la distribution conditionnelle de la variable réponse, révélant ainsi une variété de formes d’hétérogénéité. De plus, la RE est très simple à mettre en oeuvre comparativement à son analogue, la régression quantile (RQ). Dans la présente thèse, nous introduisons la RE à l’analyse des données longitudinales. Nous étudions l’association de la RE au modèle GEE et au modèle linéaire avec effets-fixes (EF). Le modèle GEE et le modèle EF sont des modèles très réputés et communément utilisés en biostatistique et en économétrie. Les données longitudinales sont de loin les données observationnelles les plus appréciées. Les données longitudinales prennent en compte la dynamique, le développement et le changement de la population à l’étude et offrent une meilleure inférence des paramètres du modèle. Ensuite, nous présentons le plan de la thèse. Dans le chapitre préliminaire, Chapitre I, nous introduisons les statistiques asymétriques (quantile et expectile) et quelques-unes de leurs propriétés. Nous discutons leurs similarités et complémentarités. Par la suite, nous introduisons les modèles de la régression quantile (RQ) et de la régression expectile (RE) associés au modèle linéaire simple. Après l’introduction des modèles RQ et RE, nous présentons les propriétés asymptotiques de leur estimateur. Nous terminons le chapitre par la présentation succincte du modèle GEE, du modèle EF et du modèle linéaire avec effets-aléatoires (EA), ainsi que les propriétés asymptotiques de leur estimateur. Dans le second chapitre (Chapitre II), nous introduisons une nouvelle classe d’estimateurs qui découle de l’association de la régression des moindres carrés asymétriques pondérés et des équations d’estimation généralisées (GEE). Cette nouvelle classe estime l’expectile de la variable réponse en fonction des régresseurs et inclut une structure de corrélation hypothétique dans les équations d’estimation pour modéliser la dépendance des données. De plus, les structures de corrélation couramment utilisées avec le modèle GEE se généralisent et s’appliquent naturellement dans les équations d’estimation de cette nouvelle classe d’estimateurs. Cette dernière permettra au modèle GEE de capturer l’hétérogénéité des effets des régresseurs et de tenir compte de l’hétérogénéité non observée. Nous avons montré les propriétés asymptotiques de ces nouveaux estimateurs et avons proposé un estimateur robuste de leur matrice de variance-covariance. Les résultats des simulations exhaustives ont démontré leurs qualités favorables dans différents scénarios et leurs avantages par rapport à d’autres méthodes similaires. Finalement, nous avons étudié l’effet d’un nouveau traitement sur la douleur du travail pendant l’accouchement pour illustrer la méthode. Le troisième chapitre (Chapitre III) introduit le modèle de la régression expectile avec effets-fixes (ERFE). Le modèle ERFE hérite de propriétés attrayantes pour l’analyse des données longitudinales. D’abord comme extension du modèle EF, ie modèle ERFE, dans sa spécification, tient compte de la corrélation entre les régresseurs du modèle et les caractéristiques individuelles non-observées, comme les facteurs génétiques et environnementaux. Ensuite, grâce à l’approche de la régression des moindres carrés asymétriques pondérés, le modèle ERFE permet l’estimation et l’analyse de l’influence des régresseurs sur la localisation, l’échelle et la forme de la distribution conditionnelle de la variable réponse. Cela dit, le modèle ERFE pose aussi le problème lié au modèle EF désigné par les termes <>. Nous montrons que !’estimateur ERFE est un <>. Autrement dit, l’estimateur ERFE peut être dérivé en utilisant de manière itérative la stratégie de la <> proposée dans le cadre du modèle EF pour résoudre le problème et éliminer le paramètre individuel. Nous établissons les propriétés asymptotiques de l’estimateur ERFE et suggérons un estimateur convergent et hétéroscédastique pour sa matrice de variance-covariance. Nous avons évalué les performances de l’estimateur ERFE à travers une simulation exhaustive et l’avons comparé au modèle de la régression quantile avec effets-fixes (QRFE). Les résultats sont mitigés, le modèle ERFE est compétitif et plus performant dans certains scénarios. Nous l’avons employé pour étudier le rendement scolaire sur le salaire avec les données réelles sur l’étude de la dynamique des revenus (PSID). Le dernier chapitre (Chapitre IV) porte sur une approche originale pour résoudre le <> dans le modèle ERFE. Cette approche, que nous désignons par PERFE, consiste à appliquer une pénalité au paramètre individuel. En plus de conserver les propriétés attrayantes du modèle ERFE, le modèle PERFE permet l’estimation des régresseurs invariants dans le temps. Nous avons appliqué la pénalité l1 afin de régulariser le paramètre individuel autour de la valeur zéro. Le degré de régularisation est contrôlé par le paramètre de régularisation et sa valeur optimale est choisie en s’appuyant sur le critère d’information bayésien (BIC). Nous appliquons également une astuce pour déterminer le chemin de la solution du paramètre de régularisation et réduire le temps de calcul. Les résultats de la simulation montrent que !’estimateur PERFE est plus performant que le modèle ERFE et le modèle QRFE avec pénalité (PQRFE). Nous appliquons le modèle PERFE aux données PSID pour étudier l’hétérogénéité du rendement scolaire.

Résumé (anglais)

Regression, in the broad sense, is one of the most used inference methods in modeling. Regression models the relationship between the regressors and the response variable. This modeling is summarized by the estimation of the influence of the regressors on the conditional mean of the response variable. While the inference on the conditional mean is generally acceptable, it sometimes happens that the interest is in estimating the tails of the distribution of the response variable conditionally to the regressors. In this context, classical regression is ineffective and it is necessary to go beyond the estimation of the conditional mean. Modern literature offers approaches to answer this kind of problem, notably with the asymmetric regression of the weighted least squares. Weighted least square asymmetric regression or expectant regression (ER) has recently gained popularity, in part due to its attractive statistical and computational properties. The ER estimates the expectile / percentile functions of the distribution of the response variable according to the regressors and their coefficient. Therefore, ER allows to examine and analyze the influence of regressors on the conditional distribution of the response variable, thus revealing a variety of forms of heterogeneity. In addition, ER is very simple to implement compared to its analogue, quantile regression (RQ). In this thesis, we introduce ER to the analysis of longitudinal data. We study the association of RE with the GEE model and with the linear model with fixed effects (EF). The GEE model and the EF model are very famous and commonly used models in biostatistics and econometrics. Longitudinal data are by far the most popular observational data. Longitudinal data take into account the dynamics, development and change of the study population and offer a better inference of the parameters of the model. Next, we present the thesis outline. In the preliminary chapter, Chapter I, we introduce asymmetric statistics (quantile and expectile) and some of their properties. We discuss their similarities and complementarities. Subsequently, we introduce the quantile regression (RQ) and expectile regression (RE) models associated with the simple linear model. After the introduction of the RQ and RE models, we present the asymptotic properties of their estimator. We end the chapter with a brief presentation of the GEE model, the EF model and the linear model with random effects (EA), as well as the asymptotic properties of their estimator. In the second chapter (Chapter II), we introduce a new class of estimators which follows from the association of weighted asymmetric least squares regression and generalized estimation equations (GEE). This new class estimates the expectation of the response variable as a function of the regressors and includes a hypothetical correlation structure in the estimation equations to model the dependence of the data. In addition, the correlation structures commonly used with the GEE model are generalized and naturally apply in the estimation equations of this new class of estimators. The latter will allow the GEE model to capture the heterogeneity of the effects of the regressors and to take account of the unobserved heterogeneity. We have shown the asymptotic properties of these new estimators and have proposed a robust estimator of their variance-covariance matrix. The results of the exhaustive simulations have demonstrated their favorable qualities in different scenarios and their advantages compared to other similar methods. Finally, we studied the effect of a new treatment on labor pain during childbirth to illustrate the method. The third chapter (Chapter III) introduces the expectile regression with fixed effects (ERFE) model. The ERFE model inherits attractive properties for the analysis of longitudinal data. First as an extension of the EF model, ie the ERFE model, in its specification, takes into account the correlation between the regressors of the model and unobserved individual characteristics, such as genetic and environmental factors. Then, thanks to the weighted asymmetric least squares regression approach, the ERFE model allows the estimation and analysis of the influence of the regressors on the location, the scale and the shape of the conditional distribution of the variable. reply. That said, the ERFE model also poses the problem linked to the EF model designated by the terms “incidental parameter problem”. We show that the ERFE estimator is an “iterative within-transformation estimator”. In other words, the ERFE estimator can be derived by using iteratively the “within-transformation” strategy proposed in the framework of the EF model to solve the problem and eliminate the individual parameter. We establish the asymptotic properties of the ERFE estimator and suggest a convergent and heteroscedastic estimator for its variance-covariance matrix. We evaluated the performance of the ERFE estimator through an exhaustive simulation and compared it to the quantile regression model with fixed effects (QRFE). The results are mixed, the ERFE model is competitive and more efficient in certain scenarios. We used it to study academic performance on wages with actual data on the study of income dynamics (PSID). The last chapter (Chapter IV) deals with an original approach to solve the “incidental parameter problem” in the ERFE model. This approach, which we refer to as PERFE, consists in applying a penalty to the individual parameter. In addition to retaining the attractive properties of the ERFE model, the PERFE model allows the estimation of time invariant regressors. We applied the l1 penalty in order to regularize the individual parameter around the value zero. The degree of regularization is controlled by the regularization parameter and its optimal value is chosen based on the Bayesian information criterion (BIC). We also apply a trick to determine the path of the solution of the regularization parameter and reduce the calculation time. The simulation results show that the PERFE estimator is more efficient than the ERFE model and the QRFE –model with penalty (PQRFE). We apply the PERFE model to PSID data to study the heterogeneity of academic performance.

Détails

Type	Thèse de doctorat
Auteur	Amadou Diogo Barry
Année de pulication	2019
Titre	La régression expectile pour l’analyse des données longitudinales
Ville	Montréal, QC
Département	Département de mathématiques
Université	Université du Québec à Montréal (UQAM)
Langue de publication	Français

Télécharger la citation

Publications connexes

Chantal Blouin, Nathalie Vandal, Amadou Diogo Barry, Yun Jen, Denis Hamel, Ernest Lo, et Sylvie Martel (2016).

The economic consequences associated with obesity and overweight in Québec: Costs tied to hospitalization and medical consultations

Chantal Blouin, Denis Hamel, Nathalie Vandal, Amadou Diogo Barry, Ernest Lo, Guy Lacroix, Johanne Laguë, Marie-France Langlois, Sylvie Martel, Pierre-Carl Michaud, et Louis Pérusse (2017).

The economic consequences of obesity and overweight among adults in Quebec

Canadian Journal of Public Health

Chantal Blouin, Nathalie Vandal, Amadou Diogo Barry, Yun Jen, Denis Hamel, Ernest Lo, et Sylvie Martel (2016).

Les conséquences économiques associées à l'obésité et à l'embonpoint au Québec : les coût liés à l'hospitalisation et aux consultations médicales - Mise à jour 2016

Mila Kingsbury, Gabrielle Dupuis, Felice Jacka, Marie-Hélène Roy-Gagnon, Seanna E. McMartin, et Ian Colman (2016).

Associations between fruit and vegetable consumption and depressive symptoms: Evidence from a national Canadian longitudinal survey

Journal of Epidemiology and Community Health , 155-161

Mohammad Hajizadeh, Arnold Mitnitski, et Kenneth Rockwood (2016).

Socioeconomic gradient in health in Canada: Is the gap widening or narrowing?

Health Policy , 1040-1050

(2009).

Changes in perceived job strain and the risk of major depression: results from a population-based longitudinal study

American Journal of Epidemiology , 1085-1091

D. C. Cole, S. Ibrahim, et H. S. Shannon (2005).

Predictors of work-related repetitive strain injuries in a population cohort

American Journal of Public Health , 1233-1237

S. B. Patten, J. V. Williams, D. Lavorato, et S. Khaled A. G. Bulloch (2011).

Weight gain in relation to major depression and antidepressant medication use

Journal of Affective Disorders , 288-293

Données utilisées

Enquête nationale sur la santé de la population - Volet ménages - longitudinal

Enquêtes longitudinales, Enquête (1994 À 2011)

ENSP

Centre(s) de données de recherche

CIQSS (Montréal)

MTL

Réseau canadien des Centres de données de recherche

La régression expectile pour l’analyse des données longitudinales

Aperçu

Résumé (français)

Résumé (anglais)

Détails

Sujets

Liens rapides

Publications connexes

Chantal Blouin, Nathalie Vandal, Amadou Diogo Barry, Yun Jen, Denis Hamel, Ernest Lo, et Sylvie Martel (2016).

The economic consequences associated with obesity and overweight in Québec: Costs tied to hospitalization and medical consultations

Chantal Blouin, Denis Hamel, Nathalie Vandal, Amadou Diogo Barry, Ernest Lo, Guy Lacroix, Johanne Laguë, Marie-France Langlois, Sylvie Martel, Pierre-Carl Michaud, et Louis Pérusse (2017).

The economic consequences of obesity and overweight among adults in Quebec

Chantal Blouin, Nathalie Vandal, Amadou Diogo Barry, Yun Jen, Denis Hamel, Ernest Lo, et Sylvie Martel (2016).

Les conséquences économiques associées à l'obésité et à l'embonpoint au Québec : les coût liés à l'hospitalisation et aux consultations médicales - Mise à jour 2016

Mila Kingsbury, Gabrielle Dupuis, Felice Jacka, Marie-Hélène Roy-Gagnon, Seanna E. McMartin, et Ian Colman (2016).

Associations between fruit and vegetable consumption and depressive symptoms: Evidence from a national Canadian longitudinal survey

Mohammad Hajizadeh, Arnold Mitnitski, et Kenneth Rockwood (2016).

Socioeconomic gradient in health in Canada: Is the gap widening or narrowing?

(2009).

Changes in perceived job strain and the risk of major depression: results from a population-based longitudinal study

D. C. Cole, S. Ibrahim, et H. S. Shannon (2005).

Predictors of work-related repetitive strain injuries in a population cohort

S. B. Patten, J. V. Williams, D. Lavorato, et S. Khaled A. G. Bulloch (2011).

Weight gain in relation to major depression and antidepressant medication use

Données utilisées

Enquête nationale sur la santé de la population - Volet ménages - longitudinal

Centre(s) de données de recherche

CIQSS (Montréal)