Odile Stalder
UniNE, Institut de statistique
29.10.2015
Computationally Efficient Estimation of Gene-Environment Interactions when the Genetic Component is Multivariate and Complex
Master's thesis, master of Science in statistics, Research Center for Statistics and Geneva School of Economics and Management, University of Geneva, August 2015
Thesis adviser: Raymond J. Carrol, Department of Statistics, Texas A&M University
Abstract
In case-control studies of gene-environment interactions that influence disease outcomes, there is extensive literature about efficient estimation of these interactions when assumptions can be made about the relationship of the genetics and environmental variables in the source population, e.g., that they are independent within strata. All of these papers are based on distributional assumptions about the genetic variables given the environmental variables. The approach used in this literature is semiparametric, with no assumptions made about the distribution of the environmental variables. We show that it is not the distribution of the genetic variables that is important, but only an expectation of a simple function of the genetic variables given the environmental variables. We further show that, in important instances, this expectation can be estimated nonparametrically, thus dispensing with the need to model the distribution of the genetic variables. We develop methodology to exploit this insight, which can be used when the genetic variables are continuous, such as a polygenic risk score, or indeed of any form, e.g., a mixture of continuous and discrete variables. The methods are applied to a study of the interaction of a polygenic risk score and age at menarche on the risk of breast cancer.
Some Key Words: Case-control studies; Gene-environment interactions; Genetic epidemiology;
Pseudolikelihood; Retrospective studies; Semiparametric methods.
Short title: Complex Gene-Environment Interactions