Robust Methods for Polygenic Analysis

Published
September 3, 2020

Modern genome-wide association studies have unequivocally demonstrated that complex traits are extremely polygenic, with each individual trait potentially involving thousands to tens of thousands of genetic variants. In this project, we will develop a series of novel methods to harness the power of polygenic signals in large GWAS to inform disease etiology and improve models for risk prediction. In (Aim 1), we will develop methods for conducting enrichment analysis of association signals in GWAS in relationship to various population genetic and functional genomic characteristics of the genome. We propose to model effect-size distributions associated with whole genome panel of markers using flexible normal-mixture models, where class memberships of the markers are modelled probabilistically in terms of various genomic “covariates”. Inferred models and underlying parameters will be further utilized in an empirical-Bayes framework to derive polygenic risk-scores (PRS) for genetic risk prediction. In (Aim 2), we will develop novel methods for Mendelian randomization analysis, a form of instrumental variable analysis, for the investigation of causal relationships between risk-factors and health outcomes. We will utilize flexible models for bivariate effect-size distributions across pairs of traits, allowing for genetic correlation to arise from both causal and non-causal relationships. We propose a solution to the complex problem of estimation of causal effects under the proposed framework using an innovative method for “spike detection” in the distribution of certain types of residuals. In (Aim 3), we will develop novel methods to enhance the power of gene-environment interaction analysis using PRS in case- control studies. We will develop retrospective methods that can take advantage of various natural assumptions about the distribution of PRS, including normality and its independence from environmental exposures, possibly conditional on other factors, in the underlying population. We will apply the proposed methods to conduct large scale analysis of existing GWAS datasets for a wide variety of traits and expect to make novel scientific observations regarding mechanisms of genetic susceptibility, causal basis for epidemiologic associations, nature of gene-environment interactions and utility of genetic risk prediction.