Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America.
Department of Genetics, Yale School of Medicine, New Haven, Connecticut, United States of America.
PLoS Genet. 2021 Nov 4;17(11):e1009849. doi: 10.1371/journal.pgen.1009849. eCollection 2021 Nov.
Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.
最近的研究表明,基于从头突变 (DNMs) 的发现,多种早发性疾病具有共同的风险基因。因此,我们可以利用一种特征的信息来提高识别另一种特征基因的统计能力。然而,能够联合分析多种特征的 DNMs 的方法很少。在这项研究中,我们开发了一个名为 M-DATA(具有注释的多特征从头突变关联测试框架)的框架,通过整合来自多个相关特征及其功能注释的数据来提高关联分析的统计能力。我们使用来自多种疾病的 DNMs 数量,开发了一种基于期望最大化算法的方法,不仅可以推断两种疾病之间的关联程度,还可以估计每种疾病的基因关联概率。我们将我们的方法应用于联合分析先天性心脏病 (CHD) 和自闭症数据的案例研究。我们的方法能够从联合分析中鉴定出 23 个 CHD 基因,包括 12 个新基因,这大大超过了单特征分析,为 CHD 疾病病因学提供了新的见解。