Ionita-Laza Iuliana, McCallum Kenneth, Xu Bin, Buxbaum Joseph D
Department of Biostatistics, Columbia University, New York, New York, USA.
Department of Psychiatry, Columbia University, New York, New York, USA.
Nat Genet. 2016 Feb;48(2):214-20. doi: 10.1038/ng.3477. Epub 2016 Jan 4.
Over the past few years, substantial effort has been put into the functional annotation of variation in human genome sequences. Such annotations can have a critical role in identifying putatively causal variants for a disease or trait among the abundant natural variation that occurs at a locus of interest. The main challenges in using these various annotations include their large numbers and their diversity. Here we develop an unsupervised approach to integrate these different annotations into one measure of functional importance (Eigen) that, unlike most existing methods, is not based on any labeled training data. We show that the resulting meta-score has better discriminatory ability using disease-associated and putatively benign variants from published studies (in both coding and noncoding regions) than the recently proposed CADD score. Across varied scenarios, the Eigen score performs generally better than any single individual annotation, representing a powerful single functional score that can be incorporated in fine-mapping studies.
在过去几年中,人们在人类基因组序列变异的功能注释方面投入了大量精力。此类注释在从感兴趣位点处存在的大量自然变异中识别疾病或性状的潜在因果变异方面可能发挥关键作用。使用这些各种注释的主要挑战包括其数量众多和种类多样。在这里,我们开发了一种无监督方法,将这些不同的注释整合为一种功能重要性度量(特征值),与大多数现有方法不同,该度量不基于任何标记的训练数据。我们表明,使用已发表研究中的疾病相关变异和假定良性变异(在编码和非编码区域),所得的元分数比最近提出的CADD分数具有更好的区分能力。在各种情况下,特征值分数通常比任何单个注释表现更好,代表了一种强大的单一功能分数,可纳入精细定位研究中。