Shihab Hashem A, Rogers Mark F, Gough Julian, Mort Matthew, Cooper David N, Day Ian N M, Gaunt Tom R, Campbell Colin
MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol BS8 2BN, UK, Bristol Centre for Systems Biomedicine, University of Bristol, Bristol BS8 2BN, UK, Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK, Department of Computer Science, University of Bristol, Bristol BS8 1UB, UK and Institute of Medical Genetics, Cardiff University, Cardiff CF14 4XN, UK MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol BS8 2BN, UK, Bristol Centre for Systems Biomedicine, University of Bristol, Bristol BS8 2BN, UK, Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK, Department of Computer Science, University of Bristol, Bristol BS8 1UB, UK and Institute of Medical Genetics, Cardiff University, Cardiff CF14 4XN, UK.
MRC Integrative Epidemiology Unit (IEU), University of Bristol, Bristol BS8 2BN, UK, Bristol Centre for Systems Biomedicine, University of Bristol, Bristol BS8 2BN, UK, Intelligent Systems Laboratory, University of Bristol, Bristol BS8 1UB, UK, Department of Computer Science, University of Bristol, Bristol BS8 1UB, UK and Institute of Medical Genetics, Cardiff University, Cardiff CF14 4XN, UK.
Bioinformatics. 2015 May 15;31(10):1536-43. doi: 10.1093/bioinformatics/btv009. Epub 2015 Jan 11.
Technological advances have enabled the identification of an increasingly large spectrum of single nucleotide variants within the human genome, many of which may be associated with monogenic disease or complex traits. Here, we propose an integrative approach, named FATHMM-MKL, to predict the functional consequences of both coding and non-coding sequence variants. Our method utilizes various genomic annotations, which have recently become available, and learns to weight the significance of each component annotation source.
We show that our method outperforms current state-of-the-art algorithms, CADD and GWAVA, when predicting the functional consequences of non-coding variants. In addition, FATHMM-MKL is comparable to the best of these algorithms when predicting the impact of coding variants. The method includes a confidence measure to rank order predictions.
技术进步使得能够在人类基因组中识别出越来越多的单核苷酸变异,其中许多变异可能与单基因疾病或复杂性状相关。在此,我们提出一种名为FATHMM-MKL的综合方法,用于预测编码和非编码序列变异的功能后果。我们的方法利用了最近可用的各种基因组注释,并学习权衡每个组件注释源的重要性。
我们表明,在预测非编码变异的功能后果时,我们的方法优于当前最先进的算法CADD和GWAVA。此外,在预测编码变异的影响时,FATHMM-MKL与这些算法中的最佳算法相当。该方法包括一个置信度度量,用于对预测进行排序。