Department of Biological, Geological, and Environmental Sciences (BiGeA), University of Bologna, Via F. Selmi 3, Bologna 40126, Italy.
Department of Comparative Biomedicine and Food Science. University of Padova, Viale dell'Università, 16, 35020 Legnaro, PD, Italy.
Nucleic Acids Res. 2017 Jul 3;45(W1):W247-W252. doi: 10.1093/nar/gkx369.
One of the major challenges in human genetics is to identify functional effects of coding and non-coding single nucleotide variants (SNVs). In the past, several methods have been developed to identify disease-related single amino acid changes but only few tools are able to score the impact of non-coding variants. Among the most popular algorithms, CADD and FATHMM predict the effect of SNVs in non-coding regions combining sequence conservation with several functional features derived from the ENCODE project data. Thus, to run CADD or FATHMM locally, the installation process requires to download a large set of pre-calculated information. To facilitate the process of variant annotation we develop PhD-SNPg, a new easy-to-install and lightweight machine learning method that depends only on sequence-based features. Despite this, PhD-SNPg performs similarly or better than more complex methods. This makes PhD-SNPg ideal for quick SNV interpretation, and as benchmark for tool development.
PhD-SNPg is accessible at http://snps.biofold.org/phd-snpg.
人类遗传学的主要挑战之一是识别编码和非编码单核苷酸变异(SNV)的功能效应。过去,已经开发了几种方法来识别与疾病相关的单一氨基酸变化,但只有少数工具能够对非编码变异进行评分。在最流行的算法中,CADD 和 FATHMM 通过结合序列保守性和来自 ENCODE 项目数据的几种功能特征来预测非编码区域中 SNV 的影响。因此,要在本地运行 CADD 或 FATHMM,安装过程需要下载一大组预先计算的信息。为了方便变体注释过程,我们开发了 PhD-SNPg,这是一种新的易于安装和轻量级的机器学习方法,仅依赖于基于序列的特征。尽管如此,PhD-SNPg 的性能与更复杂的方法相似或更好。这使得 PhD-SNPg 非常适合快速 SNV 解释,并作为工具开发的基准。
PhD-SNPg 可在 http://snps.biofold.org/phd-snpg 上访问。