Division of Informatics, Department of Pathology, University of Alabama at Birmingham, Birmingham AL, USA.
BMC Genomics. 2013;14 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2164-14-S3-S6. Epub 2013 May 28.
SNPs&GO is a method for the prediction of deleterious Single Amino acid Polymorphisms (SAPs) using protein functional annotation. In this work, we present the web server implementation of SNPs&GO (WS-SNPs&GO). The server is based on Support Vector Machines (SVM) and for a given protein, its input comprises: the sequence and/or its three-dimensional structure (when available), a set of target variations and its functional Gene Ontology (GO) terms. The output of the server provides, for each protein variation, the probabilities to be associated to human diseases.
The server consists of two main components, including updated versions of the sequence-based SNPs&GO (recently scored as one of the best algorithms for predicting deleterious SAPs) and of the structure-based SNPs&GO(3d) programs. Sequence and structure based algorithms are extensively tested on a large set of annotated variations extracted from the SwissVar database. Selecting a balanced dataset with more than 38,000 SAPs, the sequence-based approach achieves 81% overall accuracy, 0.61 correlation coefficient and an Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve of 0.88. For the subset of ~6,600 variations mapped on protein structures available at the Protein Data Bank (PDB), the structure-based method scores with 84% overall accuracy, 0.68 correlation coefficient, and 0.91 AUC. When tested on a new blind set of variations, the results of the server are 79% and 83% overall accuracy for the sequence-based and structure-based inputs, respectively.
WS-SNPs&GO is a valuable tool that includes in a unique framework information derived from protein sequence, structure, evolutionary profile, and protein function. WS-SNPs&GO is freely available at http://snps.biofold.org/snps-and-go.
SNPs&GO 是一种使用蛋白质功能注释预测有害单氨基酸多态性 (SAP) 的方法。在这项工作中,我们提出了 SNPs&GO 的网络服务器实现 (WS-SNPs&GO)。该服务器基于支持向量机 (SVM),对于给定的蛋白质,其输入包括:序列和/或其三维结构(当可用时)、一组目标变体及其功能基因本体论 (GO) 术语。服务器的输出为每个蛋白质变体提供与人类疾病相关的概率。
该服务器由两个主要组件组成,包括更新的基于序列的 SNPs&GO(最近被评为预测有害 SAP 的最佳算法之一)和基于结构的 SNPs&GO(3d) 程序。序列和结构基算法在从 SwissVar 数据库中提取的一组大型注释变体上进行了广泛测试。选择一个具有超过 38,000 个 SAP 的平衡数据集,基于序列的方法的总体准确性为 81%,0.61 相关系数和接收器操作特征 (ROC) 曲线下的面积 (AUC) 为 0.88。对于映射到蛋白质结构数据库 (PDB) 中可用的蛋白质结构的约 6,600 个变体的子集,基于结构的方法的评分总准确率为 84%,0.68 相关系数和 0.91 AUC。在新的盲变体集上进行测试时,服务器的结果分别为基于序列和基于结构输入的总体准确性为 79%和 83%。
WS-SNPs&GO 是一个有价值的工具,它在一个独特的框架中包含了源自蛋白质序列、结构、进化概况和蛋白质功能的信息。WS-SNPs&GO 可免费在 http://snps.biofold.org/snps-and-go 获得。