Suppr超能文献

利用结构和进化信息预测非同义单核苷酸多态性的表型效应。

Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.

作者信息

Bao Lei, Cui Yan

机构信息

Department of Molecular Sciences, Center of Genomics and Bioinformatics, University of Tennessee Health Science Center, 858 Madison Avenue, Memphis, TN 38163, USA.

出版信息

Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3.

Abstract

MOTIVATION

There has been great expectation that the knowledge of an individual's genotype will provide a basis for assessing susceptibility to diseases and designing individualized therapy. Non-synonymous single nucleotide polymorphisms (nsSNPs) that lead to an amino acid change in the protein product are of particular interest because they account for nearly half of the known genetic variations related to human inherited diseases. To facilitate the identification of disease-associated nsSNPs from a large number of neutral nsSNPs, it is important to develop computational tools to predict the phenotypic effects of nsSNPs.

RESULTS

We prepared a training set based on the variant phenotypic annotation of the Swiss-Prot database and focused our analysis on nsSNPs having homologous 3D structures. Structural environment parameters derived from the 3D homologous structure as well as evolutionary information derived from the multiple sequence alignment were used as predictors. Two machine learning methods, support vector machine and random forest, were trained and evaluated. We compared the performance of our method with that of the SIFT algorithm, which is one of the best predictive methods to date. An unbiased evaluation study shows that for nsSNPs with sufficient evolutionary information (with not <10 homologous sequences), the performance of our method is comparable with the SIFT algorithm, while for nsSNPs with insufficient evolutionary information (<10 homologous sequences), our method outperforms the SIFT algorithm significantly. These findings indicate that incorporating structural information is critical to achieving good prediction accuracy when sufficient evolutionary information is not available.

AVAILABILITY

The codes and curated dataset are available at http://compbio.utmem.edu/snp/dataset/

摘要

动机

人们一直寄予厚望,认为个体基因型知识将为评估疾病易感性和设计个性化治疗提供依据。导致蛋白质产物中氨基酸变化的非同义单核苷酸多态性(nsSNPs)尤其令人关注,因为它们占已知与人类遗传性疾病相关的遗传变异的近一半。为了便于从大量中性nsSNPs中识别与疾病相关的nsSNPs,开发计算工具来预测nsSNPs的表型效应很重要。

结果

我们基于Swiss-Prot数据库的变异表型注释准备了一个训练集,并将分析重点放在具有同源三维结构的nsSNPs上。从三维同源结构导出的结构环境参数以及从多序列比对导出的进化信息被用作预测因子。对支持向量机和随机森林这两种机器学习方法进行了训练和评估。我们将我们方法的性能与SIFT算法(迄今为止最好的预测方法之一)的性能进行了比较。一项无偏评估研究表明,对于具有足够进化信息(同源序列不少于10个)的nsSNPs,我们方法的性能与SIFT算法相当,而对于进化信息不足(同源序列少于10个)的nsSNPs,我们的方法明显优于SIFT算法。这些发现表明,当没有足够的进化信息时,纳入结构信息对于实现良好的预测准确性至关重要。

可用性

代码和经过整理的数据集可在http://compbio.utmem.edu/snp/dataset/获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验