• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用序列衍生信息鉴定有害非同义单核苷酸多态性

Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information.

作者信息

Hu Jing, Yan Changhui

机构信息

Department of Computer Science, Utah State University, Logan, UT 84322, USA.

出版信息

BMC Bioinformatics. 2008 Jun 27;9:297. doi: 10.1186/1471-2105-9-297.

DOI:10.1186/1471-2105-9-297
PMID:18588693
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2446391/
Abstract

BACKGROUND

As the number of non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single amino acid polymorphisms (SAPs), increases rapidly, computational methods that can distinguish disease-causing SAPs from neutral SAPs are needed. Many methods have been developed to distinguish disease-causing SAPs based on both structural and sequence features of the mutation point. One limitation of these methods is that they are not applicable to the cases where protein structures are not available. In this study, we explore the feasibility of classifying SAPs into disease-causing and neutral mutations using only information derived from protein sequence.

RESULTS

We compiled a set of 686 features that were derived from protein sequence. For each feature, the distance between the wild-type residue and mutant-type residue was computed. Then a greedy approach was used to select the features that were useful for the classification of SAPs. 10 features were selected. Using the selected features, a decision tree method can achieve 82.6% overall accuracy with 0.607 Matthews Correlation Coefficient (MCC) in cross-validation. When tested on an independent set that was not seen by the method during the training and feature selection, the decision tree method achieves 82.6% overall accuracy with 0.604 MCC. We also evaluated the proposed method on all SAPs obtained from the Swiss-Prot, the method achieves 0.42 MCC with 73.2% overall accuracy. This method allows users to make reliable predictions when protein structures are not available. Different from previous studies, in which only a small set of features were arbitrarily chosen and considered, here we used an automated method to systematically discover useful features from a large set of features well-annotated in public databases.

CONCLUSION

The proposed method is a useful tool for the classification of SAPs, especially, when the structure of the protein is not available.

摘要

背景

随着非同义单核苷酸多态性(nsSNPs),也称为单氨基酸多态性(SAPs)的数量迅速增加,需要能够区分致病SAPs和中性SAPs的计算方法。已经开发了许多方法来基于突变点的结构和序列特征区分致病SAPs。这些方法的一个局限性是它们不适用于蛋白质结构不可用的情况。在本研究中,我们探索仅使用从蛋白质序列衍生的信息将SAPs分类为致病突变和中性突变的可行性。

结果

我们汇编了一组从蛋白质序列衍生的686个特征。对于每个特征,计算野生型残基和突变型残基之间的距离。然后使用贪婪方法选择对SAPs分类有用的特征。选择了10个特征。使用所选特征,决策树方法在交叉验证中可以达到82.6%的总体准确率和0.607的马修斯相关系数(MCC)。在训练和特征选择期间该方法未见过的独立集上进行测试时,决策树方法达到82.6%的总体准确率和0.604的MCC。我们还在从Swiss-Prot获得的所有SAPs上评估了所提出的方法,该方法达到0.42的MCC和73.2%的总体准确率。当蛋白质结构不可用时,该方法允许用户进行可靠的预测。与以前的研究不同,在以前的研究中仅任意选择和考虑了一小部分特征,在这里我们使用一种自动化方法从公共数据库中注释良好的大量特征中系统地发现有用的特征。

结论

所提出的方法是用于SAPs分类的有用工具,特别是当蛋白质结构不可用时。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/292c/2446391/964c720297da/1471-2105-9-297-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/292c/2446391/9276cccfcd99/1471-2105-9-297-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/292c/2446391/9cfceb678212/1471-2105-9-297-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/292c/2446391/964c720297da/1471-2105-9-297-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/292c/2446391/9276cccfcd99/1471-2105-9-297-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/292c/2446391/9cfceb678212/1471-2105-9-297-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/292c/2446391/964c720297da/1471-2105-9-297-3.jpg

相似文献

1
Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information.利用序列衍生信息鉴定有害非同义单核苷酸多态性
BMC Bioinformatics. 2008 Jun 27;9:297. doi: 10.1186/1471-2105-9-297.
2
Predicting disease-associated substitution of a single amino acid by analyzing residue interactions.通过分析残基相互作用预测单个氨基酸的疾病相关取代。
BMC Bioinformatics. 2011 Jan 12;12:14. doi: 10.1186/1471-2105-12-14.
3
Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties.基于蛋白质相互作用网络和混合特性预测有害非同义 SNPs。
PLoS One. 2010 Jul 30;5(7):e11900. doi: 10.1371/journal.pone.0011900.
4
Combination use of protein-protein interaction network topological features improves the predictive scores of deleterious non-synonymous single-nucleotide polymorphisms.蛋白质-蛋白质相互作用网络拓扑特征的联合使用提高了有害非同义单核苷酸多态性的预测分数。
Amino Acids. 2014 Aug;46(8):2025-35. doi: 10.1007/s00726-014-1760-9. Epub 2014 May 22.
5
SySAP: a system-level predictor of deleterious single amino acid polymorphisms.SySAP:一种有害单氨基酸变异的系统水平预测因子。
Protein Cell. 2012 Jan;3(1):38-43. doi: 10.1007/s13238-011-1130-2. Epub 2011 Dec 19.
6
Improving the prediction of disease-related variants using protein three-dimensional structure.利用蛋白质三维结构提高疾病相关变异体的预测能力。
BMC Bioinformatics. 2011;12 Suppl 4(Suppl 4):S3. doi: 10.1186/1471-2105-12-S4-S3. Epub 2011 Jul 5.
7
Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.预测有害的非同义单核苷酸多态性:序列和结构属性分析
BMC Bioinformatics. 2006 Apr 21;7:217. doi: 10.1186/1471-2105-7-217.
8
Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.利用结构和进化信息预测非同义单核苷酸多态性的表型效应。
Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3.
9
CoagVDb: a comprehensive database for coagulation factors and their associated SAPs.凝血因子数据库(CoagVDb):一个关于凝血因子及其相关血清淀粉样蛋白P成分的综合数据库。
Biol Res. 2015 Jul 19;48(1):35. doi: 10.1186/s40659-015-0028-5.
10
Statistical geometry based prediction of nonsynonymous SNP functional effects using random forest and neuro-fuzzy classifiers.基于统计几何学,使用随机森林和神经模糊分类器预测非同义单核苷酸多态性的功能效应
Proteins. 2008 Jun;71(4):1930-9. doi: 10.1002/prot.21838.

引用本文的文献

1
Effect of Structural Changes in Proteins Derived from GATA4 Nonsynonymous Single Nucleotide Polymorphisms in Congenital Heart Disease.先天性心脏病中GATA4非同义单核苷酸多态性衍生蛋白质结构变化的影响
Indian J Pharm Sci. 2015 Nov-Dec;77(6):735-41. doi: 10.4103/0250-474x.174988.
2
Data mining strategies to improve multiplex microbead immunoassay tolerance in a mouse model of infectious diseases.在传染病小鼠模型中提高多重微珠免疫测定耐受性的数据挖掘策略。
PLoS One. 2015 Jan 23;10(1):e0116262. doi: 10.1371/journal.pone.0116262. eCollection 2015.
3
DDIG-in: discriminating between disease-associated and neutral non-frameshifting micro-indels.

本文引用的文献

1
SNAP: predict effect of non-synonymous polymorphisms on function.SNAP:预测非同义多态性对功能的影响。
Nucleic Acids Res. 2007;35(11):3823-35. doi: 10.1093/nar/gkm238. Epub 2007 May 25.
2
Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP).寻找新的结构和序列属性以预测单氨基酸多态性(SAP)可能的疾病关联性。
Bioinformatics. 2007 Jun 15;23(12):1444-50. doi: 10.1093/bioinformatics/btm119. Epub 2007 Mar 24.
3
PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways.
DDIG-in:区分疾病相关和中性非移码微插入缺失
Genome Biol. 2013 Mar 13;14(3):R23. doi: 10.1186/gb-2013-14-3-r23.
4
Exploring functional variant discovery in non-coding regions with SInBaD.利用 SInBaD 探索非编码区域的功能变异发现。
Nucleic Acids Res. 2013 Jan 7;41(1):e7. doi: 10.1093/nar/gks800. Epub 2012 Aug 31.
5
Predicting the effects of frameshifting indels.预测移码框突变的影响。
Genome Biol. 2012 Feb 9;13(2):R9. doi: 10.1186/gb-2012-13-2-r9.
6
Population and computational analysis of the MGEA6 P521A variation as a risk factor for familial idiopathic basal ganglia calcification (Fahr's disease).人群和计算分析 MGEA6 P521A 变异作为家族性特发性基底节钙化(Fahr 病)的风险因素。
J Mol Neurosci. 2011 Mar;43(3):333-6. doi: 10.1007/s12031-010-9445-7. Epub 2010 Sep 14.
7
Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties.基于蛋白质相互作用网络和混合特性预测有害非同义 SNPs。
PLoS One. 2010 Jul 30;5(7):e11900. doi: 10.1371/journal.pone.0011900.
PANTHER版本6:具有生物途径扩展表示的蛋白质序列和功能进化数据。
Nucleic Acids Res. 2007 Jan;35(Database issue):D247-52. doi: 10.1093/nar/gkl869. Epub 2006 Nov 27.
4
Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information.利用支持向量机和进化信息预测与单点蛋白质突变相关的人类遗传疾病的发生。
Bioinformatics. 2006 Nov 15;22(22):2729-34. doi: 10.1093/bioinformatics/btl423. Epub 2006 Aug 7.
5
Predicting deleterious nsSNPs: an analysis of sequence and structural attributes.预测有害的非同义单核苷酸多态性:序列和结构属性分析
BMC Bioinformatics. 2006 Apr 21;7:217. doi: 10.1186/1471-2105-7-217.
6
Use of bioinformatics tools for the annotation of disease-associated mutations in animal models.利用生物信息学工具对动物模型中的疾病相关突变进行注释。
Proteins. 2005 Dec 1;61(4):878-87. doi: 10.1002/prot.20664.
7
Loss of protein structure stability as a major causative factor in monogenic disease.蛋白质结构稳定性丧失作为单基因疾病的主要致病因素。
J Mol Biol. 2005 Oct 21;353(2):459-73. doi: 10.1016/j.jmb.2005.08.020.
8
Prediction of the phenotypic effects of non-synonymous single nucleotide polymorphisms using structural and evolutionary information.利用结构和进化信息预测非同义单核苷酸多态性的表型效应。
Bioinformatics. 2005 May 15;21(10):2185-90. doi: 10.1093/bioinformatics/bti365. Epub 2005 Mar 3.
9
Predicting disease using genomics.利用基因组学预测疾病。
Nature. 2004 May 27;429(6990):453-6. doi: 10.1038/nature02624.
10
The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants.瑞士蛋白质数据库变异页面与ModSNP数据库:人类蛋白质变异体的序列和结构信息资源。
Hum Mutat. 2004 May;23(5):464-70. doi: 10.1002/humu.20021.